Is there any examples of using ContainerTasks with...
# ask-the-community
m
Is there any examples of using ContainerTasks with FlyteFile and FlyteDirectory? I’m trying to use a ContainerTask that has a FlyteFile as an input, and the output is a directory but struggling to make anything work. Any pointers?
n
this should work! do you have code you can share to help debug? what error are you getting?
I’m guessing you’ve seen this page? https://docs.flyte.org/projects/cookbook/en/latest/auto/core/containerization/raw_container.html#container-tasks Are you using python in your container task or another language?
m
copilot-downloader-output
So I think the issue is to due with how I’m defining my File inputs and Directory outputs. I’ve got that example you’ve shared working with no issues, however when using a FlyteFile is seems to struggle pulling the file in.
Copy code
train_task_container = ContainerTask(
    name="regression_container",
    input_data_dir="/var/inputs",
    output_data_dir="/var/outputs",
    inputs=kwtypes(x=FlyteFile),
    outputs=kwtypes(output=FlyteFile),
    image="pachyderm/housing-prices:1.11.0",
    command=[
        "python",
        "regression.py",
        "--input",
        "/var/inputs",
        "--target-col",
        "MEDV",
        "--output",
        "/var/outputs/output",
    ],
)
This is how the step is defined, so its running a python script but failing because no file is picked up. I’ve attatched the copilot downloader output which seems to show this as well. Again this works fine when using python primitive types for inputs and outputs
n
what does your script look like?
also, can you share the workflow that uses the
train_task_container
task?
m
Yep, the script is from the pachyderm examples, that works correctly, and the output shows it has found no input CSV, hence no output, and I’ve ssh’ed into the relevant pod, and
/var/inputs/
is empty
And the workflow:
Untitled.py
I’m basically comparing the ShellTask to the ContainerTask, ShellTask is working fine with the same script
n
so it looks like
--input /var/inputs
might be the issue here: can you use the templating syntax
--inputs {{.inputs.x}}
in
train_task_container
instead? Basically the current code points to the
/var/inputs
directory whereas
{{.inputs.x}}
will inject the correct filepath
m
Thanks for looking at this 🙏
Getting the following error now
Copy code
Pod failed. No message received from kubernetes.
[flyte-copilot-downloader] terminated with ExitCode 0.
[az6rsxswjlzql79xl2bv-n2-0] terminated with exit code (1). Reason [Error]. Message: 
Traceback (most recent call last):
  File "regression.py", line 101, in <module>
    main()
  File "regression.py", line 79, in main
    print("Datasets: {}".format(input_files))
UnboundLocalError: local variable 'input_files' referenced before assignment
.
[flyte-copilot-sidecar] terminated with ExitCode 0.
And the command executing in container
Copy code
python regression.py --input <s3://adarga-ds-lab-2-flyte-data/data/2c/a9j9fgb4l89j8h6t9gfr-n0-0/238d789391a50d97f62e7fcf20e9d42c/boston_housing.csv> --target-col MEDV --output /var/outputs/output
So obviously the s3 address is being passed in instead of the file url
j
@Michael Tinsley did you ever find a solution to this? I see the same behavior and the only workaround I can think of is to have the task read the file from S3 itself, but that obviously defeats the purpose of Flyte managing the glue
m
No I didn’t… I got a bit sidetracked - I’ve been meaning to open a GH issue, I’m kind of glad someone else has experienced the same tbh
To add, for my use case I can use the ShellTask instead of a ContainerTask, but I was just trying to compare them. For us, its a question of which is quicker for migrating old pipelines over to Flyte but theres not much in it, just need to add awscli and flykit deps
j
yeah makes sense, I'm doing similar exploration where in our case we are migrating lots of existing R code
i'll continue down this path and let you know if i find a solution
I found a solution. For
ContainerTask
, for a
myinput=FlyteFile
input variable configured for the directory `/var/inputs`:
/var/inputs/myinput
=the file
{{.inputs.myinput}}
=the remote file uri
Your original attempt was
--input /var/inputs
which wasn't right, should have been
--input /var/inputs/x
m
Ah great, thanks for working that out - I’ll give it a go today
That works for me when using python primitives, but anything that needs downloading fails. Looking more into this, looks like its the initcontainer that is failing to download from S3, but its returning a 0 exit code which it probably shouldn’t 🤷
l
I'm also trying to do the same thing, i.e., get remote file into ContainerTask, but still haven't figured out how. I seems the directory
/tmp/inputs
is mounted into
input_data_dir="/var/inputs"
of the docker container, but I can't figure out how to download the file into
/tmp/inputs
with a ContainerTask.
j
for me they were automatically downloaded into a file named after the variable, so
/var/inputs/myvar
l
Hey, that's worked. It's worked for a publicly assessable file, now other people in my company may help figure out how to download internal files on the cloud (but at least the log indicates it tried to download, so that's progress). Thanks!
157 Views