first time flyte user, I have a ContainerTask whic...
# flyte-support
m
first time flyte user, I have a ContainerTask which pulls image from AWS ECR and injects input like AWS S3 URI necessary for the task to run in command. My docker image has a subprocess command to call AWS S3 to download those URI file inputs but for some reason it returns CalledProcessError for the command and returned non-zero exit status 1 . Is there a better way to download input files from S3 into Flyte ContainerTask on runtime and also upload output back to S3. Can some please provide guidance on how can I debug and move ahead with this problem?
d
did you put AWS S3 URI in
flytefile
?
m
yes, FlyteFile is my data-type for s3 uri
Can you please share a code snippet for s3 file uri as input to pass in a task where it can download do some process to it and upload back to S3 ?
d
I'm wondering that if there's some permission error
did you use attr access?
m
the error message could have been different I guess like access denied or something in calledprocess
no i haven't used attr access. I am not aware of it
d
does it work in local execution?
m
the process is too big and cannot run on local, so I have to push it to hosted flyte instance to see
i tested this setup once on EC2 and it was working fine there
should i download file and then try to send it via FlyteFile instead of sending AWS S3 URI?
d
maybe we have to give the
containertask1
more resource?
put the aws s3 uri in flytefile is enough
m
it has that already specified from pre-tested benchmarks.
d
how many resource did you give, especially memory?
m
cpu 4, mem 32Gi, nvidia.com/gpu: 1
d
so we have 3 container to execute the container task
1 for download input, 1 for execute the code, 1 for upload output
2 sidecars
do you know which container you failed?
m
nope, actually its just one single container task which is suppose to do all that in my case as I see start->container_task->end
I plan to try removing download input inside container task by somehow feeding input file directly to task. Let me try that and get back to see! . It may clear doubts about access, network issue, etc.
c
@microscopic-animal-17045, are those s3 files in a bucket that your container has access to? If so, you can specify those files as inputs to the
ContainerTask
, something like:
Copy code
consume_files = ContainerTask(
    name="my-name",
    input_data_dir="/var/inputs",
    inputs=kwtypes(file1=FlyteFile, file2=FlyteFile),
    image="<your-image>",
    command=[...],
)
and invoke
consume_files
specifying
file1
and
file2
. Flyte will make sure that those files will show up in the
/var/inputs/file1
and
/var/inputs/file2
respectively before your code runs.
m
hey @high-accountant-32689 thank you for the suggestion, above issue has been resolved.
h
@microscopic-animal-17045, can you share what was the final resolution?
m
sure, i switched to @task methods and used subprocess commands alongwith @vscode decorator to debug on failure cases. I leveraged FlyteFile download() method and direct aws s3 client commands for copying results back to S3. I had hardluck downloading FlyteFile in ContainerTask but when i switched to ShellTask it worked and later to just @task. Since ShellTask also has limitations on resource allocation attribute. My workflow required gpu and memory pre-defined so ultimately the basics helped me after all attempts.
Now i am stuck getting streaming logs out from those subprocesses to show progress to my end users on app, at present i just have a poll every 60 seconds showing run status and i am looking to somehow get hands on the logfile my subprocess is writing inside that container