I am using containertasks to process some files an...
# flyte-support
a
I am using containertasks to process some files and return a flytedirectory , has anyone figured out a way to set the upload path in s3 instead of the randomly generated path?
h
You can do that yes when you construct the return FlyteDirectory. Let me pull out an example
Actually let me ask on #C06H1SFA19R 🙂
It got it right! 🙂
Copy code
from flytekit import task, FlyteDirectory

@task
def process_files() -> FlyteDirectory:
    local_dir = "/path/to/local/dir"
    remote_path = "<s3://your-bucket/specific/path/>"
    return FlyteDirectory(local_dir, remote_path=remote_path)
Needless to say, the Pod running your task need to have access to that s3 path to upload to.
t
directories work for containertasks? i thought copilot was not able to i/o directories yet.
h
🤦 My bad, I totally glossed over that... https://docs.flyte.org/en/latest/user_guide/customizing_dependencies/raw_containers.html They do not support flyte directories unfortunately at the moment. There is a PR @thankful-minister-83577 is reviewing to add support for Flyte Directories as inputs: https://github.com/flyteorg/flyte/pull/5715 Would you be interested in contributing a similar PR to support returning a directory as an output?
a
My ContainerTask is similar to this calculate_ellipse_area_shell = ContainerTask( name="ellipse-area-metadata-shell", input_data_dir="/var/inputs", output_data_dir="/var/outputs", inputs=kwtypes(files=List[FlyteFile]), outputs=kwtypes(output=FlyteDirectory), image="ghcr.io/flyteorg/rawcontainers-shell:v1", command=[ "./calculate-ellipse-area.sh", "/var/inputs", "/var/outputs", ], ), I have multiple files as input and a flytedirectory as output, I want to control where the FlyteDirectory is uploaded in Flyte. Right now it just goes to a randomly generated path
d
directories work for containertasks? i thought copilot was not able to i/o directories yet.
output is able, input is not
h
@astonishing-airport-72525 Have you attempted to configure raw data prefix? In your
pyflyte run --remote
command, you can append
--raw-data-prefix <s3://my-custom-bucket/my-custom-prefix/>
that should instruct the system to store in that location instead
Note that this will apply to all tasks within a given execution not just that one raw container task
a
I will give this a shot thanks ! Would this work when using a kubernetes deployment as well ?
h
Mind clarifying? How do you create k8s deployments? or you meant PodTemplates?
a
I have installed the helm chart, then I register the workflow and trigger it from the web ui. I started looking into the pluginmachinery go code and it looks like it will always just generate a random output prefix, when adding in the init and sidecar containers,I don't see a way to change it when using raw container tasks.