Hey team, I was wondering if anyone would be so ki...
# ask-the-community
p
Hey team, I was wondering if anyone would be so kind as to show me how to pass a FlyteDirectory to a ContainerTask? There was a thread a few months ago that was related but I've been trying this from every angle and can't get the objects/files to show up in the container, here's how to reproduce:
Copy code
bt = ContainerTask(
    name="basic-test",
    input_data_dir="/var/inputs",
    output_data_dir="/var/outputs",
    inputs=kwtypes(indir=FlyteDirectory),
    outputs=kwtypes(),
    image="<http://ghcr.io/flyteorg/rawcontainers-shell:v2|ghcr.io/flyteorg/rawcontainers-shell:v2>",
    command=[
        "ls",
        "-la",
        "/var/inputs",
    ],
)

@task
def get_dir(dirpath: str) -> FlyteDirectory:
    fd = FlyteDirectory(path=dirpath)
    return fd

@workflow
def wf():
    fd = get_dir(dirpath='<s3://my-s3-bucket/cv-in>')
    bt(indir=fd)
Running it with
pyflyte run --remote
, the above produces an empty
ls
output in the k8s logs. Trying to explicitly pass
/var/inputs/indir
to
ls
returns a "No such file or dir" exception for the task. Any help is appreciated as always!
e
I would also like to know how to do this!
s
Hey @Dan Rammer (hamersaw)! Do we have a fix for this? https://flyte-org.slack.com/archives/CP2HDHKE1/p1674731887002389
p
@Samhita Alla just curious if maybe this use-case is slightly out-of-scope? Or more generally, what direction is the project going in regarding custom container extensibility? While looking for a workaround I'm exploring using the
container_image
parameter of the
flytekit.task()
decorator (as described in the multiple containers docs). I'd assume that using a regular Task instead of ContainerTask would better handle FlyteDirectory. I'm also assuming that if I build my custom container using
FROM <http://ghcr.io/flyteorg/flytekit:py3.8-latest|ghcr.io/flyteorg/flytekit:py3.8-latest>
then it should have all the necessary dependencies. I'll be hacking on it moving forward but I just wanted to get your take.
s
I'd assume that using a regular Task instead of ContainerTask would better handle FlyteDirectory.
Could you elaborate how?
I'm also assuming that if I build my custom container using
FROM <http://ghcr.io/flyteorg/flytekit:py3.8-latest|ghcr.io/flyteorg/flytekit:py3.8-latest>
then it should have all the necessary dependencies.
What's necessary dependencies?
d
@Pryce your linked example should work. It seems like there may be a bug in support
FlyteDirectory
in
ContainerTask
. Would you mind filing a bug? It should be relatively simple to ensure this support in flytecopilot (down download inputs / update outputs to / from the executing container). Also interested in diving deeper into
what direction is the project going in regarding custom container extensibility?
. The goal is for complete extensibility. In your use-case I think just ensuring the correct abstraction within the API is right. If you plan on processing the data with a python function obviously the
@task
decorator from the flytekit API is the most simple approach, but if it requires something more complex perhaps the
ContainerTask
is the right abstraction.
p
@Samhita Alla I'm basing my assumptions on the fact that FlyteDirectories work without issue in
@task
decorated examples from the Working with Folders docs. My dependencies comment is based on a perhaps flawed understanding of the different containers used in Flyte. In my mind, the
@task
tasks need to be "flyte aware" in some way, as suggested in the workflow lifecycle docs saying "Flyte expects to run in a container that has an entrypoint called
pyflyte-execute
. This entrypoint is provided when you
pip install flytekit
". Using raw containers via ContainerTask to me suggests that they don't need any such dependencies. I'm still wrapping my head around the abstractions so this might be way off. @Dan Rammer (hamersaw) I've raised the issue and will be more proactive in the future as I get more confident that it's not just user error! As for the extensibility, maybe I'm just trying to make sense of the
Task --> PythonTask --> ContainerTask
vs
Task --> PythonTask --> PythonAutoContainerTask --> PythonFunctionTask
dependency paths and which container/dependencies are appropriate in different situations. So basically I'm just trying to get a handle on the different abstractions while still getting some work done instead of reading source code all day 😅 I appreciate the patience from both of you as I get up to speed!