loud-belgium-4006
04/24/2023, 12:09 PMbt = ContainerTask(
name="basic-test",
input_data_dir="/var/inputs",
output_data_dir="/var/outputs",
inputs=kwtypes(indir=FlyteDirectory),
outputs=kwtypes(),
image="<http://ghcr.io/flyteorg/rawcontainers-shell:v2|ghcr.io/flyteorg/rawcontainers-shell:v2>",
command=[
"ls",
"-la",
"/var/inputs",
],
)
@task
def get_dir(dirpath: str) -> FlyteDirectory:
fd = FlyteDirectory(path=dirpath)
return fd
@workflow
def wf():
fd = get_dir(dirpath='<s3://my-s3-bucket/cv-in>')
bt(indir=fd)
Running it with pyflyte run --remote
, the above produces an empty ls
output in the k8s logs. Trying to explicitly pass /var/inputs/indir
to ls
returns a "No such file or dir" exception for the task. Any help is appreciated as always!billowy-winter-86593
04/24/2023, 1:28 PMtall-lock-23197
loud-belgium-4006
05/01/2023, 4:06 AMcontainer_image
parameter of the flytekit.task()
decorator (as described in the multiple containers docs). I'd assume that using a regular Task instead of ContainerTask would better handle FlyteDirectory. I'm also assuming that if I build my custom container using FROM <http://ghcr.io/flyteorg/flytekit:py3.8-latest|ghcr.io/flyteorg/flytekit:py3.8-latest>
then it should have all the necessary dependencies. I'll be hacking on it moving forward but I just wanted to get your take.tall-lock-23197
I'd assume that using a regular Task instead of ContainerTask would better handle FlyteDirectory.Could you elaborate how?
I'm also assuming that if I build my custom container usingWhat's necessary dependencies?then it should have all the necessary dependencies.FROM <http://ghcr.io/flyteorg/flytekit:py3.8-latest|ghcr.io/flyteorg/flytekit:py3.8-latest>
hallowed-mouse-14616
05/01/2023, 12:01 PMFlyteDirectory
in ContainerTask
. Would you mind filing a bug? It should be relatively simple to ensure this support in flytecopilot (down download inputs / update outputs to / from the executing container).
Also interested in diving deeper into what direction is the project going in regarding custom container extensibility?
. The goal is for complete extensibility. In your use-case I think just ensuring the correct abstraction within the API is right. If you plan on processing the data with a python function obviously the @task
decorator from the flytekit API is the most simple approach, but if it requires something more complex perhaps the ContainerTask
is the right abstraction.loud-belgium-4006
05/01/2023, 10:06 PM@task
decorated examples from the Working with Folders docs. My dependencies comment is based on a perhaps flawed understanding of the different containers used in Flyte. In my mind, the @task
tasks need to be "flyte aware" in some way, as suggested in the workflow lifecycle docs saying "Flyte expects to run in a container that has an entrypoint called pyflyte-execute
. This entrypoint is provided when you pip install flytekit
". Using raw containers via ContainerTask to me suggests that they don't need any such dependencies. I'm still wrapping my head around the abstractions so this might be way off.
@hallowed-mouse-14616 I've raised the issue and will be more proactive in the future as I get more confident that it's not just user error! As for the extensibility, maybe I'm just trying to make sense of the Task --> PythonTask --> ContainerTask
vs Task --> PythonTask --> PythonAutoContainerTask --> PythonFunctionTask
dependency paths and which container/dependencies are appropriate in different situations. So basically I'm just trying to get a handle on the different abstractions while still getting some work done instead of reading source code all day 😅
I appreciate the patience from both of you as I get up to speed!