Hi, I am trying to use lists of FlyteFiles inside ...
# announcements
k
Hi, I am trying to use lists of FlyteFiles inside dataclasses. It seems to me, that when passing the dataclass from one task to the other, the reference to the remote source is lost. See the following example:
Copy code
from dataclasses import dataclass
from flytekit import task, workflow
from typing import List

from dataclasses_json import dataclass_json
from flytekit.types.file import FlyteFile


@dataclass_json
@dataclass
class InputsContainer:
    files: List[FlyteFile]


@task
def task1(inputs: List[FlyteFile]) -> InputsContainer:
    print("TASK1 remote source: ", inputs[0].remote_source)
    return InputsContainer(files=inputs)


@task
def task2(inputs: InputsContainer) -> None:
    print("TASK2 remote source: ", inputs.files[0].remote_source)


@workflow
def main_workflow(inputs: List[FlyteFile]) -> None:
    task1_outputs = task1(inputs=inputs)
    task2(inputs=task1_outputs)


if __name__ == '__main__':
    file_path = FlyteFile("<s3://test-bucket/test.json>")
    main_workflow(inputs=[file_path])
The output generated is:
TASK1 remote source:  <s3://test-bucket/test.json>
TASK2 remote source:  None
Could anyone help me out here? Thanks!
k
cc @Kevin Su @Eduardo Apolinario (eapolinario)
👀 1
e
@Klemens Kasseroller, can you confirm which version of flytekit you're running?
k
This is really odd, let's try to reproduce it - cc @Samhita Alla ?
👍 1
s
FlyteFile -> Dataclass conversion works as expected without a list.
Copy code
@dataclass_json
@dataclass
class InputsContainer:
    files: FlyteFile


@task
def task1(inputs: FlyteFile) -> InputsContainer:
    print("TASK1 remote source: ", inputs.remote_source)
    return InputsContainer(files=inputs)


@task
def task2(inputs: InputsContainer) -> None:
    print("TASK2 remote source: ", inputs.files.remote_source)


@workflow
def main_workflow(inputs: FlyteFile) -> None:
    task1_outputs = task1(inputs=inputs)
    task2(inputs=task1_outputs)


if __name__ == '__main__':
    file_path = FlyteFile("<s3://test-bucket/test.json>")
    main_workflow(inputs=file_path)
Output:
Copy code
TASK1 remote source:  <s3://test-bucket/test.json>
TASK2 remote source:  <s3://test-bucket/test.json>
We haven’t really handled serialization of list of Flyte types, though deserialization got it covered.
@Kevin Su, WDYT?
k
That’s a miss. Cc @Eduardo Apolinario (eapolinario) we were going to generate tests right
k
@Eduardo Apolinario (eapolinario) I am using the latest version of flytekit - 1.0.3
k
yeah, It looks like we don’t support using a list of flyte type in dataclass. I’m fixing it. https://github.com/flyteorg/flytekit/blob/cc2a4e7d6a3763b4905334b87add324159da44e5/flytekit/core/type_engine.py#L331-L360
🙏 1
171 Views