https://flyte.org logo
k

Klemens Kasseroller

06/08/2022, 12:33 PM
Hi, I am trying to use lists of FlyteFiles inside dataclasses. It seems to me, that when passing the dataclass from one task to the other, the reference to the remote source is lost. See the following example:
Copy code
from dataclasses import dataclass
from flytekit import task, workflow
from typing import List

from dataclasses_json import dataclass_json
from flytekit.types.file import FlyteFile


@dataclass_json
@dataclass
class InputsContainer:
    files: List[FlyteFile]


@task
def task1(inputs: List[FlyteFile]) -> InputsContainer:
    print("TASK1 remote source: ", inputs[0].remote_source)
    return InputsContainer(files=inputs)


@task
def task2(inputs: InputsContainer) -> None:
    print("TASK2 remote source: ", inputs.files[0].remote_source)


@workflow
def main_workflow(inputs: List[FlyteFile]) -> None:
    task1_outputs = task1(inputs=inputs)
    task2(inputs=task1_outputs)


if __name__ == '__main__':
    file_path = FlyteFile("<s3://test-bucket/test.json>")
    main_workflow(inputs=[file_path])
The output generated is:
TASK1 remote source:  <s3://test-bucket/test.json>
TASK2 remote source:  None
Could anyone help me out here? Thanks!
k

katrina

06/08/2022, 3:59 PM
cc @Kevin Su @Eduardo Apolinario (eapolinario)
👀 1
e

Eduardo Apolinario (eapolinario)

06/09/2022, 12:11 AM
@Klemens Kasseroller, can you confirm which version of flytekit you're running?
k

Ketan (kumare3)

06/09/2022, 4:38 AM
This is really odd, let's try to reproduce it - cc @Samhita Alla ?
👍 1
s

Samhita Alla

06/09/2022, 5:00 AM
FlyteFile -> Dataclass conversion works as expected without a list.
Copy code
@dataclass_json
@dataclass
class InputsContainer:
    files: FlyteFile


@task
def task1(inputs: FlyteFile) -> InputsContainer:
    print("TASK1 remote source: ", inputs.remote_source)
    return InputsContainer(files=inputs)


@task
def task2(inputs: InputsContainer) -> None:
    print("TASK2 remote source: ", inputs.files.remote_source)


@workflow
def main_workflow(inputs: FlyteFile) -> None:
    task1_outputs = task1(inputs=inputs)
    task2(inputs=task1_outputs)


if __name__ == '__main__':
    file_path = FlyteFile("<s3://test-bucket/test.json>")
    main_workflow(inputs=file_path)
Output:
Copy code
TASK1 remote source:  <s3://test-bucket/test.json>
TASK2 remote source:  <s3://test-bucket/test.json>
We haven’t really handled serialization of list of Flyte types, though deserialization got it covered.
@Kevin Su, WDYT?
k

Ketan (kumare3)

06/09/2022, 6:10 AM
That’s a miss. Cc @Eduardo Apolinario (eapolinario) we were going to generate tests right
k

Klemens Kasseroller

06/09/2022, 9:04 AM
@Eduardo Apolinario (eapolinario) I am using the latest version of flytekit - 1.0.3
k

Kevin Su

06/09/2022, 9:07 AM
yeah, It looks like we don’t support using a list of flyte type in dataclass. I’m fixing it. https://github.com/flyteorg/flytekit/blob/cc2a4e7d6a3763b4905334b87add324159da44e5/flytekit/core/type_engine.py#L331-L360
🙏 1
10 Views