important-hamburger-34837
02/07/2024, 1:33 PMproud-answer-87162
02/07/2024, 1:52 PMProblem: passing the output directly downloads all the files from gcs path, not only the outputs of the current run.how are you detecting that behavior? do you see it in gcs or flyte logs? can you paste the task signatures you're working with here?
important-hamburger-34837
02/07/2024, 1:58 PMproud-answer-87162
02/07/2024, 2:02 PMFlyteFile
instead of a FlyteDirectory
?important-hamburger-34837
02/07/2024, 2:03 PMproud-answer-87162
02/07/2024, 2:06 PMimportant-hamburger-34837
02/07/2024, 2:06 PMimportant-hamburger-34837
02/07/2024, 2:06 PMproud-answer-87162
02/07/2024, 2:07 PMimportant-hamburger-34837
02/07/2024, 2:11 PMop1 = task1() # op1 is FlyteDirectory with remote_path
op2 = task2(inp=op1) # passing op1 in the same wf, should use a temporary artifact.
important-hamburger-34837
02/07/2024, 2:11 PMproud-answer-87162
02/07/2024, 2:25 PMcurrent_context().working_directory
in the python flytekit. (and i think this fs behavior is cloud agnostic, so should be the same for gcs. but a flyte contributor might need to confirm that)important-hamburger-34837
02/07/2024, 2:26 PMimportant-hamburger-34837
02/07/2024, 2:28 PMproud-answer-87162
02/07/2024, 2:34 PMimportant-hamburger-34837
02/07/2024, 2:35 PMproud-answer-87162
02/07/2024, 2:36 PMFlyteDirectory
have a warning about what you are seeing:
This class should not be used on very large datasets, as merely listing the dataset will cause
the entire dataset to be downloaded. Listing on S3 and other backend object stores is not consistent
and we should not need data to be downloaded to list.
important-hamburger-34837
02/07/2024, 2:37 PM