When using `FlyteFile` we end up returning something like th Flyte #flytekit

When using `FlyteFile` , we end up returning somet...

rich-garden-69988

03/17/2022, 2:49 AM

When using

FlyteFile

, we end up returning something like this when we want to upload a file to specific GCS path. This is fine if you have one file you are returning, but can be verbose when working with a handful of output files.

Copy code

FlyteFile(random_local_path, remote_path=os.path.join(gcs_outdir, os.path.basename(random_local_path))

Does anyone have recommendations to simplify this? I was thinking a small PR adding an additional parameter to FlyteFile could be nice, but thought I’d ask here in case anyone has other ideas to simplify!

Copy code

FlyteFile(random_local_path, remote_dir=gcs_outdir)

rich-garden-69988

03/17/2022, 2:50 AM

Above outlines a simple case, but it can get unwieldy when you start thinking about defining tempfile in-line like so:

Copy code

FlyteFile(os.path.join(current_context().working_directory, "message.txt"), remote_path=....)

rich-garden-69988

03/17/2022, 3:01 AM

This example may help show what I think I want, and am wondering if anyone else feels the same. I’d be happy to PR if we think this would be useful - or if you have suggestions they’’d be greatly appreciated!

thankful-minister-83577

03/17/2022, 8:11 AM

@rich-garden-69988 why don’t you set the raw output prefix?

🙏 1

thankful-minister-83577

03/17/2022, 8:12 AM

then you shouldn’t have to do any of that.

rich-garden-69988

03/17/2022, 1:34 PM

What if I want it used for some files (permanent) but not for others (temporary file output from a task). Some tasks may have 5 outputs that are meant to be uploaded to specific gcs location, 1 output that is temporary (where random path is fine)

rich-garden-69988

03/17/2022, 2:23 PM

why don’t you set the raw output prefix?

Could you maybe elaborate on this too? This can be set on the context?

rich-garden-69988

03/18/2022, 4:15 PM

Anyone else have thoughts on this?

thankful-minister-83577

03/21/2022, 12:50 AM

oh oops sorry. i was on vacation last week

thankful-minister-83577

03/21/2022, 12:51 AM

was just answering when i had time.

thankful-minister-83577

03/21/2022, 12:51 AM

ping ketan or eduardo or just repost in the future in the channel

thankful-minister-83577

03/21/2022, 12:52 AM

see https://docs.flyte.org/en/latest/concepts/data_management.html#raw-data-prefix

rich-garden-69988

03/21/2022, 12:52 AM

No apologies necessary @thankful-minister-83577!! Hope you had a nice vacation!! I can try to jump on office hours this week to chat about it with Ketan!

thankful-minister-83577

03/21/2022, 12:53 AM

https://github.com/flyteorg/flyte/blob/master/deployment/eks/flyte_helm_generated.yaml#L501 for the default as set by propeller

thankful-minister-83577

03/21/2022, 12:53 AM

https://github.com/flyteorg/flyte/issues/2070 for how we hope to have this be a project/domain level default in the future instead of just a propeller default

rich-garden-69988

03/21/2022, 12:55 AM

Yeah, I think this is still globally. For a single task, I may have some outputs that need to be uploaded to an explicit location, others where the randomly generated path using

raw_data_prefix

is fine. Does that make sense? I can show you some examples of why I would need something like that (it’s extremely common in bioinfo — e.g., intermediate files upload to ephemeral location so next task can use, but keep the logs in permanent GCS location)

rich-garden-69988

03/21/2022, 12:55 AM

That’s where having

remote_dir

would be handy, but may be best to discuss more in OH

thankful-minister-83577

03/21/2022, 1:00 AM

oh yeah i remember thinking when i first saw this that it might be configurable via environment variables, but I think it’s not actually

thankful-minister-83577

03/21/2022, 1:00 AM

yeah maybe something to discuss in OH

👍 1

172 Views

Open in Slack

Previous Next