https://flyte.org logo
g

Greg Gydush

03/17/2022, 2:49 AM
When using
FlyteFile
, we end up returning something like this when we want to upload a file to specific GCS path. This is fine if you have one file you are returning, but can be verbose when working with a handful of output files.
Copy code
FlyteFile(random_local_path, remote_path=os.path.join(gcs_outdir, os.path.basename(random_local_path))
Does anyone have recommendations to simplify this? I was thinking a small PR adding an additional parameter to FlyteFile could be nice, but thought I’d ask here in case anyone has other ideas to simplify!
Copy code
FlyteFile(random_local_path, remote_dir=gcs_outdir)
Above outlines a simple case, but it can get unwieldy when you start thinking about defining tempfile in-line like so:
Copy code
FlyteFile(os.path.join(current_context().working_directory, "message.txt"), remote_path=....)
This example may help show what I think I want, and am wondering if anyone else feels the same. I’d be happy to PR if we think this would be useful - or if you have suggestions they’’d be greatly appreciated!
y

Yee

03/17/2022, 8:11 AM
@Greg Gydush why don’t you set the raw output prefix?
🙏 1
then you shouldn’t have to do any of that.
g

Greg Gydush

03/17/2022, 1:34 PM
What if I want it used for some files (permanent) but not for others (temporary file output from a task). Some tasks may have 5 outputs that are meant to be uploaded to specific gcs location, 1 output that is temporary (where random path is fine)
why don’t you set the raw output prefix?
Could you maybe elaborate on this too? This can be set on the context?
Anyone else have thoughts on this?
y

Yee

03/21/2022, 12:50 AM
oh oops sorry. i was on vacation last week
was just answering when i had time.
ping ketan or eduardo or just repost in the future in the channel
g

Greg Gydush

03/21/2022, 12:52 AM
No apologies necessary @Yee!! Hope you had a nice vacation!! I can try to jump on office hours this week to chat about it with Ketan!
https://github.com/flyteorg/flyte/issues/2070 for how we hope to have this be a project/domain level default in the future instead of just a propeller default
g

Greg Gydush

03/21/2022, 12:55 AM
Yeah, I think this is still globally. For a single task, I may have some outputs that need to be uploaded to an explicit location, others where the randomly generated path using
raw_data_prefix
is fine. Does that make sense? I can show you some examples of why I would need something like that (it’s extremely common in bioinfo — e.g., intermediate files upload to ephemeral location so next task can use, but keep the logs in permanent GCS location)
That’s where having
remote_dir
would be handy, but may be best to discuss more in OH
y

Yee

03/21/2022, 1:00 AM
oh yeah i remember thinking when i first saw this that it might be configurable via environment variables, but I think it’s not actually
yeah maybe something to discuss in OH
👍 1
4 Views