When using `FlyteFile` , we end up returning somet...
# flytekit
g
When using
FlyteFile
, we end up returning something like this when we want to upload a file to specific GCS path. This is fine if you have one file you are returning, but can be verbose when working with a handful of output files.
Copy code
FlyteFile(random_local_path, remote_path=os.path.join(gcs_outdir, os.path.basename(random_local_path))
Does anyone have recommendations to simplify this? I was thinking a small PR adding an additional parameter to FlyteFile could be nice, but thought I’d ask here in case anyone has other ideas to simplify!
Copy code
FlyteFile(random_local_path, remote_dir=gcs_outdir)
Above outlines a simple case, but it can get unwieldy when you start thinking about defining tempfile in-line like so:
Copy code
FlyteFile(os.path.join(current_context().working_directory, "message.txt"), remote_path=....)
This example may help show what I think I want, and am wondering if anyone else feels the same. I’d be happy to PR if we think this would be useful - or if you have suggestions they’’d be greatly appreciated!
y
@Greg Gydush why don’t you set the raw output prefix?
🙏 1
then you shouldn’t have to do any of that.
g
What if I want it used for some files (permanent) but not for others (temporary file output from a task). Some tasks may have 5 outputs that are meant to be uploaded to specific gcs location, 1 output that is temporary (where random path is fine)
why don’t you set the raw output prefix?
Could you maybe elaborate on this too? This can be set on the context?
Anyone else have thoughts on this?
y
oh oops sorry. i was on vacation last week
was just answering when i had time.
ping ketan or eduardo or just repost in the future in the channel
g
No apologies necessary @Yee!! Hope you had a nice vacation!! I can try to jump on office hours this week to chat about it with Ketan!
https://github.com/flyteorg/flyte/issues/2070 for how we hope to have this be a project/domain level default in the future instead of just a propeller default
g
Yeah, I think this is still globally. For a single task, I may have some outputs that need to be uploaded to an explicit location, others where the randomly generated path using
raw_data_prefix
is fine. Does that make sense? I can show you some examples of why I would need something like that (it’s extremely common in bioinfo — e.g., intermediate files upload to ephemeral location so next task can use, but keep the logs in permanent GCS location)
That’s where having
remote_dir
would be handy, but may be best to discuss more in OH
y
oh yeah i remember thinking when i first saw this that it might be configurable via environment variables, but I think it’s not actually
yeah maybe something to discuss in OH
👍 1
170 Views