Hi everyone, I'm wondering is having the same need...
# ask-the-community
q
Hi everyone, I'm wondering is having the same need as me: I'm prototyping my code in a notebook, so in this context I'm trying to run "flyte-like" code but without pyflyte, in an interactive way. But to have the input data at hand in my notebook I have to download the files (or directories) manually (e.g: using
s3fs
). I tried to use
FlyteFile("s3://....")
(or
FlyteDirectory
) and the
.download()
method but ... nothing happens (which is not what I expected haha). Reading through the source code, I guess there is a clever way to leverage the machinery of flytekit to have a proper initialization of
FlyteFile
or
FlyteDirectory
in the context of a notebook (maybe using
flytekit.current_context()
?), but I'm a bit lost. Is there a clever way to prototype code in a notebook using native Flyte objects ? Am I the only one trying to work that way with Flyte ? Otherwise how do you prototype code ?
k
I just made it possible to download data - you could always download data using Flyte remote. Look at pyflyte fetch and if you have latest Flyte console looks at the data uris in the inputs and outputs tab
This is absolutely a fair usecase
q
Thanks @Ketan (kumare3) I'll have a look at it and get back to you if I have questions 🙂👍
you could always download data using Flyte remote.
Using
FlyteRemote
, I'm able to do something like that:
Copy code
from flytekit.remote import FlyteRemote
from flytekit.configuration import Config
from tempfile import mktemp

flyteremote = FlyteRemote(config=Config.auto())
uri = "s3://..."
tmpfile = mktemp()
flyteremote.file_access.get_data(local_path=tmpfile, remote_path=uri)
That's better since it's cloud-agnostic, I'm not using s3fs.
But is there a way to initialize a FlyteFile or a FlyteDirectory in the same way as flyte is doing it inside a task (i.e. downloading the remote data locally and giving a pointer to it) ?
Haha actually that's pretty simple:
Copy code
from flytekit.remote import FlyteRemote
from flytekit.types.directory import FlyteDirectory
from flytekit.types.file import FlyteFile
from flytekit.configuration import Config
from tempfile import mktemp, mkdtemp


def download_flyte_directory(uri):
    flyteremote = FlyteRemote(config=Config.auto())
    tmp_directory = mkdtemp()
    flyteremote.file_access.download_directory(local_path=tmp_directory, remote_path=uri)
    return FlyteDirectory(tmp_directory)


def download_flyte_file(uri):
    flyteremote = FlyteRemote(config=Config.auto())
    tmp_file = mktemp()
    flyteremote.file_access.download(local_path=tmp_file, remote_path=uri)
    return FlyteFile(tmp_file)


my_flyte_file = download_flyte_file("<s3://this_is_a_file.txt>")
my_flyte_directory = download_flyte_directory("<s3://this_is_a_directory/>")
e
Hey @Ketan (kumare3) If i want to share a downloaded file between other tasks (multiple GBs for ready-only), i should pass it as a FlyteFile? or does the remote download it to an already accessible location (shared vol)? I know that i can also set a storage in the resources and probably share it that way also… CC @Chen Vilinsky @Yaniv Ben Zvi
k
Today you have to redownload the file. You can probably mount volumes etc in your pod to reuse
Shameless plug - we have a way to accelerate files for reuse
e
You mean using something like FUSE...
k
Fuse is not going to speed, just download in background right? But you can, or just attach Efs or share a volume
c
Do you have a documentation on mounting volumes? Currently, we are trying to use a task which downloads a directory from gcs and the directory will be used by another map_task. I guess mount volume will do the trick, because passing the FlyteDirectory seems to not work...
k
pod_templates
e
Thnx