https://flyte.org logo
#ask-the-community
Title
# ask-the-community
d

Denis Shvetsov

11/16/2023, 3:40 PM
Hi, can somebody help me to understand FlyteFile and FlyteDirectory concepts? I have a dataset of images on the host machine, where demo cluster is deployed. Is there a recommended way how to access this dataset? For now I see only one solution, which us upload the dataset somewhere and then download by a flyte task and put in a FlyteDirectory. But maybe there is a better way?
but also, FlyteFile and FlyteDirectory are just abstracted references to data
d

Denis Shvetsov

11/16/2023, 8:36 PM
I’m very new to flyte so might be didn’t understand everything. But as I understood, flyte uses s3 and downloads FlyteDirectory in every task. So it doesn’t matter if I use FlyteDirectly directly or upload my dataset to s3 by hand and by hand download it in the task?
s

Samhita Alla

11/17/2023, 4:51 AM
@Denis Shvetsov flytefile and flytedirectory are abstractions. it handles automatic upload and download as it seamlessly interacts with the specified blob storage, be it s3 or gcs or abs. they use fsspec under the hood. you can also manually upload and download the data, but the whole point of annotating the parameter with flytefile/flytedirectory is to avoid doing it manually! 🙂
d

Denis Shvetsov

11/17/2023, 8:34 AM
So is there any recommended way how to do it?
Also I’m trying to put a folder with size 2.5 GB and 4400 images to FlyteDirectory, I don’t know where to see the logs of the task. Found only this
Copy code
Attempt 01
Failed
tar: Removing leading `/' from member names

Getting <s3://my-s3-bucket/flytesnacks/development/IY5ZQHNFDBKVZ26NLZGB43DTLM======/script_mode.tar.gz> to /root/
Can it be because of the size? When I try to save smaller dataset it works fine
s

Samhita Alla

11/17/2023, 11:26 AM
oh, that could be the reason. have you checked the kubernetes logs?
the option to view pod logs should be available on the UI.
@Pryce since you've been pursuing this as well, thought you might want to chime in in case you have an answer.
d

Denis Shvetsov

11/17/2023, 12:43 PM
Could you tell me where I can obtain the token for kubernetes dashboard? if I started flyte with
flytectl demo start
?
s

Samhita Alla

11/17/2023, 1:46 PM
don't you see a URL on the UI?
d

Denis Shvetsov

11/17/2023, 1:52 PM
Ah, ok, it is possible to skip authentication with a token 🙂 I see the same logs as in “Show Error”
Copy code
tar: Removing leading `/' from member names
Getting <s3://my-s3-bucket/flytesnacks/development/DJL24MBDQIXGHMP4E4JSLOY6KA======/script_mode.tar.gz> to /root/
p

Pryce

11/17/2023, 4:16 PM
Hi @Denis Shvetsov I've actually seen this recently. Would you mind sharing your code or at least a minimal version to reproduce?
d

Denis Shvetsov

11/17/2023, 4:26 PM
And if it helps. If replace the url on 27 line with https://storage.googleapis.com/picrystal-bucket/crowdhuman/Images_CUT.zip (way smaller dataset) it works.
p

Pryce

11/17/2023, 4:52 PM
Mmmm yeah that's interesting
What command are you using the run this btw?
d

Denis Shvetsov

11/17/2023, 4:54 PM
Copy code
pyflyte run --remote % workflow
Where % is a name of the file
p

Pryce

11/17/2023, 5:52 PM
Are you able to get it working with the smaller dataset? The smaller dataset runs fine for me, there's an issue with the larger one.
@Denis Shvetsov I figured out the issue. You're request is writing chunks to memory before being written to the object store. Since you're trying to fetch a 2.5GB file, and your task is limited to the default 1GB memory, it's failing with OOM. This should fix it:
Copy code
@task(cache=False, cache_version="1.2", requests=Resources(cpu='1', mem='3Gi'))
def download_crowdhuman_dataset() -> FlyteDirectory:
d

Denis Shvetsov

11/20/2023, 9:34 AM
That works, thank you!