Hi, can somebody help me to understand FlyteFile a...
# ask-the-community
d
Hi, can somebody help me to understand FlyteFile and FlyteDirectory concepts? I have a dataset of images on the host machine, where demo cluster is deployed. Is there a recommended way how to access this dataset? For now I see only one solution, which us upload the dataset somewhere and then download by a flyte task and put in a FlyteDirectory. But maybe there is a better way?
but also, FlyteFile and FlyteDirectory are just abstracted references to data
d
I’m very new to flyte so might be didn’t understand everything. But as I understood, flyte uses s3 and downloads FlyteDirectory in every task. So it doesn’t matter if I use FlyteDirectly directly or upload my dataset to s3 by hand and by hand download it in the task?
s
@Denis Shvetsov flytefile and flytedirectory are abstractions. it handles automatic upload and download as it seamlessly interacts with the specified blob storage, be it s3 or gcs or abs. they use fsspec under the hood. you can also manually upload and download the data, but the whole point of annotating the parameter with flytefile/flytedirectory is to avoid doing it manually! 🙂
d
So is there any recommended way how to do it?
Also I’m trying to put a folder with size 2.5 GB and 4400 images to FlyteDirectory, I don’t know where to see the logs of the task. Found only this
Copy code
Attempt 01
Failed
tar: Removing leading `/' from member names

Getting <s3://my-s3-bucket/flytesnacks/development/IY5ZQHNFDBKVZ26NLZGB43DTLM======/script_mode.tar.gz> to /root/
Can it be because of the size? When I try to save smaller dataset it works fine
s
oh, that could be the reason. have you checked the kubernetes logs?
the option to view pod logs should be available on the UI.
@Pryce since you've been pursuing this as well, thought you might want to chime in in case you have an answer.
d
Could you tell me where I can obtain the token for kubernetes dashboard? if I started flyte with
flytectl demo start
?
s
don't you see a URL on the UI?
d
Ah, ok, it is possible to skip authentication with a token 🙂 I see the same logs as in “Show Error”
Copy code
tar: Removing leading `/' from member names
Getting <s3://my-s3-bucket/flytesnacks/development/DJL24MBDQIXGHMP4E4JSLOY6KA======/script_mode.tar.gz> to /root/
p
Hi @Denis Shvetsov I've actually seen this recently. Would you mind sharing your code or at least a minimal version to reproduce?
d
And if it helps. If replace the url on 27 line with https://storage.googleapis.com/picrystal-bucket/crowdhuman/Images_CUT.zip (way smaller dataset) it works.
p
Mmmm yeah that's interesting
What command are you using the run this btw?
d
Copy code
pyflyte run --remote % workflow
Where % is a name of the file
p
Are you able to get it working with the smaller dataset? The smaller dataset runs fine for me, there's an issue with the larger one.
@Denis Shvetsov I figured out the issue. You're request is writing chunks to memory before being written to the object store. Since you're trying to fetch a 2.5GB file, and your task is limited to the default 1GB memory, it's failing with OOM. This should fix it:
Copy code
@task(cache=False, cache_version="1.2", requests=Resources(cpu='1', mem='3Gi'))
def download_crowdhuman_dataset() -> FlyteDirectory:
d
That works, thank you!
c
Hey, I'm having a very similar error message and what could be a similar problem: 1. How can I diagnose if the issue is that I'm not giving a task enough memory? 2. If it's not, how could I surface an error message here to find out why my task is failing?
s
an OOM error should usually surface in the pod logs or the UI, or any error, for that matter. if that isn't happening, it's likely related to the pod crashing. have you tried describing the pod?
kubectl describe pod <pod-name> -n <namespace>
c
I'll try to make a separate thread with my issue, I'm not seeing an OOM error but I am getting a similar message to what @Denis Shvetsov got above (https://flyte-org.slack.com/archives/CP2HDHKE1/p1700229165926579?thread_ts=1700149252.292709&amp;cid=CP2HDHKE1)