acoustic-carpenter-78188
02/13/2023, 3:59 PMlabels.txt file contains the labels of each image example_*.png.
dataset /
labels.txt
example_abc.png
example_xyz.png
...
Goal: What should the final outcome look like, ideally?
As a Flyte user, I should be able to lazily iterate over a FlyteDirectory of such a dataset such that I don't have to download the entire directory and instead start training as soon as the first batch of data is available on the running Pod.
Requirements
• Should support iteration over files in the directory in a random order
• Potentially support iteration of batches of files in a random order
Describe alternatives you've considered
Users would have to create their own workaround to:
1. store the filenames for all the examples in a custom Flyte type (probably a dataclass)
2. create their own iterable downloader by combining the root FlyteDirectory with the filenames from (1) and use the FileAccessProvider to fetch individual files.
3. iterate over the files in the user-defined dataloader
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyte