little-cricket-84530
12/14/2022, 6:09 PMhallowed-mouse-14616
12/14/2022, 6:45 PMbroad-monitor-993
12/14/2022, 6:54 PMlittle-cricket-84530
12/14/2022, 7:01 PMthankful-minister-83577
thankful-minister-83577
little-cricket-84530
12/14/2022, 7:22 PMthankful-minister-83577
thankful-minister-83577
little-cricket-84530
12/14/2022, 7:26 PMlittle-cricket-84530
12/14/2022, 7:26 PMthankful-minister-83577
little-cricket-84530
12/14/2022, 7:27 PMbroad-monitor-993
12/14/2022, 10:05 PMi think if the data is easily chunk-able and is slightly cpu intensive, i would opt for the map task approachagreed! basically you’ll want the task that produces the data to output 2 things: (i) the
StructuredDataset
itself with the chunked parquet file and (ii) a list of filenames for each chunk.
Then, you’ll want to map over a dataclass that contains a reference to the StructuredDataset
in addition to the filename of the chunk you want to process for a particular maptasklittle-cricket-84530
12/14/2022, 10:08 PMbroad-monitor-993
12/14/2022, 10:14 PMStructuredDataset
as input in a map task, is there a way for me to only download one of the chunks onto the map task pod?broad-monitor-993
12/14/2022, 10:21 PMcurrent_context()
to use the file_access
API to download a specific chunklittle-cricket-84530
12/14/2022, 10:22 PMbroad-monitor-993
12/14/2022, 10:23 PMthankful-minister-83577
thankful-minister-83577
thankful-minister-83577