when we pass df to and from flyte tasks - is there...
# flyte-support
f
when we pass df to and from flyte tasks - is there any native way to ask Flyte to do sampling as its reading and writing the data? eg i sometimes want to just randomly sample or write N rows to/from a task and wondering if Flyte could in some way do this natively?
use case here is say task A makes a big df task B1 and B2 need that as input - B1 needs it all but B2 actually only needs a random sample of N from df
h
@faint-cat-74485, we don't support this today. That said, how should the sampling be expressed? Let's take your example, how would sampling of the upstream df be defined?
f
I'm not 100% sure tbh, maybe I could pass the sampling param to the task decorator. Maybe it's a horrible idea idk. Just that I find myself having some tasks where first thing I do is quickly downsample the data right away.
h
@faint-cat-74485, can you capture that in github issue? [flyte-core]
f