average-winter-43654
08/10/2023, 12:48 PMStructured Dataset
.
• It seems that Flyte support two default ways to pass data between tasks (i.e. between k8s pods), Parquet
and Pickle
. However, parquet only works with pandas.DataFrame, which is automatically treated as a default dataset type called schema
by flyte. All of the other data types are supported by pickle.
• Flyte introduced a concept called Structured Dataset
to support custom data type for using Parquet. This doc introduced a way to include numpy array
as a Structured Dataset
by creating classes like Encoder
, Decoder
, and Renderer
.
• Is my understanding correct?
• And what is the real benefit of using Parquet? I know it may have a better performance and a more efficient compress rate than using Pickle. How about static type checking? Will flyte do type checks for schema
and Structrured Dataset
between tasks that are connected within one workflow during compile time?
Thanks.tall-lock-23197
tall-lock-23197
Will flyte do type checks ...What kind of type checks are you referring to?