Quick question about the dask plugin (this also applies more broadly to spark as well) - how is serialization handled between tasks that use a pandas vs dask dataframe (or spark vs dask, etc)?
For ex, task A processes a dataset with spark and returns a spark df. Task B is a dask task that loads the processed dataset to train a model. My current understanding is that the types need to properly line up in order for the workflow to type check, but the return type of A would be a pyspark DF while the argument in B would be a dask dataframe.
How are folks handling this case where we might want to mix spark and dask in pipelines? Does structured dataset figure into this? Any clarification would be appreciated