https://flyte.org logo
n

Nada Saiyed

08/15/2022, 7:47 PM
Hello.. what the best way to represent a complex schema in a dataframe passed between tasks? I think its not supported natively in
StructuredDataset
, but can that be somehow represented as
FlyteSchema
?
👀 1
Can a
List
or
Dict
type be specified as part of the schema?
Or, for example, if i have a spark dataframe with this schema, can this be enforced somehow?
Copy code
schema = StructType([
        StructField('name', StructType([
            StructField('firstname', StringType(), True),
            StructField('middlename', StringType(), True),
            StructField('lastname', StringType(), True)
            ])),
        StructField('state', StringType(), True),
        StructField('gender', StringType(), True)
        ])
n

Niels Bantilan

08/15/2022, 7:53 PM
are you using a pandas or spark dataframe? Not sure if nested dict/struct-like objects are supported currently… @Yee @Kevin Su @Eduardo Apolinario (eapolinario)?
n

Nada Saiyed

08/15/2022, 7:54 PM
spark
would be good if could define it in pandas as well.. some less complex datatypes like a
list
or a
set
n

Niels Bantilan

08/15/2022, 7:57 PM
to be clear, you want to support the use case of arbitrary collection/map-type objects as individual values in a dataframe?
n

Nada Saiyed

08/15/2022, 8:00 PM
for pandas dataframes, yes. and for spark dataframes different
StructType
s
s

Samhita Alla

08/16/2022, 5:18 AM
cc: @Shivay Lamba
4 Views