Hello.. what the best way to represent a complex s...
# announcements
n
Hello.. what the best way to represent a complex schema in a dataframe passed between tasks? I think its not supported natively in
StructuredDataset
, but can that be somehow represented as
FlyteSchema
?
👀 1
Can a
List
or
Dict
type be specified as part of the schema?
Or, for example, if i have a spark dataframe with this schema, can this be enforced somehow?
Copy code
schema = StructType([
        StructField('name', StructType([
            StructField('firstname', StringType(), True),
            StructField('middlename', StringType(), True),
            StructField('lastname', StringType(), True)
            ])),
        StructField('state', StringType(), True),
        StructField('gender', StringType(), True)
        ])
n
are you using a pandas or spark dataframe? Not sure if nested dict/struct-like objects are supported currently… @Yee @Kevin Su @Eduardo Apolinario (eapolinario)?
n
spark
would be good if could define it in pandas as well.. some less complex datatypes like a
list
or a
set
n
to be clear, you want to support the use case of arbitrary collection/map-type objects as individual values in a dataframe?
n
for pandas dataframes, yes. and for spark dataframes different
StructType
s
s
cc: @Shivay Lamba
160 Views