https://flyte.org logo
#flytekit
Title
# flytekit
v

Varun Kulkarni

08/29/2022, 7:54 PM
👋 is it possible to create an annotated
FlyteSchema
object that also includes metadata on whether a particular column is nullable vs required, along with its type?
y

Yee

08/29/2022, 7:57 PM
can you take a look at the structured dataset object instead? structured datasets support a broader set of types.
v

Varun Kulkarni

08/29/2022, 8:10 PM
thanks for the response - can you share an example of how i can annotate a structured dataset to also include info on whether null values are allowed? dont see anything on this topic in the documentation
i guess the example uses
kwtypes
, which upon introspecting the code looks like it is relying on Python's native type system so a different phrasing of this question is: do Structured Datasets / FlyteSchemas allow for
Optional
/
Union
type annotations in their column definitions?
d

Dylan Wilder

08/29/2022, 8:55 PM
alt: py 3.9 supports js style unions
field: str | None
y

Yee

08/29/2022, 9:57 PM
you should be able to use optional types in structured dataset columns yes.
🙏 2
though not for flyteschema - the column types there are much more limited.
k

Kevin Su

08/30/2022, 8:30 AM
@Varun Kulkarni is this what you want?
Copy code
cols = kwtypes(Name=str, Age=typing.Optional[int], Height=int)


@task
def get_df(a: int) -> Annotated[pd.DataFrame, cols]:
    return pd.DataFrame({"Name": ["Tom", "Joseph"], "Adg": [a, None], "Height": [160, 178]})
For now, the column is also nullable even if you don’t use
typing.Optional
v

Varun Kulkarni

08/30/2022, 2:08 PM
yeah thats what im looking for! for now we're not relying on flyte to perform the actual enforcement of types - our application can handle that after serialization / deserialization - but it would be great if in the future we can have some sort of guarantee that e.g. a structured dataset does not contain any nulls after deserialization if the column isnt optional
k

Kevin Su

08/30/2022, 2:14 PM
Have you try Pandera plugin? it provides a flexible and expressive interface for defining schemas for tabular data. https://docs.flyte.org/projects/cookbook/en/latest/auto/integrations/flytekit_plugins/pandera_examples/index.html
70 Views