I've been having issues with the
pydantic
plugin too. Specifically I think the problem is when the
pydantic.BaseModel
is an input to an execution. When running a workflow that uses
pydantic
for inputs and outputs just of internal tasks it works fine.
I'm testing the simplest possible case
class Config(BaseModel):
value: int
I think the problem stems from the slightly unusual way that
pydantic.BaseModel
is serialised to a
FlytLiteral
. The format is a map with 2 keys:
BaseModel JSON
and
Serialized Flyte Objects
.
BaseModel JSON
appears to be a struct formed from the result of
pydantic.BaseModel.json()
. When there are complex types e.g. a pandas Dataframe these get put in
Serialized Flyte Objects
and a placeholder is put in
BaseModel JSON
.
This structure makes sense to me as it enables working with complex types that Flyte can serialise but
pydantic
can't. The problem is that the transformer just declares the literal type to be
types.LiteralType(simple=types.SimpleType.STRUCT)
.
Everything works most of the time despite the types being very different from what they are declared to be. I think the problem is that there is an
explicit validation in `flyteadmin` that fails.
details: invalid config input wrong type. Expected simple:STRUCT , but got map_value_type:<union_type:<variants:<map_value_type:<simple:NONE > > variants:<simple:STRUCT > > >
This validation seems to only happen when the
pydantic.BaseModel
is used as input to an execution.
I think the solution is probably to update
BaseModelTransformer.get_literal_type
to reflect the literal that is actually created. However, I think this could be a bit tricky because
Serialized Flyte Objects
is a map type which could contain basically anything so its difficult to define the literal type for this.