Hi community, we are working on this <PR> handling...
# contribute
n
Hi community, we are working on this PR handling union-type compatibility check. After a few discussions, @thankful-minister-83577 stated this may be our final decision to support a strict subset match only. To keep it simple, we might consider only supporting JSON schemas from Mashumaro. Curious about what you think! cc @cool-lifeguard-49380 @high-accountant-32689
t
yeah let’s see what fabio thinks. eduardo are you okay with this?
h
Yes, I like this.
d
so pydantic's json schema is also supported, right?
or not
t
it should be yeah. but i would not think of it like, supporting pydantic, it’s more like supporting the style of json schema that pydantic produces. but the code is not specific to pydantic right? if i start using attrs tomorrow or i write my own dataclass-like library, the json schema that it produces might be similar to pydantic’s
also i think it’s okay to not support passing pydantic -> dataclass and vice-versa.
(since this would not be allowed in mypy)
n
If we want to support capability checks (the strict subset check) with schemas generated from different packages, maybe we should try the LGPL package since handling all those schemas by ourselves will require specific logic, which is not expected.
c
> also i think it’s okay to not support passing pydantic -> dataclass and vice-versa. Yes, let’s not support this, I don’t think it’s necessary. I think that we should decide on some type(s) of schema(s) that the backend understands and makes use of to solve ambiguity in union types. If we support multiple types of schemas, the backend doesn’t have to translate between different types because we also don’t allow pydantic -> dataclass etc. Type transformers need to provide the schema in the or one of the supported format(s). If a new type transformer for a json like type is added, no backend changes should be needed. Instead, all potentially required logic to provide the schema in the right format should be in the type transformer in flytekit. This way we maintain extensibility. I don’t have good overview which schema format(s) that should be. Is Mashumaro well adopted in the industry? One thing I still dislike a bit: For cache checks, we don’t detect schema changes via a schema potentially provided in the literal types
metadata
but via its type_structure which effectively is also a schema. I kinda dislike that for cache checks we’ll use the type_structure schema but if your type transformer wants to take part in the union ambiguity solving logic, you need to also provide the schema in a specific format as the literal type’s metadata. I wished there was a single way to provide the schema to the backend that is used for everything. Not sure if it’s possible to use the type structure object to solve the union problem though.
d
One thing I still dislike a bit:
For cache checks, we don’t detect schema changes via a schema potentially provided in the literal types
metadata
but via its type_structure which effectively is also a schema.
I kinda dislike that for cache checks we’ll use the type_structure schema but if your type transformer wants to take part in the union ambiguity solving logic, you need to also provide the schema in a specific format as the literal type’s metadata. I wished there was a single way to provide the schema to the backend that is used for everything. Not sure if it’s possible to use the type structure object to solve the union problem though. (edited)
this is a good point, but I think we should put this as a low priority
maybe a housekeeping issue?
it will be time-consuming to do this and we have lots of bugs to be solved