Hi community we are working on this <https github com flyteo Flyte #contribute

Hi community, we are working on this <PR> handling...

nutritious-telephone-58961

11/14/2024, 1:27 AM

Hi community, we are working on this PR handling union-type compatibility check. After a few discussions, @thankful-minister-83577 stated this may be our final decision to support a strict subset match only. To keep it simple, we might consider only supporting JSON schemas from Mashumaro. Curious about what you think! cc @cool-lifeguard-49380 @high-accountant-32689

thankful-minister-83577

11/14/2024, 1:28 AM

yeah let’s see what fabio thinks. eduardo are you okay with this?

high-accountant-32689

11/14/2024, 1:37 AM

Yes, I like this.

damp-lion-88352

11/14/2024, 1:58 AM

so pydantic's json schema is also supported, right?

damp-lion-88352

11/14/2024, 1:58 AM

or not

thankful-minister-83577

11/14/2024, 2:01 AM

it should be yeah. but i would not think of it like, supporting pydantic, it’s more like supporting the style of json schema that pydantic produces. but the code is not specific to pydantic right? if i start using attrs tomorrow or i write my own dataclass-like library, the json schema that it produces might be similar to pydantic’s

thankful-minister-83577

11/14/2024, 2:02 AM

also i think it’s okay to not support passing pydantic -> dataclass and vice-versa.

thankful-minister-83577

11/14/2024, 2:02 AM

(since this would not be allowed in mypy)

nutritious-telephone-58961

11/14/2024, 2:16 AM

If we want to support capability checks (the strict subset check) with schemas generated from different packages, maybe we should try the LGPL package since handling all those schemas by ourselves will require specific logic, which is not expected.

cool-lifeguard-49380

11/14/2024, 7:31 AM

> also i think it’s okay to not support passing pydantic -> dataclass and vice-versa. Yes, let’s not support this, I don’t think it’s necessary. I think that we should decide on some type(s) of schema(s) that the backend understands and makes use of to solve ambiguity in union types. If we support multiple types of schemas, the backend doesn’t have to translate between different types because we also don’t allow pydantic -> dataclass etc. Type transformers need to provide the schema in the or one of the supported format(s). If a new type transformer for a json like type is added, no backend changes should be needed. Instead, all potentially required logic to provide the schema in the right format should be in the type transformer in flytekit. This way we maintain extensibility. I don’t have good overview which schema format(s) that should be. Is Mashumaro well adopted in the industry? One thing I still dislike a bit: For cache checks, we don’t detect schema changes via a schema potentially provided in the literal types

metadata

but via its type_structure which effectively is also a schema. I kinda dislike that for cache checks we’ll use the type_structure schema but if your type transformer wants to take part in the union ambiguity solving logic, you need to also provide the schema in a specific format as the literal type’s metadata. I wished there was a single way to provide the schema to the backend that is used for everything. Not sure if it’s possible to use the type structure object to solve the union problem though.

damp-lion-88352

11/15/2024, 3:41 AM

One thing I still dislike a bit:

For cache checks, we don’t detect schema changes via a schema potentially provided in the literal types
metadata
but via its type_structure which effectively is also a schema.

I kinda dislike that for cache checks we’ll use the type_structure schema but if your type transformer wants to take part in the union ambiguity solving logic, you need to also provide the schema in a specific format as the literal type’s metadata. I wished there was a single way to provide the schema to the backend that is used for everything. Not sure if it’s possible to use the type structure object to solve the union problem though. (edited)

this is a good point, but I think we should put this as a low priority

damp-lion-88352

11/15/2024, 3:41 AM

maybe a housekeeping issue?

damp-lion-88352

11/15/2024, 3:41 AM

it will be time-consuming to do this and we have lots of bugs to be solved

Open in Slack

Previous Next