I m thinking about writing a type transformer for pydantic s Flyte #flyte-support

I’m thinking about writing a type transformer for ...

cool-lifeguard-49380

03/02/2023, 5:29 PM

I’m thinking about writing a type transformer for pydantic’s

BaseModel

(which has several benefits over data classes which is why our ML engineers asked for this).

Copy code

from pydantic import BaseModel

class Conf(BaseModel):
    ...

@task
def train(conf: Conf):
   ...

Assuming there is a

class BaseModelTransformer(TypeTransformer[BaseModel]):

, would it be invoked if the user specifies

train(conf: Conf)

or only in case they specify

train(conf: BaseModel)

cool-lifeguard-49380

03/02/2023, 5:41 PM

My understanding is that this should work because this logic kicks in, is this correct?

thankful-minister-83577

03/02/2023, 6:31 PM

yeah this will work.

thankful-minister-83577

03/02/2023, 6:31 PM

would you be able to upstream this fabio?

thankful-minister-83577

03/02/2023, 6:32 PM

we’re actually thinking of rewriting the dataclass transformer as well, would be nice if they looked the same

thankful-minister-83577

03/02/2023, 6:35 PM

the main thing is just that the underlying transformers should be called… that is if you have a pydantic model that has a

StructuredDataset

(or like another separate custom type), the same thing should happen as if it were not in a pydantic model

thankful-minister-83577

03/02/2023, 6:37 PM

not familiar enough with pydantic tbh, need to play around with it some more

cool-lifeguard-49380

03/02/2023, 6:39 PM

If we go forward with this, I will upstream 👍

we’re actually thinking of rewriting the dataclass transformer as well, would be nice if they looked the same

I think the pydantic base model transformer could be rather simple. For pydantic, I unfortunately need to know which exact class is trying to be deserialized.

Copy code

def pydantic_encoder(obj):
    if isinstance(obj, BaseModel):
        return {'__pydantic_model__': obj.__class__.__name__, **obj.dict()}
    else:
        return obj

def pydantic_decoder(obj):
    if '__pydantic_model__' in obj:
        model_name = obj.pop('__pydantic_model__')
        model_class = globals()[model_name]
        return model_class(**obj)
    else:
        return obj

serialized = json.dumps(m, default=pydantic_encoder)

reconstructed = json.loads(serialized, object_hook=pydantic_decoder)

assert reconstructed == m

(Maybe not get if from

globals()

, first try, but save the path from which we can get it with

importlib

.) A base model has a

.schema_json()

which I would also save into the protobuf. Then, at deserialization time, we load the class using

importlib

, compare the schemas, and load using

pydantic_decoder

. I’m not 100% convinced of the part with

importlib

😕 but don’t have a better idea yet since we need to know the class when instantiating the python value again. Are you aware of a better approach?

cool-lifeguard-49380

03/02/2023, 6:47 PM

The nice thing if we know the class is that this “integer being converted to float” during a json (de-)serialization roundtrip doesn’t happen. The dataclass_json transformer walks the dataclass and converts back to int, right?

thankful-minister-83577

03/02/2023, 6:47 PM

let me play around with this early next week and get back to you?

thankful-minister-83577

03/02/2023, 6:48 PM

and thank you

thankful-minister-83577

03/02/2023, 6:48 PM

for the upstream

cool-lifeguard-49380

03/02/2023, 6:48 PM

let me play around with this early next week and get back to you?

Of course. If you have an idea for a better approach, I can also try it out if you give me the hint ..

agreeable-kitchen-44189

03/03/2023, 6:07 PM

We do have a custom de/serializer like this within Pachama, I remember it having some slight drawbacks but not exactly which. I’m currently OOO but will be back Monday to take a look

🙏 1

cool-lifeguard-49380

03/03/2023, 7:19 PM

@thankful-minister-83577 I gave this some more thoughts and actually, if I’m not mistaken, it can be done way simpler and cleaner than I thought yesterday. What do you think about this implementation? And about what I noted here? (It doesn’t rely on this hack with

globals()

importlib

anymore that yesterday I thought I would need.)

cool-lifeguard-49380

03/07/2023, 5:13 PM

@broad-monitor-993 about pydantic base model transformer

🙏 1

cool-lifeguard-49380

03/07/2023, 5:13 PM

@rich-garden-69988

cool-lifeguard-49380

03/07/2023, 5:13 PM

Let’s compare 🙂

rich-garden-69988

03/07/2023, 5:14 PM

Want to have a quick call potentially?

👍 1

cool-lifeguard-49380

03/07/2023, 5:36 PM

Interested in your thoughts about the implementation and whether this is interesting to upstream like this or with modifications @broad-monitor-993 🙂

rich-garden-69988

03/07/2023, 5:37 PM

I think we'd benefit from automatic type registration, so downstream implementers of new models don't have to explicitly register (or is that handled already?)

rich-garden-69988

03/07/2023, 5:39 PM

We also had a lot of issues when we ran this at scale (if you pass model spec to 1,000s of tasks, gRPC starts to become a huge issue) - serializing to file resolves this, but just wanted to point this out!

broad-monitor-993

03/07/2023, 5:47 PM

let’s have a call to coordinate this effort? @high-accountant-32689 @cool-lifeguard-49380 @rich-garden-69988 what times work for you this week? What timezones are y’all in?

rich-garden-69988

03/07/2023, 5:52 PM

Very flexible today, tomorrow after 2PM PT, pretty flexible Thursday after 10AM PT until 2PM

cool-lifeguard-49380

03/07/2023, 5:53 PM

I’m in CET so it’s ~7pm now. Flexible later today after 8pm. Thursday evening my time/day your time would also work well.

rich-garden-69988

03/07/2023, 6:07 PM

Would be down for later today (if not too late for you)

cool-lifeguard-49380

03/07/2023, 6:10 PM

Depends on when 🙂 In an hour its 8pm here. Would work for me between ~8-10:30

rich-garden-69988

03/07/2023, 6:10 PM

any time in that slot works for me 🙂

cool-lifeguard-49380

03/07/2023, 6:15 PM

Actually I could also do from now on, not 8

rich-garden-69988

03/07/2023, 6:36 PM

@broad-monitor-993?

broad-monitor-993

03/07/2023, 7:46 PM

sorry, my day today is packed with meetings… do you mind if we chat on Thursday 11am PT, 2pm ET, 8pm CET?

👍 1

rich-garden-69988

03/07/2023, 8:18 PM

Thurs 11AM works for me!

broad-monitor-993

03/07/2023, 8:26 PM

@cool-lifeguard-49380 @rich-garden-69988 mind sending me an email for a calendar invite?

rich-garden-69988

03/07/2023, 8:29 PM

I can send one over

rich-garden-69988

03/07/2023, 8:29 PM

anyone else to add?

thankful-minister-83577

03/07/2023, 8:53 PM

me please

👍 1

broad-monitor-993

03/07/2023, 8:55 PM

also @high-accountant-32689 probably

👍 2

173 Views

Open in Slack

Previous Next