Hi team! How can I make a pydantic `BaseModel` cacheable if one of its fields is `FlyteFile` type? I...

thousands-car-79657

01/31/2024, 10:39 PM

Hi team! How can I make a pydantic

BaseModel

cacheable if one of its fields is

FlyteFile

type? I guess I need some sort of HashMethod since BaseModel is a non-flyte offloaded object. Any recommended approach? Thanks!

tall-lock-23197

02/01/2024, 8:49 AM

for a flytefile, i think if you specify where the file should be uploaded to using

remote_path

, that should hit the cache. also, if you're passing the same file, the cache should hit automatically.

thousands-car-79657

02/01/2024, 4:51 PM

ah, ok, so for

remote_path

, I can specify different S3 paths (even if the file content is the same) and it should hit the cache?

thousands-car-79657

02/01/2024, 4:51 PM

this works with the flytekit pydantic plugin right?

tall-lock-23197

02/01/2024, 5:23 PM

so for
remote_path
, I can specify different S3 paths ...

i don't think so. the remote_path has to be the same.

this works with the flytekit pydantic plugin right?

should work! i think flytefile cache has to work out of the box.

thousands-car-79657

02/01/2024, 5:25 PM

oh, I thought flytefile can be cached regardless of the

remote_path

or the local

path

, no?

tall-lock-23197

02/01/2024, 5:26 PM

i don't think so. i'd love for you to try! i think the remote_path will remain the same if you're sending the same file again.

thousands-car-79657

02/01/2024, 5:27 PM

I see, the hash key is based on file path not the content

tall-lock-23197

02/01/2024, 5:29 PM

yeah! you can hash the content using HashMethod.

thousands-car-79657

02/01/2024, 5:31 PM

Got it, that makes sense. Just curious, would the flyteflye annotated with hashmethod nested inside basemodel work?

Copy code

class MyModel(BaseModel):
    ffile: Annotated[FlyteFile, HashMethod(hash_my_file)]

tall-lock-23197

02/02/2024, 6:59 AM

yes, i believe it should work. i'm not able to get a simple pydantic plugin example to work on a flyte cluster; does it work for you?

thousands-car-79657

02/02/2024, 7:45 AM

I haven’t, it turns out to be quite complicated to upgrade to the flytekit version that supports pydantic plugin for my codebase

tall-lock-23197

02/02/2024, 8:30 AM

what flytekit version do you currently have?

tall-lock-23197

02/02/2024, 8:32 AM

the plugin needs to be updated so that it support pydantic v2: https://github.com/flyteorg/flyte/issues/4603

thousands-car-79657

02/02/2024, 4:33 PM

1.9.1

tall-lock-23197

02/02/2024, 5:39 PM

are you seeing any errors when upgrading?

thousands-car-79657

02/02/2024, 6:54 PM

yeah, there are some errors. Unfortunately, I didn’t save them and reverted the changes. I can repro later

thousands-car-79657

02/03/2024, 1:46 AM

Inputs:

Copy code

{
  "input": input,
  "params": params,
}

But got the error in the log

Untitled

thousands-car-79657

02/03/2024, 1:46 AM

I tried running the following snippet locally and remotely:

Copy code

class ConfigWithFlyteFiles(BaseModel):
    flytefiles: list[FlyteFile]

    def __eq__(self, __value: object) -> bool:
        return isinstance(__value, ConfigWithFlyteFiles) and all(
            pathlib.Path(self_file).read_text() == pathlib.Path(other_file).read_text()
            for self_file, other_file in zip(self.flytefiles, __value.flytefiles)
        )


@task
def example_task(input: str) -> str:
    return input


@dynamic(cache=True, cache_version="1.0")
def example_dynamic(input: str, params: ConfigWithFlyteFiles) -> str:
    return example_task(input=input)


@workflow
def example_workflow(input: str, params: ConfigWithFlyteFiles) -> str:
    return example_dynamic(input=input, params=params)

thousands-car-79657

02/03/2024, 1:47 AM

Copy code

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.INVALID_ARGUMENT
        details = "invalid params input wrong type. Expected simple:STRUCT , but got map_value_type:<union_type:<variants:<map_value_type:<blob:<> > > variants:<simple:STRUCT > > > "
        debug_error_string = "UNKNOWN:Error received from peer ipv4:172.21.196.17:443 {created_time:"2024-02-03T01:43:35.346807118+00:00", grpc_status:3, grpc_message:"invalid params input 
wrong type. Expected simple:STRUCT , but got map_value_type:<union_type:<variants:<map_value_type:<blob:<> > > variants:<simple:STRUCT > > > "}"

thousands-car-79657

02/03/2024, 1:48 AM

My version:

flytekit==1.10.3

flytekitplugins-pydantic==1.10.3

pydantic==1.10.13

Appreciate any points when you have a chance to take a look, thanks, @tall-lock-23197!

tall-lock-23197

02/03/2024, 7:54 AM

i'm unable to run the pydantic code with flytekit as well. i think the plugin needs to be updated and this is a bug. would you mind creating an issue? [flyte-bug]

user

02/03/2024, 7:54 AM

🐞 Create a new Flyte Bug issue: https://github.com/flyteorg/flyte/issues/new?assignees=&labels=bug%2Cuntriaged&template=bug_report.yaml&title=%5BBUG%5D+

tall-lock-23197

02/03/2024, 7:55 AM

if you could contribute a fix as well, that'd be great! cc @high-accountant-32689

calm-pilot-2010

02/06/2024, 12:31 AM

I've been having issues with the

pydantic

plugin too. Specifically I think the problem is when the

pydantic.BaseModel

is an input to an execution. When running a workflow that uses

pydantic

for inputs and outputs just of internal tasks it works fine. I'm testing the simplest possible case

Copy code

class Config(BaseModel):
    value: int

I think the problem stems from the slightly unusual way that

pydantic.BaseModel

is serialised to a

FlytLiteral

. The format is a map with 2 keys:

BaseModel JSON

and

Serialized Flyte Objects

BaseModel JSON

appears to be a struct formed from the result of

pydantic.BaseModel.json()

. When there are complex types e.g. a pandas Dataframe these get put in

Serialized Flyte Objects

and a placeholder is put in

BaseModel JSON

. This structure makes sense to me as it enables working with complex types that Flyte can serialise but

pydantic

can't. The problem is that the transformer just declares the literal type to be

types.LiteralType(simple=types.SimpleType.STRUCT)

. Everything works most of the time despite the types being very different from what they are declared to be. I think the problem is that there is an explicit validation in `flyteadmin` that fails.

Copy code

details: invalid config input wrong type. Expected simple:STRUCT , but got map_value_type:<union_type:<variants:<map_value_type:<simple:NONE > > variants:<simple:STRUCT > > >

This validation seems to only happen when the

pydantic.BaseModel

is used as input to an execution. I think the solution is probably to update

BaseModelTransformer.get_literal_type

to reflect the literal that is actually created. However, I think this could be a bit tricky because

Serialized Flyte Objects

is a map type which could contain basically anything so its difficult to define the literal type for this.

thousands-car-79657

02/06/2024, 2:28 AM

Good find! True that BaseModel can contain complex and unserializable types. But it’s good to know that it works in internal tasks, which are good enough for my use cases since I only need simple types and FlyteFile. But yeah, to make pydantic work end to end in all workflow components is tricky.

41 Views

Open in Slack

Previous Next

Flyte

Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.