thousands-car-79657
01/31/2024, 10:39 PMBaseModel
cacheable if one of its fields is FlyteFile
type? I guess I need some sort of HashMethod since BaseModel is a non-flyte offloaded object. Any recommended approach? Thanks!tall-lock-23197
remote_path
, that should hit the cache. also, if you're passing the same file, the cache should hit automatically.thousands-car-79657
02/01/2024, 4:51 PMremote_path
, I can specify different S3 paths (even if the file content is the same) and it should hit the cache?thousands-car-79657
02/01/2024, 4:51 PMtall-lock-23197
so fori don't think so. the remote_path has to be the same., I can specify different S3 paths ...remote_path
this works with the flytekit pydantic plugin right?should work! i think flytefile cache has to work out of the box.
thousands-car-79657
02/01/2024, 5:25 PMremote_path
or the local path
, no?tall-lock-23197
thousands-car-79657
02/01/2024, 5:27 PMtall-lock-23197
thousands-car-79657
02/01/2024, 5:31 PMclass MyModel(BaseModel):
ffile: Annotated[FlyteFile, HashMethod(hash_my_file)]
tall-lock-23197
thousands-car-79657
02/02/2024, 7:45 AMtall-lock-23197
tall-lock-23197
thousands-car-79657
02/02/2024, 4:33 PMtall-lock-23197
thousands-car-79657
02/02/2024, 6:54 PMthousands-car-79657
02/03/2024, 1:46 AM{
"input": input,
"params": params,
}
But got the error in the logthousands-car-79657
02/03/2024, 1:46 AMclass ConfigWithFlyteFiles(BaseModel):
flytefiles: list[FlyteFile]
def __eq__(self, __value: object) -> bool:
return isinstance(__value, ConfigWithFlyteFiles) and all(
pathlib.Path(self_file).read_text() == pathlib.Path(other_file).read_text()
for self_file, other_file in zip(self.flytefiles, __value.flytefiles)
)
@task
def example_task(input: str) -> str:
return input
@dynamic(cache=True, cache_version="1.0")
def example_dynamic(input: str, params: ConfigWithFlyteFiles) -> str:
return example_task(input=input)
@workflow
def example_workflow(input: str, params: ConfigWithFlyteFiles) -> str:
return example_dynamic(input=input, params=params)
thousands-car-79657
02/03/2024, 1:47 AM_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "invalid params input wrong type. Expected simple:STRUCT , but got map_value_type:<union_type:<variants:<map_value_type:<blob:<> > > variants:<simple:STRUCT > > > "
debug_error_string = "UNKNOWN:Error received from peer ipv4:172.21.196.17:443 {created_time:"2024-02-03T01:43:35.346807118+00:00", grpc_status:3, grpc_message:"invalid params input
wrong type. Expected simple:STRUCT , but got map_value_type:<union_type:<variants:<map_value_type:<blob:<> > > variants:<simple:STRUCT > > > "}"
thousands-car-79657
02/03/2024, 1:48 AMflytekit==1.10.3
, flytekitplugins-pydantic==1.10.3
, pydantic==1.10.13
Appreciate any points when you have a chance to take a look, thanks, @tall-lock-23197!tall-lock-23197
user
02/03/2024, 7:54 AMtall-lock-23197
calm-pilot-2010
02/06/2024, 12:31 AMpydantic
plugin too. Specifically I think the problem is when the pydantic.BaseModel
is an input to an execution. When running a workflow that uses pydantic
for inputs and outputs just of internal tasks it works fine.
I'm testing the simplest possible case
class Config(BaseModel):
value: int
I think the problem stems from the slightly unusual way that pydantic.BaseModel
is serialised to a FlytLiteral
. The format is a map with 2 keys: BaseModel JSON
and Serialized Flyte Objects
. BaseModel JSON
appears to be a struct formed from the result of pydantic.BaseModel.json()
. When there are complex types e.g. a pandas Dataframe these get put in Serialized Flyte Objects
and a placeholder is put in BaseModel JSON
.
This structure makes sense to me as it enables working with complex types that Flyte can serialise but pydantic
can't. The problem is that the transformer just declares the literal type to be types.LiteralType(simple=types.SimpleType.STRUCT)
.
Everything works most of the time despite the types being very different from what they are declared to be. I think the problem is that there is an explicit validation in `flyteadmin` that fails.
details: invalid config input wrong type. Expected simple:STRUCT , but got map_value_type:<union_type:<variants:<map_value_type:<simple:NONE > > variants:<simple:STRUCT > > >
This validation seems to only happen when the pydantic.BaseModel
is used as input to an execution.
I think the solution is probably to update BaseModelTransformer.get_literal_type
to reflect the literal that is actually created. However, I think this could be a bit tricky because Serialized Flyte Objects
is a map type which could contain basically anything so its difficult to define the literal type for this.thousands-car-79657
02/06/2024, 2:28 AM