Xinzhou Liu
01/31/2024, 10:39 PMBaseModel
cacheable if one of its fields is FlyteFile
type? I guess I need some sort of HashMethod since BaseModel is a non-flyte offloaded object. Any recommended approach? Thanks!Samhita Alla
remote_path
, that should hit the cache. also, if you're passing the same file, the cache should hit automatically.Xinzhou Liu
02/01/2024, 4:51 PMremote_path
, I can specify different S3 paths (even if the file content is the same) and it should hit the cache?Xinzhou Liu
02/01/2024, 4:51 PMSamhita Alla
so fori don't think so. the remote_path has to be the same., I can specify different S3 paths ...remote_path
this works with the flytekit pydantic plugin right?should work! i think flytefile cache has to work out of the box.
Xinzhou Liu
02/01/2024, 5:25 PMremote_path
or the local path
, no?Samhita Alla
Xinzhou Liu
02/01/2024, 5:27 PMSamhita Alla
Xinzhou Liu
02/01/2024, 5:31 PMclass MyModel(BaseModel):
ffile: Annotated[FlyteFile, HashMethod(hash_my_file)]
Samhita Alla
Xinzhou Liu
02/02/2024, 7:45 AMSamhita Alla
Samhita Alla
Xinzhou Liu
02/02/2024, 4:33 PMSamhita Alla
Xinzhou Liu
02/02/2024, 6:54 PMXinzhou Liu
02/03/2024, 1:46 AM{
"input": input,
"params": params,
}
But got the error in the logXinzhou Liu
02/03/2024, 1:46 AMclass ConfigWithFlyteFiles(BaseModel):
flytefiles: list[FlyteFile]
def __eq__(self, __value: object) -> bool:
return isinstance(__value, ConfigWithFlyteFiles) and all(
pathlib.Path(self_file).read_text() == pathlib.Path(other_file).read_text()
for self_file, other_file in zip(self.flytefiles, __value.flytefiles)
)
@task
def example_task(input: str) -> str:
return input
@dynamic(cache=True, cache_version="1.0")
def example_dynamic(input: str, params: ConfigWithFlyteFiles) -> str:
return example_task(input=input)
@workflow
def example_workflow(input: str, params: ConfigWithFlyteFiles) -> str:
return example_dynamic(input=input, params=params)
Xinzhou Liu
02/03/2024, 1:47 AM_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "invalid params input wrong type. Expected simple:STRUCT , but got map_value_type:<union_type:<variants:<map_value_type:<blob:<> > > variants:<simple:STRUCT > > > "
debug_error_string = "UNKNOWN:Error received from peer ipv4:172.21.196.17:443 {created_time:"2024-02-03T01:43:35.346807118+00:00", grpc_status:3, grpc_message:"invalid params input
wrong type. Expected simple:STRUCT , but got map_value_type:<union_type:<variants:<map_value_type:<blob:<> > > variants:<simple:STRUCT > > > "}"
Xinzhou Liu
02/03/2024, 1:48 AMflytekit==1.10.3
, flytekitplugins-pydantic==1.10.3
, pydantic==1.10.13
Appreciate any points when you have a chance to take a look, thanks, @Samhita Alla!Samhita Alla
Slackbot
02/03/2024, 7:54 AMSamhita Alla
Thomas Newton
02/06/2024, 12:31 AMpydantic
plugin too. Specifically I think the problem is when the pydantic.BaseModel
is an input to an execution. When running a workflow that uses pydantic
for inputs and outputs just of internal tasks it works fine.
I'm testing the simplest possible case
class Config(BaseModel):
value: int
I think the problem stems from the slightly unusual way that pydantic.BaseModel
is serialised to a FlytLiteral
. The format is a map with 2 keys: BaseModel JSON
and Serialized Flyte Objects
. BaseModel JSON
appears to be a struct formed from the result of pydantic.BaseModel.json()
. When there are complex types e.g. a pandas Dataframe these get put in Serialized Flyte Objects
and a placeholder is put in BaseModel JSON
.
This structure makes sense to me as it enables working with complex types that Flyte can serialise but pydantic
can't. The problem is that the transformer just declares the literal type to be types.LiteralType(simple=types.SimpleType.STRUCT)
.
Everything works most of the time despite the types being very different from what they are declared to be. I think the problem is that there is an explicit validation in `flyteadmin` that fails.
details: invalid config input wrong type. Expected simple:STRUCT , but got map_value_type:<union_type:<variants:<map_value_type:<simple:NONE > > variants:<simple:STRUCT > > >
This validation seems to only happen when the pydantic.BaseModel
is used as input to an execution.
I think the solution is probably to update BaseModelTransformer.get_literal_type
to reflect the literal that is actually created. However, I think this could be a bit tricky because Serialized Flyte Objects
is a map type which could contain basically anything so its difficult to define the literal type for this.Xinzhou Liu
02/06/2024, 2:28 AM