We are having an issue with input parameters not being deter Flyte #flyte-support

We are having an issue with input parameters not b...

nutritious-match-70820

03/06/2025, 12:04 PM

We are having an issue with input parameters not being deterministic when moving up from flyte 1.13 We are runnin pyflyte with --params params.yaml Seems to coincide with the msgpack update from 1.14. Do you no longer consider dicts to be deterministic input, even if resolved from the cli in flytekit/interaction/click_types.JsonParamType. Or is this a oversight @task(cache=Cache(version="hard-coded",serialize=True)) def generic_task_no_params(dataset: str) -> str: return dataset @task(cache=Cache(version="hard-coded",serialize=True)) def generic_task(dataset: str, params: dict) -> str: return dataset @workflow def generic_wf(dataset: str, params: dict) -> None: generic_task_no_params(dataset=dataset) # caching works generic_task(dataset=dataset, params=params) # no caching generic_task(dataset=dataset, params=params["key"]) # no caching Note that this worked perfect with cache hits when we were at 1.13 version

careful-australia-19356

03/10/2025, 4:10 PM

@nutritious-match-70820, I'm looking into this.

nutritious-match-70820

03/17/2025, 2:30 PM

Any update on this? Do you need more information to verify the issue?

nutritious-match-70820

03/31/2025, 12:46 PM

We fixed it by overriding the function flytekit.core.type_engine.DictTransformer.dict_to_binary_literal With https://github.com/flyteorg/flytekit/blob/master/flytekit/core/type_engine.py#L2063 which is deprecated. Seems like the new msgpack version doesn't support the usecase with nested yaml/json

crooked-lifeguard-46802

05/12/2025, 1:10 PM

👋 has this been fixed? Still seeing non-deterministic cache misses on tasks that take dataclasses as inputs on

flytekit==1.15.3

freezing-airport-6809

05/12/2025, 1:26 PM

Hmm this is a change in behavior- cc @thankful-minister-83577 @damp-lion-88352 do you folks know about this. @crooked-lifeguard-46802 can you use pydantic (do you know if this happens with pydantic)

crooked-lifeguard-46802

05/12/2025, 1:29 PM

Switching to pydantic would be a pretty large refactor 😓 Another issue that I’m seeing on these tasks with cache misses is sometimes that inputs also become malformed in the Flyte UI (though logging the input from inside the task shows that the inputs is correctly deserialized) In the screenshot,

model_type

is an Enum type, and the value should be

card_cashing

but its getting exploded into a list/iterable for some reason 🤔

crooked-lifeguard-46802

05/12/2025, 1:30 PM

do you know if this happens with pydantic

We don’t have any pydantic dataclasses in our codebase currently

freezing-airport-6809

05/12/2025, 1:33 PM

Cool this is a good clue. @nutritious-match-70820 do you have an enum type too?

damp-lion-88352

05/12/2025, 1:36 PM

can you help me try dict with type hints?

crooked-lifeguard-46802

05/12/2025, 1:36 PM

The input mangling is not just happening for enum types, seeing it for

Optional[str]

as well 😬 Here’s the interface for this input:

Copy code

start_date: {"type":{"unionType":{"variants":[{"structure":{"dataclassType":{},"tag":"str"},"simple":"STRING"},{"structure":{"dataclassType":{},"tag":"none"},"simple":"NONE"}]}}}

crooked-lifeguard-46802

05/12/2025, 1:40 PM

can you help me try dict with type hints?

@damp-lion-88352 was this for me? (we are not currently using untyped dicts in our workflow)

damp-lion-88352

05/12/2025, 1:41 PM

I can find a time to investigate this at this week

crooked-lifeguard-46802

05/12/2025, 1:41 PM

I tried rerunning the same workflow with

FLYTE_USE_OLD_DC_FORMAT=true

and didn’t see cache misses btw • hard to say with certainity if this is a proper fix due to the non-deterministic nature of the bug!

damp-lion-88352

05/12/2025, 1:42 PM

I think I might know a bit of this problem

damp-lion-88352

05/12/2025, 1:43 PM

so javascript doesn't have integer, only float

damp-lion-88352

05/12/2025, 1:43 PM

but our console is built by javascript

damp-lion-88352

05/12/2025, 1:43 PM

so when you use int in dict, we will miss the cache

damp-lion-88352

05/12/2025, 1:44 PM

I think use

FLYTE_USE_OLD_DC_FORMAT=true

in console is a good way to hit cache

damp-lion-88352

05/12/2025, 1:44 PM

this might be a constraint for msgpack idl

crooked-lifeguard-46802

05/12/2025, 1:46 PM

so javascript doesn’t have integer, only float

but our console is built by javascript

so when you use int in dict, we will miss the cache

I don’t know much about flyte internals, but I would assume that flyte doesn’t really use javascript for the actual serde logic? I always thought that the javascript is only used to display the human-readable representation on the task input/output, and the serde is handled within python + go?

damp-lion-88352

05/12/2025, 1:47 PM

when you input in flyteconsole, we will turn your input to javascript value and seriialize them

crooked-lifeguard-46802

05/12/2025, 1:50 PM

Ahh I see, that makes sense. For our workflows, we are seeing the cache issues on inner tasks that get its inputs from the output of another task! Our high level workflow only takes in string inputs, no integers/dicts etc.

gentle-night-59824

05/12/2025, 5:02 PM

👋 @damp-lion-88352 thanks for willing to investigate this issue this week! I work with archit so I'm also happy to look into this a bit further from our side if you have any tips. I basically poked into the datacatalog logs & the database, but I wasn't able to find anything conclusive I noticed that for the exact same inputs the

tag_name

we were searching the database for was different from what the first execution that generated the cache we should've used, but wasn't sure where to go from there, perhaps testing flyte's tag generation logic locally on these inputs?

damp-lion-88352

05/13/2025, 8:27 AM

I think the reason is because

damp-lion-88352

05/13/2025, 8:28 AM

the first execution that generated the cache

this will generate cache for msgpack idl inputs which includes integer but when you input from frontend, you will generate msgpack idl inputs which doesn't include integer

damp-lion-88352

05/13/2025, 8:29 AM

I think one possible solution is we add a task like

Copy code

def return_input(dc: DC) -> DC
    return dc

damp-lion-88352

05/13/2025, 8:29 AM

and cache output after this

return_input

task

damp-lion-88352

05/13/2025, 8:30 AM

and 1 more option is to use

FLYTE_USE_OLD_DC_FORMAT=true

to launch task in your user client and frontend so that we will all use json str to cache

thankful-minister-83577

05/13/2025, 4:13 PM

so this is not repro-able only from the command line?

crooked-lifeguard-46802

05/13/2025, 4:51 PM

I don’t have a minimum reproducing example at hand unfortunately! We do not have javascript in our input loop where we are seeing the cache misses: • Our workflow that we are seeing the cache miss on only has a string input like so

experiment_id = baseline candidate

• Rest of the inputs all generated from tasks so there is no javascript round-trip (basically we already have

def return_input(dc: DC) -> DC

type tasks that generates the inputs for our tasks that we are seeing unexpected cache misses on) Could you point me to where is the hashed value of the cache inputs is computed? Is it computed in python or somewhere else?

damp-lion-88352

05/16/2025, 8:50 AM

Hi, @nutritious-match-70820 I just tried to reproduce your case by this workflow

Copy code

from flytekit import task, workflow
from flytekit.core.cache import Cache

@task(cache=Cache(version="hard-coded",serialize=True))
def generic_task_no_params(dataset: str) -> str:
    return dataset
@task(cache=Cache(version="hard-coded",serialize=True))
def generic_task(dataset: str, params: dict) -> str:
    return dataset
@workflow
def generic_wf(dataset: str, params: dict) -> None:
    generic_task_no_params(dataset=dataset) # caching works
    generic_task(dataset=dataset, params=params) # no caching

and when (second time) I triggered the workflow in the console, it doesn't hit the cache at the first time (producing struct protobuf in the literal but when I triggerd again in the console, it will hit the cache here

damp-lion-88352

05/16/2025, 8:51 AM

this is because the first time you run the workflow, we are producing msgpack bytes in literal (for the input

dict

)

damp-lion-88352

05/16/2025, 8:57 AM

but when you trigger the workflow again from the terminal, you will hit the cache generated from the first execution

damp-lion-88352

05/16/2025, 9:04 AM

I think this can be fixed if we support produce msgpack bytes literal from the flyte console, which is not supported now

damp-lion-88352

05/16/2025, 9:26 AM

How we cache in flyte? flytepropeller will hash the literal map from here https://github.com/flyteorg/flyte/blob/b3330ba4430538f91ae9fc7d868a29a2e96db8bd/flyteplugins/go/tasks/pluginmachinery/catalog/hashing.go#L60-L83 and compute the hash here https://github.com/flyteorg/flyte/blob/9178aaa34f4d7d5427289413ae4a5df0b6995928/flytestdlib/pbhash/pbhash.go#L24-L47

damp-lion-88352

05/16/2025, 9:27 AM

but how we generate the protobuf is either from flytekit (python) or flyteconsole (javascript)

damp-lion-88352

05/16/2025, 9:38 AM

The cache's location stores at the table artifact_data in postegres database. you can see the location column. and the cache itself store at s3 (Of course, XD) for example this is my query result

Copy code

flyte=# SELECT * FROM artifact_data LIMIT 1;
          created_at           |          updated_at           | deleted_at | dataset_project |               dataset_name                | dataset_domain |       dataset_version        |             artifact_id              | name |                                                                                 location
-------------------------------+-------------------------------+------------+-----------------+-------------------------------------------+----------------+------------------------------+--------------------------------------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 2025-05-16 08:43:57.795506+00 | 2025-05-16 08:43:57.795506+00 |            | flytesnacks     | flyte_task-example.generic_task_no_params | development    | hard-coded-x3LhCJG8-zCZxwZgs | e09aecc8-8703-447b-b469-5cdd8426b9cf | o0   | <s3://my-s3-bucket/metadata/flytesnacks/development/flyte_task-example.generic_task_no_params/hard-coded-x3LhCJG8-zCZxwZgs/e09aecc8-8703-447b-b469-5cdd8426b9cf/o0/data.pb>
(1 row)

8 Views

Open in Slack

Previous Next