We are having an issue with input parameters not b...
# flyte-support
n
We are having an issue with input parameters not being deterministic when moving up from flyte 1.13 We are runnin pyflyte with --params params.yaml Seems to coincide with the msgpack update from 1.14. Do you no longer consider dicts to be deterministic input, even if resolved from the cli in flytekit/interaction/click_types.JsonParamType. Or is this a oversight @task(cache=Cache(version="hard-coded",serialize=True)) def generic_task_no_params(dataset: str) -> str: return dataset @task(cache=Cache(version="hard-coded",serialize=True)) def generic_task(dataset: str, params: dict) -> str: return dataset @workflow def generic_wf(dataset: str, params: dict) -> None: generic_task_no_params(dataset=dataset) # caching works generic_task(dataset=dataset, params=params) # no caching generic_task(dataset=dataset, params=params["key"]) # no caching Note that this worked perfect with cache hits when we were at 1.13 version
c
@nutritious-match-70820, I'm looking into this.
n
Any update on this? Do you need more information to verify the issue?
We fixed it by overriding the function flytekit.core.type_engine.DictTransformer.dict_to_binary_literal With https://github.com/flyteorg/flytekit/blob/master/flytekit/core/type_engine.py#L2063 which is deprecated. Seems like the new msgpack version doesn't support the usecase with nested yaml/json
c
👋 has this been fixed? Still seeing non-deterministic cache misses on tasks that take dataclasses as inputs on
flytekit==1.15.3
f
Hmm this is a change in behavior- cc @thankful-minister-83577 @damp-lion-88352 do you folks know about this. @crooked-lifeguard-46802 can you use pydantic (do you know if this happens with pydantic)
c
Switching to pydantic would be a pretty large refactor 😓 Another issue that I’m seeing on these tasks with cache misses is sometimes that inputs also become malformed in the Flyte UI (though logging the input from inside the task shows that the inputs is correctly deserialized) In the screenshot,
model_type
is an Enum type, and the value should be
card_cashing
but its getting exploded into a list/iterable for some reason 🤔
do you know if this happens with pydantic
We don’t have any pydantic dataclasses in our codebase currently
f
Cool this is a good clue. @nutritious-match-70820 do you have an enum type too?
d
can you help me try dict with type hints?
c
The input mangling is not just happening for enum types, seeing it for
Optional[str]
as well 😬 Here’s the interface for this input:
Copy code
start_date: {"type":{"unionType":{"variants":[{"structure":{"dataclassType":{},"tag":"str"},"simple":"STRING"},{"structure":{"dataclassType":{},"tag":"none"},"simple":"NONE"}]}}}
can you help me try dict with type hints?
@damp-lion-88352 was this for me? (we are not currently using untyped dicts in our workflow)
d
I can find a time to investigate this at this week
c
I tried rerunning the same workflow with
FLYTE_USE_OLD_DC_FORMAT=true
and didn’t see cache misses btw • hard to say with certainity if this is a proper fix due to the non-deterministic nature of the bug!
d
I think I might know a bit of this problem
so javascript doesn't have integer, only float
but our console is built by javascript
so when you use int in dict, we will miss the cache
I think use
FLYTE_USE_OLD_DC_FORMAT=true
in console is a good way to hit cache
this might be a constraint for msgpack idl
c
so javascript doesn’t have integer, only float
but our console is built by javascript
so when you use int in dict, we will miss the cache
I don’t know much about flyte internals, but I would assume that flyte doesn’t really use javascript for the actual serde logic? I always thought that the javascript is only used to display the human-readable representation on the task input/output, and the serde is handled within python + go?
d
when you input in flyteconsole, we will turn your input to javascript value and seriialize them
c
Ahh I see, that makes sense. For our workflows, we are seeing the cache issues on inner tasks that get its inputs from the output of another task! Our high level workflow only takes in string inputs, no integers/dicts etc.
g
👋 @damp-lion-88352 thanks for willing to investigate this issue this week! I work with archit so I'm also happy to look into this a bit further from our side if you have any tips. I basically poked into the datacatalog logs & the database, but I wasn't able to find anything conclusive I noticed that for the exact same inputs the
tag_name
we were searching the database for was different from what the first execution that generated the cache we should've used, but wasn't sure where to go from there, perhaps testing flyte's tag generation logic locally on these inputs?
d
I think the reason is because
the first execution that generated the cache
this will generate cache for msgpack idl inputs which includes integer but when you input from frontend, you will generate msgpack idl inputs which doesn't include integer
I think one possible solution is we add a task like
Copy code
def return_input(dc: DC) -> DC
    return dc
and cache output after this
return_input
task
and 1 more option is to use
FLYTE_USE_OLD_DC_FORMAT=true
to launch task in your user client and frontend so that we will all use json str to cache
t
so this is not repro-able only from the command line?
c
I don’t have a minimum reproducing example at hand unfortunately! We do not have javascript in our input loop where we are seeing the cache misses: • Our workflow that we are seeing the cache miss on only has a string input like so
experiment_id = baseline candidate
• Rest of the inputs all generated from tasks so there is no javascript round-trip (basically we already have
def return_input(dc: DC) -> DC
type tasks that generates the inputs for our tasks that we are seeing unexpected cache misses on) Could you point me to where is the hashed value of the cache inputs is computed? Is it computed in python or somewhere else?
d
Hi, @nutritious-match-70820 I just tried to reproduce your case by this workflow
Copy code
from flytekit import task, workflow
from flytekit.core.cache import Cache

@task(cache=Cache(version="hard-coded",serialize=True))
def generic_task_no_params(dataset: str) -> str:
    return dataset
@task(cache=Cache(version="hard-coded",serialize=True))
def generic_task(dataset: str, params: dict) -> str:
    return dataset
@workflow
def generic_wf(dataset: str, params: dict) -> None:
    generic_task_no_params(dataset=dataset) # caching works
    generic_task(dataset=dataset, params=params) # no caching
and when (second time) I triggered the workflow in the console, it doesn't hit the cache at the first time (producing struct protobuf in the literal but when I triggerd again in the console, it will hit the cache here
this is because the first time you run the workflow, we are producing msgpack bytes in literal (for the input
dict
)
but when you trigger the workflow again from the terminal, you will hit the cache generated from the first execution
I think this can be fixed if we support produce msgpack bytes literal from the flyte console, which is not supported now
but how we generate the protobuf is either from flytekit (python) or flyteconsole (javascript)
The cache's location stores at the table artifact_data in postegres database. you can see the location column. and the cache itself store at s3 (Of course, XD) for example this is my query result
Copy code
flyte=# SELECT * FROM artifact_data LIMIT 1;
          created_at           |          updated_at           | deleted_at | dataset_project |               dataset_name                | dataset_domain |       dataset_version        |             artifact_id              | name |                                                                                 location
-------------------------------+-------------------------------+------------+-----------------+-------------------------------------------+----------------+------------------------------+--------------------------------------+------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 2025-05-16 08:43:57.795506+00 | 2025-05-16 08:43:57.795506+00 |            | flytesnacks     | flyte_task-example.generic_task_no_params | development    | hard-coded-x3LhCJG8-zCZxwZgs | e09aecc8-8703-447b-b469-5cdd8426b9cf | o0   | <s3://my-s3-bucket/metadata/flytesnacks/development/flyte_task-example.generic_task_no_params/hard-coded-x3LhCJG8-zCZxwZgs/e09aecc8-8703-447b-b469-5cdd8426b9cf/o0/data.pb>
(1 row)