https://flyte.org logo
#ask-the-community
Title
# ask-the-community
c

Constance Ferragu

11/27/2023, 3:26 PM
Hi all, We seem to be getting a caching bug. Task A runs successfully and reports its outputs being cached (execution A1). Some time later the same task is run twice (executions A2 and A3). Both executions report taking outputs from cache, and View source execution points to A1, however execution A3 gets a different output, which we do not know the origin of. Thanks in advance !
s

Samhita Alla

11/28/2023, 8:40 AM
is it the same output, though? have you loaded the pytorch module to check if it's the same? also, have you tried immediately rerunning it? do you see different output URIs?
c

Constance Ferragu

11/28/2023, 10:22 AM
Yes it is the same pytorch module once it is loaded. However, the uris are different, which means the following tasks are not read from cache, when they should be.
Task A3 indicates that the execution was read from cache and points to task A1, but the two tasks have different file outputs.
s

Samhita Alla

11/28/2023, 1:12 PM
@David Espejo (he/him) any idea why this is the case?
d

Daniel Danciu

12/08/2023, 1:54 PM
We are seeing a very similar potential cache corruption behavior again. Notice how the task below, which has cache execution disabled saves its output to gs://cradle-bio-pipelines/62/frbtcqftl2w1xy-... ? Here
frbtcqftl2w1xy
is supposed to be the current execution id of the task, but it's not. It's a 4 month old execution id.
The actual execution id of the task is:
What makes this issue really severe for our case is that there is no workaround. No matter what we do, including disabling the cache, we still get the incorrect cache results.
f

Franziska Geiger

12/08/2023, 2:55 PM
Looking deeper into this issue (I’m working with Daniel) it seems like the main problem is coming from map task caching configurations, assume we have this:
Copy code
@task(cache=True,
    cache_version="0.0.2",
    cache_serialize=True,)
def do_something(...):
  ...

@workflow
def wf():
   do_partial = partial(do_something,....)
   res = map_task(do_partial,
                  metadata=TaskMetadata(cache=True, cache_serialize=True, cache_version="0.0.2")(...)
This is what we saw: • Despite the UI saying that caching is disabled it actually caching (but loading a unrelated cache entry) • Disabling caching on the task (
do_something
) has no effect • Disabling caching on both task and map task actually disables caching and as expected things don’t cache • We were not able to find a version where caching works correctly.
s

Samhita Alla

12/08/2023, 4:44 PM
@Daniel Danciu are the outputs being read from cache even though caching is disabled? @Franziska Geiger have you tried disabling cache only in the metadata? because as far as i know, that has to disable the cache. i think you need to set the cache only in the metadata in the case of a map task.
d

Daniel Danciu

12/08/2023, 5:21 PM
To be clear: the fact that the output is read from cache is the smallest problem here. We managed to avoid caching by disabling caching in
TaskMetadata
. Apparently disabling cache in the
@task
annotation is ignored (but the Flyte UI incorrectly displays a message saying that caching was disabled, which mislead us). The actual problem is that the cached values do not match the inputs. So the TL;DR is that we managed to work around the problem by disabling caching. However, when cache is enabled, the task returns some cached data that is incorrect.
s

Samhita Alla

12/11/2023, 7:25 AM
i see two issues here: • disabling cache in the task annotation is ignored in the UI (can you file a bug, please?) • when cache is enabled, the task returns some cached data that is incorrect (could you overwrite cached outputs? the option is available on the UI)