I have a question about flyte caching when running workflows as standard python functions for exampl...

microscopic-furniture-57275

01/18/2024, 10:19 PM

I have a question about flyte caching when running workflows as standard python functions for example in local testing -- i.e., not by running

pflyte run

from the CLI, but rather just executing the python workflow function directly. In this case, how best to disable caching? I see that work has been done to give

pyflyte run

some local-cache options, but that doesn't apply here. In my codebase, I use a global flag that is typically always on to allow/disallow caching like

Copy code

@task( cache=GLOBAL_CACHING_ENABLED ... )

If I take care to turn this flag OFF in my test module BEFORE the module containing the task is loaded, I can disable caching this way. But it would be nice if there were a more programmatic way to do this that didn't rely as much on order of module loading - just something to say, "hey, flytekit, turn off local caching please!" From the flyte console when "relaunching" a workflow on our k8s cluster, or from code that launches jobs programmatically via FlyteRemote, I can pass an

overwrite_cache

parameter. Is there some flytekit API I can call to turn off local caching in a similar way? I see reference to

LocalTaskCache.clear()

in this PR, but I don't find docs about it. The other option would seem to be executing

pyflyte local-cache clear

in a subprocess during the test, but this feels pretty hacky in a test that's running a very-fast workflow a few times to ensure results are either all different or all the same (which caching gets in the way of). Thoughts? Thanks!

microscopic-furniture-57275

01/19/2024, 5:54 AM

Upon re-reading that PR, it looks to be that one can export

FLYTE_LOCAL_CACHE_ENABLED=false

in the testing environment to cause the effect I'm looking for -- no use of local caching. I missed this initially, because I first thought one was required to modify whatever config file holds the config option discussed:

Copy code

[local]
cache_enabled=False

Am I reading this correctly? One can disable all use of the local-cache by setting the environment variable? Presumably this will become available in a forthcoming release of flytekit? (I see it was approved 8 hours ago!)

tall-lock-23197

01/19/2024, 1:58 PM

i believe so! and yes, it should become available in the forthcoming release.

tall-lock-23197

01/19/2024, 1:59 PM

cc @agreeable-kitchen-44189

agreeable-kitchen-44189

01/19/2024, 2:07 PM

Yes that’s correct! We’re having a similar use case whilst testing and are also aiming to use the environment variable 👍

microscopic-furniture-57275

01/19/2024, 2:08 PM

@agreeable-kitchen-44189 Thanks for this work 🙏

agreeable-kitchen-44189

01/19/2024, 2:08 PM

I’ve just merged the PR, so the next pre-release should have it

80 Views

Open in Slack

Previous Next

Flyte

Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.