https://flyte.org logo
e

Eugene Cha

05/31/2022, 6:13 AM
we're trying to run the caching.py example to see how the caching works, but it appears to only work sometimes. we increased the sleep time to 50 seconds
Copy code
def hash_pandas_dataframe(df: pandas.DataFrame) -> str:
    return str(pandas.util.hash_pandas_object(df))


@task
def uncached_data_reading_task() -> Annotated[
    pandas.DataFrame, HashMethod(hash_pandas_dataframe)
]:
    return pandas.DataFrame({"column_1": [1, 2, 3]})


@task(cache=True, cache_version="1.0")
def cached_data_processing_task(df: pandas.DataFrame) -> pandas.DataFrame:
    time.sleep(50)
    return df * 2


@task
def compare_dataframes(df1: pandas.DataFrame, df2: pandas.DataFrame):
    assert df1.equals(df2)


@workflow
def cached_dataframe_wf():
    raw_data = uncached_data_reading_task()

    # We execute `cached_data_processing_task` twice, but we force those
    # two executions to happen serially to demonstrate how the second run
    # hits the cache.
    t1_node = create_node(cached_data_processing_task, df=raw_data)
    t2_node = create_node(cached_data_processing_task, df=raw_data)
    t1_node >> t2_node

    # Confirm that the dataframes actually match
    compare_dataframes(df1=t1_node.o0, df2=t2_node.o0)


if __name__ == "__main__":
    df1 = cached_dataframe_wf()
    print(f"Running cached_dataframe_wf once : {df1}")
but sometimes the caching works and sometimes it doesnt. we've tried running with pyflyte run --remote caching.py cached_dataframe_wf as well as trying the relaunch button but as you can see in the pictures it tends to not work and i'm not sure why. any ideas?
p

Prafulla Mahindrakar

05/31/2022, 8:14 AM
Hi @Eugene Cha, Can you check the following metrics from datacatalog
Copy code
get_success_count
You can portforward your datacatalog pod similar to this
Copy code
kubectl port-forward datacatalog-6797ff48c6-tvkm5  -n flyte 10254:10254
And access the metrics locally http://localhost:10254/metrics Every cache hit will increment this counter . Also the UI shows the cache symbol
Also assuming you have this config for propeller cache config as default value
Copy code
MaxCacheAge  config.Duration `json:"max-cache-age" pflag:", Cache entries past this age will incur cache miss. 0 means cache never expires"`
Also another log you can check is this for executions using cache
Copy code
k logs -n flyte flytepropeller-6844db64cf-5jtxn  |grep "Catalog CacheHit" |wc -l
e

Eugene Cha

05/31/2022, 8:18 AM
i'm using flytectl demo and there's no datacatalog or flytepropeller pods
p

Prafulla Mahindrakar

05/31/2022, 8:24 AM
You should be able to check the same logs in demo too . find the docker container for flyte and check the logs for those
you should be able to find it using the entry point script
e

Eugene Cha

05/31/2022, 8:27 AM
I've checked the pods in namespace flyte and i only see the kubernetes dashboard, minio, and postgres pods
p

Prafulla Mahindrakar

05/31/2022, 8:29 AM
The logs for propeller and all other components are bundled in one single binary with demo and hence you won’t get these logs from the pods but instead you can get there directly from the docker container which is run by demo
e

Eugene Cha

05/31/2022, 8:29 AM
ah
{"json":{"exec_id":"fcbe5b0421ec342e7bb2","node":"n2","ns":"flytesnacks-development","res_ver":"266527","routine":"worker-3","src":"pre_post_execution.go:55","tasktype":"python-task","wf":"flytesnacksdevelopmentflyte.workflows.caching2.cached_dataframe_wf"},"level":"error","msg":"No CacheHIT and no Error received. Illegal state, Cache State: CACHE_DISABLED","ts":"2022-05-31T061225Z"}
I don't see logs regarding datacatalog
p

Prafulla Mahindrakar

05/31/2022, 11:27 AM
Ahhh .so demo has caching disabled seems like .if caching is disabled then you won’t see any logs from data catalog.
e

Eugene Cha

05/31/2022, 11:28 AM
How do I enable caching in demo?
p

Prafulla Mahindrakar

05/31/2022, 11:28 AM
Yes checking on this now
e

Eugene Cha

05/31/2022, 11:45 AM
Ah
There's no data catalog in the demo right?
p

Prafulla Mahindrakar

05/31/2022, 11:46 AM
even datacatalog is bundled as part of the demo executable . as it uses minio for cached data ref
e

Eugene Cha

05/31/2022, 11:48 AM
Hmm. Is there a way to enable caching in the demo? The team wanted to see caching in action but I had so many issues trying to setup a production level system in our on premise setup
p

Prafulla Mahindrakar

05/31/2022, 11:49 AM
checking with @Kevin Su if we have used this on demo . I will try to check whats happening in interim. sorry to hear that you ran into many issues with your prod setup
e

Eugene Cha

05/31/2022, 11:50 AM
No worries. Thanks so much for the help Prafulla
k

Kevin Su

05/31/2022, 3:47 PM
@Eugene Cha Good catch. Cache doesn’t work because the default catalog type is noop. I just created a PR to fix it. https://github.com/flyteorg/flyte/pull/2564 To unblock you, you can use image I just built.
Copy code
flytectl demo start --image pingsutw/sandbox-lite-test
k

Ketan (kumare3)

05/31/2022, 4:57 PM
cc @Eugene Cha we do not have caching enabled in demo - 😞 Thank you for the catch.
e

Eugene Cha

06/02/2022, 3:16 AM
works great. thank you so much guys
👍 1
k

Kevin Su

06/02/2022, 4:23 AM
Awesome!!
5 Views