Thread
#announcements
    e

    Eugene Cha

    3 months ago
    we're trying to run the caching.py example to see how the caching works, but it appears to only work sometimes. we increased the sleep time to 50 seconds
    def hash_pandas_dataframe(df: pandas.DataFrame) -> str:
        return str(pandas.util.hash_pandas_object(df))
    
    
    @task
    def uncached_data_reading_task() -> Annotated[
        pandas.DataFrame, HashMethod(hash_pandas_dataframe)
    ]:
        return pandas.DataFrame({"column_1": [1, 2, 3]})
    
    
    @task(cache=True, cache_version="1.0")
    def cached_data_processing_task(df: pandas.DataFrame) -> pandas.DataFrame:
        time.sleep(50)
        return df * 2
    
    
    @task
    def compare_dataframes(df1: pandas.DataFrame, df2: pandas.DataFrame):
        assert df1.equals(df2)
    
    
    @workflow
    def cached_dataframe_wf():
        raw_data = uncached_data_reading_task()
    
        # We execute `cached_data_processing_task` twice, but we force those
        # two executions to happen serially to demonstrate how the second run
        # hits the cache.
        t1_node = create_node(cached_data_processing_task, df=raw_data)
        t2_node = create_node(cached_data_processing_task, df=raw_data)
        t1_node >> t2_node
    
        # Confirm that the dataframes actually match
        compare_dataframes(df1=t1_node.o0, df2=t2_node.o0)
    
    
    if __name__ == "__main__":
        df1 = cached_dataframe_wf()
        print(f"Running cached_dataframe_wf once : {df1}")
    but sometimes the caching works and sometimes it doesnt. we've tried running with pyflyte run --remote caching.py cached_dataframe_wf as well as trying the relaunch button but as you can see in the pictures it tends to not work and i'm not sure why. any ideas?
    p

    Prafulla Mahindrakar

    3 months ago
    Hi @Eugene Cha, Can you check the following metrics from datacatalog
    get_success_count
    You can portforward your datacatalog pod similar to this
    kubectl port-forward datacatalog-6797ff48c6-tvkm5  -n flyte 10254:10254
    And access the metrics locally http://localhost:10254/metrics Every cache hit will increment this counter . Also the UI shows the cache symbol
    Also assuming you have this config for propeller cache config as default value
    MaxCacheAge  config.Duration `json:"max-cache-age" pflag:", Cache entries past this age will incur cache miss. 0 means cache never expires"`
    Also another log you can check is this for executions using cache
    k logs -n flyte flytepropeller-6844db64cf-5jtxn  |grep "Catalog CacheHit" |wc -l
    e

    Eugene Cha

    3 months ago
    i'm using flytectl demo and there's no datacatalog or flytepropeller pods
    p

    Prafulla Mahindrakar

    3 months ago
    You should be able to check the same logs in demo too . find the docker container for flyte and check the logs for those
    you should be able to find it using the entry point script
    e

    Eugene Cha

    3 months ago
    I've checked the pods in namespace flyte and i only see the kubernetes dashboard, minio, and postgres pods
    p

    Prafulla Mahindrakar

    3 months ago
    The logs for propeller and all other components are bundled in one single binary with demo and hence you won’t get these logs from the pods but instead you can get there directly from the docker container which is run by demo
    e

    Eugene Cha

    3 months ago
    ah
    {"json":{"exec_id":"fcbe5b0421ec342e7bb2","node":"n2","ns":"flytesnacks-development","res_ver":"266527","routine":"worker-3","src":"pre_post_execution.go:55","tasktype":"python-task","wf":"flytesnacks:development:flyte.workflows.caching2.cached_dataframe_wf"},"level":"error","msg":"No CacheHIT and no Error received. Illegal state, Cache State: CACHE_DISABLED","ts":"2022-05-31T06:12:25Z"}
    I don't see logs regarding datacatalog
    p

    Prafulla Mahindrakar

    3 months ago
    Ahhh .so demo has caching disabled seems like .if caching is disabled then you won’t see any logs from data catalog.
    e

    Eugene Cha

    3 months ago
    How do I enable caching in demo?
    p

    Prafulla Mahindrakar

    3 months ago
    Yes checking on this now
    e

    Eugene Cha

    3 months ago
    Ah
    There's no data catalog in the demo right?
    p

    Prafulla Mahindrakar

    3 months ago
    even datacatalog is bundled as part of the demo executable . as it uses minio for cached data ref
    e

    Eugene Cha

    3 months ago
    Hmm. Is there a way to enable caching in the demo? The team wanted to see caching in action but I had so many issues trying to setup a production level system in our on premise setup
    p

    Prafulla Mahindrakar

    3 months ago
    checking with @Kevin Su if we have used this on demo . I will try to check whats happening in interim. sorry to hear that you ran into many issues with your prod setup
    e

    Eugene Cha

    3 months ago
    No worries. Thanks so much for the help Prafulla
    Kevin Su

    Kevin Su

    3 months ago
    @Eugene Cha Good catch. Cache doesn’t work because the default catalog type is noop. I just created a PR to fix it.https://github.com/flyteorg/flyte/pull/2564 To unblock you, you can use image I just built.
    flytectl demo start --image pingsutw/sandbox-lite-test
    Ketan (kumare3)

    Ketan (kumare3)

    3 months ago
    cc @Eugene Cha we do not have caching enabled in demo - 😞 Thank you for the catch.
    e

    Eugene Cha

    3 months ago
    works great. thank you so much guys
    Kevin Su

    Kevin Su

    3 months ago
    Awesome!!