Martin Hwasser
09/20/2022, 12:40 PMMartin Hwasser
09/20/2022, 12:52 PMHashMethod
, would this work?Ketan (kumare3)
Niels Bantilan
09/20/2022, 2:16 PMAnnoteted
types to compute a hash of incoming data so you don’t have to manually pass an md5 hash to the task, see here:
https://docs.flyte.org/projects/cookbook/en/latest/auto/core/flyte_basics/task_cache.html#caching-of-non-flyte-offloaded-objectsMartin Hwasser
09/21/2022, 7:38 AMstr
type. It doesn’t work though, perhaps because str
is a primitive?Martin Hwasser
09/21/2022, 8:02 AM@task
def hash_dataset_function(dataset_name: str) -> str:
return hashlib.md5(
open(f"data/dataset/{dataset_name}.dvc", "rb").read()
).hexdigest()
@task
def get_dataset_name(process: str) -> Annotated[str, HashMethod(hash_dataset_function)]:
return process
@task(cache=True,cache_version="1.0")
def cached_task(dataset_name: str) -> float:
...
@workflow
def wf():
dataset_name = get_dataset_name(process=process)
always_cached = cached_task(dataset_name)
Martin Hwasser
09/21/2022, 9:44 AMMartin Hwasser
09/21/2022, 9:52 AM@dataclass @dataclass_json
which has the md5 checksum. If the other approach is supposed to work, let me know.Ketan (kumare3)
Ketan (kumare3)
Ketan (kumare3)
Ketan (kumare3)
Yee
Yee
Yee
Yee
Eduardo Apolinario (eapolinario)
09/23/2022, 12:17 AMEduardo Apolinario (eapolinario)
09/23/2022, 9:42 PMhash_dataset_function
with @task
). You can test it out by installing flytekit from master or waiting for the next release (which should happen about 1 week from now)