Martin Hwasser
09/20/2022, 12:40 PMHashMethod
, would this work?Ketan (kumare3)
Niels Bantilan
09/20/2022, 2:16 PMAnnoteted
types to compute a hash of incoming data so you don’t have to manually pass an md5 hash to the task, see here:
https://docs.flyte.org/projects/cookbook/en/latest/auto/core/flyte_basics/task_cache.html#caching-of-non-flyte-offloaded-objectsMartin Hwasser
09/21/2022, 7:38 AMstr
type. It doesn’t work though, perhaps because str
is a primitive?@task
def hash_dataset_function(dataset_name: str) -> str:
return hashlib.md5(
open(f"data/dataset/{dataset_name}.dvc", "rb").read()
).hexdigest()
@task
def get_dataset_name(process: str) -> Annotated[str, HashMethod(hash_dataset_function)]:
return process
@task(cache=True,cache_version="1.0")
def cached_task(dataset_name: str) -> float:
...
@workflow
def wf():
dataset_name = get_dataset_name(process=process)
always_cached = cached_task(dataset_name)
@dataclass @dataclass_json
which has the md5 checksum. If the other approach is supposed to work, let me know.Ketan (kumare3)
Yee
Eduardo Apolinario (eapolinario)
09/23/2022, 12:17 AMhash_dataset_function
with @task
). You can test it out by installing flytekit from master or waiting for the next release (which should happen about 1 week from now)