Hi all, I have a question on controlling how Flyte...
# ask-the-community
j
Hi all, I have a question on controlling how Flyte determines an object's hash for caching. We are using a class with the
@dataclass_json @dataclass
decorators and extending
DataClassJsonMixin
Copy code
@dataclass_json
@dataclass
class Document(DataClassJsonMixin):
    url: str
    id: int
    flag: bool

    def __init__(self, url: str, id: int, flag: bool):
        self.url = url
        self.id = id
        self.flag = flag
and we have a task that takes in a `list[Document]`and has
cache
set to
True
. Currently, as expected, if I run that task on a
Document
and then run it on another
Document
with the same properties, it will read from the cache for the second run of the task. What I'd like to know is: is there any way to specify a different hashing method (e.g., in my case really all I care about is the
url
property -- if the
id
value is different but
url
is still the same, I would like to read from the cache rather than rerunning my task)? I know the ability to set annotations for a specific hashing function are mentioned here, but if I try to add those annotations to this class I get the error
Flytekit does not currently have support for FlyteAnnotations applied to Dataclass.Type
s
You're looking at v0.3 docs. Here's the latest: https://docs.flyte.org/projects/cookbook/en/latest/auto_examples/development_lifecycle/task_cache.html#caching-of-non-flyte-offloaded-objects. And regarding custom hashing methods, @Eduardo Apolinario (eapolinario), how should that be implemented?
j
Ah, thanks, must have gotten to the old docs through google somehow. Curious for the answer on the custom hashing methods!
I believe I found a solution here -- I set up and registered a
TypeTransformer
for my class, which allowed me to set annotations for a given hashing function as desired. I had to explicitly write out my transformer functions, but for a simple class like this that wasn't too much effort -- would still be nice to have the ability to set a specific hashing function on a
dataclass_json
though
s
Understood. We've recently integrated mashumaro, and you don't need to use
dataclass_json
anymore. Here's an example: https://flyte.org/blog/flyte-1-10-monorepo-new-agents-eager-workflows-and-more#mashumaro-to-serializedeserialize-dataclasses
j
Got it; however, it seems even with this new
mashumaro
DataClassJSONMixin
, I'm still not able to set an annotation for a specific hashing function (so I'm still forced to write my own TypeTransformer if I want to only have my hash take into account some subset of the object's properties)