Hi! I found a bug in flytekit regarding the attrib...
# ask-the-community
m
Hi! I found a bug in flytekit regarding the attribute access for dataclass type promises in case of int fields. The bug can be reproduced with flytekit==1.10.3 with the following example taken from the documentation by changing the type from str to int:
Copy code
@dataclass_json
@dataclass
class Fruit:
    name: int # was str

@task
def print_message(message: int):
    print(message)
    return

@task
def dataclass_task() -> Fruit:
    return Fruit(name=10)


@workflow
def dataclass_wf():
    fruit_instance = dataclass_task()
    print_message(message=fruit_instance.name)
Running this workflow results in:
Copy code
TypeTransformerFailedError: Error encountered while executing 'dataclass_wf':
  Failed to convert inputs of task 'access-test.print_message':
  Error converting input 'message' at position 0:
  Cannot convert literal <FlyteLiteral scalar { primitive { float_value: 10 } }> to <class 'int'>
This is happening because when a new promise is created from a dataclass instance by accessing one of its attributes, that promise has a pb struct representation with a float_value inside. That promise will be resolved in flytekit.core.promise:translate_inputs_to_literals to a float literal in flyte's type system. Later when that value is about to be converted to an integer using the TypeEngine with the registered SimpleTransformer for ints, it will fail to access the underlying data because its now stored inside lv.scalar.primitive.float_value instead of lv.scalar.primitive.integer:
Copy code
TypeEngine.register(
        SimpleTransformer(
            "int",
            int,
            _type_models.LiteralType(simple=_type_models.SimpleType.INTEGER),
            lambda x: Literal(scalar=Scalar(primitive=Primitive(integer=x))),
            lambda x: x.scalar.primitive.integer   # <- This doesn't exist. x.scalar.primite.float_value does.
        )
    )
If I replace that accessor for the transformer with:
Copy code
def _fix_int_access(lv: Literal) -> int:
    if lv.scalar.primitive.integer is None:
        return int(lv.scalar.primitive.float_value)
    return lv.scalar.primitive.integer
then it works fine. Also the unit test suite completes successfuly with this change. I see that this behaviour was encountered before when dataclasses are passed inside promises and these float values are converted back to integer after the python dataclass has been reconstructed. This is unfortunately not covered by that fix. I can also see a different fix for this right where this conversion first takes place in flytekit.core.promise:
Copy code
try:
  if type(v) is Promise:
    v = resolve_attr_path_in_promise(v)  # <-- HERE we can convert the Literal to contain an integer instead of a float_value
    result[k] = TypeEngine.to_literal(ctx, v, t, var.type)
except TypeTransformerFailedError as exc:
  raise TypeTransformerFailedError(f"Failed argument '{k}': {exc}") from exc
The straightforward fix here worked also, but messed up some of the unit tests which I haven't investigated further yet. I'm not sure which one would be the better solution, but I think fixing it right where the discrepancy first manifests between the type annotations and the underlying data in the flyte's type system seems better for me. I'm happy to create an issue and a PR but I first wanted to ask if this approach is the right way to go about this.
k
yes, mind creating a pr, we can take a look
cc @Byron Hsu
m
https://github.com/flyteorg/flytekit/pull/2180 Let me know if I missed some things in the process
b
Hi @Máté Csákvári thanks for surfacing up the issue and the PR. I implemented attribute access log. I suspect the issue lies in the condition here in propeller. it should have gone into that line but instead to the float one. I will take a closer look cc @Chi-Sheng Liu(劉奇聖)
m
This issue came up at some point I believe when dataclass promises were implemented. This https://github.com/flyteorg/flytekit/blob/master/flytekit/core/type_engine.py#L722 fixes the same issue when all the int attributes of a dataclass are float when transforming back from protobuf. The actual place where float type gets into the typesystem I believe is here: https://github.com/flyteorg/flytekit/blob/master/flytekit/core/promise.py#L132. The type operator interprets the float value coming from protobuf as float type
b
The solution makes sense to me at the first glance. Will take a closer look tomorrow
m
Thank you very much! Fixing it at that promise resolve level could also be possible I think, but when I tried it it affected some other parts of the code and I didn't go deeper into it.
b
sticking with what flytekit currently has to resolve protobuf strict caveet is a safer way!
Very impressed by your thorough investigation. Thanks a lot!
m
No problem, I like this stuff 🙂 Also we really need that feature :)
b
i just reviewed
m
thanks, i'll check it out soon
a
I agree with sticking to Flytekit's current approach to resolve the protobuf strict caveat; it's a safer way. When I attempted to reproduce the bug it fixes and then implement the solution myself at the promise resolve level, it messed up some test cases. @Máté Csákvári Your investigation and implementation are much better! thanks
By the way, an off-topic question: Do you think there is a need for
data class
annotations
?
m
Unfortunately in the PR @Byron Hsu found an example which gets broken by this change and it seems like the promise level solution should be the way. But thats tricky because we don't know the original type of the variable, just the target and I don't see any way to properly cast the variable as int or float. My idea was to send that type info down inside the protobuf struct and get it from there in the promise, but the implementation I have messes up some unit tests, but generally its working. I was caught up with some deadlines in work so I couldn't finish it, but I intend to.
If the dataclass question is directed to me, I'm not sure where you mean its needed. We use them extensively so being able to pass them to flyte tasks/workflows is essential for us.
a
Hi , Máté. I see. That makes sense. Is this the types mismatch [error case] (https://github.com/flyteorg/flytekit/pull/2180/files#r1484022019) you mentioned that found by Byron?
m
yes! with the change in the PR, this wf becomes valid, but it shouldn't. And thats the root of the issue, we know the target type where we want to resolve the promise, but not the origin.
a
What if we pass down the target type into
resolve_attr_path_in_promise()
at
translate_inputs_to_literals()
. Adding something like,
new_st = int(new_st) if t==int else new_st
before
to_literal_type()
in resolving
dataclass
remaining path. It works in local mode.
By origin you mean like
python_val
?
m
I mean the type annotation in the dataclass for that attribute so that we can convert back to it. Btw what you wrote is what i prototyped but i n a slightly more general way and passing down every fields type into the protobuf struct. And it works, its just that some unit tests failed because they rely on a specific layout of the protobuf which gets changed now.
a
Got it!
m
But that might not be neccessary if we pass this info to translate_inputs_to_literals as a param or something, but that seemed like a big change to me and i tried to make the pr as small as possible
a
Indeed. Thanks for surfacing up the more general solutions. Feel free to open another issue. I can work on it.