freezing-airport-6809
NamedTuple
vs dataclass
support.
• @workflow
and @dynamic
seem to have some special Promise handling for NamedTuple
types specifically
◦ Within a @workflow
I can construct NamedTuple
instances, including passing promise objects into their attributes
◦ Within a @workflow
I can access attributes on a NamedTuple
instance and get a promise object back which resolves to a field in that record
◦ The return types of these seem equivalent to the original NamedTuple
definition, but not the same as, ie. isinstance(output, MyNamedTuple) == False
on a task output from a task that returns a MyNamedTuple
, even though the object in all other respects pretends to be a MyNamedTuple
instance.
• None of this behavior is supported for dataclass
types
◦ Including the wonky isinstance
stuff
Is this special-cased logic for NamedTuple
?
• We’d really love to use exactly one of NamedTuple
or dataclass
for all of our structures in our Flyte workflows
• The above behaviors are very desirable (in fact I think I asked for exactly this ~6mo ago! very excited to see it)
• But the mis-matched support of features for NamedTuple
and dataclass
is making it hard for us to put a stake in the ground and just pick one.
• Can we either implement the above behaviors (minus the isinstance bit) for dataclass
or allow NamedTuple
s as task inputs?
----------
Let me explain what NamedTuple are meant to be.
So when you have multi-valued return from a python function, i.e.,
def foo() -> (int, int, int):
return 10, 10, 10
In this case the int, int, int
is actually a tuple in the python world right, as you can receive it as
x = foo()
and x
would be a tuple.
But Python does special handing for Tuples as the following is valid
m, n, o = foo()
Flyte actually supports multiple values as outputs. These are not python but arbitrary lanugages.
By default, flytekit names all the return types as o1, o2, …, on
This is because every returned value in Flyte is actually bound by its name o1
etc in the downstream consumer
Flyte also supports naming the outputs - as x, y, z
instead of the o1…
But, in python there is no easy way to name the outputs. So we decided to use NamedTuples.
Rationale, NamedTuples behave the same as Tuples, but allow naming.
NamedTuples are not handled specially in flytekit, except for the naming.
On the otherhand, dataclasses are very python native thing. We wanted users to be able to return arbitrarily complex objects as a single value, so in the above example
def foo() -> (int, dataclass, ...)"
...
Dataclasses are sent through flyte as opaque Json Objects. And the engine tries to not load them to keep its security high.
thus dataclasses are more like int
tuples and namedtuples
cannot be returned today as an invidual object
def foo() -> (int, tuple)
....
Is not valid