For visibility of folks ---- There’s a separate is...
# flytekit
k
For visibility of folks ---- There’s a separate issue we’ve noticed with
NamedTuple
vs
dataclass
support. •
@workflow
and
@dynamic
seem to have some special Promise handling for
NamedTuple
types specifically ◦ Within a
@workflow
I can construct
NamedTuple
instances, including passing promise objects into their attributes ◦ Within a
@workflow
I can access attributes on a
NamedTuple
instance and get a promise object back which resolves to a field in that record ◦ The return types of these seem equivalent to the original
NamedTuple
definition, but not the same as, ie.
isinstance(output, MyNamedTuple) == False
on a task output from a task that returns a
MyNamedTuple
, even though the object in all other respects pretends to be a
MyNamedTuple
instance. • None of this behavior is supported for
dataclass
types ◦ Including the wonky
isinstance
stuff Is this special-cased logic for
NamedTuple
? • We’d really love to use exactly one of
NamedTuple
or
dataclass
for all of our structures in our Flyte workflows • The above behaviors are very desirable (in fact I think I asked for exactly this ~6mo ago! very excited to see it) • But the mis-matched support of features for
NamedTuple
and
dataclass
is making it hard for us to put a stake in the ground and just pick one. • Can we either implement the above behaviors (minus the isinstance bit) for
dataclass
or allow
NamedTuple
s as task inputs? ---------- Let me explain what NamedTuple are meant to be. So when you have multi-valued return from a python function, i.e.,
Copy code
def foo() -> (int, int, int):
   return 10, 10, 10
In this case the
int, int, int
is actually a tuple in the python world right, as you can receive it as
Copy code
x = foo()
and
x
would be a tuple. But Python does special handing for Tuples as the following is valid
Copy code
m, n, o = foo()
Flyte actually supports multiple values as outputs. These are not python but arbitrary lanugages. By default, flytekit names all the return types as
o1, o2, …, on
This is because every returned value in Flyte is actually bound by its name
o1
etc in the downstream consumer Flyte also supports naming the outputs - as
x, y, z
instead of the
o1…
But, in python there is no easy way to name the outputs. So we decided to use NamedTuples. Rationale, NamedTuples behave the same as Tuples, but allow naming. NamedTuples are not handled specially in flytekit, except for the naming. On the otherhand, dataclasses are very python native thing. We wanted users to be able to return arbitrarily complex objects as a single value, so in the above example
Copy code
def foo() -> (int, dataclass, ...)"
   ...
Dataclasses are sent through flyte as opaque Json Objects. And the engine tries to not load them to keep its security high. thus dataclasses are more like
int
tuples and namedtuples
cannot be returned today as an invidual object
Copy code
def foo() -> (int, tuple)
  ....
Is not valid
👍 4
160 Views