freezing-airport-6809
NamedTuple vs dataclass support.
• @workflow and @dynamic seem to have some special Promise handling for NamedTuple types specifically
◦ Within a @workflow I can construct NamedTuple instances, including passing promise objects into their attributes
◦ Within a @workflow I can access attributes on a NamedTuple instance and get a promise object back which resolves to a field in that record
◦ The return types of these seem equivalent to the original NamedTuple definition, but not the same as, ie. isinstance(output, MyNamedTuple) == False on a task output from a task that returns a MyNamedTuple, even though the object in all other respects pretends to be a MyNamedTuple instance.
• None of this behavior is supported for dataclass types
◦ Including the wonky isinstance stuff
Is this special-cased logic for NamedTuple?
• We’d really love to use exactly one of NamedTuple or dataclass for all of our structures in our Flyte workflows
• The above behaviors are very desirable (in fact I think I asked for exactly this ~6mo ago! very excited to see it)
• But the mis-matched support of features for NamedTuple and dataclass is making it hard for us to put a stake in the ground and just pick one.
• Can we either implement the above behaviors (minus the isinstance bit) for dataclass or allow NamedTuple s as task inputs?
----------
Let me explain what NamedTuple are meant to be.
So when you have multi-valued return from a python function, i.e.,
def foo() -> (int, int, int):
return 10, 10, 10
In this case the int, int, int is actually a tuple in the python world right, as you can receive it as
x = foo()
and x would be a tuple.
But Python does special handing for Tuples as the following is valid
m, n, o = foo()
Flyte actually supports multiple values as outputs. These are not python but arbitrary lanugages.
By default, flytekit names all the return types as o1, o2, …, on This is because every returned value in Flyte is actually bound by its name o1 etc in the downstream consumer
Flyte also supports naming the outputs - as x, y, z instead of the o1…
But, in python there is no easy way to name the outputs. So we decided to use NamedTuples.
Rationale, NamedTuples behave the same as Tuples, but allow naming.
NamedTuples are not handled specially in flytekit, except for the naming.
On the otherhand, dataclasses are very python native thing. We wanted users to be able to return arbitrarily complex objects as a single value, so in the above example
def foo() -> (int, dataclass, ...)"
...
Dataclasses are sent through flyte as opaque Json Objects. And the engine tries to not load them to keep its security high.
thus dataclasses are more like int tuples and namedtuples cannot be returned today as an invidual object
def foo() -> (int, tuple)
....
Is not validFlyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.
Powered by