Quick question about the type system: Is there a t...
# ask-the-community
f
Quick question about the type system: Is there a technical reason why a tuple type transformer cannot be implemented easily? This is somehow something that is regularly met with disbelief and a bit of frustration when our engineers stumble on this. This is not meant as criticism, I’d rather like to know whether there is a known technical reason why this is complicated.
Copy code
from flytekit import task, workflow


@task
def test(inp: tuple) -> tuple:
    return inp

@workflow
def wf():
    test(inp=(1, 2, 3))

if __name__ == "__main__":
    wf()
gives:
Copy code
RestrictedTypeError: Transformer for type <class 'tuple'> is restricted currently
n
we’ve talked about this a lot… if I recall correctly it has something to do with there not being a native tuple representation in protobuf @Yee @Eduardo Apolinario (eapolinario). It remains a big pain point for me when I use flytekit
h
The current struggle is that named tuples are used to define named outputs... I think it won't be obvious how to distinguish between those and these...
m
hm, is it not possible to add some scoped metadata to the protobuf definition to distinguish the two?
h
I don't think protobuf is the issue, there is enough flexibility in literals/literaltypes to describe a tuple..
m
The current struggle is that named tuples are used to define named outputs... I think it won't be obvious how to distinguish between those and these...
I meant to make whatever distinction is needed here
h
Copy code
@task
def test(inp: tuple) -> namedtuple("a", "b"):
    return inp

@task
def test2(inp: tuple) -> namedtuple("a", "b"):
    return inp
How to tell if the user wanted a single output that's of type tuple or two outputs
a
and
b
m
pick one to be default and accept an annotated type to be more specific
that's the most trivial way to do it. but ideally you'd use a dedicated type for named outputs to disambiguate.
h
I agree!
f
Considering this example:
Copy code
from flytekit import task, workflow


@task
def test() -> tuple[str, str, tuple[str, str]]:
    return "foo", "bar", ("foo-inner", "bar-inner")

@workflow
def wf() -> tuple[str, str, tuple[str, str]]:
    return test()

if __name__ == "__main__":
    print(wf())
I think it would be ok to treat the “outer or main”
DefaultNamedTupleOutput
return value (that is always returned when there are multiple return values) with the existing logic and not with a potential new tuple type transformer but invoke a potential tuple type transformer only for the “inner” tuple here. Of course it would be nice to have a general solution but I think it’s more important that users can do what is shown in the code snippet. --- One could argue that a tuple type transformer isn’t very important, one could just return e.g. a list instead. The problem is that changing the tuple to a list in the example above requires a user to understand that the type engine exists in the first place and what its limitations are. The type engine works so well that most users at least early on don’t realize its there. I onboarded ~30 people onto Flyte at my previous and current company and would say that there are 3 typical situations where people first learn that the type engine exists. 1) When they try to return tuples, 2) when they put an int into a dict and get a dict with a float on the other side (maybe fixed now?) and 3) when I show them how to build a custom type transformer. They typically find 3) super cool but also typically are (maybe unreasonably) frustrated when learning about 1 and 2. Something like “returning a tuple is the most basic python thing, why doesn’t even this work?” I guess when something works well, it doesn’t cause gratefulness but creates the expectation that it works perfectly 🤷‍♂️
m
I can echo that the lack of tuple support was jarring for me and I have seen users have their first WTF moment there as well.
h
I came to learn that even regular outputs are read as tuples and no way of distinguishing between:
Copy code
@task
def task() -> int, str:
  ...

# and:

@task
def task2() -> tuple[int, str]:
  ...
I think the "right" thing (Perhaps not practical) to do here is perhaps to always treat outputs as "one" output that's indexable (much like what python does)... Support is being added to flyte for indexing into nested fields, right, @Fabio Grätz?
f
I think I have heard somewhere that indexing into tuples etc is being worked on but I can’t confirm for sure.
But I agree that this would be the “right” solution.
Shell I transfer this discussion into the RFC incubator?
h
Let's !
139 Views