Quick question about the type system Is there a technical re Flyte #flyte-support

Quick question about the type system: Is there a t...

cool-lifeguard-49380

09/15/2023, 4:38 PM

Quick question about the type system: Is there a technical reason why a tuple type transformer cannot be implemented easily? This is somehow something that is regularly met with disbelief and a bit of frustration when our engineers stumble on this. This is not meant as criticism, I’d rather like to know whether there is a known technical reason why this is complicated.

💯 1

cool-lifeguard-49380

09/15/2023, 4:38 PM

Copy code

from flytekit import task, workflow


@task
def test(inp: tuple) -> tuple:
    return inp

@workflow
def wf():
    test(inp=(1, 2, 3))

if __name__ == "__main__":
    wf()

cool-lifeguard-49380

09/15/2023, 4:39 PM

gives:

Copy code

RestrictedTypeError: Transformer for type <class 'tuple'> is restricted currently

broad-monitor-993

09/15/2023, 5:35 PM

we’ve talked about this a lot… if I recall correctly it has something to do with there not being a native tuple representation in protobuf @thankful-minister-83577 @high-accountant-32689. It remains a big pain point for me when I use flytekit

➕ 1

high-park-82026

09/15/2023, 6:20 PM

The current struggle is that named tuples are used to define named outputs... I think it won't be obvious how to distinguish between those and these...

full-ram-17934

09/15/2023, 6:21 PM

hm, is it not possible to add some scoped metadata to the protobuf definition to distinguish the two?

high-park-82026

09/15/2023, 6:22 PM

I don't think protobuf is the issue, there is enough flexibility in literals/literaltypes to describe a tuple..

full-ram-17934

09/15/2023, 6:23 PM

The current struggle is that named tuples are used to define named outputs... I think it won't be obvious how to distinguish between those and these...

I meant to make whatever distinction is needed here

high-park-82026

09/15/2023, 6:24 PM

Copy code

@task
def test(inp: tuple) -> namedtuple("a", "b"):
    return inp

@task
def test2(inp: tuple) -> namedtuple("a", "b"):
    return inp

How to tell if the user wanted a single output that's of type tuple or two outputs

and

full-ram-17934

09/15/2023, 6:24 PM

pick one to be default and accept an annotated type to be more specific

full-ram-17934

09/15/2023, 6:25 PM

that's the most trivial way to do it. but ideally you'd use a dedicated type for named outputs to disambiguate.

high-park-82026

09/15/2023, 10:09 PM

I agree!

cool-lifeguard-49380

09/16/2023, 9:01 AM

Considering this example:

Copy code

from flytekit import task, workflow


@task
def test() -> tuple[str, str, tuple[str, str]]:
    return "foo", "bar", ("foo-inner", "bar-inner")

@workflow
def wf() -> tuple[str, str, tuple[str, str]]:
    return test()

if __name__ == "__main__":
    print(wf())

I think it would be ok to treat the “outer or main”

DefaultNamedTupleOutput

return value (that is always returned when there are multiple return values) with the existing logic and not with a potential new tuple type transformer but invoke a potential tuple type transformer only for the “inner” tuple here. Of course it would be nice to have a general solution but I think it’s more important that users can do what is shown in the code snippet. --- One could argue that a tuple type transformer isn’t very important, one could just return e.g. a list instead. The problem is that changing the tuple to a list in the example above requires a user to understand that the type engine exists in the first place and what its limitations are. The type engine works so well that most users at least early on don’t realize its there. I onboarded ~30 people onto Flyte at my previous and current company and would say that there are 3 typical situations where people first learn that the type engine exists. 1) When they try to return tuples, 2) when they put an int into a dict and get a dict with a float on the other side (maybe fixed now?) and 3) when I show them how to build a custom type transformer. They typically find 3) super cool but also typically are (maybe unreasonably) frustrated when learning about 1 and 2. Something like “returning a tuple is the most basic python thing, why doesn’t even this work?” I guess when something works well, it doesn’t cause gratefulness but creates the expectation that it works perfectly 🤷‍♂️

👍 2

full-ram-17934

09/18/2023, 7:33 PM

I can echo that the lack of tuple support was jarring for me and I have seen users have their first WTF moment there as well.

👍 3

high-park-82026

09/26/2023, 7:25 PM

I came to learn that even regular outputs are read as tuples and no way of distinguishing between:

Copy code

@task
def task() -> int, str:
  ...

# and:

@task
def task2() -> tuple[int, str]:
  ...

I think the "right" thing (Perhaps not practical) to do here is perhaps to always treat outputs as "one" output that's indexable (much like what python does)... Support is being added to flyte for indexing into nested fields, right, @cool-lifeguard-49380?

cool-lifeguard-49380

09/27/2023, 7:46 AM

I think I have heard somewhere that indexing into tuples etc is being worked on but I can’t confirm for sure.

cool-lifeguard-49380

09/27/2023, 7:46 AM

But I agree that this would be the “right” solution.

cool-lifeguard-49380

09/27/2023, 7:46 AM

Shell I transfer this discussion into the RFC incubator?

👍 1

💯 2

high-park-82026

09/27/2023, 8:03 PM

Let's !

389 Views

Open in Slack

Previous Next