<https github com flyteorg flyte issues 2720| 2720 Core feat Flyte #flytekit

<#2720 [Core feature] Add Outputs() as the idiom f...

acoustic-carpenter-78188

10/29/2023, 12:06 AM

#2720 [Core feature] Add Outputs() as the idiom for multiple outputs, to avoid user confusion on NamedTuple-vs-dataclass Issue created by jdanbrown Motivation: Why do you think this is important? (Copying out a feature request from slack, hopefully I captured enough context here) I now understand the (important) distinction between dataclasses and NamedTuple in flytekit: • NamedTuple : output :: kwargs : input — this makes a lot of sense, because you want multiple outputs, with names • NamedTuple cannot be used as a datatype, only as a return type to mean "multiple outputs" • dataclass is a normal datatype that you can use wherever you like. it doesn't mean "multiple outputs" What's confusing is the way a python user declares a datatype using

@dataclass

NamedTuple

is basically the same:

Copy code

@dataclass
class Config:
    epochs: int
    cv_splits: int
    ...

class Config(NamedTuple):
    epochs: int
    cv_splits: int
    ...

So it's a very easy trap to think they're basically interchangeable and then get confused and/or frustrated when flyte behaves in very different ways depending on which one you used. My team is trying to anticipate adding the rest of our team as flyte users (~8 people), as well as handfuls more teams (~5–10 teams) as users, and we think this is an important friction to get ahead of and have a simple recommendation and happy path for. Goal: What should the final outcome look like, ideally? Library pseudocode • Define this once • Document/explain to users to use Outputs instead of NamedTuple for task outputs

Copy code

# Outputs is like NamedTuple except:
#   - It fills in the type name for you -- it's a nuisance parameter, and flyte ignores it
#   - You use it inline instead of inheriting from it, for both type and value usages
#   - TODO Add metaclass stuff to make this code actually work as a type (and a value)
Outputs = lambda **kwargs: NamedTuple("Outputs", **kwargs)

Example user code:

Copy code

from wherever import Outputs

@dataclass
def Config:
    ...

@dataclass
def TrainStats:
    ...

@task
def evaluate_model(
    config: Config,         # A user-defined dataclass
    model: tf.keras.Model,  # Some type from a library
    metrics: List[str],     # A normal python datatype
) -> Outputs(               # Use Outputs() inline as a type instead of declaring a NamedTuple
    success: bool,          # A normal python datatype
    stats: TrainStats,      # A user-defined dataclass
    thresholds: np.ndarray  # Some type from a library
):
    ...
    return Outputs(         # Also use Outputs() as a value, matching the type above
        success=...,
        stats=...,
        thresholds=...,
    )

# Simple tasks can ofc still return single outputs too
#   - With no name, i.e. flyte's default o1 naming
@task
def sample_train_data(X: pd.DataFrame) -> pd.DataFrame:
    ...

Describe alternatives you've considered . Propose: Link/Inline OR Additional context No response Are you sure this issue hasn't been raised already? ☑︎ Yes Have you read the Code of Conduct? ☑︎ Yes flyteorg/flyte

acoustic-carpenter-78188

10/29/2023, 12:06 AM

#2720 [Core feature] Add Outputs() as the idiom for multiple outputs, to avoid user confusion on NamedTuple-vs-dataclass Issue closed as not planned by github-actions[bot] flyteorg/flyte

acoustic-carpenter-78188

10/29/2023, 11:32 PM

#2720 [Core feature] Add Outputs() as the idiom for multiple outputs, to avoid user confusion on NamedTuple-vs-dataclass Issue reopened by wild-endeavor flyteorg/flyte

2 Views

Open in Slack

Previous Next