Anyone has any idea what I’m doing wrong here?
# ask-the-community
s
Anyone has any idea what I’m doing wrong here?
e
@seunggs, the support for named outputs is a bit confusing. Essentially you can think of them as the equivalent of kwargs but for outputs in the context of flyte. In other words, they can't be used as actual as inputs of downstream tasks directly, but you can use members of a named tuple as inputs to downstream tasks. An example will clarify:
Copy code
import typing
from flytekit import task, workflow, dynamic

my_tuple = typing.NamedTuple("A", b=str, c=int)

@task
def t1() -> my_tuple:
    return my_tuple(b="hello world", c=42)

@task
def t3(b: str, c: int):
    print(f"{b} - {c}")

@workflow
def wf_valid():
    res = t1()
    t3(b=res.b, c=res.c)
Notice how we have to access the values separately in the invocation of the downstream
t3
. In other words, we couldn't have
NamedTuple
as an input of
t3
. In your example, you're probably using the result of calling
train_task
as an input to another task, right? Can you share how that's happening in your case?
s
No actually there’s only one task right now
Copy code
_wf_outputs=typing.NamedTuple("WfOutputs",train_task_o0=flytekit.types.file.file.FlyteFile)

@workflow
def mnist(_wf_args:Hyperparameters)->_wf_outputs:
    train_task_o0_=train_task(hp=_wf_args)
    return _wf_outputs(train_task_o0_)
e
got it, what if you substitute the definition of the workflow with:
Copy code
@workflow
def mnist(_wf_args:Hyperparameters)->_wf_outputs:
    return train_task(hp=_wf_args)
@seunggs ^
s
So are you suggesting workflows cannot output tuples?
In the tutorial though, that’s exactly what doc is doing I think
e
no, I'm supposing that
train_task
already returns a named tuple
s
Oh I see - so you’re saying I’m returning a tuple of tuple
So the inner tuple is violating the validation
e
it's a little bit more involving than this. If you really want to do this you'll have to do something like:
Copy code
_wf_outputs=typing.NamedTuple("WfOutputs",train_task_o0=flytekit.types.file.file.FlyteFile)

@workflow
def mnist(_wf_args:Hyperparameters)->_wf_outputs:
    x = train_task(hp=_wf_args)
    return _wf_outputs(hp=x.hp)
essentially named tuples are treated especially. Their only purpose is so you have a way to refer to returned objects by name
99% of the time you can use a dataclass to achieve what you want
(So instead of returning a named tuple from
train_task
you return a dataclass)
s
Hmm sorry I’m a bit confused as to the cause of this error - am I allowed to return a named tuple as task output and workflow output?
I get that you cannot receive tuples as task inputs
e
yeah, I'll admit, the support for named tuples is really confusing. The tldr is that you're allowed to return them, but you need to be careful how to use them in downstream tasks (as named tuples are not allowed to be passed as inputs)
s
Yes that was my understanding - but I only have one task here and it’s still giving me the error which is why I’m a bit confused here
e
so, going back to your code, did my suggestion of returning the result of calling
train_task
work?
s
I got to run but let me try that and report back
It sounds like that might do the trick but I’ll confirm and get back to you
Hey sorry it took a while - I was pulled into something else for a couple of days. I just deployed the workflow successfully - the problem was as you suspected - I was returning named tuple inside a named tuple.
Since I have you though - if you don’t mind me asking another question. The model I deployed is a simple mnist model (using Pytorch Lightning). I’m using pip-compile with these requirements:
Copy code
torch>=1.13.0
torchvision>=0.14.0
pytorch_lightning>=1.8.1
flytekit>=1.2.3
matplotlib>=3.6.2
I’m uploading the image to gcr - and it shows that the image size of 4GB (!!). This seems exceptionally large for something so simple. Also it takes a very long time to build and deploy the workflow (11m+)
Is this expected or do you think I’m doing something wrong here?
Here’s my dockerfile and it’s mostly based on flyte docs (except for pip-compile part):
Copy code
FROM python:3.8-slim-buster

WORKDIR /root
ENV VENV /opt/venv
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root

RUN apt-get update && apt-get install -y build-essential curl

# Install pip-tools
RUN pip3 install pip-tools

# Install the AWS cli separately to prevent issues with boto being written over
RUN pip3 install awscli

# Install flytectl
RUN curl -sL <https://ctl.flyte.org/install> | bash
ENV PATH="/root/bin:$PATH"

ENV VENV /opt/venv
# Virtual environment
RUN python3 -m venv ${VENV}
ENV PATH="${VENV}/bin:$PATH"

# Compile source dependencies (i.e. <http://requirements.in|requirements.in>) to requirements.txt and then use that to install Python dependencies
COPY ./requirements.in /root
RUN pip-compile --output-file=/root/requirements.txt /root/requirements.in
# --no-cache-dir to prevent OOMKilled
RUN pip install --no-cache-dir -r /root/requirements.txt

# Copy the actual code
COPY . /root

# Init flytectl to use the correct remote host
RUN flytectl config init --host=<https://flyte.sidetrek.com>

# This tag is supplied by the build script and will be used to determine the version
# when registering tasks, workflows, and launch plans
ARG tag
ENV FLYTE_INTERNAL_IMAGE $tag
e
@seunggs, sorry for the delay. Can you run
docker history <image_tag>
to have a sense of which step is taking space? I have a feeling that all the pip operations you're running while building the image are the culprit.
153 Views