Hi, I got some additional basic questions in my po...
# ask-the-community
y
Hi, I got some additional basic questions in my poc. How to add custom python dependency in a local k3s environment? • My environment is the local k3s cluster that spinned up by running
flytectl demo start
• The project directory is created by running
pyflyte init flyte_example
• I plan to use some python libraries in my workflow, for example
import nlpaug
• I added the library
nlpaug
in the
requirement.txt
, then executed
./docker_build.sh -r localhost:30000 -v 0.0.1
• Then I pushed the docker image to the docker registry of flyte k3s cluster
docker push localhost:30000/flyte_example:0.0.1
• Then I got this error message when I run
pyflyte register workflows -i localhost:30000/flyte_example:0.0.1
Copy code
Failed with Unknown Exception <class 'ModuleNotFoundError'> Reason: No module named 'nlpaug'
No module named 'nlpaug'
• I also tried it with
pyflyte run
and
pyflyte run remote
. Got the same error.
y
maybe this is something we can disclaim better @David Espejo (he/him) (when you get back)
flyte ultimately will run a container image for python functions. it’s the responsibility of flyte to run that image with the right inputs, and store the outputs (and pass them down to downstream tasks if needed). But it is the user’s responsibility to ensure that the image that’s run, matches the code (and requirements) that you want.
so you have to build the image the correct way.
however, flyte has features that make this easier • this notion of fast register (which is on by default for pyflyte run/register). this basically zips up the code and unzips it at run time. helping to ensure you have the latest code. • ImageSpec - if you specify your requirements in image spec form, flytekit will help build and manage images for the user. (keep in mind that this image is only used at run-time, not at registration-time. at registration time, assuming you’re running the pyflyte register command from your host terminal, you’ll still be relying on your local virtualenv)
i’m not sure what docker_build.sh is - this isn’t part of flytekit.
are you sure it’s doing the right thing?
can you also check the Task pane (if you click on a node in the UI, the far-right pane) - it’s just a json dump of the task definition. can you check to make sure the image is correct?
y
I just found that after running
pyflyte init flyte_example
(this PR:https://github.com/flyteorg/flytekit/pull/738), We will get an template folder. And in that folder, will have a default docker_build.sh and Dockerfile
y
oh i see
pull the image then and run pip freeze on it?
y
Copy code
➜  flyte_example tree
.
├── Dockerfile
├── LICENSE
├── README.md
├── docker_build.sh
├── requirements.txt
└── workflows
    ├── __init__.py
    ├── __pycache__
    │   ├── __init__.cpython-311.pyc
    │   └── example.cpython-311.pyc
    └── example.py
Copy code
➜  flyte_example cat requirements.txt
flytekit>=1.5.0
pandas
scikit-learn
nlpaug
Copy code
➜  flyte_example cat Dockerfile
FROM python:3.8-slim-buster

WORKDIR /root
ENV VENV /opt/venv
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root

RUN apt-get update && apt-get install -y build-essential

ENV VENV /opt/venv
# Virtual environment
RUN python3 -m venv ${VENV}
ENV PATH="${VENV}/bin:$PATH"

# Install Python dependencies
COPY requirements.txt /root
RUN pip install -r /root/requirements.txt

# Copy the actual code
COPY . /root

# This tag is supplied by the build script and will be used to determine the version
# when registering tasks, workflows, and launch plans
ARG tag
ENV FLYTE_INTERNAL_IMAGE $tag
Copy code
➜  flyte_example cat docker_build.sh
#!/bin/bash

set -e

# SET the REGISTRY here, where the docker container should be pushed
REGISTRY=""

# SET the appname here
PROJECT_NAME="flyte_example"

while getopts a:r:v:h flag
do
    case "${flag}" in
        p) PROJECT_NAME=${OPTARG};;
        r) REGISTRY=${OPTARG};;
        v) VERSION=${OPTARG};;
        h) echo "Usage: ${0} [-h|[-p <project_name>][-r <registry_name>][-v <version>]]"
           echo "  h: help (this message)"
           echo "  p: PROJECT_NAME for your workflows. Defaults to 'flyte_example'."
           echo "  r: REGISTRY name where the docker container should be pushed. Defaults to none - localhost"
           echo "  v: VERSION of the build. Defaults to using the current git head SHA"
           exit 1;;
        *) echo "Usage: ${0} [-h|[-a <project_name>][-r <registry_name>][-v <version>]]"
           exit 1;;
    esac
done

# If you are using git, then this will automatically use the git head as the
# version
if [ -z "${VERSION}" ]; then
  echo "No version set, using git commit head sha as the version"
  VERSION=$(git rev-parse HEAD)
fi

TAG=${PROJECT_NAME}:${VERSION}
if [ -z "${REGISTRY}" ]; then
  echo "No registry set, creating tag ${TAG}"
else
 TAG="${REGISTRY}/${TAG}"
 echo "Registry set: creating tag ${TAG}"
fi

# Should be run in the folder that has Dockerfile
docker build --tag ${TAG} .

echo "Docker image built with tag ${TAG}. You can use this image to run pyflyte package."
Copy code
➜  flyte_example cat workflows/example.py
import typing
from flytekit import task, workflow

import nlpaug


@task
def say_hello(name: str) -> str:
    return f"hello {name}!"

@task
def greeting_length(greeting: str) -> int:
    return len(greeting)

@workflow
def mike_wang_wf(name: str = "union") -> typing.Tuple[str, int]:
    greeting = say_hello(name=name)
    greeting_len = greeting_length(greeting=greeting)
    return greeting, greeting_len

if __name__ == "__main__":
    print(f"Running wf() { mike_wf(name='passengers') }")
Copy code
➜  flyte_example kubectl -n flyte get all
NAME                                                      READY   STATUS    RESTARTS   AGE
pod/flyte-sandbox-docker-registry-67b7b67664-vjvws        1/1     Running   0          10h
pod/flyte-sandbox-kubernetes-dashboard-6757db879c-9wqk6   1/1     Running   0          10h
pod/flyte-sandbox-proxy-d95874857-bvzsv                   1/1     Running   0          10h
pod/flyte-sandbox-postgresql-0                            1/1     Running   0          10h
pod/flyte-sandbox-minio-645c8ddf7c-rmp7g                  1/1     Running   0          10h
pod/flyte-sandbox-98749fb56-682jf                         1/1     Running   0          10h

NAME                                         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                         AGE
service/flyte-sandbox-docker-registry        NodePort    10.43.101.44   <none>        5000:30000/TCP                  10h
service/flyte-sandbox-grpc                   ClusterIP   10.43.246.18   <none>        8089/TCP                        10h
service/flyte-sandbox-http                   ClusterIP   10.43.80.86    <none>        8088/TCP                        10h
service/flyte-sandbox-kubernetes-dashboard   ClusterIP   10.43.171.47   <none>        80/TCP                          10h
service/flyte-sandbox-minio                  NodePort    10.43.83.1     <none>        9000:30002/TCP,9001:31951/TCP   10h
service/flyte-sandbox-postgresql             NodePort    10.43.202.17   <none>        5432:30001/TCP                  10h
service/flyte-sandbox-postgresql-hl          ClusterIP   None           <none>        5432/TCP                        10h
service/flyte-sandbox-proxy                  NodePort    10.43.119.60   <none>        8000:30080/TCP                  10h
service/flyte-sandbox-webhook                ClusterIP   10.43.85.237   <none>        443/TCP                         10h
Copy code
➜  flyte_example ./docker_build.sh -r localhost:30000 -v 1.0.0
Registry set: creating tag localhost:30000/flyte_example:1.0.0
[+] Building 85.8s (12/12) FINISHED
 => [internal] load build definition from Dockerfile                                                                      0.1s
 => => transferring dockerfile: 636B                                                                                      0.0s
 => [internal] load .dockerignore                                                                                         0.0s
 => => transferring context: 2B                                                                                           0.0s
 => [internal] load metadata for <http://docker.io/library/python:3.8-slim-buster|docker.io/library/python:3.8-slim-buster>                                                 2.1s
 => [1/7] FROM <http://docker.io/library/python:3.8-slim-buster@sha256:8799b0564103a9f36cfb8a8e1c562e11a9a6f2e3bb214e2adc23982b3|docker.io/library/python:3.8-slim-buster@sha256:8799b0564103a9f36cfb8a8e1c562e11a9a6f2e3bb214e2adc23982b3>  8.2s
 => => resolve <http://docker.io/library/python:3.8-slim-buster@sha256:8799b0564103a9f36cfb8a8e1c562e11a9a6f2e3bb214e2adc23982b3|docker.io/library/python:3.8-slim-buster@sha256:8799b0564103a9f36cfb8a8e1c562e11a9a6f2e3bb214e2adc23982b3>  0.0s
 => => sha256:8799b0564103a9f36cfb8a8e1c562e11a9a6f2e3bb214e2adc23982b36a04511 988B / 988B                                0.0s
 => => sha256:90834dba6381dfc3957573dc7a3e6c5c8ed255cf60079329a6da2b5e6d4257b8 1.37kB / 1.37kB                            0.0s
 => => sha256:addd6962740ab9fd79a788945daa24348c11adcec97d47a647e0a61c86cc9f60 6.87kB / 6.87kB                            0.0s
 => => sha256:8b91b88d557765cd8c6802668755a3f6dc4337b6ce15a17e4857139e5fc964f3 27.14MB / 27.14MB                          3.1s
 => => sha256:824416e234237961c9c5d4f41dfe5b295a3c35a671ee52889bfb08d8e257ec4c 2.78MB / 2.78MB                            0.6s
 => => sha256:8f777578c172d018077d3dc22d6654911fff60066097943fe8c4697ecf8aac35 12.89MB / 12.89MB                          1.8s
 => => sha256:cbfea27109a8b1136059a7973ccb8243889faf162ebc173a05909dcb0bec03c9 244B / 244B                                1.3s
 => => sha256:276dfcf5deffff3c5d540a8e0d9a18656a4c03637a8b4f4eec1f4a147799c901 3.14MB / 3.14MB                            2.0s
 => => extracting sha256:8b91b88d557765cd8c6802668755a3f6dc4337b6ce15a17e4857139e5fc964f3                                 2.9s
 => => extracting sha256:824416e234237961c9c5d4f41dfe5b295a3c35a671ee52889bfb08d8e257ec4c                                 0.3s
 => => extracting sha256:8f777578c172d018077d3dc22d6654911fff60066097943fe8c4697ecf8aac35                                 0.7s
 => => extracting sha256:cbfea27109a8b1136059a7973ccb8243889faf162ebc173a05909dcb0bec03c9                                 0.0s
 => => extracting sha256:276dfcf5deffff3c5d540a8e0d9a18656a4c03637a8b4f4eec1f4a147799c901                                 0.4s
 => [internal] load build context                                                                                         0.0s
 => => transferring context: 1.63kB                                                                                       0.0s
 => [2/7] WORKDIR /root                                                                                                   0.5s
 => [3/7] RUN apt-get update && apt-get install -y build-essential                                                       14.2s
 => [4/7] RUN python3 -m venv /opt/venv                                                                                   3.5s
 => [5/7] COPY requirements.txt /root                                                                                     0.0s
 => [6/7] RUN pip install -r /root/requirements.txt                                                                      50.6s
 => [7/7] COPY . /root                                                                                                    0.0s
 => exporting to image                                                                                                    6.5s
 => => exporting layers                                                                                                   6.5s
 => => writing image sha256:7b65b29a43020c4b9ebcc81f4d2b7440b7ac21e9ca556f8ff27756eae9ed5509                              0.0s
 => => naming to localhost:30000/flyte_example:1.0.0                                                                      0.0s

Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them
Docker image built with tag localhost:30000/flyte_example:1.0.0. You can use this image to run pyflyte package.
Copy code
➜  flyte_example docker image ls
REPOSITORY                                    TAG                                            IMAGE ID       CREATED          SIZE
localhost:30000/flyte_example                 1.0.0                                          7b65b29a4302   38 seconds ago   1.14GB
localhost:30000/flyte_example                 0.0.1                                          36f2cbcdfc0a   11 hours ago     1.29GB
<http://cr.flyte.org/flyteorg/flyte-sandbox-bundled|cr.flyte.org/flyteorg/flyte-sandbox-bundled>   sha-951a93080972bbcf85bc92ec54ac62e074be887f   b66458f6f9b9   13 days ago      1.49GB
registry                                      2                                              4bb5ea59f8e0   5 weeks ago      24MB
Copy code
➜  flyte_example docker push localhost:30000/flyte_example:1.0.0
The push refers to repository [localhost:30000/flyte_example]
290016b638a7: Pushed
68c2ba55c8f6: Pushed
5251342774d0: Pushed
7db6ae363d6e: Pushed
00f5b583631f: Pushed
5f70bf18a086: Layer already exists
e6c5004ee77f: Pushed
997b8e79e84f: Pushed
3054512b6f71: Pushed
ae2d55769c5e: Pushed
e2ef8a51359d: Pushed
1.0.0: digest: sha256:a04724009611fab07cfdbc960bed887f81573cf2d32394d92eaa77dd9fb558b3 size: 2627
Copy code
➜  flyte_example pyflyte register workflows -i localhost:30000/flyte_example:1.0.0
Running pyflyte register from /Users/yuanwang/PycharmProjects/flyte_example with images ImageConfig(default_image=Image(name='default', fqn='localhost:30000/flyte_example', tag='1.0.0'), images=[Image(name='default', fqn='localhost:30000/flyte_example', tag='1.0.0')]) and image destination folder /root on 1 package(s) ('/Users/yuanwang/PycharmProjects/flyte_example/workflows',)
Registering against localhost:30080
Detected Root /Users/yuanwang/PycharmProjects/flyte_example, using this to create deployable package...
2023-07-25 00:34:34,507189 WARNING  {"asctime": "2023-07-25 00:34:34,507", "name": "flytekit.cli", "levelname": "WARNING", "message": "Could not determine ignored files due to:\nb'fatal: not a git repository (or any of the parent              ignore.py:51
                                    directories): .git\\n'\nNot applying any filters"}
No output path provided, using a temporary directory at /var/folders/wl/8003r09n2sj0ggf1mhj44db00000gq/T/tmp10zuu9fd instead
Computed version is FOdS36hniFF_ggAfXxMO8g==
Loading packages ['workflows'] under source root /Users/yuanwang/PycharmProjects/flyte_example
Failed with Unknown Exception <class 'ModuleNotFoundError'> Reason: No module named 'nlpaug'
Basically, I followed the steps in this Flyte Fundamental. However, I got this error.
y
have you installed the nlpaug in your local env?
y
You mean in venv?
y
yes
y
I have installed it in venv. But is it relevant?
OK. Now it's working. Thanks for the help. That means we have to run the
pyflyte register
command within a python venv that has all the custom dependencies installed. Is this a requirement only for local k3s cluster? Or is it also required for a real k8s cluster on cloud?
Copy code
(flyte_example) ➜  flyte_example pyflyte register workflows -i localhost:30000/flyte_example:1.0.0
Running pyflyte register from /Users/yuanwang/PycharmProjects/flyte_example with images ImageConfig(default_image=Image(name='default', fqn='localhost:30000/flyte_example', tag='1.0.0'), images=[Image(name='default', fqn='localhost:30000/flyte_example', tag='1.0.0')]) and image destination folder /root on 1 package(s) ('/Users/yuanwang/PycharmProjects/flyte_example/workflows',)
Registering against localhost:30080
Detected Root /Users/yuanwang/PycharmProjects/flyte_example, using this to create deployable package...
2023-07-25 00:39:34,901349 WARNING  {"asctime": "2023-07-25 00:39:34,901", "name": "flytekit.cli", "levelname": "WARNING", "message": "Could not determine ignored files due to:\nb'fatal: not a git repository (or any of the parent              ignore.py:51
                                    directories): .git\\n'\nNot applying any filters"}
No output path provided, using a temporary directory at /var/folders/wl/8003r09n2sj0ggf1mhj44db00000gq/T/tmpqnhnchaj instead
Computed version is 3uYOfQrhNfPA3EjKNdNwHA==
Loading packages ['workflows'] under source root /Users/yuanwang/PycharmProjects/flyte_example
Successfully serialized 4 flyte objects
[✔] Registration workflows.example.say_hello type TASK successful with version 3uYOfQrhNfPA3EjKNdNwHA==
[✔] Registration workflows.example.greeting_length type TASK successful with version 3uYOfQrhNfPA3EjKNdNwHA==
[✔] Registration workflows.example.mike_wang_wf type WORKFLOW successful with version 3uYOfQrhNfPA3EjKNdNwHA==
[✔] Registration workflows.example.mike_wang_wf type LAUNCH_PLAN successful with version 3uYOfQrhNfPA3EjKNdNwHA==
Successfully registered 4 entities
y
In my understanding, all required.
y
However, if I run
pyflyte run
within the same venv, I will get the
no module named nlpaug
error in the console. And it is not relevant with whether the image option is specified in the
pyflyte run
command.
Copy code
(flyte_example) ➜  flyte_example pyflyte run --remote -i localhost:30000/flyte_example:1.0.0 workflows/example.py wf
Go to <http://localhost:30080/console/projects/flytesnacks/domains/development/executions/fb013548df81f4d23982> to see execution in the console.
(flyte_example) ➜  flyte_example pyflyte run --remote workflows/example.py wf
Go to <http://localhost:30080/console/projects/flytesnacks/domains/development/executions/f3ee8f75789714feb9b3> to see execution in the console.
y
If --image isn't specified, the default image will be used, which definitely does not include nlpaug. As long as the image you build have nlpaug, It should be fine
Maybe try to build image with a new tag. And make sure you are using the right image by looking at console.
y
@Yicheng Lu • I built an new image with a new tag, and it works now. Actually the image with the old tag also works now. So not sure what happened yesterday. Anyway, thanks a lot for your patience and your professional support, Yicheng. • Maybe we can emphasize the point of using the virtual environment in the official document. I thought it might be a pre-requisite only for testing flyte with
pyflyte run
, but I did not realize actually it is necessary for almost all flyte CLI commands. • In addition, is there any document that describes more details about the registration and execution? For example, is the FlyteAdmin an operating system process on my local laptop? Or is it a service in the K8s cluster? Actually I did not see any of them by checking
ps aux
and
kubectl -n flyte get all
. I am asking this, because I would like to understand the architecture better so that I can diagnose issues more by myself.
that might be helpful.
flyteadmin is a service, but in the flyte-binary chart, all flyte components have been bundled together into one big binary.
y
@Yee Thank you.