Tom Szumowski

    Tom Szumowski

    1 month ago
    Hi everyone! Another beginner question here. I'd like to run one task in a workflow with a specific image. I found and followed these docs but ran into an error. I attached the provided example
    basic_workflow.py
    , but with the
    t2
    task decorator changed to:
    @task(container_image="python:3.7")
    When I run it, I get this error:
    [f902b0296c1a94ed4ade-n1-0] terminated with exit code (128). Reason [StartError]. Message: 
    failed to create containerd task: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "pyflyte-fast-execute": executable file not found in $PATH: unknown.
    Is it possible to use any arbitrary image as a task there? Or does the image need to follow a specific build process that includes
    pyflyte-fast-execute
    ? Thank you!
    I should also note that I am running this on GCP GKE, and running with the following command:
    pyflyte run --image <http://gcr.io/urbn-data-science/flytekit-test-wrapper:latest|gcr.io/urbn-data-science/flytekit-test-wrapper:latest> --remote workflows/basic_workflow_custom.py my_wf --a 10 --b foobar
    The
    <http://gcr.io/urbn-data-science/flytekit-test-wrapper:latest|gcr.io/urbn-data-science/flytekit-test-wrapper:latest>
    image is a default image I built due to GCP workflow identity errors discussed here. Not sure if that is conflicting with my desired goal above or not.
    Kevin Su

    Kevin Su

    1 month ago
    Custom image should also install flytekit
    Therefore, you have create a new docker file, and use python:3.7 as you base image, then install flytekit on it
    Tom Szumowski

    Tom Szumowski

    1 month ago
    ah. Thanks! That worked.
    @Kevin Su A follow-up question. If I just pip-install flytekit, it's missing other key packages/config as shown in the flytekit Dockerfile:
    WORKDIR /root
    ENV PYTHONPATH /root
    
    RUN pip install awscli
    RUN pip install gsutil
    
    ARG VERSION
    ARG DOCKER_IMAGE
    
    # Pod tasks should be exposed in the default image
    RUN pip install -U flytekit==$VERSION flytekitplugins-pod==$VERSION
    
    ENV FLYTE_INTERNAL_IMAGE "$DOCKER_IMAGE"
    If I wish to use a custom image, do I need to create a new image that runs all of the above as well? Otherwise I was getting GCP permission errors. And if so, does that mean this should be applied to every custom Dockerfile I wish to have?
    Ketan (kumare3)

    Ketan (kumare3)

    1 month ago
    @Tom Szumowski so the way most of users do it is, build one image for most of their workflows. But sometimes when you want to use more than one image, then use image_config to auto-substitue
    Tom Szumowski

    Tom Szumowski

    1 month ago
    @Ketan (kumare3) this is great! Thank you for the resources. Love the documentation and templating to guide best practices around larger scale management 💯.
    Ketan (kumare3)

    Ketan (kumare3)

    1 month ago
    @Tom Szumowski you are welcome and we are sorry if the docs are a little hard to find
    please file issues in how we can improve the docs
    Tom Szumowski

    Tom Szumowski

    1 month ago
    Docs have been largely great so far. But will consider that for the future
    I think where I got caught for a loop was that the flytekit needs to be packed with the image. That's largely just due to my misunderstanding of how the pods were being deployed. Once I layered it in, I got a custom GPU image running with reasonable ease. 👍
    Ketan (kumare3)

    Ketan (kumare3)

    1 month ago
    @Tom Szumowski you have to leave Kubeflow and Argo behind - welcome to Flyte
    Tom Szumowski

    Tom Szumowski

    1 month ago
    In our current pipelines (airflow, KFP), we have different images all over. I kind of like the idea of having a common project-wide default image and only customize when absolutely needed. Makes CM easier and consistent. Our pipeline tasks largely need the same packages anyway, the usual: pandas, sklearn, torch, numpy, etc. 🙂
    Ketan (kumare3)

    Ketan (kumare3)

    1 month ago
    yup
    and the way we do it is build the images in CI
    and then iterate on it quickly using
    pyflyte run / register
    what we call
    fast register