Hey team, What is the recommended way to get the execution Id of a workflow? We used the following, ...

varsha Parthasarathy

Hey team, What is the recommended way to get the execution Id of a workflow? We used the following, are there better ways to do it?

flytekit.current_context().execution_id.name or os.environ['FLYTE_INTERNAL_EXECUTION_ID']

Hi everyone! I'm wondering if others have had issues getting Keycloak to work as the OAuth2 Authoriz...

Neal Feierabend

over 3 years ago

Hi everyone! I'm wondering if others have had issues getting Keycloak to work as the OAuth2 Authorization Server. I have the console working correctly and can login through our Keycloak instance to the web ui, but when I try to enable the configMap.adminServer.auth.appAuth and thirdPartyConfig in the Helm chart, the Flyteadmin pod enters a crashloop on startup trying to create the oauth2ResourceServer (hitting this error point I think) because it is getting a "404" response. I've been able to see in access logs that it is looking for /auth/realms/myrealm/.well-known/oauth-authorization-server on the Keycloak instance and that endpoint doesn't exist, thus the 404. As far as I know, Keycloak only supports the .well-known/openid-configuration and not oauth-authorization-service endpoint, but I could be wrong since others seem to have it working? This is deploying with v1.0.2 of the helm chart, which uses v1.1.16 of flyteadmin, and v16 Keycloak instance.

Hi! Do I need a specific version of helm/kubectl to install the helm charts from the flyte repo? I ...

William Burke

over 3 years ago

Hi! Do I need a specific version of helm/kubectl to install the helm charts from the flyte repo? I keep getting this error

Error: INSTALLATION FAILED: unable to build kubernetes objects from release manifest: error validating "": error validating data: ValidationError(Deployment.spec.template.spec.securityContext): unknown field "fsGroupChangePolicy" in io.k8s.api.core.v1.PodSecurityContext

Hello everyone.. I'm testing the "getting started" example in the flyte docs but I'm getting "module...

Eduardo Matus

almost 3 years ago

Hello everyone.. I'm testing the "getting started" example in the flyte docs but I'm getting "module 'example' has no attribute 'wf'" using the pyflyte run command.. any hints?

Hey all, I'm having a little trouble running a workflow in the Flyte sandbox on my local machine - i...

Tom Stokes

almost 3 years ago

Hey all, I'm having a little trouble running a workflow in the Flyte sandbox on my local machine - in particular, the workflow that I'm attempting to run is failing to pull the image that I've built within the sandbox. Here you can see the containers that I have running on my host:

$ docker ps
>>>
CONTAINER ID   IMAGE                                                                               COMMAND                  CREATED             STATUS             PORTS                                                                                                                 NAMES

dbf8f5dcb150   <http://cr.flyte.org/flyteorg/flyte-sandbox:dind-bfa1dd4e6057b6fc16272579d61df7b1832b96a7|cr.flyte.org/flyteorg/flyte-sandbox:dind-bfa1dd4e6057b6fc16272579d61df7b1832b96a7>   "tini flyte-entrypoi…"   About an hour ago   Up About an hour   0.0.0.0:30081-30082->30081-30082/tcp, 0.0.0.0:30084->30084/tcp, 2375-2376/tcp, 0.0.0.0:30086-30088->30086-30088/tcp   flyte-sandbox

From which we can then find the images that exist inside the

dbf8f5dcb150

container:

$ docker exec -it dbf8f5dcb150 docker image ls
>>>
REPOSITORY                                     TAG                       IMAGE ID       CREATED          SIZE
papermill-exploration                          latest                    3c40c6deb126   23 minutes ago   948MB
...

I can see my project in there under the tag

papermill-exploration:latest

. I then serialize and submit my workflow spec as follows:

pyflyte --pkgs workflows package -f --image "papermill-exploration:latest"
flytectl register files --project flytesnacks --domain development --archive flyte-package.tgz --version v2

All of which works:

$ flytectl get workflows --project flytesnacks --domain development  
>>>        
 --------- ------------------------------------ ----------------------------- 
| VERSION | NAME                               | CREATED AT                  |
 --------- ------------------------------------ ----------------------------- 
| v2      | workflows.workflow.nb_to_python_wf | 2022-12-12T12:41:53.987960Z |
 --------- ------------------------------------ ----------------------------- 
| v1      | workflows.workflow.nb_to_python_wf | 2022-12-12T12:33:08.295661Z |
 --------- ------------------------------------ ----------------------------- 
2 rows

I then attempt to invoke the workflow, but the resulting pod cannot pull the image:

$ flytectl get execution --project flytesnacks --domain development azlfqvzfsbz4lr8pbmlt
>>>
 ---------------------- ------------------------------------ ------------- -------- ---------------- -------------------------------- --------------- -------------------- --------------------------------------------------------- 
| NAME                 | LAUNCH PLAN NAME                   | TYPE        | PHASE  | SCHEDULED TIME | STARTED                        | ELAPSED TIME  | ABORT DATA (TRUNC) | ERROR DATA (TRUNC)                                      |
 ---------------------- ------------------------------------ ------------- -------- ---------------- -------------------------------- --------------- -------------------- --------------------------------------------------------- 
| azlfqvzfsbz4lr8pbmlt | workflows.workflow.nb_to_python_wf | LAUNCH_PLAN | FAILED |                | 2022-12-12T13:07:23.548693519Z | 23.161600293s |                    | [1/1] currentAttempt done. Last Error: USER::containers |
|                      |                                    |             |        |                |                                |               |                    | with unready status: [azlfqvzfsbz4lr8pbmlt-n            |
 ---------------------- ------------------------------------ ------------- -------- ---------------- -------------------------------- --------------- -------------------- --------------------------------------------------------- 
1 rows

$ docker exec -it dbf8f5dcb150 kubectl -n flytesnacks-development describe pod azlfqvzfsbz4lr8pbmlt-n0-0
>>>
...
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  27m                    default-scheduler  Successfully assigned flytesnacks-development/azlfqvzfsbz4lr8pbmlt-n0-0 to dbf8f5dcb150
  Normal   Pulling    25m (x4 over 27m)      kubelet            Pulling image "papermill-exploration:latest"
  Warning  Failed     25m (x4 over 27m)      kubelet            Failed to pull image "papermill-exploration:latest": rpc error: code = Unknown desc = Error response from daemon: pull access denied for papermill-exploration, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
  Warning  Failed     25m (x4 over 27m)      kubelet            Error: ErrImagePull
  Warning  Failed     25m (x6 over 27m)      kubelet            Error: ImagePullBackOff
  Normal   BackOff    2m22s (x106 over 27m)  kubelet            Back-off pulling image "papermill-exploration:latest"

Have I missed something here? Are the pods not authenticated against the docker repo? Or am I not specifying my images correctly?

Hello flyte team. I have a question regarding using persistent volumes (PVs) and persistent volume c...

Mike Zhong

about 3 years ago

Hello flyte team. I have a question regarding using persistent volumes (PVs) and persistent volume claims (PVCs) with flyte tasks and workflows. More specifically, I’m interested in using an AWS

efs

volume in my

eks

cluster as a shared mount for

ReadWriteMany

. I have looked at this example and see the

V1Volume

being passed into the pod spec. I’m looking through these docs and see that you can create a

V1Volume

from a

persistent_volume_claim

. So in theory, I can just create a PV and PVC in my cluster, include them in my pod spec, and attach the spec to a flyte task. But I noticed that PVCs are namespace specific and flyte uses the

project-domain

namespace for tasks/workflows that are executing. Two questions. 1. Are PVCs the right solution here and if so, how can I dynamically create PVCs for my

project-domain

s? Is this something flyte could be configured to do for us or would we be responsible for ensuring any referenced PVCs and PVs exist. 2. What other options are available for mounting shared persistent volumes to my flyte tasks?

I need to conditionally run some tasks based on an output of a previous task. I saw that using `if` ...

Victor Churikov

over 2 years ago

I need to conditionally run some tasks based on an output of a previous task. I saw that using

if

in my @workflow to conditionally run these tasks is not supported because of limitations related to DAG and/or serialization I tried with a conditional (

from flytekit import conditional

), when I do

conditional('somename').if_(mybool).then(mytask(myinputs=myinputs))

it ignores my conditional with no representation in the graph UI. When I add an

.else_().then(anytask())

it gets represented in graphs UI but errors with

AttributeError: 'FlyteBranchNode' object has no attribute 'interface'

after the run already begins I tried with

@task

and with

@dynamic

What is the correct way to conditionally run these multiple tasks based an a boolean output of another task? Note that these tasks also have dependencies on each other, one of them produces outputs and the rest receive it as input

Hi all :wave: I have a problem of sharing data between tasks. I found a similar issue here in discus...

Anthony

about 3 years ago

Hi all 👋 I have a problem of sharing data between tasks. I found a similar issue here in discussions (link)

Workflow[flyte-anti-fraud-ml:development:app.workflow.main_flow] failed. RuntimeExecutionError: max number of system retry attempts [31/30] exhausted. Last known status message: failed at Node[n0]. RuntimeExecutionError: failed during plugin execution, caused by: error file @[<s3://my-s3-bucket/metadata/propeller/flyte-anti-fraud-ml-development-f31c365f02c114639b00/n0/data/0/error.pb>] is too large [28775519] bytes, max allowed [10485760] bytes

I added

max-output-size-bytes

params to the

flyte-propeller-config

and wait to apply all changes before re-submitting a new task.

kubectl edit configmap -n flyte flyte-propeller-config

My propeller section of

flyte-propeller-config

looks like:

core.yaml: |
    manager:
      pod-application: flytepropeller
      pod-template-container-name: flytepropeller
      pod-template-name: flytepropeller-template
    propeller:
      max-output-size-bytes: 52428800
      downstream-eval-duration: 30s
      enable-admin-launcher: true
      leader-election:
        enabled: true
        lease-duration: 15s
        lock-config-map:
          name: propeller-leader
          namespace: flyte
        renew-deadline: 10s
        retry-period: 2s
      limit-namespace: all
      max-workflow-retries: 3
      metadata-prefix: metadata/propeller
      metrics-prefix: flyte
      prof-port: 10254

Task configuration has been set up via

kubectl -n flyte edit cm flyte-admin-base-config

storage.yaml: |
    storage:
      type: minio
      container: "my-s3-bucket"
      stow:
        kind: s3
        config:
          access_key_id: minio
          auth_type: accesskey
          secret_key: miniostorage
          disable_ssl: true
          endpoint: <http://minio.flyte.svc.cluster.local:9000>
          region: us-east-1
      signedUrl:
        stowConfigOverride:
          endpoint: <http://localhost:30084>
      enable-multicontainer: false
      limits:
        maxDownloadMBs: 50
  task_resource_defaults.yaml: |
    task_resources:
      defaults:
        cpu: 1
        memory: 3000Mi
        storage: 100Mi
      limits:
        cpu: 4
        gpu: 1
        memory: 3Gi
        storage: 500Mi

Also changing

maxDownloadMBs

didn’t change the situation Changing cache

max_size_mbs

flyte-propeller-config

from 0 to some custom value also not working:

cache.yaml: |
    cache:
      max_size_mbs: 100
      target_gc_percent: 70

I ty to change different time with different params but the error was arising during each new executions. I saw that none of

max-output-size-bytes

max-workflow-retries

(changed from 30 --> 3) are passed to the workflow execution:

RuntimeExecutionError: max number of system retry attempts [31/30] exhausted...

error file @[<s3://my-s3-bucket/metadata/propeller/flyte-anti-fraud-ml-development-f31c365f02c114639b00/n0/data/0/error.pb>] is too large [28775519] bytes, max allowed [10485760] bytes...

Hereafter are my cli steps to create a new execution:

- kubectl -n flyte edit cm flyte-admin-base-config
- kubectl edit configmap -n flyte flyte-propeller-config
- flytectl get task-resource-attribute -p flyteexamples -d development
- flytectl update project -p flyte-anti-fraud-ml -d development --storage.cache.max_size_mbs 100
- flytectl get launchplan --project flyte-anti-fraud-ml --domain development app.workflow.main_flow --latest --execFile exec_spec.yaml
- flytectl create execution --project flyte-anti-fraud-ml --domain development --execFile exec_spec.yaml

What additional steps I have to do to force flytectl to use my propeller changes and solve the problem of a max 10Mb size allowed for serialized uploads to flyte?

Hi, I ran into the default memory limit: `Requested MEMORY default [2Gi] is greater than current lim...

Felix Ruess

almost 3 years ago

Hi, I ran into the default memory limit:

Requested MEMORY default [2Gi] is greater than current limit set in the platform configuration [1Gi]. Please contact Flyte Admins to change these limits or consult the configuration

And I already increased the task resource limits but I still get the same error. Any pointers?

Hi all, I'm trying to get my implementation of a Bazel/Flyte integration off the ground and I'm runn...

Sam Eckert

almost 3 years ago

Hi all, I'm trying to get my implementation of a Bazel/Flyte integration off the ground and I'm running into an issue on the last bit which is stumping me. I created a bazel macro called

flyte_library

(taking cues from @Rahul Mehta’s talk!). On run, that rule 1. Creates a py3_image with the workflow file, as well as pulling in the

aws

flytectl

, and

pyflyte-*

cli's. I wanted to keep things hermetic within the bazel env so I didn't create a base image with `awscli`/`pyflyte` pre-installed. 2. We add the

FLYTE_INTERNAL_IMAGE

tag, and then push this image to ECR. I'm still not 100% sure what

FLYTE_INTERNAL_IMAGE

does, but followed the examples I could find. 3. We then have a genrule which runs

docker run

using the image we just created, and calls a custom register script which wraps

pyflyte register

to register the workflow, and uses

flytectl

to enable/optionally execute any launchplans registered alongside the workflow. Registration works correctly as far as I can tell. The objects are created and viewable in the Console, but all tasks fail with:

[1/1] currentAttempt done. Last Error: UNKNOWN::Outputs not generated by task execution

I can see the pod starting, pulling the correct image, and the

pyflyte-fast-execute

command exiting successfully via

kubectl

. No logs are created before the script exits so I'm having a bit of trouble identifying the issue. Weirder still, the exact same

pyflyte-fast-execute

command runs fine if I run it in a docker container locally.

👍 1

Previous 345 Next

Flyte

Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.