Hello! I have a deployment up and running (mostly?) in AWS right now. However, when I am trying to r...

Matthew Krueger

Hello! I have a deployment up and running (mostly?) in AWS right now. However, when I am trying to run something with

pyflyte

I am getting the following error:

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNKNOWN
        details = "failed to create a signed url. Error: WebIdentityErr: failed to retrieve credentials
caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity

I have been double checking all the SAs and everything seems to be in order but obviously I am missing something. Any pointers would be appreciated!

Team, i am trying to modify existing registered workflow and register again with some changes but e...

Sathish kumar Venkatesan

about 3 years ago

Team, i am trying to modify existing registered workflow and register again with some changes but ending up with below exception. it is showing exception with launch plan already exists. kindly let me know best practices to redeploy the workflow with same launch plan.

details = "launch plan with different structure already exists with id resource_type:LAUNCH_PLAN project:"cloudops-max-flyte-demo" domain:"development" name:"snowflake_cron_scheduled_lp" version:"v1"

Hi everyone :raised_hands: In my company we are evaluating Flyte :partyparrot:and we want to deploy...

Marti Jorda Roca

over 2 years ago

Hi everyone 🙌 In my company we are evaluating Flyte 🦜and we want to deploy an MVP to AWS EKS, with s3 and RDS postgres. We are following the guide flyte-the-hard-way but we are stuck in the 05-deploy-with-helm.md. When we try to install flyte the pod throws the following error:

/go/pkg/mod/gorm.io/gorm@v1.24.1-0.20221019064659-5dd2bb482755/gorm.go:206
[error] failed to initialize database, got error failed to connect to `host=***** user=flyteadmin database=flyteadmin`: server error (FATAL: no pg_hba.conf entry for host "172.32.101.12", user "flyteadmin", database "flyteadmin", no encryption (SQLSTATE 28000))

We have run this command to test the database connection:

kubectl run pgsql-postgresql-client --rm --tty -i --restart='Never' --namespace testdb --image <http://docker.io/bitnami/postgresql:11.7.0-debian-10-r9|docker.io/bitnami/postgresql:11.7.0-debian-10-r9> --env='PGPASSWORD=<Password>' --command -- psql testdb --host <RDS-ENDPOINT-NAME> -U flyteadmin -d flyteadmin -p 5432

And works. The only difference is that when we run it asks for the password. Anyone knows what’s happening here?

❤️ 3

Hi, I have a question: Is there a way from within a task code execution to retrieve information on u...

Arshak Ulubabyan

about 3 years ago

Hi, I have a question: Is there a way from within a task code execution to retrieve information on under what domain & project is it running. on? E.g. I have Development/Staging/Production domains, and when I run the workflow, I want task to load configs for the right environment.

Hi folks... I have a `@dynamic` task that calls a `@task` task that's expected to return a dict. Ins...

Rupsha Chaudhuri

over 3 years ago

Hi folks... I have a

@dynamic

task that calls a

@task

task that's expected to return a dict. Instead I'm getting a

flytekit.core.promise.Promise

object back. I also don't see any of the logs from that task.. nor do I see the called task executing on the console. What's weird is that the same called task works as expected when invoked from other tasks in the same workflow. Can someone help me understand why a task would not execute and return a Promise instead?

Got a question with regards to RBAC because I couldn’t find anything in the doc. Is it possible at ...

Stephen

over 2 years ago

Got a question with regards to RBAC because I couldn’t find anything in the doc. Is it possible at the moment to limit access to some specific projects to a group of people only? I haven’t found anything where I could limit viewing/ editing a project to a specific group of developers.

Running the greeting example... I am getting the following error on the `say_hello` task :thread:

Peter Sulcs

over 2 years ago

Running the greeting example... I am getting the following error on the

say_hello

task 🧵

hey everyone, i'm trying to run ray on the flyte sandbox deployment, and am running into the followi...

James Evers

about 3 years ago

hey everyone, i'm trying to run ray on the flyte sandbox deployment, and am running into the following error. it looks like its basically an OOM error, but i'm not sure how to increase the default size of

/dev/shm

. what's the best approach to bumping this default in the sandbox deployment?

/usr/local/lib/python3.8/dist-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated
  "class": algorithms.Blowfish,
2022-09-14 18:10:43,190	WARNING services.py:1882 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67108864 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=0.10gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2022-09-14 18:10:46,611	INFO worker.py:1509 -- Started a local Ray instance. View the dashboard at [1m[32m<http://127.0.0.1:8265> [39m[22m
[2022-09-14 18:10:57,864 E 1 1] <http://core_worker.cc:149|core_worker.cc:149>: Failed to register worker 01000000ffffffffffffffffffffffffffffffffffffffffffffffff to Raylet. IOError: [RayletClient] Unable to register worker with raylet. No such file or directory

Hi, I have a ContianerTask as shown below ```my-task = ContainerTask( metadata=TaskMetadata(cach...

Gaurav Kumar

about 2 years ago

Hi, I have a ContianerTask as shown below

my-task = ContainerTask(
    metadata=TaskMetadata(cache=True, cache_version="1.0"),
    name="my-task",
    image="my-image",
    input_data_dir="/var/inputs",
    output_data_dir="/var/outputs",
    inputs=kwtypes(inDir=str),
    outputs=kwtypes(out=str),
    requests=Resources(gpu="1"),
    limits=Resources(gpu="1"),
    command=[
        "/bin/bash",
    ],
    arguments=[
        "-c",
        "echo \"out\" > /var/outputs/out; ... other commands"
        ],
   ....
)

I wanted to cache the task, for which I found that I had to put inputs/outputs even though I don’t need them. So, I just a string “out” in

/var/outputs/out

as shown in the

arguments

and put a string in the

inDir

as below while calling the task.

@workflow
def aeb_sanity_workflow(data: Dict):
    ## -----------------------------------------------------------------------------
    .......
    my_task_promise = my-task(inDir="some string")
    ........

This was working for me with earlier version of Flyte mentioned below

<http://cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-1a8d37570cda76cc01bf8c26354f4aad4debcd0a|cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-1a8d37570cda76cc01bf8c26354f4aad4debcd0a>

However, I use the master version of flyte patched with https://github.com/flyteorg/flyte/pull/3256 and manually built in

docker/sandbox-bundled

using

make build-gpu

because I needed gpu support in sandbox. I’m seeing that with this latest version, I saw two issues which were not there with
<http://cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-1a8d37570cda76cc01bf8c26354f4aad4debcd0a|cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-1a8d37570cda76cc01bf8c26354f4aad4debcd0a>

tag: v1.8.1

1. For the above mentioned ContainerTask, It’s throwing errors saying output doesn’t exist after workflow execution. I haven’t changed a single line of code in in the

my-task

except the latest flyte image. 2. Also, for the task that needs GPU, since, the image size is huge ~24 GB, k8 node came under disk pressure, and severals pods were evicted.

> kubectl describe pod <gpu-pod>
  Warning  Evicted              8m10s (x3 over 9m30s)  kubelet            The node was low on resource: ephemeral-storage.
  Warning  ExceededGracePeriod  8m (x3 over 9m20s)     kubelet            Container runtime did not kill the pod within specified grace period.
  Normal   Pulled               7m59s                  kubelet            Successfully pulled image "my-gpu-image" in 8m40.26240502s
  Normal   Created              7m59s                  kubelet            Created container primary
  Normal   Started              7m58s                  kubelet            Started container primary
  Normal   Killing              7m58s                  kubelet            Stopping container primary
  Warning  Evicted              7m30s                  kubelet            The node was low on resource: ephemeral-storage. Container primary was using 13516Ki, which exceeds its request of 0.

> kubectl describe nodes <>
Warning  FreeDiskSpaceFailed      52m                    kubelet                failed to garbage collect required amount of images. Wanted to free 110758122291 bytes, but freed 155692522 bytes
  Warning  ImageGCFailed            52m                    kubelet                failed to garbage collect required amount of images. Wanted to free 110758122291 bytes, but freed 155692522 bytes
  Warning  FreeDiskSpaceFailed      47m                    kubelet                failed to garbage collect required amount of images. Wanted to free 111138763571 bytes, but freed 0 bytes
  Warning  ImageGCFailed            47m                    kubelet                failed to garbage collect required amount of images. Wanted to free 111138763571 bytes, but freed 0 bytes
  Warning  EvictionThresholdMet     7m56s (x3 over 11m)    kubelet                Attempting to reclaim ephemeral-storage
  Normal   NodeNotReady             7m49s                  node-controller        Node 1fefe346c083 status is now: NodeNotReady
  Normal   NodeHasSufficientMemory  7m47s (x3 over 57m)    kubelet                Node 1fefe346c083 status is now: NodeHasSufficientMemory
  Normal   NodeHasDiskPressure      7m47s (x2 over 11m)    kubelet                Node 1fefe346c083 status is now: NodeHasDiskPressure

I didn’t observe these issues in this image

<http://cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-1a8d37570cda76cc01bf8c26354f4aad4debcd0a|cr.flyte.org/flyteorg/flyte-sandbox-bundled:sha-1a8d37570cda76cc01bf8c26354f4aad4debcd0a>

tag: v1.8.1

Hi, I'm trying to run Sandbox deployment with k3d cluster, following the instruction: <https://docs....

Tatyana Vlasova

over 3 years ago

Hi, I'm trying to run Sandbox deployment with k3d cluster, following the instruction: https://docs.flyte.org/en/latest/deployment/sandbox.html#deploying-your-own-flyte-sandbox-environment-to-a-k8s-cluster At the resource page I see:

TODO: These instructions currently still rely on the old kustomize setup, and will be moved over to the Helm chart soon.

Is there any relevant instruction for deploying Flyte with k3d, because the current version runs into error (or maybe is not full) with the 2d step:

k3d cluster create -p "30081:30081" –no-lb –k3s-server-arg '–no-deploy=traefik' –k3s-server-arg '–no-deploy=servicelb' flyte

Error:

Error: accepts between 0 and 1 arg(s), received 6

Tried to fix it with (following the k3d hints for tags):

k3d cluster create flyte -p 30081:30081 --no-lb  --k3s-arg "–no-deploy=traefik" --k3s-arg "–no-deploy=servicelb"

but get an error again:

FATA[0000] failed to transform ports: No nodefilters specified

What is the right setup procedure and what are the right parameters for this?

Previous 8910 Next

Flyte

Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.