what kubernetes versions (min, max) does flyte support?

Eugene Cha

what kubernetes versions (min, max) does flyte support?

Hi all, how can I tell if a map_task is successfully cached? As far as I can tell <this bugfix> shou...

Nicholas LoFaso

Hi all, how can I tell if a map_task is successfully cached? As far as I can tell this bugfix should now allow for caching, but the UI doesn’t give an indication that it was saved.

I'm running into this error in an @dynamic task: ```TooLarge: Event message exceeds maximum gRPC siz...

Thomas Blom

over 1 year ago

I'm running into this error in an @dynamic task:

TooLarge: Event message exceeds maximum gRPC size limit, caused by [rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5928244 vs. 4194304)]

I found a message from @Dan Rammer (hamersaw) from a year ago that says (abbreviated):

So this error is happening when propeller sends an event to admin <and exceeds the configured gRPC buffer size>

What I don't understand is the "event message" that is occurring and its contents -- or how to get around it. It is related to the number of tasks I create in the @dynamic, and the error occurs before any of the tasks get launched. My use case looks like this:

@dynamic
def some_dynamic_worklow( input ):

  for i in range(n):
    # manipulate input to get input1, input2, etc.
    res1 = task1( input1 )
    res2 = task2( input2, res1 )
    task3( input3, res1, res2 )         # writes all results to filesystem

  summary = X # some local computation, resulting in a smallish object

  return summary

For small

, this works fine; as

gets bigger, I get the error. This error occurs before I see any tasks launched - so presumably is related to sending info about the task inputs -- in one event-message? -- that need to be launched? Some of my inputs do in fact contain long protein sequences, so may be e.g. 100K in size - but I don't understand why these ALL are presumably getting sent in some single event/message, and causing the size issue. I'm not passing any big collections of them around -- just one at a time between tasks. And looking at the pod-log for my flyte-binary via k9s, I don't even see this logged, so all I have to go on is the message at top that is shown in Flyte Console. Help? Thanks!

Hi, I’m trying to understand how Flyte works under the hood to evaluate whether it can deliver the n...

Rene Penkert

over 1 year ago

Hi, I’m trying to understand how Flyte works under the hood to evaluate whether it can deliver the necessary performance for us. I have a deployment on EKS following the Single Cluster Simple Cloud Deployment guide and have executed some more simple workflows. Looking at FlytePropeller Architecture &

YT FlytePropeller Deep Dive▾

https://www.youtube.com/watch?v=FJ-rG9lZDhY

& Optimizing Performance I still can’t map it to what is running in my cluster. 1. What are the actual components running in my EKS cluster that represent FlyteAdmin & FlytePropeller & WorkQueue? There is one Pod

flyte-backend-flyte-binary-xxx

so that includes everything? and I can only scale everything together? 2.

"FlytePropeller can scale to 1000s of workers on a single CPU"

Worker is used a lot in regards to FlytePropeller but what is actually meant by that? An instantiation of one FlytePropeller aka Pod? A process as part of that

flyte-backend-flyte-binary

Pod? A node as part of the cluster? What is a worker and how can I observe what it is doing? 3. How is scaling of the cluster supposed to work? Assume I want to increase the number of concurrent tasks. How would I make sure that the cluster can handle it? Scaling out FlyteAdmin & Scaling out Datacatalog & Scaling out FlytePropeller does not describe what I actually need to change to make it work except deploying the “FlytePropeller Manager”. I don’t have a deployment in my cluster that is called “FlyteAdmin” or “Datacatalog” so I’m not sure what is meant by “Datacatalog is a stateless service and its replicas (in the kubernetes deployment) can be simply increased to allow higher throughput” Excuses in advance if these questions are trivial … . It would also help me if you can point me to some design documents or similar that I can read to answer my questions 🙂

Hi community! I'm trying to run a dummy python job on a :databricks: job cluster (_new_cluster_) u...

Robert Ambrus

over 2 years ago

Hi community! I'm trying to run a dummy python job on a 🧱 job cluster (_new_cluster_) using

pyflyte

like this:

pyflyte run -i <corporate_docker_registry>/<prefix>/flyte-dbx-demo:0.0.1 --remote --destination-dir . dbx_example_job_cluster.py my_databricks_job

Job successfully submitted to Databricks, but the cluster creation failed because the IMAGE could not be pulled from our corporate docker registry (due to missing credentials). Analyzed the audit logs and found that the docker image config looks like this (_basic_auth_ block is missing):

"docker_image": {
  "url": "<corporate_docker_registry>/<prefix>/flyte-dbx-demo:0.0.1",
}

I'm assuming that if we add

imagePullSecrets

to our service account as described here, Flyte will pass the Docker credentials to Databricks in job definition like this:

"docker_image": {
  "url": "<corporate_docker_registry>/<prefix>/flyte-dbx-demo:0.0.1",
  "basic_auth": {
    "username": <user>,
    "password": <token>
  }
}

Can you please confirm? (Please note that I'm only experiencing this issue when trying to run a job on a _new_cluster,_ I was able to successfully complete a job on an _existing__cluster.) cc @Evan Sadler @Kevin Su

Hi guys! How to setup maxWorkflowNodes(<https://docs.flyte.org/en/latest/deployment/configuration/g...

Alexey Kharlamov

over 2 years ago

Hi guys! How to setup maxWorkflowNodes(https://docs.flyte.org/en/latest/deployment/configuration/generated/flyteadmin_config.html#maxworkflownodes-int) in flyte-binary deployment?

Hi, is it possible to change the bucket that flyte uses? If so, are there migration instructions? We...

Derek Yu

over 2 years ago

Hi, is it possible to change the bucket that flyte uses? If so, are there migration instructions? We tried, and on the console, see 500 errors reading data from the old bucket for old workflows

Hello, I have multiple docker contexts running on my laptop, Is it possible to run `flytectl demo s...

Albert Wibowo

over 2 years ago

Hello, I have multiple docker contexts running on my laptop, Is it possible to run

flytectl demo start

on specific docker context? Apologise for spamming questions here - still trying to learn the rope😄

Is there a way to rebuild the database that flyte admin uses from the s3 bucket? Today we attempted...

Mike Ossareh

over 2 years ago

Is there a way to rebuild the database that flyte admin uses from the s3 bucket? Today we attempted to migrate to a new s3 bucket, we did a backup, restored it to a new bucket, and then updated our values.yaml to point to the new bucket. The backend started erroring with:

Unable to read WorkflowClosure from location s3://[redacted-old-bucket-name]/metadata/admin/flytesnacks/production/plaster.genv2.generators.sigproc_v2.sigproc_v2_analyze_workflow/23.5.19 : path:s3://[redacted-old-bucket-name]/metadata/admin/flytesnacks/production/plaster.genv2.generators.sigproc_v2.sigproc_v2_analyze_workflow/23.5.19: Conf container:[redacted-new-bucket-name] != Passed Container:[redacted-old-bucket-name]. Dynamic loading is disabled: not found" debug_error_string = "UNKNOWN:Error received from peer {grpc_message:"...

(I've elided the end of the message because it repeats the error message).

Hi all I've managed to get a local demo cluster working on my old laptop with workflows running etc,...

Samuel Bentley

over 2 years ago

Hi all I've managed to get a local demo cluster working on my old laptop with workflows running etc, but I've followed the exact same steps on my new one and I'm getting the following error whenever I run a workflow. Note, I've tried the cookbook hello_world example and the greeting workflow, but both give the same error. I think I'm doing something stupid, but I just can't figure it out.

containers with unready status: [f0de017c01c7140d8a4c-n0-0]|Back-off pulling image "<http://cr.flyte.org/flyteorg/flytekit:py3.9-1.5.0|cr.flyte.org/flyteorg/flytekit:py3.9-1.5.0>"

Any ideas?

Previous 727374 Next

Flyte

Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.