I'm running into this error in an @dynamic task: ```TooLarge: Event message exceeds maximum gRPC siz...
t

Thomas Blom

over 1 year ago
I'm running into this error in an @dynamic task:
TooLarge: Event message exceeds maximum gRPC size limit, caused by [rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5928244 vs. 4194304)]
I found a message from @Dan Rammer (hamersaw) from a year ago that says (abbreviated):
So this error is happening when propeller sends an event to admin <and exceeds the configured gRPC buffer size>
What I don't understand is the "event message" that is occurring and its contents -- or how to get around it. It is related to the number of tasks I create in the @dynamic, and the error occurs before any of the tasks get launched. My use case looks like this:
@dynamic
def some_dynamic_worklow( input ):

  for i in range(n):
    # manipulate input to get input1, input2, etc.
    res1 = task1( input1 )
    res2 = task2( input2, res1 )
    task3( input3, res1, res2 )         # writes all results to filesystem

  summary = X # some local computation, resulting in a smallish object

  return summary
For small
n
, this works fine; as
n
gets bigger, I get the error. This error occurs before I see any tasks launched - so presumably is related to sending info about the task inputs -- in one event-message? -- that need to be launched? Some of my inputs do in fact contain long protein sequences, so may be e.g. 100K in size - but I don't understand why these ALL are presumably getting sent in some single event/message, and causing the size issue. I'm not passing any big collections of them around -- just one at a time between tasks. And looking at the pod-log for my flyte-binary via k9s, I don't even see this logged, so all I have to go on is the message at top that is shown in Flyte Console. Help? Thanks!
Hi, I’m trying to understand how Flyte works under the hood to evaluate whether it can deliver the n...
r

Rene Penkert

over 1 year ago
Hi, I’m trying to understand how Flyte works under the hood to evaluate whether it can deliver the necessary performance for us. I have a deployment on EKS following the Single Cluster Simple Cloud Deployment guide and have executed some more simple workflows. Looking at FlytePropeller Architecture &

YT FlytePropeller Deep Dive

https://www.youtube.com/watch?v=FJ-rG9lZDhY
& Optimizing Performance I still can’t map it to what is running in my cluster. 1. What are the actual components running in my EKS cluster that represent FlyteAdmin & FlytePropeller & WorkQueue? There is one Pod
flyte-backend-flyte-binary-xxx
so that includes everything? and I can only scale everything together? 2.
"FlytePropeller can scale to 1000s of workers on a single CPU"
Worker is used a lot in regards to FlytePropeller but what is actually meant by that? An instantiation of one FlytePropeller aka Pod? A process as part of that
flyte-backend-flyte-binary
Pod? A node as part of the cluster? What is a worker and how can I observe what it is doing? 3. How is scaling of the cluster supposed to work? Assume I want to increase the number of concurrent tasks. How would I make sure that the cluster can handle it? Scaling out FlyteAdmin & Scaling out Datacatalog & Scaling out FlytePropeller does not describe what I actually need to change to make it work except deploying the “FlytePropeller Manager”. I don’t have a deployment in my cluster that is called “FlyteAdmin” or “Datacatalog” so I’m not sure what is meant by “Datacatalog is a stateless service and its replicas (in the kubernetes deployment) can be simply increased to allow higher throughput” Excuses in advance if these questions are trivial … . It would also help me if you can point me to some design documents or similar that I can read to answer my questions 🙂