https://flyte.org logo
#ask-the-community
Title
# ask-the-community
t

Thomas Blom

03/15/2024, 4:19 AM
I'm running into this error in an @dynamic task:
Copy code
TooLarge: Event message exceeds maximum gRPC size limit, caused by [rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5928244 vs. 4194304)]
I found a message from @Dan Rammer (hamersaw) from a year ago that says (abbreviated):
Copy code
So this error is happening when propeller sends an event to admin <and exceeds the configured gRPC buffer size>
What I don't understand is the "event message" that is occurring and its contents -- or how to get around it. It is related to the number of tasks I create in the @dynamic, and the error occurs before any of the tasks get launched. My use case looks like this:
Copy code
@dynamic
def some_dynamic_worklow( input ):

  for i in range(n):
    # manipulate input to get input1, input2, etc.
    res1 = task1( input1 )
    res2 = task2( input2, res1 )
    task3( input3, res1, res2 )         # writes all results to filesystem

  summary = X # some local computation, resulting in a smallish object

  return summary
For small
n
, this works fine; as
n
gets bigger, I get the error. This error occurs before I see any tasks launched - so presumably is related to sending info about the task inputs -- in one event-message? -- that need to be launched? Some of my inputs do in fact contain long protein sequences, so may be e.g. 100K in size - but I don't understand why these ALL are presumably getting sent in some single event/message, and causing the size issue. I'm not passing any big collections of them around -- just one at a time between tasks. And looking at the pod-log for my flyte-binary via k9s, I don't even see this logged, so all I have to go on is the message at top that is shown in Flyte Console. Help? Thanks!
k

Ketan (kumare3)

03/15/2024, 4:46 AM
because you are passing them to functions and they are not offloaded. you can offload and pass blob links that should work
t

Thomas Blom

03/20/2024, 2:17 AM
I've read all the docs I can find and believe the crux is this, from this doc page:
For every task that receives input, Flyte sends an Inputs Metadata object, which contains all the primitive or simple scalar values inlined, but in the case of complex, large objects, they are offloaded and the Metadata simply stores a reference to the object.
I am passing around
dataclass
-based objects that are definitely "large and complex" -- they contain 100K+ of protein sequences, for example. I've tried passing one to a task and returning it, which should "offload" it if that were supported for dataclass, but I don't think it does. The same page above says also:
If
FlyteFile
or any natively supported type like
pandas.DataFrame
is used, Flyte will automatically offload and download data from the configured object-store paths
I know DataFrame, FlyteDirectory, and FlyteFile are examples of these types. Are these the only ones? Lots of data types recognized by Flyte are defined elsewhere in the docs, but I haven't found any good reference for what automatically gets offloaded to blob storage and what does not. Just trying to find my way around and learn best practices do deal with this grpc overflow issue; it would be cool if I could annotate my dataclass-based classes to cause them to be offloaded automatically - they're not huge, but the problem is trying to spin up a number of parallel tasks from a @dynamic and how this results in (presumably) a single grpc message with all inputs stuffed into it.
2 Views