After upgrading from 1.13.x to 1.14.1, we've notic...
# flyte-connectors
a
After upgrading from 1.13.x to 1.14.1, we've noticed that our Async Agents seem to call
get
sometimes in rapid succession, causing issues w/ database lookups. We expected the get to be called at some interval, but that doesn't seem to be the case always. Is this interval configurable and if so, how?
g
yes, you can set rate limiter. https://flyte-org.slack.com/archives/C06SYN9QJ5N/p1741117528227339?thread_ts=1741109569.518379&cid=C06SYN9QJ5N
Copy code
# Only run the agent get method once a minute
 webApi:
   readRateLimiter:
     burst: 0
     qps: 1
a
Is this new?
g
readRateLimiter is not new
a
We're just trying to understand why issues have started since upgrading from 1.13.x to 1.14.1
For us, it's getting called back to back from the same workflow, so we'll see 2 calls in less than a second
so we’ll see 2 calls in less than a second
did you see 2 calls in every seconds? or just see 2 calls at the first second?
a
just the 1st second in this case, and seems intermitent. in our case, the 1st is about to return success (or has just returned success), and the 2nd kicks off and returns failed
g
why it fails at second time? shouldn’t it be idempotent
configure the qps can help address the issue
a
its an impl detail, but the agent in this case is to implement a locking mechanism so prevent workflows running at the same time, and we implemented it to allow queueing. in our success case, the item is de-queued and the following call the check on the item was checking that queue'd items position and it was gone
there are probably ways to reduce what are race conditions there, but i'd argue it is reasonable to expect the same "instance" of an agent to not run after a termination response (failed, succeeded, etc), and to not run for that same instance multiple times simultaneously (at least, not without respect to whatever the poll interval is - it would be our responsibility if it runs every 30sec and we ran for more than 30 sec))
its not an issue with different instances of the agents conflicting, so QPS isn't really our problem. it could help by trying to get that poll interval to something closer to intended, but it comes with other drawbacks