After upgrading from 1 13 x to 1 14 1 we ve noticed that our Flyte #flyte-connectors

After upgrading from 1.13.x to 1.14.1, we've notic...

alert-oil-1341

04/01/2025, 2:42 PM

After upgrading from 1.13.x to 1.14.1, we've noticed that our Async Agents seem to call

get

sometimes in rapid succession, causing issues w/ database lookups. We expected the get to be called at some interval, but that doesn't seem to be the case always. Is this interval configurable and if so, how?

glamorous-carpet-83516

04/01/2025, 2:44 PM

yes, you can set rate limiter. https://flyte-org.slack.com/archives/C06SYN9QJ5N/p1741117528227339?thread_ts=1741109569.518379&cid=C06SYN9QJ5N

Copy code

# Only run the agent get method once a minute
 webApi:
   readRateLimiter:
     burst: 0
     qps: 1

alert-oil-1341

04/01/2025, 2:45 PM

Is this new?

glamorous-carpet-83516

04/01/2025, 2:45 PM

readRateLimiter is not new

alert-oil-1341

04/01/2025, 2:46 PM

We're just trying to understand why issues have started since upgrading from 1.13.x to 1.14.1

alert-oil-1341

04/01/2025, 2:53 PM

For us, it's getting called back to back from the same workflow, so we'll see 2 calls in less than a second

glamorous-carpet-83516

04/01/2025, 2:54 PM

glamorous-carpet-83516

04/01/2025, 2:55 PM

so we’ll see 2 calls in less than a second

did you see 2 calls in every seconds? or just see 2 calls at the first second?

abundant-laptop-64153

04/01/2025, 3:03 PM

just the 1st second in this case, and seems intermitent. in our case, the 1st is about to return success (or has just returned success), and the 2nd kicks off and returns failed

glamorous-carpet-83516

04/01/2025, 5:54 PM

why it fails at second time? shouldn’t it be idempotent

glamorous-carpet-83516

04/01/2025, 5:55 PM

configure the qps can help address the issue

abundant-laptop-64153

04/01/2025, 6:33 PM

its an impl detail, but the agent in this case is to implement a locking mechanism so prevent workflows running at the same time, and we implemented it to allow queueing. in our success case, the item is de-queued and the following call the check on the item was checking that queue'd items position and it was gone

abundant-laptop-64153

04/01/2025, 6:36 PM

there are probably ways to reduce what are race conditions there, but i'd argue it is reasonable to expect the same "instance" of an agent to not run after a termination response (failed, succeeded, etc), and to not run for that same instance multiple times simultaneously (at least, not without respect to whatever the poll interval is - it would be our responsibility if it runs every 30sec and we ran for more than 30 sec))

abundant-laptop-64153

04/01/2025, 6:38 PM

its not an issue with different instances of the agents conflicting, so QPS isn't really our problem. it could help by trying to get that poll interval to something closer to intended, but it comes with other drawbacks

3 Views

Open in Slack

Previous Next