Alex Pozimenko

05/09/2022, 9:02 PM
hi team, we're having issues with metric collection on the recent flyteadmin. Telegraf starts throwing lots of these after just a day or two of uptime and some metrics disappear from the dashboard:
Error in plugin [inputs.prometheus]: error reading body: net/http: request canceled (Client.Timeout exceeded while reading body)
original settings were:
Copy code
interval = "10s"
response_timeout = "3s"
I changed these to 30s and 15s and it seems to be ok for now. Do you have a recommendation on that these should be set to?

Ketan (kumare3)

05/10/2022, 5:45 PM
hmm this is interesting
the timeout is from flytadmin?
that is interesting - seems admin is under load?

Alex Pozimenko

05/10/2022, 7:30 PM
@Ketan (kumare3) yes, flyteadmin. No significant load, CPU < 20%