hi team, we're having issues with metric collection on the recent flyteadmin. Telegraf starts throwing lots of these after just a day or two of uptime and some metrics disappear from the dashboard:
Error in plugin [inputs.prometheus]: error reading body: net/http: request canceled (Client.Timeout exceeded while reading body)
original settings were:
Copy code
interval = "10s"
response_timeout = "3s"
I changed these to 30s and 15s and it seems to be ok for now.
Do you have a recommendation on that these should be set to?
f
freezing-airport-6809
05/10/2022, 5:45 PM
hmm this is interesting
freezing-airport-6809
05/10/2022, 5:45 PM
the timeout is from flytadmin?
freezing-airport-6809
05/10/2022, 5:46 PM
that is interesting - seems admin is under load?
o
orange-hairdresser-63684
05/10/2022, 7:30 PM
@freezing-airport-6809 yes, flyteadmin. No significant load, CPU < 20%