https://flyte.org logo
#ask-the-community
Title
# ask-the-community
n

Nicholas LoFaso

03/09/2023, 7:12 PM
Hi, we’re running a bunch of large jobs each with thousands of tasks, and noticed our Postgres database was at 100% CPU for many hours. DB only has 4 vCPU and 15GB of memory at the moment. Saw many of these in the datacatalog log as well
Copy code
"textPayload": "2023/03/07 15:38:00 \u001b[32m/go/src/github.com/flyteorg/datacatalog/pkg/repositories/gormimpl/artifact.go:64 \u001b[33mSLOW SQL >= 200ms",
Do you have a recommended size for postgres for large production workloads? I’m also going through this doc to make sure we’re following best practices.
Also seeing many of
Copy code
2023-03-07 16:14:58.742 UTC [1644473]: [4-1] db=datacatalog,user=flyteadmin ERROR:  duplicate key value violates unique constraint "tags_pkey"
And
Copy code
2023-03-07 16:14:58.874 UTC [1644473]: [7-1] db=datacatalog,user=flyteadmin ERROR:  duplicate key value violates unique constraint "datasets_pkey"
In our postgres logs. Any idea why those are showing up?
k

Ketan (kumare3)

03/10/2023, 12:59 AM
@Nicholas LoFaso is this in datacatalog. we have not seen these issues
ya for larger workloads DB can run how - but 100% cpu for datacatalog seems odd
can you share more
happy to hop on a call to help
n

Nicholas LoFaso

03/10/2023, 2:08 AM
Yes that was a datacatalog error. The 100% CPU was for our managed postgres instance not the data manager itself
I’m on the east coast available for a call any time tomorrow
k

Ketan (kumare3)

03/10/2023, 5:57 AM
are you using same postgres for both datacatalog and admin?
n

Nicholas LoFaso

03/10/2023, 3:54 PM
yes same for both
We significantly increased the size of postgres and that resolved the current bottleneck. I will be digging into our Flyte performance over the next couple of weeks so will likely have additional questions but for now I plan to setup prometheus to gather the data plane metrics
k

Ketan (kumare3)

03/10/2023, 9:26 PM
yup
but it still seems odd that postgres was hammered
we would love to understand your workflow pattern
n

Nicholas LoFaso

03/10/2023, 9:45 PM
We would love to share and improve our pattern / configuration. Maybe we can setup a call late next week after I’ve had a chance to gather some metrics?
75 Views