https://flyte.org logo
#ask-the-community
Title
# ask-the-community
q

Quinn Miller

01/10/2024, 4:03 PM
has anyone gotten this issue when trying to install the default dev chart on EKS:
Copy code
Error from server (BadRequest): container "flyte" in pod "flyte-backend-flyte-binary-587f6c94d4-pm5zx" is waiting to start: PodInitializing
c

Chris Grass

01/10/2024, 4:07 PM
is postgres up? do you see any errors in logs regarding flyte's ability to communicate with the db?
q

Quinn Miller

01/10/2024, 4:10 PM
I'm not sure, that's what I get when I do a
kubectl logs
and when I try to
kubectl exec
I get :
Copy code
Defaulted container "flyte" out of: flyte, wait-for-db (init)
error: unable to upgrade connection: container not found ("flyte")
c

Chris Grass

01/10/2024, 4:13 PM
try
kubectl describe pod/flyte-backend-flyte-binary-587f6c94d4-pm5zx
from my experience, most of the
PodInitializing
failures are related to flyte being unable to communicate with the db. that can be network/access issues, db down issues, password/secret problems, etc
usually the
kubectl describe pod
output gives a clear indication of what is failing during init
q

Quinn Miller

01/10/2024, 4:21 PM
I didn't have some of the prereq stuff set up properly. I'll fix it up and let you know if it works.
t

Trọng Đạt Bùi

01/10/2024, 5:02 PM
@Quinn Miller you are deploying a Flyte binary cluster k8s?
q

Quinn Miller

01/10/2024, 5:27 PM
I'm using the simple cloud deployment via helm: https://docs.flyte.org/en/latest/deployment/deployment/cloud_simple.html
definitely an issue with the
wait-for-db
init container
Copy code
wait-for-db:
    Container ID:  <containerd://6fea6c36f27b44ff163eb44ecb05bb9b6f2b803bdbcb4b6c758237505b53a6c>7
    Image:         postgres:15-alpine
    Image ID:      <http://docker.io/library/postgres@sha256:ddaa3615f15a3d0ba9ef5f9af398d093fcce511fea1187049444fcd09626c21a|docker.io/library/postgres@sha256:ddaa3615f15a3d0ba9ef5f9af398d093fcce511fea1187049444fcd09626c21a>
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -ec
    Args:
      until pg_isready \
        -h <redacted> \
        -p 5432 \
        -U postgres
      do
        echo waiting for database
        sleep 0.1
      done
      
    State:          Running
      Started:      Wed, 10 Jan 2024 10:19:04 -0700
    Ready:          False
c

Chris Grass

01/10/2024, 5:30 PM
can you interact with postgres given the args used above?
and is it within the cluster or a cloud instance (RDS?)
q

Quinn Miller

01/10/2024, 5:31 PM
RDS, double checking network connectivity
now getting:
Copy code
[error] failed to initialize database, got error failed to connect to `host=<redacted> database=<redacted>: server error (FATAL: no pg_hba.conf entry for host "<redacted>", user "postgres", database "<redacted>", no encryption (SQLSTATE 28000))
on the flyte container
the init container is now reaching the ready state
c

Chris Grass

01/10/2024, 6:33 PM
that looks familiar - i think i had to set
options: sslmode=verify-full
q

Quinn Miller

01/10/2024, 6:41 PM
would I specify that in
eks-starter.yaml
?
c

Chris Grass

01/10/2024, 6:42 PM
yeah, in the db configuration. e.g.:
Copy code
database:
    username: postgres
    password: <DB_PASSWORD>
    host: <RDS_HOST_DNS>
    dbname: flyteadmin
    options: sslmode=verify-full
q

Quinn Miller

01/10/2024, 6:46 PM
Okay I tried that and now hitting:
Copy code
failed to initialize database, got error failed to connect to `host=<redacted> user=postgres database=<redacted>`: failed to write startup message (x509: certificate signed by unknown authority)
panic: interface conversion: error is x509.UnknownAuthorityError, not *pgconn.PgError
folks on stackoverflow say:
The way I solved this was: Added the line as below in `pg_hba.conf`:
Copy code
hostnossl    all          all            0.0.0.0/0  trust
and this was modified in
postgresql.conf
, as shown:
Copy code
listen_addresses = '*'
this is for development so we aren't concerned with SSL quite yet
c

Chris Grass

01/10/2024, 6:49 PM
if you aren't concerned with ssl i think you can update the chart to be
options: sslmode=disable
q

Quinn Miller

01/10/2024, 6:51 PM
I think that is the default option because I get the same originial error message
I can dig into the image and templates to try to see how pg_hba.conf and postgresql.conf are getting set
c

Chris Grass

01/10/2024, 6:53 PM
i haven't had to update those files directly - it might be an RDS configuration thing (requiring SSL?)
i think i followed this post to get ssl working for me
q

Quinn Miller

01/10/2024, 7:33 PM
Fixed the issue: If ssl verification is turned off you can't use the default postgres rds parameter group, you need to use a custom one with
rds.force_ssl
set to
0
instead of the default
1
Thank you to everyone who jumped in to lend a hand!