This message was deleted.
# ask-the-community
s
This message was deleted.
y
can you grab the logs from the failed init container?
also can you kubectl get service -n flyte
if there’s no obvious solution there, can always run a raw ubuntu container, install psql and investigate manually
also logs from the postgres container itself
you need to get the logs for the init container
-c wait-for-db
logs from postgres container itself?
also install psql locally on your host machine.
and then kubectl port-forward postgres, see if that works. this should help isolate if the issue is with the service or with the container
i
about installing psql on host machine: I got inside the container and from there I received response from 5432 port. Is it not sufficient to determine whether the pod is working?
y
well postgres shouldn’t be crashlooping
so maybe fix that first… though not sure how
i haven’t seen that before
i
Agreed. I observed that postgres did not write anything to the pv path. On the other hand, when I got into my minikube cluster, there a bunch of stuff inside the PV path. I tried changing the path permission, did not work
y
try getting the postgres logs with -p also to see if the other containers had a different error message
i
sorry, I did not understand your last message.
y
so the container works if you port forward, but not from within the cluster?
i
this is really weird.
do you have any suggestion why it might happen?
y
can you paste the output for get service again?
and also
get pod -o yaml
for the postgres pod, and
get service -o yaml
for the postgres service
have not seen this before… feel like there’s just a simple incorrect configuration somewhere
i
get pod postgresql -o yaml ------------------------ apiVersion: v1 kind: Pod metadata: annotations: cni.projectcalico.org/containerID: c1bbc73a376aecc7323494f7e8a9890c06a3ed8a89b72c5c8832f7412db085d4 cni.projectcalico.org/podIP: 192.168.195.68/32 cni.projectcalico.org/podIPs: 192.168.195.68/32 creationTimestamp: "2023-10-11T123944Z" generateName: flyte-sandbox-postgresql- labels: app.kubernetes.io/component: primary app.kubernetes.io/instance: flyte-sandbox app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: postgresql apps.kubernetes.io/pod-index: "0" controller-revision-hash: flyte-sandbox-postgresql-555fcbd8 helm.sh/chart: postgresql-12.1.9 statefulset.kubernetes.io/pod-name: flyte-sandbox-postgresql-0 name: flyte-sandbox-postgresql-0 namespace: flyte ownerReferences: - apiVersion: apps/v1 blockOwnerDeletion: true controller: true kind: StatefulSet name: flyte-sandbox-postgresql uid: c3ec310a-64e0-4bff-a645-da53121016d8 resourceVersion: "2017" uid: 62e9297d-4594-4952-9755-e1d6b9543eaf spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchLabels: app.kubernetes.io/component: primary app.kubernetes.io/instance: flyte-sandbox app.kubernetes.io/name: postgresql topologyKey: kubernetes.io/hostname weight: 1 containers: - env: - name: BITNAMI_DEBUG value: "false" - name: POSTGRESQL_PORT_NUMBER value: "5432" - name: POSTGRESQL_VOLUME_DIR value: /bitnami/postgresql - name: PGDATA value: /bitnami/postgresql/data - name: POSTGRES_PASSWORD valueFrom: secretKeyRef: key: postgres-password name: flyte-sandbox-postgresql - name: POSTGRESQL_ENABLE_LDAP value: "no" - name: POSTGRESQL_ENABLE_TLS value: "no" - name: POSTGRESQL_LOG_HOSTNAME value: "false" - name: POSTGRESQL_LOG_CONNECTIONS value: "false" - name: POSTGRESQL_LOG_DISCONNECTIONS value: "false" - name: POSTGRESQL_PGAUDIT_LOG_CATALOG value: "off" - name: POSTGRESQL_CLIENT_MIN_MESSAGES value: error - name: POSTGRESQL_SHARED_PRELOAD_LIBRARIES value: pgaudit image: docker.io/bitnami/postgresql:latest imagePullPolicy: IfNotPresent livenessProbe: exec: command: - /bin/sh - -c - exec pg_isready -U "postgres" -h 127.0.0.1 -p 5432 failureThreshold: 6 initialDelaySeconds: 30 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 name: postgresql ports: - containerPort: 5432 name: tcp-postgresql protocol: TCP readinessProbe: exec: command: - /bin/sh - -c - -e - | exec pg_isready -U "postgres" -h 127.0.0.1 -p 5432 [ -f /opt/bitnami/postgresql/tmp/.initialized ] || [ -f /bitnami/postgresql/.initialized ] failureThreshold: 6 initialDelaySeconds: 5 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 resources: requests: cpu: 250m memory: 256Mi securityContext: runAsUser: 1001 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /bitnami/postgresql name: data - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-gdp78 readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: true hostname: flyte-sandbox-postgresql-0 initContainers: - command: - /bin/sh - -ec - | chown 1001:1001 /bitnami/postgresql mkdir -p /bitnami/postgresql/data chmod 700 /bitnami/postgresql/data find /bitnami/postgresql -mindepth 1 -maxdepth 1 -not -name "conf" -not -name ".snapshot" -not -name "lost+found" | \ xargs -r chown -R 1001:1001 image: docker.io/bitnami/bitnami-shell:latest imagePullPolicy: IfNotPresent name: init-chmod-data resources: {} securityContext: runAsUser: 0 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /bitnami/postgresql name: data - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-gdp78 readOnly: true nodeName: kmdlab11 preemptionPolicy: PreemptLowerPriority priority: 0 restartPolicy: Always schedulerName: default-scheduler securityContext: fsGroup: 1001 serviceAccount: default serviceAccountName: default subdomain: flyte-sandbox-postgresql-hl terminationGracePeriodSeconds: 30 tolerations: - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 300 - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 300 volumes: - name: data persistentVolumeClaim: claimName: flyte-sandbox-db-storage - name: kube-api-access-gdp78 projected: defaultMode: 420 sources: - serviceAccountToken: expirationSeconds: 3607 path: token - configMap: items: - key: ca.crt path: ca.crt name: kube-root-ca.crt - downwardAPI: items: - fieldRef: apiVersion: v1 fieldPath: metadata.namespace path: namespace status: conditions: - lastProbeTime: null lastTransitionTime: "2023-10-11T124009Z" status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2023-10-11T124019Z" status: "True" type: Ready - lastProbeTime: null lastTransitionTime: "2023-10-11T124019Z" status: "True" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2023-10-11T124009Z" status: "True" type: PodScheduled containerStatuses: - containerID: containerd://b605d207a7b3c0ad76194982f093b8abd52af85b5905a17147aa659fb007bdbb image: docker.io/bitnami/postgresql:latest imageID: docker.io/bitnami/postgresql@sha256:c971ee380470048be39ef84436f46b90766937a6e082810309001ce43989a5ff lastState: {} name: postgresql ready: true restartCount: 0 started: true state: running: startedAt: "2023-10-11T124010Z" hostIP: 141.44.32.81 initContainerStatuses: - containerID: containerd://88e4de7e05d1787b499ba3939c7d37e9525712941bb34365ae5e6bc963fd697d image: docker.io/bitnami/bitnami-shell:latest imageID: docker.io/bitnami/bitnami-shell@sha256:201f48da3d894ce5aaf22e042683319a0e7095fccc8ef069ef792eb06426b6b9 lastState: {} name: init-chmod-data ready: true restartCount: 0 started: false state: terminated: containerID: containerd://88e4de7e05d1787b499ba3939c7d37e9525712941bb34365ae5e6bc963fd697d exitCode: 0 finishedAt: "2023-10-11T124009Z" reason: Completed startedAt: "2023-10-11T124009Z" phase: Running podIP: 192.168.195.68 podIPs: - ip: 192.168.195.68 qosClass: Burstable startTime: "2023-10-11T124009Z"
get svc postgresql -o yaml ------------ apiVersion: v1 kind: Service metadata: annotations: meta.helm.sh/release-name: flyte-sandbox meta.helm.sh/release-namespace: flyte creationTimestamp: "2023-10-11T123944Z" labels: app.kubernetes.io/component: primary app.kubernetes.io/instance: flyte-sandbox app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: postgresql helm.sh/chart: postgresql-12.1.9 name: flyte-sandbox-postgresql namespace: flyte resourceVersion: "1614" uid: 5e9079f3-7450-474e-b2a9-b17d3a1e1a17 spec: clusterIP: 10.106.227.185 clusterIPs: - 10.106.227.185 externalTrafficPolicy: Cluster internalTrafficPolicy: Cluster ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - name: tcp-postgresql nodePort: 30001 port: 5432 protocol: TCP targetPort: tcp-postgresql selector: app.kubernetes.io/component: primary app.kubernetes.io/instance: flyte-sandbox app.kubernetes.io/name: postgresql sessionAffinity: None type: NodePort status: loadBalancer: {}
@Yee I am using Calico CNI. was curious whether calico is behind this anomaly, so tried installing flannel. With flannel, the contianers remain stuck in containercreating loop.
I just tried creating a simple ClusterIP-based service with a different pod-label for the postgres pod, but still the same thing is happening. apiVersion: v1 kind: Service metadata: name: flyte-postgresql namespace: flyte spec: type: ClusterIP ports: - name: postgresql port: 5432 protocol: TCP selector: bug: fix
y
and copy/paste
get service
again? (not the -o yaml, just the regular get)
i
I can reach this service from local psql by using port-forward, but not from inside the cluster
get svc -o yaml apiVersion: v1 kind: Service metadata: creationTimestamp: "2023-10-11T132502Z" name: flyte-postgresql namespace: flyte resourceVersion: "4959" uid: 5004aa34-d7ae-4819-9bee-0f3d1539bc69 spec: clusterIP: 10.110.117.109 clusterIPs: - 10.110.117.109 internalTrafficPolicy: Cluster ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - name: postgresql port: 5432 protocol: TCP targetPort: 5432 selector: bug: fix sessionAffinity: None type: ClusterIP status: loadBalancer: {}
y
this is the other service you set up?
i
yes
y
is that the right selector?
like it came through?
i
i added an additional label to postgresql pod. just to see, whether there is an issue with the service definition.
apparently, it does not work with a different service definition as well.
y
but like when you
k get service flyte-postgresql
it showed that it found the pod?
i
yes, as I said, I added an additional label to the pod. So the pod is enlisted in the endpoints of this newly created service.
and when I forwarded the port, it worked, just like before. but it did not work from inside the cluster.
y
i don’t think i can help more then. kinda at my limit on k8s networking knowledge.
could you maybe try with a different container? something other than postgres
i don’t think this is a flyte thing… feels like k8s cni thing
did you figure it out?
were you able to get any service running? if we can get a non-postgres container’s service working it’d be easier to track down.
i
So I have identified that it is indeed a network issue. Probably some misconfiguration of CNI. as it is bare-metal, it's extremely hard to get things right. let's see.