```flyte [1.218ms] [rows:1] SELECT count(*) FROM p...
# ask-the-community
m
Copy code
flyte [1.218ms] [rows:1] SELECT count(*) FROM pg_indexes WHERE tablename = 'artifacts' AND indexname = 'artifacts_dataset_uuid_idx' AND schemaname = CURRENT_SCHEMA()                                                                                                                      flyte 2023/07/18 18:19:44 /go/pkg/mod/gorm.io/gorm@v1.24.1-0.20221019064659-5dd2bb482755/callbacks.go:134 ERROR: duplicate key value violates unique constraint "executions_pkey" (SQLSTATE 23505)
any suggestions on how to fix this? A failed update to flyte-binary 1.8.0 and a rollback to flyte-binary 1.6.2 seems to have left our system in a broken state.
t
@Yee - do you think this is related to values that have not been updated in your Helm charts? In a different thread, you'd referred to version 1.8.0 of the flyte-binary (which I'm still not completely clear on where to find that precise version number), and we have attempted to use pulumi/helm to update our binary from 1.6.2 to 1.8.0. The update failed with the error below. As @Mike Ossareh (who knows much more about this than I do, but is away at the moment) indicated above, a rollback to 1.6.2 leaves our flyte database in a broken state. Any ideas?
Copy code
~  kubernetes:<http://helm.sh/v3:Release|helm.sh/v3:Release> informatics-001-install updating (6s) [diff: ~resourceNames,version]; error: 1 error occurred:
  
   ~  kubernetes:<http://helm.sh/v3:Release|helm.sh/v3:Release> informatics-001-install **updating failed** [diff: ~resourceNames,version]; error: 1 error occurred:
  
      pulumi:pulumi:Stack flyte-informatics-001 running error: update failed
  
      pulumi:pulumi:Stack flyte-informatics-001 **failed** 1 error
  
  Diagnostics:
    kubernetes:<http://helm.sh/v3:Release|helm.sh/v3:Release> (informatics-001-install):
  
      error: 1 error occurred:
      	* Helm release "flyte/flyte" failed to initialize completely. Use Helm CLI to investigate.: failed to become available within allocated timeout. Error: Helm Release flyte/flyte: cannot patch "flyte-flyte-binary" with kind Deployment: Deployment.apps "flyte-flyte-binary" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"<http://app.kubernetes.io/component|app.kubernetes.io/component>":"flyte-binary", "<http://app.kubernetes.io/instance|app.kubernetes.io/instance>":"flyte", "<http://app.kubernetes.io/name|app.kubernetes.io/name>":"flyte-binary"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
  
    pulumi:pulumi:Stack (flyte-informatics-001):
      error: update failed
m
I don’t have the time or bandwidth to look into these issues properly. The helm error Thomas lists above (we use pulumi to manage our stack, hence the format here) was fixed by deleting the flyte-binary deployment. I think the indentation fix from 2 months ago caused some spurious error from the POV of kubernetes.
This fixed the fact our flyte console could not load; that is, it seems the database error is somewhat benign.
y
the db error is a red herring i believe.
executions_pkey
is the primary key of the executions table. the constraint is (execution_project, execution_domain, execution_name)
remind me again, which helm charts have you guys been using? just flyte-binary? or were you using the flyte or flyte-core helm charts at one point?
m
@Yee agree the db issue seems benign. We’ve only ever used flyte-binary. I think the issue is related to a merge by jeev ~2 weeks ago. He changed the indentation on labels which even though the output is the same k8s is freaking out about the indentation.
I’m not 100% convinced, but it’s the only release in the time since we last updated that mucks with the thing that is failing.
y
i saw this pr
the old indentation is wrong though
8 is correct
k8s is freaking out? or pulumi?
m
k8s
it was rejecting the update due to the Deployment resource changing
y
cuz labels are immutable?
if anything should be mutable it’s labels
m
strong agree
just to re-iterate - I did not get forensic on this because it’s not my focus rn. I was in “just get it working”-mode.
y
yeah sure
m
I too am keenly interesting in what was happening…
y
and the match labels didn’t change in that pr
m
but maybe they went from
[]
=>
nil
for some reason? or vis-a-versa ?
also.. as far as I’m concerned - k8s is oblivious to whitespace. So like… I’m even more confused.
y
k8s may be but yaml is definitely not. it’s very whitespace dependent wrt indents
m
yah, for sure, but I’d expect the comparison of whether changes have actually occurred to be based on go structs and not the actual yaml.
tbf, I don’t muck with indentation that much - so this is just expectation and speculation on my part. Not knowledge or experience.
ugh… I got interested now. Here’s the change set of the file between the versions we were using: https://github.com/flyteorg/flyte/compare/v1.6.2...v1.8.0#diff-bc8270245ec2c05798daddb9a1a8c7261db27f70573a43fa1057d9dec7c2f416
j
i added a component label to the flyte-binary deployment (to differentiate between flyte-binary and flyteagent now that we have multiple deployments). that’s likely it based on the log above. it should be easy to recover by dropping the old deployment and rerunning helm - minimal downtime and no loss. but i don’t recall seeing this issue when upgrading our dev env. i’ll have to confirm again. i don’t see why helm couldn’t just handle this.
alas i can reproduce this now.
the easiest way to resolve this is to recreate the deployment (just drop the old deployment and let helm/pulumi take over) @Mike Ossareh @Thomas Blom
t
Thanks @jeev for the followup!
m
thanks @jeev, that's exactly what we ended up doing!
b
having the exact same issue here - upgrade to 1.8.1 failed and rollback to 1.7.0 left the current flyte-binary in shambles. @jeev what do you mean by “_drop the old deployment and let helm take over_”? we tried to delete the deployment, and ran
helm upgrade
again but we’re still seeing issues with the readiness probe failing
j
well if you were able to
helm upgrade
and the deployment came up with the correct selectors, its a different issue i think
anything in the flyte-binary logs?