https://flyte.org logo
#ask-the-community
Title
# ask-the-community
m

Mike Ossareh

07/18/2023, 6:22 PM
Copy code
flyte [1.218ms] [rows:1] SELECT count(*) FROM pg_indexes WHERE tablename = 'artifacts' AND indexname = 'artifacts_dataset_uuid_idx' AND schemaname = CURRENT_SCHEMA()                                                                                                                      flyte 2023/07/18 18:19:44 /go/pkg/mod/gorm.io/gorm@v1.24.1-0.20221019064659-5dd2bb482755/callbacks.go:134 ERROR: duplicate key value violates unique constraint "executions_pkey" (SQLSTATE 23505)
any suggestions on how to fix this? A failed update to flyte-binary 1.8.0 and a rollback to flyte-binary 1.6.2 seems to have left our system in a broken state.
t

Thomas Blom

07/18/2023, 7:08 PM
@Yee - do you think this is related to values that have not been updated in your Helm charts? In a different thread, you'd referred to version 1.8.0 of the flyte-binary (which I'm still not completely clear on where to find that precise version number), and we have attempted to use pulumi/helm to update our binary from 1.6.2 to 1.8.0. The update failed with the error below. As @Mike Ossareh (who knows much more about this than I do, but is away at the moment) indicated above, a rollback to 1.6.2 leaves our flyte database in a broken state. Any ideas?
Copy code
~  kubernetes:<http://helm.sh/v3:Release|helm.sh/v3:Release> informatics-001-install updating (6s) [diff: ~resourceNames,version]; error: 1 error occurred:
  
   ~  kubernetes:<http://helm.sh/v3:Release|helm.sh/v3:Release> informatics-001-install **updating failed** [diff: ~resourceNames,version]; error: 1 error occurred:
  
      pulumi:pulumi:Stack flyte-informatics-001 running error: update failed
  
      pulumi:pulumi:Stack flyte-informatics-001 **failed** 1 error
  
  Diagnostics:
    kubernetes:<http://helm.sh/v3:Release|helm.sh/v3:Release> (informatics-001-install):
  
      error: 1 error occurred:
      	* Helm release "flyte/flyte" failed to initialize completely. Use Helm CLI to investigate.: failed to become available within allocated timeout. Error: Helm Release flyte/flyte: cannot patch "flyte-flyte-binary" with kind Deployment: Deployment.apps "flyte-flyte-binary" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"<http://app.kubernetes.io/component|app.kubernetes.io/component>":"flyte-binary", "<http://app.kubernetes.io/instance|app.kubernetes.io/instance>":"flyte", "<http://app.kubernetes.io/name|app.kubernetes.io/name>":"flyte-binary"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
  
    pulumi:pulumi:Stack (flyte-informatics-001):
      error: update failed
m

Mike Ossareh

07/18/2023, 8:36 PM
I don’t have the time or bandwidth to look into these issues properly. The helm error Thomas lists above (we use pulumi to manage our stack, hence the format here) was fixed by deleting the flyte-binary deployment. I think the indentation fix from 2 months ago caused some spurious error from the POV of kubernetes.
This fixed the fact our flyte console could not load; that is, it seems the database error is somewhat benign.
y

Yee

07/18/2023, 9:39 PM
the db error is a red herring i believe.
executions_pkey
is the primary key of the executions table. the constraint is (execution_project, execution_domain, execution_name)
remind me again, which helm charts have you guys been using? just flyte-binary? or were you using the flyte or flyte-core helm charts at one point?
m

Mike Ossareh

07/18/2023, 9:41 PM
@Yee agree the db issue seems benign. We’ve only ever used flyte-binary. I think the issue is related to a merge by jeev ~2 weeks ago. He changed the indentation on labels which even though the output is the same k8s is freaking out about the indentation.
I’m not 100% convinced, but it’s the only release in the time since we last updated that mucks with the thing that is failing.
y

Yee

07/18/2023, 9:43 PM
i saw this pr
the old indentation is wrong though
8 is correct
k8s is freaking out? or pulumi?
m

Mike Ossareh

07/18/2023, 9:43 PM
k8s
it was rejecting the update due to the Deployment resource changing
y

Yee

07/18/2023, 9:44 PM
cuz labels are immutable?
if anything should be mutable it’s labels
m

Mike Ossareh

07/18/2023, 9:46 PM
strong agree
just to re-iterate - I did not get forensic on this because it’s not my focus rn. I was in “just get it working”-mode.
y

Yee

07/18/2023, 9:46 PM
yeah sure
m

Mike Ossareh

07/18/2023, 9:46 PM
I too am keenly interesting in what was happening…
y

Yee

07/18/2023, 9:46 PM
and the match labels didn’t change in that pr
m

Mike Ossareh

07/18/2023, 9:48 PM
but maybe they went from
[]
=>
nil
for some reason? or vis-a-versa ?
also.. as far as I’m concerned - k8s is oblivious to whitespace. So like… I’m even more confused.
y

Yee

07/18/2023, 9:52 PM
k8s may be but yaml is definitely not. it’s very whitespace dependent wrt indents
m

Mike Ossareh

07/18/2023, 9:52 PM
yah, for sure, but I’d expect the comparison of whether changes have actually occurred to be based on go structs and not the actual yaml.
tbf, I don’t muck with indentation that much - so this is just expectation and speculation on my part. Not knowledge or experience.
ugh… I got interested now. Here’s the change set of the file between the versions we were using: https://github.com/flyteorg/flyte/compare/v1.6.2...v1.8.0#diff-bc8270245ec2c05798daddb9a1a8c7261db27f70573a43fa1057d9dec7c2f416
j

jeev

07/19/2023, 1:14 AM
i added a component label to the flyte-binary deployment (to differentiate between flyte-binary and flyteagent now that we have multiple deployments). that’s likely it based on the log above. it should be easy to recover by dropping the old deployment and rerunning helm - minimal downtime and no loss. but i don’t recall seeing this issue when upgrading our dev env. i’ll have to confirm again. i don’t see why helm couldn’t just handle this.
alas i can reproduce this now.
the easiest way to resolve this is to recreate the deployment (just drop the old deployment and let helm/pulumi take over) @Mike Ossareh @Thomas Blom
t

Thomas Blom

07/19/2023, 1:41 PM
Thanks @jeev for the followup!
m

Mike Ossareh

07/19/2023, 4:09 PM
thanks @jeev, that's exactly what we ended up doing!
b

Brian Tang

10/03/2023, 9:40 AM
having the exact same issue here - upgrade to 1.8.1 failed and rollback to 1.7.0 left the current flyte-binary in shambles. @jeev what do you mean by “_drop the old deployment and let helm take over_”? we tried to delete the deployment, and ran
helm upgrade
again but we’re still seeing issues with the readiness probe failing
j

jeev

10/03/2023, 3:50 PM
well if you were able to
helm upgrade
and the deployment came up with the correct selectors, its a different issue i think
anything in the flyte-binary logs?
5 Views