Hey all, We’re facing an issue with the Flyte gar...
# flyte-support
h
Hey all, We’re facing an issue with the Flyte garbage collector. Previously, it worked fine with the default
gc-interval
(30m) without us configuring anything explicitly. However, recently, we’ve noticed that pods marked as
Succeeded
are lingering for much longer than expected. I tried setting the
gc-interval
explicitly in our ConfigMap, but it doesn’t seem to take effect:
Copy code
data:
  000-core.yaml: |
    admin:
      endpoint: localhost:8089
      insecure: true
    catalog-cache:
      endpoint: localhost:8081
      insecure: true
      type: datacatalog
    cluster_resources:
      standaloneDeployment: false
      templatePath: /etc/flyte/cluster-resource-templates
    logger:
      show-source: true
      level: 1
    propeller:
      create-flyteworkflow-crd: true
      gc-interval: 1m
    webhook:
      certDir: /var/run/flyte/certs
      localCert: true
      secretName: flyte-binary-webhook-secret
      serviceName: flyte-binary-webhook
      servicePort: 443
    flyte:
      admin:
        disableClusterResourceManager: false
        disableScheduler: false
        disabled: false
        seedProjects:
        - flytesnacks
      dataCatalog:
        disabled: false
      propeller:
        disableWebhook: false
        disabled: false
        gc-interval: 1m
When I set
delete-resource-on-finalize: true
, the pods are deleted immediately, but that’s not ideal since we need some time to debug. Has anyone faced a similar issue? Is there something I’m missing in the configuration or another setting I should adjust? How can I debug this to understand why the garbage collector isn’t working as expected? Thanks in advance for your help!
a
If you're using
flyte-core
this should be set on
configmap.core.propeller.gc-interval
Could you get logs from the
flytepropeller
Pod? Maybe there are hints there
h
Hey David, Thanks for your response! We’re actually using flyte-binary, not flyte-core. In the
flyte-binary
pod the only log I see is
metadata.finalizers: "flyte-finalizer": prefer a domain-qualified finalizer name to avoid accidental conflicts with other finalizer writers
Appreciate your help!
a
I think that warning message can be safely ignored (context) For binary then, the setting should go in
configuration.propeller.gc-interval
h
Thanks for your help! We were able to solve it. It was the
max-ttl-hours
setting. We were using the default value (23 hours) and now we’ve decreased it
a
that's good to know. Thanks for sharing!