abundant-laptop-47033
04/13/2022, 7:41 PMfreezing-airport-6809
inject-finalizer or set maxPArallelismfreezing-airport-6809
freezing-airport-6809
abundant-laptop-47033
04/13/2022, 8:12 PMinject-finalizer more!
I set maxParallelism to 3 (from 25) and it did not help. am I right in my understanding that setting it lower would help, as it would limit the number of nodes per workflow the propeller launches?abundant-laptop-47033
04/19/2022, 12:55 AMinject-finalizer suggestion, I tried it out and my original issue seems solved, but now I’m having a backlog of pods that are stuck in Terminating (for 4+ hours so far) and still have the finalizer on them. have you seen this before?freezing-airport-6809
freezing-airport-6809
abundant-laptop-47033
04/19/2022, 5:06 PMhallowed-mouse-14616
04/19/2022, 6:18 PMinject-finalizer configuration and that fixed it, only now the Flyte pods are taking a very long time to terminate? And the pods still have the "flyte" finalizer set on them?abundant-laptop-47033
04/19/2022, 6:19 PMhallowed-mouse-14616
04/19/2022, 6:20 PMhallowed-mouse-14616
04/19/2022, 6:23 PMabundant-laptop-47033
04/19/2022, 6:23 PMhallowed-mouse-14616
04/19/2022, 6:27 PMhallowed-mouse-14616
04/19/2022, 6:29 PMabundant-laptop-47033
04/19/2022, 6:32 PMabundant-laptop-47033
04/19/2022, 6:33 PMapiVersion: v1
items:
- apiVersion: v1
  kind: Pod
  metadata:
    annotations:
      <http://cluster-autoscaler.kubernetes.io/safe-to-evict|cluster-autoscaler.kubernetes.io/safe-to-evict>: "false"
    creationTimestamp: "2022-04-19T07:30:49Z"
    deletionGracePeriodSeconds: 0
    deletionTimestamp: "2022-04-19T10:36:03Z"
    finalizers:
    - flyte/flytek8s
    labels:
      domain: main
      execution-id: f11qb62y-n1-0-dn0-0
      interruptible: "true"
      node-id: n2
      project: sunflower
      shard-key: "4"
      task-name: sunflower-workflows-flyte-workflows-trim-fastqs-and-align-workf
      workflow-name: sunflower-workflows-flyte-workflows-trim-fastqs-and-align-workf
    name: f11qb62y-n1-0-dn0-0-n2-0
    namespace: sunflower-main
    ownerReferences:
    - apiVersion: <http://flyte.lyft.com/v1alpha1|flyte.lyft.com/v1alpha1>
      blockOwnerDeletion: true
      controller: true
      kind: flyteworkflow
      name: f11qb62y-n1-0-dn0-0
      uid: e0525cc8-17c2-463b-a5dd-342776c69bbd
    resourceVersion: "266395947"
    uid: 393c4d2b-42ad-4061-8c95-a5d4f6eabb67
  spec:
    affinity:
      nodeAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - preference:
            matchExpressions:
            - key: <http://cloud.google.com/gke-preemptible|cloud.google.com/gke-preemptible>
              operator: Exists
          weight: 1
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: <http://k8s.freenome.net/node-role|k8s.freenome.net/node-role>
              operator: In
              values:
              - flyte-worker
            - key: <http://cloud.google.com/gke-preemptible|cloud.google.com/gke-preemptible>
              operator: In
              values:
              - "true"
    containers:
    - args:
      - pyflyte-execute
      - --inputs
      - <gs://freenome-orchid-staging-flyte-data/metadata/propeller/sunflower-main-f11qb62y-n1-0-dn0-0/n2/data/inputs.pb>
      - --output-prefix
      - <gs://freenome-orchid-staging-flyte-data/metadata/propeller/sunflower-main-f11qb62y-n1-0-dn0-0/n2/data/0>
      - --raw-output-data-prefix
      - <gs://freenome-orchid-staging-flyte-data/ak/f11qb62y-n1-0-dn0-0-n2-0>
      - --resolver
      - flytekit.core.python_auto_container.default_task_resolver
      - --
      - task-module
      - sunflower.workflows.flyte_workflows.trim_fastqs_and_align_workflow
      - task-name
      - align
      env:
      - name: FLYTE_INTERNAL_CONFIGURATION_PATH
        value: /usr/src/app/sunflower/workflows/config/workflows.config
      - name: FLYTE_INTERNAL_IMAGE
        value: <http://gcr.io/freenome-build/ap/sunflower:20220414.2|gcr.io/freenome-build/ap/sunflower:20220414.2>
      - name: FLYTE_INTERNAL_EXECUTION_WORKFLOW
        value: sunflower:main:sunflower.workflows.flyte_workflows.trim_fastqs_and_align_workflow.trim_fastqs_and_align_wf
      - name: FLYTE_INTERNAL_EXECUTION_ID
        value: f11qb62y-n1-0-dn0-0
      - name: FLYTE_INTERNAL_EXECUTION_PROJECT
        value: sunflower
      - name: FLYTE_INTERNAL_EXECUTION_DOMAIN
        value: main
      - name: FLYTE_ATTEMPT_NUMBER
        value: "0"
      - name: FLYTE_INTERNAL_TASK_PROJECT
        value: sunflower
      - name: FLYTE_INTERNAL_TASK_DOMAIN
        value: main
      - name: FLYTE_INTERNAL_TASK_NAME
        value: sunflower.workflows.flyte_workflows.trim_fastqs_and_align_workflow.align
      - name: FLYTE_INTERNAL_TASK_VERSION
        value: "20220414.2"
      - name: FLYTE_INTERNAL_PROJECT
        value: sunflower
      - name: FLYTE_INTERNAL_DOMAIN
        value: main
      - name: FLYTE_INTERNAL_NAME
        value: sunflower.workflows.flyte_workflows.trim_fastqs_and_align_workflow.align
      - name: FLYTE_INTERNAL_VERSION
        value: "20220414.2"
      - name: SUNFLOWER_STATIC_DATA_GCS_PATH
        value: <gs://freenome-orchid-staging-static-data>
      image: <http://gcr.io/freenome-build/ap/sunflower:20220414.2|gcr.io/freenome-build/ap/sunflower:20220414.2>
      imagePullPolicy: IfNotPresent
      name: f11qb62y-n1-0-dn0-0-n2-0
      resources:
        limits:
          cpu: "16"
          ephemeral-storage: 52Gi
          memory: 57Gi
        requests:
          cpu: "16"
          ephemeral-storage: 26Gi
          memory: 57Gi
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: FallbackToLogsOnError
      volumeMounts:
      - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
        name: kube-api-access-6h4gh
        readOnly: true
    dnsPolicy: ClusterFirst
    enableServiceLinks: true
    nodeName: gke-orchid-west1-flyte-worker-a3b955b5-pccx
    preemptionPolicy: PreemptLowerPriority
    priority: 0
    restartPolicy: Never
    schedulerName: default-scheduler
    securityContext: {}
    serviceAccount: default
    serviceAccountName: default
    terminationGracePeriodSeconds: 30
    tolerations:
    - effect: NoSchedule
      key: <http://k8s.freenome.net/node-role|k8s.freenome.net/node-role>
      operator: Equal
      value: flyte-worker
    - effect: NoExecute
      key: <http://node.kubernetes.io/not-ready|node.kubernetes.io/not-ready>
      operator: Exists
      tolerationSeconds: 300
    - effect: NoExecute
      key: <http://node.kubernetes.io/unreachable|node.kubernetes.io/unreachable>
      operator: Exists
      tolerationSeconds: 300
    volumes:
    - name: kube-api-access-6h4gh
      projected:
        defaultMode: 420
        sources:
        - serviceAccountToken:
            expirationSeconds: 3607
            path: token
        - configMap:
            items:
            - key: ca.crt
              path: ca.crt
            name: kube-root-ca.crt
        - downwardAPI:
            items:
            - fieldRef:
                apiVersion: v1
                fieldPath: metadata.namespace
              path: namespace
  status:
    conditions:
    - lastProbeTime: null
      lastTransitionTime: "2022-04-19T07:30:49Z"
      reason: PodCompleted
      status: "True"
      type: Initialized
    - lastProbeTime: null
      lastTransitionTime: "2022-04-19T08:24:46Z"
      reason: PodCompleted
      status: "False"
      type: Ready
    - lastProbeTime: null
      lastTransitionTime: "2022-04-19T08:24:46Z"
      reason: PodCompleted
      status: "False"
      type: ContainersReady
    - lastProbeTime: null
      lastTransitionTime: "2022-04-19T07:30:49Z"
      status: "True"
      type: PodScheduled
    containerStatuses:
    - containerID: <containerd://7a10fb36a46f0c245ee20736b470f2dd271755c0d50eb5b9c0e38e0a801b6d2>0
      image: <http://gcr.io/freenome-build/ap/sunflower:20220414.2|gcr.io/freenome-build/ap/sunflower:20220414.2>
      imageID: <http://gcr.io/freenome-build/ap/sunflower@sha256:c1029318c0902a1a7c69cc9252c11c0c745d6cc5433857bc06d83b98885742a5|gcr.io/freenome-build/ap/sunflower@sha256:c1029318c0902a1a7c69cc9252c11c0c745d6cc5433857bc06d83b98885742a5>
      lastState: {}
      name: f11qb62y-n1-0-dn0-0-n2-0
      ready: false
      restartCount: 0
      started: false
      state:
        terminated:
          containerID: <containerd://7a10fb36a46f0c245ee20736b470f2dd271755c0d50eb5b9c0e38e0a801b6d2>0
          exitCode: 0
          finishedAt: "2022-04-19T08:24:45Z"
          reason: Completed
          startedAt: "2022-04-19T07:30:49Z"
    hostIP: 172.31.0.27
    phase: Succeeded
    podIP: 172.20.53.23
    podIPs:
    - ip: 172.20.53.23
    qosClass: Guaranteed
    startTime: "2022-04-19T07:30:49Z"
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""abundant-laptop-47033
04/19/2022, 6:42 PMabundant-laptop-47033
04/19/2022, 6:42 PMhallowed-mouse-14616
04/19/2022, 7:17 PMabundant-laptop-47033
04/19/2022, 7:22 PMhallowed-mouse-14616
04/19/2022, 7:26 PMhallowed-mouse-14616
04/19/2022, 7:26 PMabundant-laptop-47033
04/19/2022, 7:30 PMhallowed-mouse-14616
04/19/2022, 7:46 PMhigh-park-82026
hallowed-mouse-14616
04/19/2022, 9:41 PMhallowed-mouse-14616
04/19/2022, 9:41 PMhallowed-mouse-14616
04/19/2022, 9:42 PMabundant-laptop-47033
04/19/2022, 9:46 PMlaunch_plan = LaunchPlan.create("name", workflow)  and then calling launch_plan(). Instead should we just be calling workflow() directly?hallowed-mouse-14616
04/19/2022, 10:02 PMabundant-laptop-47033
04/19/2022, 10:04 PMabundant-laptop-47033
04/19/2022, 10:07 PMabundant-laptop-47033
04/20/2022, 12:23 AMfreezing-airport-6809
abundant-laptop-47033
04/20/2022, 4:11 AMfreezing-airport-6809
high-park-82026
abundant-laptop-47033
04/21/2022, 2:12 PMhallowed-mouse-14616
04/21/2022, 2:16 PMabundant-laptop-47033
04/26/2022, 8:26 PMabundant-laptop-47033
04/26/2022, 8:26 PMhallowed-mouse-14616
04/27/2022, 8:52 AM