• JP Kosymna

    JP Kosymna

    6 months ago
    I installed flyte v0.19.1 on aws using opta, but I can't seem to get logging to work. I created a logGroup in aws named
    flyte-prod-tasks-logs
    and updated the
    values-eks.yaml
    file accordingly. In
    opta/aws/flyte.yaml
    I added the following.
    task_logs:
        plugins:
            logs:
                cloudwatch-enabled: true
                cloudwatch-log-group: flyte-prod-tasks-logs
                cloudwatch-region: "{vars.region}"
    My workflow runs successfully but I don't see anything in the log group. My ShellTask prints to standard out and I have a PythonTask that uses logging. Is there anything special I need to do? Or some configuration I missed?
  • > opta version
    v0.22.1
    > terraform --version
    Terraform v1.0.11
    on linux_amd64
  • Ketan (kumare3)

    Ketan (kumare3)

    6 months ago
  • Can you paste the contents of
    kubectl get -n cm flyte-propeller-config -o yaml
  • cc @Haytham Abuelfutuh 🙂
  • JP Kosymna

    JP Kosymna

    6 months ago
    apiVersion: v1
    data:
      admin.yaml: |
        admin:
          clientId: 'flytepropeller'
          clientSecretLocation: /etc/secrets/client_secret
          endpoint: flyteadmin:81
          insecure: true
        event:
          capacity: 1000
          rate: 500
          type: admin
      cache.yaml: |
        cache:
          max_size_mbs: 1024
          target_gc_percent: 70
      catalog.yaml: |
        catalog-cache:
          endpoint: datacatalog:89
          insecure: true
          type: datacatalog
      copilot.yaml: |
        plugins:
          k8s:
            co-pilot:
              image: <http://cr.flyte.org/flyteorg/flytecopilot:v0.0.24|cr.flyte.org/flyteorg/flytecopilot:v0.0.24>
              name: flyte-copilot-
              start-timeout: 30s
      core.yaml: |
        manager:
          pod-application: flytepropeller
          pod-template-container-name: flytepropeller
          pod-template-name: flytepropeller-template
        propeller:
          downstream-eval-duration: 30s
          enable-admin-launcher: true
          gc-interval: 12h
          kube-client-config:
            burst: 25
            qps: 100
            timeout: 30s
          leader-election:
            enabled: true
            lease-duration: 15s
            lock-config-map:
              name: propeller-leader
              namespace: flyte
            renew-deadline: 10s
            retry-period: 2s
          limit-namespace: all
          max-workflow-retries: 50
          metadata-prefix: metadata/propeller
          metrics-prefix: flyte
          prof-port: 10254
          queue:
            batch-size: -1
            batching-interval: 2s
            queue:
              base-delay: 5s
              capacity: 1000
              max-delay: 120s
              rate: 100
              type: maxof
            sub-queue:
              capacity: 1000
              rate: 100
              type: bucket
            type: batch
          rawoutput-prefix: <s3://flyte-prod-service-flyte>
          workers: 40
          workflow-reeval-duration: 30s
        webhook:
          certDir: /etc/webhook/certs
          serviceName: flyte-pod-webhook
      enabled_plugins.yaml: |
        tasks:
          task-plugins:
            default-for-task-types:
              container: container
              container_array: k8s-array
              sidecar: sidecar
            enabled-plugins:
            - container
            - sidecar
            - k8s-array
      k8s.yaml: |
        plugins:
          k8s:
            default-cpus: 100m
            default-env-vars: []
            default-memory: 100Mi
      logger.yaml: |
        logger:
          level: 5
          show-source: true
      resource_manager.yaml: |
        propeller:
          resourcemanager:
            type: noop
      storage.yaml: |
        storage:
          type: s3
          container: "flyte-prod-service-flyte"
          connection:
            auth-type: iam
            region: us-west-2
          limits:
            maxDownloadMBs: 10
      task_logs.yaml: |
        plugins:
          logs:
            cloudwatch-enabled: true
            cloudwatch-log-group: flyte-prod-tasks-logs
            cloudwatch-region: us-west-2
            kubernetes-enabled: false
    kind: ConfigMap
    metadata:
      annotations:
        <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: service-flyte-helmchart
        <http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
      creationTimestamp: "2022-01-26T19:57:43Z"
      labels:
        <http://app.kubernetes.io/instance|app.kubernetes.io/instance>: service-flyte-helmchart
        <http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: Helm
        <http://app.kubernetes.io/name|app.kubernetes.io/name>: flyteadmin
        <http://helm.sh/chart|helm.sh/chart>: flyte-core-v0.1.10
      name: flyte-propeller-config
      namespace: flyte
      resourceVersion: "8963"
      selfLink: /api/v1/namespaces/flyte/configmaps/flyte-propeller-config
      uid: 3fdd593b-ee78-476b-84fc-8c8a8c9b3502
  • Ketan (kumare3)

    Ketan (kumare3)

    6 months ago
    cc @Haytham Abuelfutuh
  • Haytham Abuelfutuh

    Haytham Abuelfutuh

    6 months ago
    Hey @JP Kosymna are these array tasks? Because you can choose to run array tasks in a different environment (e.g. AWS Batch), you will need to configure their logs separately… TL;DR: You can modify the task_logs.yaml like this:
    task_logs.yaml: |
        plugins:
          logs:
            cloudwatch-enabled: true
            cloudwatch-log-group: flyte-prod-tasks-logs
            cloudwatch-region: us-west-2
            kubernetes-enabled: false
          k8s-array:
            logs:
              config:
                cloudwatch-enabled: true
                cloudwatch-log-group: flyte-prod-tasks-logs
                cloudwatch-region: us-west-2
                kubernetes-enabled: false
  • y

    Yuvraj

    6 months ago
    cc: @Eugene Cha Is it same issue ?
  • JP Kosymna

    JP Kosymna

    6 months ago
    I've updated my config as @Haytham Abuelfutuh has suggested. But see no logs in the log group. I'm now running a super simple task for testing
    @task
    def get_uuid() -> str:
        id = str(uuid.uuid4())
        <http://logging.info|logging.info>(f'got new id {id}')
        print(f'got new id {id}')
        return id
  • Ketan (kumare3)

    Ketan (kumare3)

    6 months ago
    Do you have a cloudwatch agent installed
  • JP Kosymna

    JP Kosymna

    6 months ago
    Only if the opta install did it for me. How would I check?
  • Ketan (kumare3)

    Ketan (kumare3)

    6 months ago
    Should be a daemonset?
  • JP Kosymna

    JP Kosymna

    6 months ago
    I don't see anything.
  • Ketan (kumare3)

    Ketan (kumare3)

    6 months ago
    hmm cc @Yuvraj / @Nitin Aggarwal do you guys know what can be off?
  • cc @JD Palomino
  • Haytham Abuelfutuh

    Haytham Abuelfutuh

    6 months ago
  • Ketan (kumare3)

    Ketan (kumare3)

    6 months ago
    it seems after using opta, they are not seeing any cloudwatch daemon
  • Haytham Abuelfutuh

    Haytham Abuelfutuh

    6 months ago
    Opta doesn’t setup cloudwatch logs (fluentD… etc.) AFAIK… these steps will have to be followed manually/separately/after…
  • Ketan (kumare3)

    Ketan (kumare3)

    6 months ago
    ohh, we need to add this to the docs then?
  • i.e. that you should setup your preferred logs provider
  • j

    JD Palomino

    6 months ago
    er…. yeah we only setup cloudwatch logs for the vpc and eks control plane
  • not for the individual containers as that’s a personal decision
  • but I can help you setting it up!
  • Haytham Abuelfutuh

    Haytham Abuelfutuh

    6 months ago
    we’ve this one paragraph here https://docs.flyte.org/projects/cookbook/en/latest/auto/deployment/configure_logging_links.html#sphx-glr-auto-deployment-configure-logging-links-py:
    Every organization potentially uses different log aggregators, making it hard to create a one-size-fits-all solution. Some examples of the log aggregators include cloud-hosted solutions like AWS CloudWatch, GCP Stackdriver, Splunk, Datadog, etc. Flyte does not have an opinion here and provides a simplified interface to configure your log provider. Flyte-sandbox ships with the Kubernetes dashboard to visualize the logs. This may not be safe for production; hence we recommend users explore other log aggregators.
  • I think if we can document how to do that in our guide here: https://docs.flyte.org/en/latest/deployment/aws/manual.html that would be fantastic!
  • JP Kosymna

    JP Kosymna

    6 months ago
    Ok, this makes sense
  • @JD Palomino I would appreciate your help to get this setup
  • j

    JD Palomino

    6 months ago
    cool, lmk when you’re free
  • JP Kosymna

    JP Kosymna

    6 months ago
    @JD Palomino showed me how to view the pods logs using kubectl and that will work for now. I'll look into https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-EKS-logs.html or another log aggregator in the future
  • Ketan (kumare3)

    Ketan (kumare3)

    6 months ago
    ya
  • Haytham Abuelfutuh

    Haytham Abuelfutuh

    6 months ago
    yeah.. We have used datadog, stackdriver, cloudwatch, kibana… there are many many of those as you can tell and most organizations have decided on one or the other, hence why flyte only allows you to configure how the link should be formulated without getting too deep into setting these up…
  • e

    Eugene Cha

    6 months ago
    @Yuvraj I think it’s the same issue
  • oh you can just view the logs in kubectl? I should try that
  • Haytham Abuelfutuh

    Haytham Abuelfutuh

    6 months ago
    kubectl get pods -n <namespace>
    Should list you all the pods in, say, flyte-snacks namespace… Unless nodes are deeply nested, the pod names will have the structure of:
    <execId>-n0-0
    Where
    <execId>
    is the flyte execution id and
    n0
    is the node id (unless you explicitly name it) and
    0
    is the retry attempt Then:
    kubectl logs -n <namespace> <pod name>
    Or to watch logs as they come:
    kubectl logs -n <namespace> <pod name> --follow
  • j

    JD Palomino

    6 months ago
  • JP Kosymna

    JP Kosymna

    6 months ago
    Somewhat related I have some nested nodes via a map_task that seem to get cleaned up almost immediately upon completion. Does flyte have any options to control how long pods are left around? Something like https://argoproj.github.io/argo-workflows/fields/#ttlstrategy
  • Ketan (kumare3)

    Ketan (kumare3)

    6 months ago
    @JP Kosymna yes, it does have an option
  • JP Kosymna

    JP Kosymna

    6 months ago
    Thanks for the links. The behavior I'm seeing is only the pods from the map_task are getting removed immediately the other pods do seem to stick around for 23h
  • If that's just how map_tasks work that's fine I should probably setup a real logging solution going forward
  • Ketan (kumare3)

    Ketan (kumare3)

    6 months ago
    ohh interesting