I installed flyte v0 19 1 on aws using opta but I can t seem Flyte #flyte-deployment

I installed flyte v0.19.1 on aws using opta, but I...

happy-bird-19790

01/27/2022, 9:03 PM

I installed flyte v0.19.1 on aws using opta, but I can't seem to get logging to work. I created a logGroup in aws named

flyte-prod-tasks-logs

and updated the

values-eks.yaml

file accordingly. In

opta/aws/flyte.yaml

I added the following.

Copy code

task_logs:
    plugins:
        logs:
            cloudwatch-enabled: true
            cloudwatch-log-group: flyte-prod-tasks-logs
            cloudwatch-region: "{vars.region}"

My workflow runs successfully but I don't see anything in the log group. My ShellTask prints to standard out and I have a PythonTask that uses logging. Is there anything special I need to do? Or some configuration I missed?

happy-bird-19790

01/27/2022, 9:07 PM

Copy code

> opta version
v0.22.1
> terraform --version
Terraform v1.0.11
on linux_amd64

freezing-airport-6809

01/27/2022, 9:31 PM

@happy-bird-19790 did you follow this doc - https://docs.flyte.org/projects/cookbook/en/latest/auto/deployment/configure_logging_links.html#sphx-glr-auto-deployment-configure-logging-links-py

freezing-airport-6809

01/27/2022, 9:31 PM

Can you paste the contents of

Copy code

kubectl get -n cm flyte-propeller-config -o yaml

freezing-airport-6809

01/27/2022, 9:32 PM

cc @high-park-82026 🙂

happy-bird-19790

01/27/2022, 9:34 PM

Copy code

apiVersion: v1
data:
  admin.yaml: |
    admin:
      clientId: 'flytepropeller'
      clientSecretLocation: /etc/secrets/client_secret
      endpoint: flyteadmin:81
      insecure: true
    event:
      capacity: 1000
      rate: 500
      type: admin
  cache.yaml: |
    cache:
      max_size_mbs: 1024
      target_gc_percent: 70
  catalog.yaml: |
    catalog-cache:
      endpoint: datacatalog:89
      insecure: true
      type: datacatalog
  copilot.yaml: |
    plugins:
      k8s:
        co-pilot:
          image: <http://cr.flyte.org/flyteorg/flytecopilot:v0.0.24|cr.flyte.org/flyteorg/flytecopilot:v0.0.24>
          name: flyte-copilot-
          start-timeout: 30s
  core.yaml: |
    manager:
      pod-application: flytepropeller
      pod-template-container-name: flytepropeller
      pod-template-name: flytepropeller-template
    propeller:
      downstream-eval-duration: 30s
      enable-admin-launcher: true
      gc-interval: 12h
      kube-client-config:
        burst: 25
        qps: 100
        timeout: 30s
      leader-election:
        enabled: true
        lease-duration: 15s
        lock-config-map:
          name: propeller-leader
          namespace: flyte
        renew-deadline: 10s
        retry-period: 2s
      limit-namespace: all
      max-workflow-retries: 50
      metadata-prefix: metadata/propeller
      metrics-prefix: flyte
      prof-port: 10254
      queue:
        batch-size: -1
        batching-interval: 2s
        queue:
          base-delay: 5s
          capacity: 1000
          max-delay: 120s
          rate: 100
          type: maxof
        sub-queue:
          capacity: 1000
          rate: 100
          type: bucket
        type: batch
      rawoutput-prefix: <s3://flyte-prod-service-flyte>
      workers: 40
      workflow-reeval-duration: 30s
    webhook:
      certDir: /etc/webhook/certs
      serviceName: flyte-pod-webhook
  enabled_plugins.yaml: |
    tasks:
      task-plugins:
        default-for-task-types:
          container: container
          container_array: k8s-array
          sidecar: sidecar
        enabled-plugins:
        - container
        - sidecar
        - k8s-array
  k8s.yaml: |
    plugins:
      k8s:
        default-cpus: 100m
        default-env-vars: []
        default-memory: 100Mi
  logger.yaml: |
    logger:
      level: 5
      show-source: true
  resource_manager.yaml: |
    propeller:
      resourcemanager:
        type: noop
  storage.yaml: |
    storage:
      type: s3
      container: "flyte-prod-service-flyte"
      connection:
        auth-type: iam
        region: us-west-2
      limits:
        maxDownloadMBs: 10
  task_logs.yaml: |
    plugins:
      logs:
        cloudwatch-enabled: true
        cloudwatch-log-group: flyte-prod-tasks-logs
        cloudwatch-region: us-west-2
        kubernetes-enabled: false
kind: ConfigMap
metadata:
  annotations:
    <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: service-flyte-helmchart
    <http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
  creationTimestamp: "2022-01-26T19:57:43Z"
  labels:
    <http://app.kubernetes.io/instance|app.kubernetes.io/instance>: service-flyte-helmchart
    <http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: Helm
    <http://app.kubernetes.io/name|app.kubernetes.io/name>: flyteadmin
    <http://helm.sh/chart|helm.sh/chart>: flyte-core-v0.1.10
  name: flyte-propeller-config
  namespace: flyte
  resourceVersion: "8963"
  selfLink: /api/v1/namespaces/flyte/configmaps/flyte-propeller-config
  uid: 3fdd593b-ee78-476b-84fc-8c8a8c9b3502

freezing-airport-6809

01/28/2022, 12:34 AM

cc @high-park-82026

high-park-82026

01/28/2022, 12:41 AM

Hey @happy-bird-19790 are these array tasks? Because you can choose to run array tasks in a different environment (e.g. AWS Batch), you will need to configure their logs separately… TL;DR: You can modify the task_logs.yaml like this:

Copy code

task_logs.yaml: |
    plugins:
      logs:
        cloudwatch-enabled: true
        cloudwatch-log-group: flyte-prod-tasks-logs
        cloudwatch-region: us-west-2
        kubernetes-enabled: false
      k8s-array:
        logs:
          config:
            cloudwatch-enabled: true
            cloudwatch-log-group: flyte-prod-tasks-logs
            cloudwatch-region: us-west-2
            kubernetes-enabled: false

great-school-54368

01/28/2022, 2:48 PM

cc: @blue-eve-64591 Is it same issue ?

happy-bird-19790

01/28/2022, 4:43 PM

I've updated my config as @high-park-82026 has suggested. But see no logs in the log group. I'm now running a super simple task for testing

Copy code

@task
def get_uuid() -> str:
    id = str(uuid.uuid4())
    <http://logging.info|logging.info>(f'got new id {id}')
    print(f'got new id {id}')
    return id

freezing-airport-6809

01/28/2022, 5:03 PM

Do you have a cloudwatch agent installed

happy-bird-19790

01/28/2022, 5:06 PM

Only if the opta install did it for me. How would I check?

freezing-airport-6809

01/28/2022, 5:09 PM

Should be a daemonset?

happy-bird-19790

01/28/2022, 5:17 PM

I don't see anything.

freezing-airport-6809

01/28/2022, 5:17 PM

hmm cc @great-school-54368 / @gentle-intern-64632 do you guys know what can be off?

freezing-airport-6809

01/28/2022, 5:17 PM

cc @most-sunset-30029

high-park-82026

01/28/2022, 5:17 PM

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-EKS-logs.html

freezing-airport-6809

01/28/2022, 5:17 PM

it seems after using opta, they are not seeing any cloudwatch daemon

high-park-82026

01/28/2022, 5:18 PM

Opta doesn’t setup cloudwatch logs (fluentD… etc.) AFAIK… these steps will have to be followed manually/separately/after…

freezing-airport-6809

01/28/2022, 5:18 PM

ohh, we need to add this to the docs then?

freezing-airport-6809

01/28/2022, 5:19 PM

i.e. that you should setup your preferred logs provider

most-sunset-30029

01/28/2022, 5:19 PM

er…. yeah we only setup cloudwatch logs for the vpc and eks control plane

most-sunset-30029

01/28/2022, 5:19 PM

not for the individual containers as that’s a personal decision

most-sunset-30029

01/28/2022, 5:19 PM

but I can help you setting it up!

high-park-82026

01/28/2022, 5:19 PM

we’ve this one paragraph here https://docs.flyte.org/projects/cookbook/en/latest/auto/deployment/configure_logging_links.html#sphx-glr-auto-deployment-configure-logging-links-py:

Every organization potentially uses different log aggregators, making it hard to create a one-size-fits-all solution. Some examples of the log aggregators include cloud-hosted solutions like AWS CloudWatch, GCP Stackdriver, Splunk, Datadog, etc.

Flyte does not have an opinion here and provides a simplified interface to configure your log provider. Flyte-sandbox ships with the Kubernetes dashboard to visualize the logs. This may not be safe for production; hence we recommend users explore other log aggregators.

high-park-82026

01/28/2022, 5:20 PM

I think if we can document how to do that in our guide here: https://docs.flyte.org/en/latest/deployment/aws/manual.html that would be fantastic!

happy-bird-19790

01/28/2022, 5:21 PM

Ok, this makes sense

happy-bird-19790

01/28/2022, 5:21 PM

@most-sunset-30029 I would appreciate your help to get this setup

most-sunset-30029

01/28/2022, 5:22 PM

cool, lmk when you’re free

happy-bird-19790

01/28/2022, 6:07 PM

@most-sunset-30029 showed me how to view the pods logs using kubectl and that will work for now. I'll look into https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-EKS-logs.html or another log aggregator in the future

happy-bird-19790

01/28/2022, 6:31 PM

two other options https://docs.opta.dev/observability/datadog/ or https://docs.opta.dev/observability/logdna/

freezing-airport-6809

01/28/2022, 6:37 PM

high-park-82026

01/28/2022, 11:34 PM

yeah.. We have used datadog, stackdriver, cloudwatch, kibana… there are many many of those as you can tell and most organizations have decided on one or the other, hence why flyte only allows you to configure how the link should be formulated without getting too deep into setting these up…

blue-eve-64591

01/29/2022, 1:27 AM

@great-school-54368 I think it’s the same issue

blue-eve-64591

01/29/2022, 1:30 AM

oh you can just view the logs in kubectl? I should try that

high-park-82026

01/29/2022, 1:33 AM

Copy code

kubectl get pods -n <namespace>

Should list you all the pods in, say, flyte-snacks namespace… Unless nodes are deeply nested, the pod names will have the structure of:

Copy code

<execId>-n0-0

Where

<execId>

is the flyte execution id and

n0

is the node id (unless you explicitly name it) and

is the retry attempt Then:

Copy code

kubectl logs -n <namespace> <pod name>

Or to watch logs as they come:

Copy code

kubectl logs -n <namespace> <pod name> --follow

most-sunset-30029

01/29/2022, 1:36 AM

this tool is super useful https://github.com/wercker/stern

👍 1

happy-bird-19790

01/31/2022, 6:10 PM

Somewhat related I have some nested nodes via a map_task that seem to get cleaned up almost immediately upon completion. Does flyte have any options to control how long pods are left around? Something like https://argoproj.github.io/argo-workflows/fields/#ttlstrategy

freezing-airport-6809

01/31/2022, 7:32 PM

@happy-bird-19790 yes, it does have an option

freezing-airport-6809

01/31/2022, 7:34 PM

https://docs.flyte.org/en/latest/deployment/cluster_config/flytepropeller_config.html#max-ttl-hours-int

freezing-airport-6809

01/31/2022, 7:35 PM

you may also want to add - https://docs.flyte.org/en/latest/deployment/cluster_config/flytepropeller_config.html#inject-finalizer-bool and check on - https://docs.flyte.org/en/latest/deployment/cluster_config/flytepropeller_config.html#delete-resource-on-finalize-bool

happy-bird-19790

01/31/2022, 7:38 PM

Thanks for the links. The behavior I'm seeing is only the pods from the map_task are getting removed immediately the other pods do seem to stick around for 23h

happy-bird-19790

01/31/2022, 7:46 PM

If that's just how map_tasks work that's fine I should probably setup a real logging solution going forward

freezing-airport-6809

01/31/2022, 7:58 PM

ohh interesting

196 Views

Open in Slack

Previous Next