Hi all 👋
I have a problem of sharing data between tasks.
I found a similar issue here in discussions (
link)
Workflow[flyte-anti-fraud-ml:development:app.workflow.main_flow] failed. RuntimeExecutionError: max number of system retry attempts [31/30] exhausted. Last known status message: failed at Node[n0]. RuntimeExecutionError: failed during plugin execution, caused by: error file @[<s3://my-s3-bucket/metadata/propeller/flyte-anti-fraud-ml-development-f31c365f02c114639b00/n0/data/0/error.pb>] is too large [28775519] bytes, max allowed [10485760] bytes
I added 
max-output-size-bytes
  params to the 
flyte-propeller-config
  and wait to apply all changes before re-submitting a new task.
kubectl edit configmap -n flyte flyte-propeller-config
My propeller section of 
flyte-propeller-config
  looks like:
core.yaml: |
    manager:
      pod-application: flytepropeller
      pod-template-container-name: flytepropeller
      pod-template-name: flytepropeller-template
    propeller:
      max-output-size-bytes: 52428800
      downstream-eval-duration: 30s
      enable-admin-launcher: true
      leader-election:
        enabled: true
        lease-duration: 15s
        lock-config-map:
          name: propeller-leader
          namespace: flyte
        renew-deadline: 10s
        retry-period: 2s
      limit-namespace: all
      max-workflow-retries: 3
      metadata-prefix: metadata/propeller
      metrics-prefix: flyte
      prof-port: 10254
Task configuration has been set up via 
kubectl -n flyte edit cm flyte-admin-base-config
storage.yaml: |
    storage:
      type: minio
      container: "my-s3-bucket"
      stow:
        kind: s3
        config:
          access_key_id: minio
          auth_type: accesskey
          secret_key: miniostorage
          disable_ssl: true
          endpoint: <http://minio.flyte.svc.cluster.local:9000>
          region: us-east-1
      signedUrl:
        stowConfigOverride:
          endpoint: <http://localhost:30084>
      enable-multicontainer: false
      limits:
        maxDownloadMBs: 50
  task_resource_defaults.yaml: |
    task_resources:
      defaults:
        cpu: 1
        memory: 3000Mi
        storage: 100Mi
      limits:
        cpu: 4
        gpu: 1
        memory: 3Gi
        storage: 500Mi
Also changing 
maxDownloadMBs
 didn’t change the situation
Changing cache 
max_size_mbs
 in 
flyte-propeller-config
 from 0 to some custom value also not working:
cache.yaml: |
    cache:
      max_size_mbs: 100
      target_gc_percent: 70
I ty to change different time with different params but the error was arising during each new executions.
I saw that none of 
max-output-size-bytes
 or 
max-workflow-retries
 (changed from 30 --> 3) are passed to the workflow execution:
RuntimeExecutionError: max number of system retry attempts [31/30] exhausted...
error file @[<s3://my-s3-bucket/metadata/propeller/flyte-anti-fraud-ml-development-f31c365f02c114639b00/n0/data/0/error.pb>] is too large [28775519] bytes, max allowed [10485760] bytes...
Hereafter are my cli steps to create a new execution:
- kubectl -n flyte edit cm flyte-admin-base-config
- kubectl edit configmap -n flyte flyte-propeller-config
- flytectl get task-resource-attribute -p flyteexamples -d development
- flytectl update project -p flyte-anti-fraud-ml -d development --storage.cache.max_size_mbs 100
- flytectl get launchplan --project flyte-anti-fraud-ml --domain development app.workflow.main_flow --latest --execFile exec_spec.yaml
- flytectl create execution --project flyte-anti-fraud-ml --domain development --execFile exec_spec.yaml
What additional steps I have to do to force flytectl to use my propeller changes and solve the problem of a max 10Mb size allowed for serialized uploads to flyte?