https://flyte.org logo
#ask-the-community
Title
# ask-the-community
a

Anthony

09/28/2022, 8:31 AM
Hi all 👋 I have a problem of sharing data between tasks. I found a similar issue here in discussions (link)
Copy code
Workflow[flyte-anti-fraud-ml:development:app.workflow.main_flow] failed. RuntimeExecutionError: max number of system retry attempts [31/30] exhausted. Last known status message: failed at Node[n0]. RuntimeExecutionError: failed during plugin execution, caused by: error file @[<s3://my-s3-bucket/metadata/propeller/flyte-anti-fraud-ml-development-f31c365f02c114639b00/n0/data/0/error.pb>] is too large [28775519] bytes, max allowed [10485760] bytes
I added
max-output-size-bytes
params to the
flyte-propeller-config
and wait to apply all changes before re-submitting a new task.
kubectl edit configmap -n flyte flyte-propeller-config
My propeller section of
flyte-propeller-config
looks like:
Copy code
core.yaml: |
    manager:
      pod-application: flytepropeller
      pod-template-container-name: flytepropeller
      pod-template-name: flytepropeller-template
    propeller:
      max-output-size-bytes: 52428800
      downstream-eval-duration: 30s
      enable-admin-launcher: true
      leader-election:
        enabled: true
        lease-duration: 15s
        lock-config-map:
          name: propeller-leader
          namespace: flyte
        renew-deadline: 10s
        retry-period: 2s
      limit-namespace: all
      max-workflow-retries: 3
      metadata-prefix: metadata/propeller
      metrics-prefix: flyte
      prof-port: 10254
Task configuration has been set up via
kubectl -n flyte edit cm flyte-admin-base-config
Copy code
storage.yaml: |
    storage:
      type: minio
      container: "my-s3-bucket"
      stow:
        kind: s3
        config:
          access_key_id: minio
          auth_type: accesskey
          secret_key: miniostorage
          disable_ssl: true
          endpoint: <http://minio.flyte.svc.cluster.local:9000>
          region: us-east-1
      signedUrl:
        stowConfigOverride:
          endpoint: <http://localhost:30084>
      enable-multicontainer: false
      limits:
        maxDownloadMBs: 50
  task_resource_defaults.yaml: |
    task_resources:
      defaults:
        cpu: 1
        memory: 3000Mi
        storage: 100Mi
      limits:
        cpu: 4
        gpu: 1
        memory: 3Gi
        storage: 500Mi
Also changing
maxDownloadMBs
didn’t change the situation Changing cache
max_size_mbs
in
flyte-propeller-config
from 0 to some custom value also not working:
Copy code
cache.yaml: |
    cache:
      max_size_mbs: 100
      target_gc_percent: 70
I ty to change different time with different params but the error was arising during each new executions. I saw that none of
max-output-size-bytes
or
max-workflow-retries
(changed from 30 --> 3) are passed to the workflow execution:
Copy code
RuntimeExecutionError: max number of system retry attempts [31/30] exhausted...

error file @[<s3://my-s3-bucket/metadata/propeller/flyte-anti-fraud-ml-development-f31c365f02c114639b00/n0/data/0/error.pb>] is too large [28775519] bytes, max allowed [10485760] bytes...
Hereafter are my cli steps to create a new execution:
Copy code
- kubectl -n flyte edit cm flyte-admin-base-config
- kubectl edit configmap -n flyte flyte-propeller-config
- flytectl get task-resource-attribute -p flyteexamples -d development
- flytectl update project -p flyte-anti-fraud-ml -d development --storage.cache.max_size_mbs 100
- flytectl get launchplan --project flyte-anti-fraud-ml --domain development app.workflow.main_flow --latest --execFile exec_spec.yaml
- flytectl create execution --project flyte-anti-fraud-ml --domain development --execFile exec_spec.yaml
What additional steps I have to do to force flytectl to use my propeller changes and solve the problem of a max 10Mb size allowed for serialized uploads to flyte?
s

Samhita Alla

09/28/2022, 9:55 AM
@Anthony, can you try restarting your propeller after editing your config map?
I’m not sure if this solves the problem, though. But no harm in giving that a try.
a

Anthony

09/28/2022, 10:00 AM
Im using registration and task submission via flytectl. Also changing
kubectl edit configmap -n flyte flyte-propeller-config
I have to wait 5 min for healthy upstream in flyte UI console (i though that it was restarting automaticaly after new changes in config are submitted). How could I forcibly restart
flyte-propeller
via
kubectl cli
?
Im already using 2-steps in my Makefile:
serialize
-->
register
Copy code
.PHONY: serialize
serialize:
        echo ${CURDIR}
        pyflyte -c flyte.config --pkgs app package \
                --force \
                --in-container-source-path /root \
                --image ${FULL_IMAGE_NAME}:${VERSION}

.PHONY: register
# register: docker-push serialize
register: serialize
        flytectl -c ${FLYTECTL_CONFIG} \
                register files \
                --project ${PROJECT} \
                --domain development \
                --archive flyte-package.tgz \
                --force \
                --version ${VERSION}
Also according to docs i saw that some flags are deprecated in favour of using congif file via `flyte admin`:
Copy code
--sourceUploadPath string       Deprecated: Update flyte admin to avoid having to configure storage access from flytectl.
My problem that I have one task that generates a dataclasses instances on the exit and another task should takes these classes as input params:
Copy code
@workflow
def main_flow() -> Forecast:
    """
    Main Flyte WorkFlow consisting of three tasks:
        -  @preproc_and_split
        -  @train_xgboost_clf
        -  @get_predictions
    """
    <http://logger.info|logger.info>(log="#START -- START Raw Preprocessing and Splitting", timestamp=None)
    train_cls, target_cls = preproc_and_split()

    <http://logger.info|logger.info>(log="#START -- START Initialize Boosting Params", timestamp=None)
    saved_mpath = train_xgboost_clf(
                            feat_cls=train_cls,
                            target_cls=target_cls,
                            xgb_params=xgb_params,
                            cust_metric=BoostingCustMetric
                         )
Where
Copy code
def preproc_and_split() -> Tuple[Fraud_Raw_PostProc_Data_Class, Fraud_Raw_Target_Data_Class]:
These dataclasses are produced only after task execution so i cannot register them as some proto-files coz they are Promising return args for my function. 10mb error arises during sharing these dataclasses between tasks. And i’m wondering of how to apply changes in
flyte-propeller-config
to pass these serialisation limitations
s

Samhita Alla

09/28/2022, 10:05 AM
kubectl -n flyte rollout restart deploy flytepropeller
is the restart command.
a

Anthony

09/28/2022, 10:06 AM
lemme check this
@Samhita Alla by restarting flytepropeller i can solve my issue:
kubectl -n flyte rollout restart deploy flytepropeller
🎉
thank you!
224 Views