Hi folks, I have setup a Flyte based on the <Singl...
# ask-the-community
b
Hi folks, I have setup a Flyte based on the Single Cluster Simple Cloud Deployment docs. Deployed a simple workflow based on Getting Started guide. I am getting the following error in the console. Any idea why? Thanks.
Copy code
[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[f2d93e14fa6164601aad-n0-0] terminated with exit code (137). Reason [OOMKilled]. Message: 
{"asctime": "2023-03-08 13:31:31,597", "name": "flytekit", "levelname": "WARNING", "message": "FlyteSchema is deprecated, use Structured Dataset instead."}
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-mmqveczs because the default path (/home/flytekit/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
.
m
The default container resource limits are very low, you can override this:
Copy code
from flytekit import Resources, task

@task(limits=Resources(mem="256Mi")
def your_task(...
k
thank you @Michael Tinsley for helping folks - the core maintainers highly appreciate the help and the community effort
b
@Michael Tinsley thanks for your help. That solved my problem. But got a new error, the IAM role has full S3Bucket Access.
Copy code
Traceback (most recent call last):
  File "/usr/local/bin/pyflyte-fast-execute", line 8, in <module>
    sys.exit(fast_execute_task_cmd())
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/flytekit/bin/entrypoint.py", line 502, in fast_execute_task_cmd
    _download_distribution(additional_distribution, dest_dir)
  File "/usr/local/lib/python3.10/site-packages/flytekit/tools/fast_registration.py", line 111, in download_distribution
    FlyteContextManager.current_context().file_access.get_data(additional_distribution, os.path.join(destination, ""))
  File "/usr/local/lib/python3.10/site-packages/flytekit/core/data_persistence.py", line 456, in get_data
    raise FlyteAssertion(
flytekit.exceptions.user.FlyteAssertion: Failed to get data from <s3://senn-ai-mlops-flyte/xa/flytesnacks/development/MZGUPI5PR6WYBPOECGZDQE2L4A======/scriptmode.tar.gz> to /root/ (recursive=False).

Original exception: Access Denied
k
Every execution in Flyte is associated with a Kubernetes Service account. this service account can have an associated IAM role with it (if you setup EKS ServiceAccounts for IAM roles). then ensure that the associated role has access to the metadata bucket as well as your data bucket. this makes it possible to not share any data with Flyte components and yet allow sharing
b
@Ketan (kumare3) I am using the same bucket for metadata and data. I am able register the workflow.
Copy code
storage:
    metadataContainer: "senn-ai-mlops-flyte"
    userDataContainer: "senn-ai-mlops-flyte"
    provider: s3
    providerConfig:
      s3:
        region: "eu-central-1"
        authType: "iam"
My pods should have access to the bucket.
Copy code
AWS_DEFAULT_REGION : eu-central-1
AWS_METADATA_SERVICE_NUM_ATTEMPTS : 20
AWS_METADATA_SERVICE_TIMEOUT : 5
AWS_REGION : eu-central-1
AWS_ROLE_ARN : xxxxxxx/playground-eks-flyte-user
AWS_STS_REGIONAL_ENDPOINTS : regional
AWS_WEB_IDENTITY_TOKEN_FILE : /var/run/secrets/eks.amazonaws.com/serviceaccount/token
d
Hi @Bosco Raju You could annotate the default service account for each namespace like this:
Copy code
kubectl annotate serviceaccount -n flyte default <<http://eks.amazonaws.com/role-arn=arn:aws:iam::xxxx:role/playground-eks-flyte-user|eks.amazonaws.com/role-arn=arn:aws:iam::xxxx:role/playground-eks-flyte-user>>
In your case, check if you're using a different namespace name instead of
flyte
also double check the S3-related permissions configured for your IAM role
b
@David Espejo (he/him) I have annotated default service account for each namespace. What was missing is including each namespace and service account in Trusted entities under the IAM role. My pods are running successfully now but in the console I get the following error. Any idea why?
Copy code
UNKNOWN::Outputs not generated by task execution
d
could you share your workflow definition? also the logs from the pod?
b
Logs from the pod {"asctime": "2023-03-08 204445,474", "name": "flytekit", "levelname": "WARNING", "message": "FlyteSchema is deprecated, use Structured Dataset instead."} Matplotlib created a temporary config/cache directory at /tmp/matplotlib-lz93jpiy because the default path (/home/flytekit/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing. tar: Removing leading `/' from member names {"asctime": "2023-03-08 204448,361", "name": "flytekit", "levelname": "WARNING", "message": "FlyteSchema is deprecated, use Structured Dataset instead."} 2023-03-08T204449.338369856Z Update: Plus the flyte binary pod crashes after the run.
s
@Bosco Raju, what's the flytekit version?
b
@Samhita Alla I am using Flytekit Version: 1.4.0
s
We yanked 1.4.0. Can you install 1.4.1?
b
Thanks @Samhita Alla upgrading to 1.4.1 fixed the problem.
153 Views