Blair Anson
04/15/2023, 7:58 AMpyflyte run --remote
command fails with Handshake failed with fatal error SSL_ERROR_SSL
.
$ FLYTE_SDK_LOGGING_LEVEL=20 pyflyte run --remote example.py training_workflow --hyperparameters '{"C": 0.1}'
{"asctime": "2023-04-15 16:54:36,923", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-15 16:54:36,950", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-15 16:54:36,954", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-15 16:54:37,937", "name": "flytekit", "levelname": "INFO", "message": "We won't register PyTorchCheckpointTransformer, PyTorchTensorTransformer, and PyTorchModuleTransformer because torch is not installed."}
{"asctime": "2023-04-15 16:54:38,379", "name": "flytekit", "levelname": "INFO", "message": "We won't register TensorFlowRecordFileTransformer, TensorFlowRecordsDirTransformer and TensorFlowModelTransformerbecause tensorflow is not installed."}
{"asctime": "2023-04-15 16:54:38,408", "name": "flytekit", "levelname": "INFO", "message": "We won't register bigquery handler for structured dataset because we can't find the packages google-cloud-bigquery-storage and google-cloud-bigquery"}
{"asctime": "2023-04-15 16:54:38,696", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-15 16:54:38,697", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
E0415 16:54:39.685038207 177107 <http://ssl_transport_security.cc:1420]|ssl_transport_security.cc:1420]> Handshake failed with fatal error SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER.
E0415 16:54:40.191239374 177107 <http://ssl_transport_security.cc:1420]|ssl_transport_security.cc:1420]> Handshake failed with fatal error SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER.
Failed with Exception: Reason: SYSTEM:Unknown
RPC Failed, with Status: StatusCode.UNAVAILABLE
details: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:8088: Ssl handshake failed: SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
Debug string UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:8088: Ssl handshake failed: SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER {created_time:"2023-04-15T16:54:40.193233866+09:00", grpc_status:14}
I understand this error usually occurs when the .flyte/config.yaml
and env variable config is not correct. I have checked that but I must be missing something obvious.
Here is my setup...
Remote cluster is AWS EKS running in a VPC
Flyte was installed following instructions in https://docs.flyte.org/en/latest/deployment/deployment/cloud_simple.html
Local ports are proxied to these flyte services...
kubectl -n flyte port-forward service/flyte-backend-flyte-binary-grpc 8089:8089 &
kubectl -n flyte port-forward service/flyte-backend-flyte-binary-http 8088:8088 &
Env vars...
$ echo $FLYTECTL_CONFIG
/home/blair/.flyte/config.yaml
$ echo $KUBECONFIG
:/home/blair/.kube/config
.flyte/config.yaml
admin:
# For GRPC endpoints you might want to use dns:///flyte.myexample.com
endpoint: dns:///localhost:8088
authType: Pkce
insecure: false
logger:
show-source: true
level: 0
insecure: true
I get the following error
$ FLYTE_SDK_LOGGING_LEVEL=20 pyflyte run --remote example.py training_workflow --hyperparameters '{"C": 0.1}'
{"asctime": "2023-04-15 17:00:10,016", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-15 17:00:10,032", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-15 17:00:10,034", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-15 17:00:11,255", "name": "flytekit", "levelname": "INFO", "message": "We won't register PyTorchCheckpointTransformer, PyTorchTensorTransformer, and PyTorchModuleTransformer because torch is not installed."}
{"asctime": "2023-04-15 17:00:11,875", "name": "flytekit", "levelname": "INFO", "message": "We won't register TensorFlowRecordFileTransformer, TensorFlowRecordsDirTransformer and TensorFlowModelTransformerbecause tensorflow is not installed."}
{"asctime": "2023-04-15 17:00:11,897", "name": "flytekit", "levelname": "INFO", "message": "We won't register bigquery handler for structured dataset because we can't find the packages google-cloud-bigquery-storage and google-cloud-bigquery"}
{"asctime": "2023-04-15 17:00:12,160", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-15 17:00:12,162", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
Failed with Exception: Reason: SYSTEM:Unknown
RPC Failed, with Status: StatusCode.UNAVAILABLE
details: failed to connect to all addresses; last error: INTERNAL: ipv4:127.0.0.1:8088: Trying to connect an http1.x server
Debug string UNKNOWN:failed to connect to all addresses; last error: INTERNAL: ipv4:127.0.0.1:8088: Trying to connect an http1.x server {created_time:"2023-04-15T17:00:13.277402386+09:00", grpc_status:14}
I looks like grpc can not connect, but the port proxy looks to be working fine as I can open the web console in a browser
http://localhost:8088/consoleKetan (kumare3)
jeev
Blair Anson
04/16/2023, 1:40 AM.flyte/config.yaml
but see the same error with port 8089
$ FLYTE_SDK_LOGGING_LEVEL=20 pyflyte run --remote example.py training_workflow --hyperparameters '{"C": 0.1}'
{"asctime": "2023-04-16 10:37:01,444", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-16 10:37:01,467", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-16 10:37:01,470", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-16 10:37:02,739", "name": "flytekit", "levelname": "INFO", "message": "We won't register PyTorchCheckpointTransformer, PyTorchTensorTransformer, and PyTorchModuleTransformer because torch is not installed."}
{"asctime": "2023-04-16 10:37:03,340", "name": "flytekit", "levelname": "INFO", "message": "We won't register TensorFlowRecordFileTransformer, TensorFlowRecordsDirTransformer and TensorFlowModelTransformerbecause tensorflow is not installed."}
{"asctime": "2023-04-16 10:37:03,382", "name": "flytekit", "levelname": "INFO", "message": "We won't register bigquery handler for structured dataset because we can't find the packages google-cloud-bigquery-storage and google-cloud-bigquery"}
{"asctime": "2023-04-16 10:37:03,707", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-16 10:37:03,709", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
E0416 10:37:04.581003542 260714 <http://ssl_transport_security.cc:1420]|ssl_transport_security.cc:1420]> Handshake failed with fatal error SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER.
E0416 10:37:04.878786218 260714 <http://ssl_transport_security.cc:1420]|ssl_transport_security.cc:1420]> Handshake failed with fatal error SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER.
Failed with Exception: Reason: SYSTEM:Unknown
RPC Failed, with Status: StatusCode.UNAVAILABLE
details: failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:8089: Ssl handshake failed: SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
Debug string UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:8089: Ssl handshake failed: SSL_ERROR_SSL: error:100000f7:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER {created_time:"2023-04-16T10:37:04.880480722+09:00", grpc_status:14}
jeev
insecure
?Blair Anson
04/16/2023, 1:45 AMfalse
$ cat ~/.flyte/config.yaml
admin:
# For GRPC endpoints you might want to use dns:///flyte.myexample.com
endpoint: dns:///localhost:8089
authType: Pkce
insecure: false
logger:
show-source: true
level: 0
jeev
insecure: true
Blair Anson
04/16/2023, 2:03 AM$ FLYTE_SDK_LOGGING_LEVEL=20 pyflyte run --remote example.py training_workflow --hyperparameters '{"C": 0.1}'
{"asctime": "2023-04-16 10:53:40,037", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-16 10:53:40,058", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-16 10:53:40,060", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-16 10:53:41,334", "name": "flytekit", "levelname": "INFO", "message": "We won't register PyTorchCheckpointTransformer, PyTorchTensorTransformer, and PyTorchModuleTransformer because torch is not installed."}
{"asctime": "2023-04-16 10:53:41,924", "name": "flytekit", "levelname": "INFO", "message": "We won't register TensorFlowRecordFileTransformer, TensorFlowRecordsDirTransformer and TensorFlowModelTransformerbecause tensorflow is not installed."}
{"asctime": "2023-04-16 10:53:41,971", "name": "flytekit", "levelname": "INFO", "message": "We won't register bigquery handler for structured dataset because we can't find the packages google-cloud-bigquery-storage and google-cloud-bigquery"}
{"asctime": "2023-04-16 10:53:42,300", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-16 10:53:42,302", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
Failed with Exception: Reason: USER:ValueError
Value error! Received: 403. Request to send data <https://meta-bucket.s3.us-west-2.amazonaws.com/flytesnacks/development/4FSJB7UHCV36ICGGUNFBJCEDZU%3D%3D%3D%3D%3D%3D/script_mode.tar.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=xxxxx(redacted)xxxx> failed
Hmm eks-starter.yaml does have role with S3 permissions for that meta bucket
serviceAccount:
create: true
annotations:
<http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: "arn:aws:iam::xxx(redacted)xxx:role/flyte-role"
Annotations: <http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: arn:aws:iam::xxx(redacted)xxx:role/flyte-role
but it is not there, so I have to assume the service account creation is not adding the IAM role
$ kubectl describe serviceaccount default -n flyte
Name: default
Namespace: flyte
Labels: <none>
Annotations: <none>
Image pull secrets: <none>
Mountable secrets: default-token-t29lf
Tokens: default-token-t29lf
Events: <none>
$ kubectl describe serviceaccount flyte-backend-flyte-binary -n flyte
Name: flyte-backend-flyte-binary
Namespace: flyte
Labels: <http://app.kubernetes.io/instance=flyte-backend|app.kubernetes.io/instance=flyte-backend>
<http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
<http://app.kubernetes.io/name=flyte-binary|app.kubernetes.io/name=flyte-binary>
<http://app.kubernetes.io/version=1.16.0|app.kubernetes.io/version=1.16.0>
<http://helm.sh/chart=flyte-binary-v1.5.0|helm.sh/chart=flyte-binary-v1.5.0>
Annotations: <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte-backend
<http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
Image pull secrets: <none>
Mountable secrets: flyte-backend-flyte-binary-token-4pxfl
Tokens: flyte-backend-flyte-binary-token-4pxfl
Events: <none>
jeev
Blair Anson
04/16/2023, 4:21 AM$ kubectl describe serviceaccount flyte-backend-flyte-binary -n flyte
Name: flyte-backend-flyte-binary
Namespace: flyte
Labels: <http://app.kubernetes.io/instance=flyte-backend|app.kubernetes.io/instance=flyte-backend>
<http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
<http://app.kubernetes.io/name=flyte-binary|app.kubernetes.io/name=flyte-binary>
<http://app.kubernetes.io/version=1.16.0|app.kubernetes.io/version=1.16.0>
<http://helm.sh/chart=flyte-binary-v1.5.0|helm.sh/chart=flyte-binary-v1.5.0>
Annotations: <http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: arn:aws:iam::xxxx:role/poc-eks-flyte3_iamserviceaccount_role
<http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte-backend
<http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
Image pull secrets: <none>
Mountable secrets: flyte-backend-flyte-binary-token-4pxfl
Tokens: flyte-backend-flyte-binary-token-4pxfl
Events: <none>
I tried pyflyte again but still get an S3 error
$ FLYTE_SDK_LOGGING_LEVEL=20 pyflyte run --remote example.py training_workflow --hyperparameters '{"C": 0.1}'
{"asctime": "2023-04-16 13:04:29,645", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-16 13:04:29,663", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-16 13:04:29,666", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-16 13:04:30,941", "name": "flytekit", "levelname": "INFO", "message": "We won't register PyTorchCheckpointTransformer, PyTorchTensorTransformer, and PyTorchModuleTransformer because torch is not installed."}
{"asctime": "2023-04-16 13:04:31,587", "name": "flytekit", "levelname": "INFO", "message": "We won't register TensorFlowRecordFileTransformer, TensorFlowRecordsDirTransformer and TensorFlowModelTransformerbecause tensorflow is not installed."}
{"asctime": "2023-04-16 13:04:31,631", "name": "flytekit", "levelname": "INFO", "message": "We won't register bigquery handler for structured dataset because we can't find the packages google-cloud-bigquery-storage and google-cloud-bigquery"}
{"asctime": "2023-04-16 13:04:31,994", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
{"asctime": "2023-04-16 13:04:31,996", "name": "flytekit", "levelname": "INFO", "message": "Using flytectl/YAML config /home/blair/.flyte/config.yaml"}
Failed with Exception: Reason: USER:ValueError
Value error! Received: 403. Request to send data <https://meta-bucket.s3.us-west-2.amazonaws.com/flytesnacks/development/4FSJB7UHCV36ICGGUNFBJCEDZU%3D%3D%3D%3D%3D%3D/script_mode.tar.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=xxxxx(redacted)xxxx> failed
The iamservice account looks like this...
$ kubectl describe serviceaccount flyte-backend-flyte-binary -n flyte
Name: flyte-backend-flyte-binary
Namespace: flyte
Labels: <http://app.kubernetes.io/instance=flyte-backend|app.kubernetes.io/instance=flyte-backend>
<http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
<http://app.kubernetes.io/name=flyte-binary|app.kubernetes.io/name=flyte-binary>
<http://app.kubernetes.io/version=1.16.0|app.kubernetes.io/version=1.16.0>
<http://helm.sh/chart=flyte-binary-v1.5.0|helm.sh/chart=flyte-binary-v1.5.0>
Annotations: <http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: arn:aws:iam::xxxx:role/poc-eks-flyte3_iamserviceaccount_role
<http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte-backend
<http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
Image pull secrets: <none>
Mountable secrets: flyte-backend-flyte-binary-token-4pxfl
Tokens: flyte-backend-flyte-binary-token-4pxfl
Events: <none>
The trust relationship for the role looks like this...
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::xxxx:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/XXXXXXXXX"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"<http://oidc.eks.us-west-2.amazonaws.com/id/XXXXXXXXX:sub|oidc.eks.us-west-2.amazonaws.com/id/XXXXXXXXX:sub>": "system:serviceaccount:flyte:flyte-backend-flyte-binary",
"<http://oidc.eks.us-west-2.amazonaws.com/id/XXXXXXXXX:aud|oidc.eks.us-west-2.amazonaws.com/id/XXXXXXXXX:aud>": "<http://sts.amazonaws.com|sts.amazonaws.com>"
}
}
}
]
}
jeev
poc-eks-flyte3_iamserviceaccount_role
iam role have permissions on the meta-bucket
S3 bucket?Blair Anson
04/16/2023, 4:46 AM{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:ListStorageLensConfigurations",
"s3:ListAccessPointsForObjectLambda",
"s3:GetAccessPoint",
"s3:PutAccountPublicAccessBlock",
"s3:GetAccountPublicAccessBlock",
"s3:ListAllMyBuckets",
"s3:ListAccessPoints",
"s3:PutAccessPointPublicAccessBlock",
"s3:ListJobs",
"s3:PutStorageLensConfiguration",
"s3:ListMultiRegionAccessPoints",
"s3:CreateJob"
],
"Resource": "*"
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::meta-bucket",
"arn:aws:s3:::user-bucket"
]
}
...
jeev
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::meta-bucket",
"arn:aws:s3:::meta-bucket/*",
"arn:aws:s3:::user-bucket",
"arn:aws:s3:::user-bucket/*"
]
}
Blair Anson
04/16/2023, 5:10 AMPod failed. No message received from kubernetes.
[feccea785a4c046ee848-n0-0] terminated with exit code (137). Reason [OOMKilled]. Message:
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-hk2xuhf_ because the default path (/home/flytekit/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
@task(limits=Resources(mem="256Mi")
Next error in the workflow appears to be an S3 permission error
[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[f9ecd5cb543bf4a68b9d-n0-0] terminated with exit code (1). Reason [Error]. Message:
on3.8/asyncio/tasks.py", line 455, in wait_for
return await fut
File "/usr/local/lib/python3.8/site-packages/s3fs/core.py", line 1171, in _get_file
body, content_length = await _open_file(range=0)
File "/usr/local/lib/python3.8/site-packages/s3fs/core.py", line 1162, in _open_file
resp = await self._call_s3(
File "/usr/local/lib/python3.8/site-packages/s3fs/core.py", line 347, in _call_s3
return await _error_wrapper(
File "/usr/local/lib/python3.8/site-packages/s3fs/core.py", line 139, in _error_wrapper
raise err
PermissionError: Access Denied
"system:serviceaccount:flyte:flyte-backend-flyte-binary"
?
$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
flyte flyte-backend-flyte-binary-74b4dbb9ff-ktmns 1/1 Running 0 11m
flytesnacks-development f9ecd5cb543bf4a68b9d-n0-0 0/1 Error 0 5m29s
flytesnacks-development
, but the Access Denied error still occurs
eksctl create iamserviceaccount \
--name flytesnacks-development-role-sa \
--namespace flytesnacks-development \
--cluster poc-eks-flyte3 \
--attach-policy-arn arn:aws:iam::xxxx:policy/flyte-policy \
--approve \
--role-name poc-eks-flyte3_flytesnacks-development_iamserviceaccount_role
jeev
Blair Anson
04/16/2023, 6:58 AM$ kubectl describe sa flytesnacks-development-role-sa -n flytesnacks-development
Name: flytesnacks-development-role-sa
Namespace: flytesnacks-development
Labels: <http://app.kubernetes.io/managed-by=eksctl|app.kubernetes.io/managed-by=eksctl>
Annotations: <http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: arn:aws:iam::xxxx:role/poc-eks-flyte3_flytesnacks-development_iamserviceaccount_role
Image pull secrets: <none>
Mountable secrets: flytesnacks-development-role-sa-token-s4ltw
Tokens: flytesnacks-development-role-sa-token-s4ltw
Events: <none>
Here is the description of the pods with the error
kubectl describe pod fd049603a9d0f4a58a91-n0-0 -n flytesnacks-development
Name: fd049603a9d0f4a58a91-n0-0
Namespace: flytesnacks-development
Priority: 0
Node: ip-10-1-1-49.us-west-2.compute.internal/10.1.1.49
Start Time: Sun, 16 Apr 2023 15:24:47 +0900
Labels: domain=development
execution-id=fd049603a9d0f4a58a91
interruptible=false
node-id=n0
project=flytesnacks
shard-key=6
task-name=example-get-data
workflow-name=example-training-workflow
Annotations: <http://cluster-autoscaler.kubernetes.io/safe-to-evict|cluster-autoscaler.kubernetes.io/safe-to-evict>: false
<http://kubernetes.io/psp|kubernetes.io/psp>: eks.privileged
....
flytekitplugins.papermill
in the example here https://docs.flyte.org/projects/cookbook/en/latest/auto/case_studies/feature_engineering/eda/notebook.html
It works locally, but when run remote it fails with ModuleNotFoundError: No module named 'flytekitplugins.papermill'
I don't see any instructions for installing the papermill plugin to the cluster in the link below... so how is that module meant to be installed?
https://docs.flyte.org/en/latest/deployment/plugins/k8s/index.htmljeev
Blair Anson
04/16/2023, 2:09 PMjeev
Blair Anson
04/16/2023, 2:48 PM$ pip freeze | grep flyte
flyteidl==1.3.17
flytekit==1.4.1
flytekitplugins-papermill==1.5.0
jeev
Blair Anson
04/16/2023, 3:17 PMpyflyte *--image xxxx run xxxx*
I also see @task(container_image="xxxx")
, but I don't see the same option for NotebookTask
. Is is possible to pass in the image to the NotebookTask
?
Also is there a way to change the default task image, so I don't have to override the image being used?jeev
Blair Anson
04/16/2023, 3:25 PMFROM <http://ghcr.io/flyteorg/flytekit:py3.8-latest|ghcr.io/flyteorg/flytekit:py3.8-latest>
USER root
RUN pip install -U flytekitplugins-papermill
USER flytekit
[1/1] currentAttempt done. Last Error: USER::Pod failed. No message received from kubernetes.
[f1cedfad3740243a8801-n0-0] terminated with exit code (1). Reason [Error]. Message:
r(InstanceTrackingMeta, cls).__call__(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/flytekitplugins/papermill/task.py", line 140, in __init__
raise ValueError(f"Illegal notebook path passed in {self._notebook_path}")
ValueError: Illegal notebook path passed in /root/supermarket_regression.ipynb
jeev
Blair Anson
04/16/2023, 3:51 PM<s3://meta-bucket/flytesnacks/development/BXAKRFPKUSL47MC36AP7GTZZDA======/scriptmode.tar.gz>
Inside it is only the workflow.py
filejeev
Blair Anson
04/16/2023, 3:53 PMjeev
Blair Anson
04/16/2023, 4:23 PMpyflyte register -d development -p flytesnacks -i <http://xxxxxx.dkr.ecr.us-west-2.amazonaws.com/flyteorg/flytekit:py3.8-latest|xxxxxx.dkr.ecr.us-west-2.amazonaws.com/flyteorg/flytekit:py3.8-latest> ./
How do I run using that tar ball, I don't see an option in pyflyte run
?jeev
Ketan (kumare3)
Blair Anson
04/16/2023, 4:35 PMpyflyte run now supports multiple fileswhat is the syntax for multiple files? I tried this but get an error
$ pyflyte run --remote --image <http://xxxx.dkr.ecr.us-west-2.amazonaws.com/flyteorg/flytekit:py3.8-latest|xxxx.dkr.ecr.us-west-2.amazonaws.com/flyteorg/flytekit:py3.8-latest> workflow1.py notebook_wf --n_estimators 100 supermarket_regression.ipynb
Ketan (kumare3)
Blair Anson
04/16/2023, 4:56 PMKetan (kumare3)
Blair Anson
04/16/2023, 5:26 PMpyflyte register
but could not see how to then run the registered file. Do you have an example command you could share?
https://flyte-org.slack.com/archives/CP2HDHKE1/p1681662219041209?thread_ts=1681545516.802469&cid=CP2HDHKE1Ketan (kumare3)
Blair Anson
04/17/2023, 3:11 AMNotebookTask(render_deck=True, ....
I was expecting this...
but got this...Ketan (kumare3)
Blair Anson
04/17/2023, 5:29 AMKetan (kumare3)
Samhita Alla
Blair Anson
04/17/2023, 7:31 AMSamhita Alla
disable_deck=False
to your @task
decorator? Forgot to mention that you need to enable it.
@task(disable_deck=False)
def t1() -> str:
...
Blair Anson
04/17/2023, 12:33 PMdisable_deck=False
it now displays the rendered notebook. Thank you!
https://docs.flyte.org/projects/cookbook/en/latest/auto/integrations/flytekit_plugins/papermilltasks/simple.html
nb = NotebookTask(
name="pipeline-nb",
notebook_path=os.path.join(
pathlib.Path(__file__).parent.absolute(), "supermarket_regression.ipynb"
),
inputs=kwtypes(
n_estimators=int,
max_depth=int,
max_features=str,
min_samples_split=int,
random_state=int,
),
outputs=kwtypes(mae_score=float),
requests=Resources(cpu="2", mem="1Gi"),
render_deck=True
)
Ketan (kumare3)
Blair Anson
04/22/2023, 12:33 PMNotebookTask
?
I have been using a PodTemplate
to set the default docker image for a normal @task()
, as per the link below. However the NotebookTask
ignores the image
setting in the PodTemplate
, although it does apply other settings such as VolumeMount
. How do I change the default docker image for a NotebookTask
without using pyflyte *--image xxxx run xxxx*
?
https://docs.flyte.org/en/latest/deployment/configuration/general.html#using-default-k8s-podtemplates