narrow-helicopter-35399
01/12/2024, 8:44 AM+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.16.8.240 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner local:///usr/local/bin/entrypoint.py pyflyte-execute --inputs <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-axcw4r59m9s9t8bjswq7/n0/data/inputs.pb> --output-prefix <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-axcw4r59m9s9t8bjswq7/n0/data/0> --raw-output-data-prefix <s3://my-s3-bucket/data/h0/axcw4r59m9s9t8bjswq7-n0-0> --checkpoint-path <s3://my-s3-bucket/data/h0/axcw4r59m9s9t8bjswq7-n0-0/_flytecheckpoints> --prev-checkpoint '""' --resolver flytekit.core.python_auto_container.default_task_resolver -- task-module workflows.pyspark_example task-name hello_spark
bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
24/01/12 08:14:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Getting <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-axcw4r59m9s9t8bjswq7/n0/data/inputs.pb> to /tmp/flyte-wk4v2nip/sandbox/local_flytekit/inputs.pb
{"asctime": "2024-01-12 08:14:20,715", "name": "flytekit.entrypoint", "levelname": "ERROR", "message": "Exception when executing task workflows.pyspark_example.hello_spark , reason Failed to get data from <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-axcw4r59m9s9t8bjswq7/n0/data/inputs.pb> to /tmp/flyte-wk4v2nip/sandbox/local_flytekit/inputs.pb ( recursive=False).\n\nOriginal exception: Unable to locate credentials"}
{"asctime": "2024-01-12 08:14:20,715", "name": "flytekit.entrypoint", "levelname": "ERROR", "message": "!! Begin Unknown System Error Captured by Flyte !!"}
Unable to locate credentials
We deploy flyte-sandbox through Helm. Are we setting the wrong Minio parameters in the values.yaml file?
values.yaml
docker-registry:
enabled: false
image:
registry: <http://harbor.linecorp.com/ecacda|harbor.linecorp.com/ecacda>
repository: <http://cr.flyte.org/flyteorg/registry|cr.flyte.org/flyteorg/registry>
tag: 2.8.1
pullPolicy: Always
persistence:
enabled: false
service:
type: NodePort
nodePort: 30000
flyte-binary:
nameOverride: flyte-sandbox
enabled: true
configuration:
database:
host: '{{ printf "%s-postgresql" .Release.Name | trunc 63 | trimSuffix "-" }}'
password: postgres
storage:
metadataContainer: my-s3-bucket
userDataContainer: my-s3-bucket
provider: s3
providerConfig:
s3:
disableSSL: true
v2Signing: true
endpoint: http://{{ printf "%s-minio" .Release.Name | trunc 63 | trimSuffix "-" }}.{{ .Release.Namespace }}:9000
authType: accesskey
accessKey: minio
secretKey: miniostorage
logging:
level: 6
plugins:
kubernetes:
enabled: true
templateUri: |-
<http://localhost:30080/kubernetes-dashboard/#/log/{{.namespace> }}/{{ .podName }}/pod?namespace={{ .namespace }}
inline:
storage:
signedURL:
stowConfigOverride:
endpoint: <http://10.227.231.9:30003>
plugins:
k8s:
default-env-vars:
- FLYTE_AWS_ENDPOINT: http://{{ printf "%s-minio" .Release.Name | trunc 63 | trimSuffix "-" }}.{{ .Release.Namespace }}:9000
- FLYTE_AWS_ACCESS_KEY_ID: minio
- FLYTE_AWS_SECRET_ACCESS_KEY: miniostorage
cluster_resources:
refreshInterval: 5m
customData:
- production:
- projectQuotaCpu:
value: "5"
- projectQuotaMemory:
value: "4000Mi"
- staging:
- projectQuotaCpu:
value: "2"
- projectQuotaMemory:
value: "3000Mi"
- development:
- projectQuotaCpu:
value: "4"
- projectQuotaMemory:
value: "5000Mi"
refresh: 5m
inlineConfigMap: '{{ include "flyte-sandbox.configuration.inlineConfigMap" . }}'
clusterResourceTemplates:
inlineConfigMap: '{{ include "flyte-sandbox.clusterResourceTemplates.inlineConfigMap" . }}'
deployment:
image:
repository: <http://harbor.linecorp.com/ecacda/flyteorg/flyte-binary|harbor.linecorp.com/ecacda/flyteorg/flyte-binary>
tag: sha-7b3-a90
pullPolicy: Always
waitForDB:
image:
repository: <http://harbor.linecorp.com/ecacda/cr.flyte.org/flyteorg/bitnami/postgresql|harbor.linecorp.com/ecacda/cr.flyte.org/flyteorg/bitnami/postgresql>
tag: 15.1.0-debian-11-r20
pullPolicy: Always
rbac:
# This is strictly NOT RECOMMENDED in production clusters, and is only for use
# within local Flyte sandboxes.
# When using cluster resource templates to create additional namespaced roles,
# Flyte is required to have a superset of those permissions. To simplify
# experimenting with new backend plugins that require additional roles be created
# with cluster resource templates (e.g. Spark), we add the following:
extraRules:
- apiGroups:
- '*'
resources:
- '*'
verbs:
- '*'
kubernetes-dashboard:
enabled: false
image:
tag: sandbox
pullPolicy: Never
extraArgs:
- --enable-insecure-login
- --enable-skip-login
protocolHttp: true
service:
externalPort: 80
rbac:
create: true
clusterRoleMetrics: false
clusterReadOnlyRole: true
minio:
enabled: true
image:
registry: <http://harbor.linecorp.com/ecacda|harbor.linecorp.com/ecacda>
repository: <http://cr.flyte.org/flyteorg/bitnami/minio|cr.flyte.org/flyteorg/bitnami/minio>
tag: 2023.1.25-debian-11-r0
pullPolicy: Always
auth:
rootUser: minio
rootPassword: miniostorage
defaultBuckets: my-s3-bucket
extraEnvVars:
- name: MINIO_BROWSER_REDIRECT_URL
value: <http://localhost:30080/minio>
service:
type: NodePort
nodePorts:
api: 30003
persistence:
enabled: true
existingClaim: '{{ include "flyte-sandbox.persistence.minioVolumeName" . }}'
volumePermissions:
enabled: true
image:
registry: <http://harbor.linecorp.com/ecacda|harbor.linecorp.com/ecacda>
repository: <http://cr.flyte.org/flyteorg/bitnami/bitnami-shell|cr.flyte.org/flyteorg/bitnami/bitnami-shell>
tag: 11-debian-11-r76
pullPolicy: Always
postgresql:
enabled: true
image:
registry: <http://harbor.linecorp.com/ecacda|harbor.linecorp.com/ecacda>
repository: <http://cr.flyte.org/flyteorg/bitnami/postgresql|cr.flyte.org/flyteorg/bitnami/postgresql>
tag: 15.1.0-debian-11-r20
pullPolicy: Always
auth:
postgresPassword: postgres
shmVolume:
enabled: false
primary:
service:
type: NodePort
nodePorts:
postgresql: 30001
persistence:
enabled: true
existingClaim: '{{ include "flyte-sandbox.persistence.dbVolumeName" . }}'
volumePermissions:
enabled: true
image:
registry: <http://harbor.linecorp.com/ecacda|harbor.linecorp.com/ecacda>
repository: <http://cr.flyte.org/flyteorg/bitnami/bitnami-shell|cr.flyte.org/flyteorg/bitnami/bitnami-shell>
tag: 11-debian-11-r76
pullPolicy: Always
sandbox:
# dev Routes requests to an instance of Flyte running locally on a developer's
# development environment. This is only usable if the flyte-binary chart is disabled.
dev: true
buildkit:
enabled: true
image:
repository: moby/buildkit
tag: buildx-stable-1
pullPolicy: Always
proxy:
enabled: true
image:
repository: envoyproxy/envoy
tag: v1.23-latest
pullPolicy: Always
narrow-helicopter-35399
01/12/2024, 8:51 AMnarrow-helicopter-35399
01/12/2024, 8:54 AMapiVersion: v1
data:
000-core.yaml: |
admin:
endpoint: localhost:8089
insecure: true
catalog-cache:
endpoint: localhost:8081
insecure: true
type: datacatalog
cluster_resources:
standaloneDeployment: false
templatePath: /etc/flyte/cluster-resource-templates
logger:
show-source: true
level: 6
propeller:
create-flyteworkflow-crd: true
webhook:
certDir: /var/run/flyte/certs
localCert: true
secretName: flyte-flyte-sandbox-webhook-secret
serviceName: flyte-flyte-sandbox-webhook
servicePort: 443
flyte:
admin:
disableClusterResourceManager: false
disableScheduler: false
disabled: false
seedProjects:
- flytesnacks
dataCatalog:
disabled: false
propeller:
disableWebhook: false
disabled: false
001-plugins.yaml: |
tasks:
task-plugins:
default-for-task-types:
container: container
container_array: k8s-array
sidecar: sidecar
spark: spark
enabled-plugins:
- container
- sidecar
- k8s-array
- agent-service
- spark
plugins:
logs:
kubernetes-enabled: true
kubernetes-template-uri: <http://localhost:30080/kubernetes-dashboard/#/log/{{.namespace> }}/{{ .podName }}/pod?namespace={{ .namespace }}
cloudwatch-enabled: false
stackdriver-enabled: false
k8s:
co-pilot:
image: "<http://cr.flyte.org/flyteorg/flytecopilot-release:v1.10.7-b0|cr.flyte.org/flyteorg/flytecopilot-release:v1.10.7-b0>"
k8s-array:
logs:
config:
kubernetes-enabled: true
kubernetes-template-uri: <http://localhost:30080/kubernetes-dashboard/#/log/{{.namespace> }}/{{ .podName }}/pod?namespace={{ .namespace }}
cloudwatch-enabled: false
stackdriver-enabled: false
002-database.yaml: |
database:
postgres:
username: postgres
host: flyte-postgresql
port: 5432
dbname: flyte
options: "sslmode=disable"
003-storage.yaml: |
propeller:
rawoutput-prefix: <s3://my-s3-bucket/data>
storage:
type: stow
stow:
kind: s3
config:
region: us-east-1
disable_ssl: true
v2_signing: true
endpoint: <http://flyte-minio.flyte:9000>
auth_type: accesskey
container: my-s3-bucket
100-inline-config.yaml: |
plugins:
cluster_resources:
customData:
- production:
- projectQuotaCpu:
value: "5"
- projectQuotaMemory:
value: 4000Mi
- staging:
- projectQuotaCpu:
value: "2"
- projectQuotaMemory:
value: 3000Mi
- development:
- projectQuotaCpu:
value: "4"
- projectQuotaMemory:
value: 5000Mi
refresh: 5m
refreshInterval: 5m
k8s:
default-env-vars:
- FLYTE_AWS_ENDPOINT: <http://flyte-minio.flyte:9000>
- FLYTE_AWS_ACCESS_KEY_ID: minio
- FLYTE_AWS_SECRET_ACCESS_KEY: miniostorage
storage:
signedURL:
stowConfigOverride:
endpoint: <http://10.227.231.9:30003>
task_resources:
defaults:
cpu: 500m
ephemeralStorage: 0
gpu: 0
memory: 1Gi
limits:
cpu: 0
ephemeralStorage: 0
gpu: 0
memory: 0
kind: ConfigMap
metadata:
annotations:
<http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte
<http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
creationTimestamp: "2024-01-10T01:58:35Z"
labels:
<http://app.kubernetes.io/instance|app.kubernetes.io/instance>: flyte
<http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: Helm
<http://app.kubernetes.io/name|app.kubernetes.io/name>: flyte-sandbox
<http://app.kubernetes.io/version|app.kubernetes.io/version>: 1.16.0
<http://helm.sh/chart|helm.sh/chart>: flyte-binary-v1.10.7-b0
k8slens-edit-resource-version: v1
name: flyte-flyte-sandbox-config
namespace: flyte
resourceVersion: "772808677"
uid: cbb1f7eb-f857-44ff-9ec9-793012fad166
tall-lock-23197
narrow-helicopter-35399
01/12/2024, 9:13 AMnarrow-helicopter-35399
01/12/2024, 9:16 AM...
...
...
flyte-binary:
nameOverride: flyte-sandbox
enabled: true
configuration:
database:
host: '{{ printf "%s-postgresql" .Release.Name | trunc 63 | trimSuffix "-" }}'
password: postgres
storage:
metadataContainer: my-s3-bucket
userDataContainer: my-s3-bucket
provider: s3
providerConfig:
s3:
disableSSL: true
v2Signing: true
endpoint: http://{{ printf "%s-minio" .Release.Name | trunc 63 | trimSuffix "-" }}.{{ .Release.Namespace }}:9000
authType: accesskey
accessKey: minio
secretKey: miniostorage
logging:
level: 6
plugins:
kubernetes:
enabled: true
templateUri: |-
<http://localhost:30080/kubernetes-dashboard/#/log/{{.namespace> }}/{{ .podName }}/pod?namespace={{ .namespace }}
inline:
storage:
signedURL:
stowConfigOverride:
endpoint: <http://10.227.231.9:30003>
plugins:
k8s:
default-env-vars:
- FLYTE_AWS_ENDPOINT: http://{{ printf "%s-minio" .Release.Name | trunc 63 | trimSuffix "-" }}.{{ .Release.Namespace }}:9000
- FLYTE_AWS_ACCESS_KEY_ID: minio
- FLYTE_AWS_SECRET_ACCESS_KEY: miniostorage
spark:
# Edit the Spark configuration as you see fit
spark-config-default:
- spark.driver.cores: "1"
- spark.hadoop.fs.s3a.aws.credentials.provider: "com.amazonaws.auth.DefaultAWSCredentialsProviderChain"
- spark.kubernetes.allocation.batch.size: "50"
- spark.hadoop.fs.s3a.acl.default: "BucketOwnerFullControl"
- spark.hadoop.fs.s3n.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
- spark.hadoop.fs.AbstractFileSystem.s3n.impl: "org.apache.hadoop.fs.s3a.S3A"
- spark.hadoop.fs.s3.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
- spark.hadoop.fs.AbstractFileSystem.s3.impl: "org.apache.hadoop.fs.s3a.S3A"
- spark.hadoop.fs.s3a.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
- spark.hadoop.fs.AbstractFileSystem.s3a.impl: "org.apache.hadoop.fs.s3a.S3A"
- spark.network.timeout: 600s
- spark.executorEnv.KUBERNETES_REQUEST_TIMEOUT: 100000
- spark.executor.heartbeatInterval: 60s
cluster_resources:
refreshInterval: 5m
...
...
and execute cmd: helm upgrade flyte flyteorg/flyte-sandbox -n flyte -f values.yaml
narrow-helicopter-35399
01/12/2024, 9:20 AMapiVersion: v1
data:
000-core.yaml: |
admin:
endpoint: localhost:8089
insecure: true
catalog-cache:
endpoint: localhost:8081
insecure: true
type: datacatalog
cluster_resources:
standaloneDeployment: false
templatePath: /etc/flyte/cluster-resource-templates
logger:
show-source: true
level: 6
propeller:
create-flyteworkflow-crd: true
webhook:
certDir: /var/run/flyte/certs
localCert: true
secretName: flyte-flyte-sandbox-webhook-secret
serviceName: flyte-flyte-sandbox-webhook
servicePort: 443
flyte:
admin:
disableClusterResourceManager: false
disableScheduler: false
disabled: false
seedProjects:
- flytesnacks
dataCatalog:
disabled: false
propeller:
disableWebhook: false
disabled: false
001-plugins.yaml: |
tasks:
task-plugins:
default-for-task-types:
container: container
container_array: k8s-array
sidecar: sidecar
spark: spark
enabled-plugins:
- container
- sidecar
- k8s-array
- agent-service
- spark
plugins:
logs:
kubernetes-enabled: true
kubernetes-template-uri: <http://localhost:30080/kubernetes-dashboard/#/log/{{.namespace> }}/{{ .podName }}/pod?namespace={{ .namespace }}
cloudwatch-enabled: false
stackdriver-enabled: false
k8s:
co-pilot:
image: "<http://cr.flyte.org/flyteorg/flytecopilot-release:v1.10.7-b0|cr.flyte.org/flyteorg/flytecopilot-release:v1.10.7-b0>"
k8s-array:
logs:
config:
kubernetes-enabled: true
kubernetes-template-uri: <http://localhost:30080/kubernetes-dashboard/#/log/{{.namespace> }}/{{ .podName }}/pod?namespace={{ .namespace }}
cloudwatch-enabled: false
stackdriver-enabled: false
002-database.yaml: |
database:
postgres:
username: postgres
host: flyte-postgresql
port: 5432
dbname: flyte
options: "sslmode=disable"
003-storage.yaml: |
propeller:
rawoutput-prefix: <s3://my-s3-bucket/data>
storage:
type: stow
stow:
kind: s3
config:
region: us-east-1
disable_ssl: true
v2_signing: true
endpoint: <http://flyte-minio.flyte:9000>
auth_type: accesskey
container: my-s3-bucket
100-inline-config.yaml: |
plugins:
cluster_resources:
customData:
- production:
- projectQuotaCpu:
value: "5"
- projectQuotaMemory:
value: 4000Mi
- staging:
- projectQuotaCpu:
value: "2"
- projectQuotaMemory:
value: 3000Mi
- development:
- projectQuotaCpu:
value: "4"
- projectQuotaMemory:
value: 5000Mi
refresh: 5m
refreshInterval: 5m
k8s:
default-env-vars:
- FLYTE_AWS_ENDPOINT: <http://flyte-minio.flyte:9000>
- FLYTE_AWS_ACCESS_KEY_ID: minio
- FLYTE_AWS_SECRET_ACCESS_KEY: miniostorage
spark:
spark-config-default:
- spark.driver.cores: "1"
- spark.hadoop.fs.s3a.aws.credentials.provider: com.amazonaws.auth.DefaultAWSCredentialsProviderChain
- spark.kubernetes.allocation.batch.size: "50"
- spark.hadoop.fs.s3a.acl.default: BucketOwnerFullControl
- spark.hadoop.fs.s3n.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
- spark.hadoop.fs.AbstractFileSystem.s3n.impl: org.apache.hadoop.fs.s3a.S3A
- spark.hadoop.fs.s3.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
- spark.hadoop.fs.AbstractFileSystem.s3.impl: org.apache.hadoop.fs.s3a.S3A
- spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
- spark.hadoop.fs.AbstractFileSystem.s3a.impl: org.apache.hadoop.fs.s3a.S3A
- spark.network.timeout: 600s
- spark.executorEnv.KUBERNETES_REQUEST_TIMEOUT: 100000
- spark.executor.heartbeatInterval: 60s
storage:
signedURL:
stowConfigOverride:
endpoint: <http://10.227.231.9:30003>
task_resources:
defaults:
cpu: 500m
ephemeralStorage: 0
gpu: 0
memory: 1Gi
limits:
cpu: 0
ephemeralStorage: 0
gpu: 0
memory: 0
kind: ConfigMap
metadata:
annotations:
<http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte
<http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
creationTimestamp: "2024-01-10T01:58:35Z"
labels:
<http://app.kubernetes.io/instance|app.kubernetes.io/instance>: flyte
<http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: Helm
<http://app.kubernetes.io/name|app.kubernetes.io/name>: flyte-sandbox
<http://app.kubernetes.io/version|app.kubernetes.io/version>: 1.16.0
<http://helm.sh/chart|helm.sh/chart>: flyte-binary-v1.10.7-b0
k8slens-edit-resource-version: v1
name: flyte-flyte-sandbox-config
namespace: flyte
resourceVersion: "772868563"
uid: cbb1f7eb-f857-44ff-9ec9-793012fad166
narrow-helicopter-35399
01/12/2024, 9:24 AMGetting <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-abnfplffdnxcjbx69g7g/n0/data/inputs.pb> to /tmp/flyte-z3p38zvj/sandbox/local_flytekit/inputs.pb
{"asctime": "2024-01-12 09:21:37,115", "name": "flytekit.entrypoint", "levelname": "ERROR", "message": "Exception when executing task workflows.pyspark_example.hello_spark, reason Failed to get data from <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-abnfplffdnxcjbx69g7g/n0/data/inputs.pb> to /tmp/flyte-z3p38zvj/sandbox/local_flytekit/inputs.pb (recursive=False).\n\nOriginal exception: Unable to locate credentials"}
{"asctime": "2024-01-12 09:21:37,116", "name": "flytekit.entrypoint", "levelname": "ERROR", "message": "!! Begin Unknown System Error Captured by Flyte !!"}
{"asctime": "2024-01-12 09:21:37,116", "name": "flytekit.entrypoint", "levelname": "ERROR", "message": "Traceback (most recent call last):\n File \"/usr/local/lib/python3.9/dist-packages/flytekit/core/data_persistence.py\", line 473, in get_data\n self.get(remote_path, to_path=local_path, recursive=is_multipart, **kwargs)\n File \"/usr/local/lib/python3.9/dist-packages/flytekit/core/data_persistence.py\", line 251, in get\n dst = file_system.get(from_path, to_path, recursive=recursive, **kwargs)\n File \"/usr/local/lib/python3.9/dist-packages/fsspec/asyn.py\", line 118, in wrapper\n return sync(self.loop, func, *args, **kwargs)\n File \"/usr/local/lib/python3.9/dist-packages/fsspec/asyn.py\", line 103, in sync\n raise return_result\n File \"/usr/local/lib/python3.9/dist-packages/fsspec/asyn.py\", line 56, in _runner\n result[0] = await coro\n File \"/usr/local/lib/python3.9/dist-packages/fsspec/asyn.py\", line 609, in _get\n rpaths = [\n File \"/usr/local/lib/python3.9/dist-packages/fsspec/asyn.py\", line 610, in <listcomp>\n p for p in rpaths if not (trailing_sep(p) or await self._isdir(p))\n File \"/usr/local/lib/python3.9/dist-packages/s3fs/core.py\", line 1411, in _isdir\n return bool(await self._lsdir(path))\n File \"/usr/local/lib/python3.9/dist-packages/s3fs/core.py\", line 706, in _lsdir\n async for c in self._iterdir(\n File \"/usr/local/lib/python3.9/dist-packages/s3fs/core.py\", line 738, in _iterdir\n s3 = await self.get_s3(bucket)\n File \"/usr/local/lib/python3.9/dist-packages/s3fs/core.py\", line 336, in get_s3\n return await self._s3creator.get_bucket_client(bucket)\n File \"/usr/local/lib/python3.9/dist-packages/s3fs/utils.py\", line 39, in get_bucket_client\n response = await general_client.head_bucket(Bucket=bucket_name)\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/client.py\", line 366, in _make_api_call\n http, parsed_response = await self._make_request(\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/client.py\", line 391, in _make_request\n return await self._endpoint.make_request(\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/endpoint.py\", line 96, in _send_request\n request = await self.create_request(request_dict, operation_model)\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/endpoint.py\", line 84, in create_request\n await self._event_emitter.emit(\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/hooks.py\", line 66, in _emit\n response = await resolve_awaitable(handler(**kwargs))\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/_helpers.py\", line 15, in resolve_awaitable\n return await obj\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/signers.py\", line 24, in handler\n return await self.sign(operation_name, request)\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/signers.py\", line 82, in sign\n auth.add_auth(request)\n File \"/usr/local/lib/python3.9/dist-packages/botocore/auth.py\", line 418, in add_auth\n raise NoCredentialsError()\nbotocore.exceptions.NoCredentialsError: Unable to locate credentials\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/usr/local/bin/entrypoint.py\", line 88, in _dispatch_execute\n ctx.file_access.get_data(inputs_path, local_inputs_file)\n File \"/usr/local/lib/python3.9/dist-packages/flytekit/core/data_persistence.py\", line 475, in get_data\n raise FlyteAssertion(\nflytekit.exceptions.user.FlyteAssertion: Failed to get data from <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-abnfplffdnxcjbx69g7g/n0/data/inputs.pb> to /tmp/flyte-z3p38zvj/sandbox/local_flytekit/inputs.pb (recursive=False).\n\nOriginal exception: Unable to locate credentials\n"}
narrow-helicopter-35399
01/12/2024, 9:25 AMtall-lock-23197
narrow-helicopter-35399
01/12/2024, 9:32 AMnarrow-helicopter-35399
01/12/2024, 9:42 AMapiVersion: v1
data:
000-core.yaml: |
admin:
endpoint: localhost:8089
insecure: true
catalog-cache:
endpoint: localhost:8081
insecure: true
type: datacatalog
cluster_resources:
standaloneDeployment: false
templatePath: /etc/flyte/cluster-resource-templates
logger:
show-source: true
level: 6
propeller:
create-flyteworkflow-crd: true
webhook:
certDir: /var/run/flyte/certs
localCert: true
secretName: flyte-flyte-sandbox-webhook-secret
serviceName: flyte-flyte-sandbox-webhook
servicePort: 443
flyte:
admin:
disableClusterResourceManager: false
disableScheduler: false
disabled: false
seedProjects:
- flytesnacks
dataCatalog:
disabled: false
propeller:
disableWebhook: false
disabled: false
001-plugins.yaml: |
tasks:
task-plugins:
default-for-task-types:
container: container
container_array: k8s-array
sidecar: sidecar
spark: spark
enabled-plugins:
- container
- sidecar
- k8s-array
- agent-service
- spark
plugins:
logs:
kubernetes-enabled: true
kubernetes-template-uri: <http://localhost:30080/kubernetes-dashboard/#/log/{{.namespace> }}/{{ .podName }}/pod?namespace={{ .namespace }}
cloudwatch-enabled: false
stackdriver-enabled: false
k8s:
co-pilot:
image: "<http://cr.flyte.org/flyteorg/flytecopilot-release:v1.10.7-b0|cr.flyte.org/flyteorg/flytecopilot-release:v1.10.7-b0>"
k8s-array:
logs:
config:
kubernetes-enabled: true
kubernetes-template-uri: <http://localhost:30080/kubernetes-dashboard/#/log/{{.namespace> }}/{{ .podName }}/pod?namespace={{ .namespace }}
cloudwatch-enabled: false
stackdriver-enabled: false
002-database.yaml: |
database:
postgres:
username: postgres
host: flyte-postgresql
port: 5432
dbname: flyte
options: "sslmode=disable"
003-storage.yaml: |
propeller:
rawoutput-prefix: <s3://my-s3-bucket/data>
storage:
type: stow
stow:
kind: s3
config:
region: us-east-1
disable_ssl: true
v2_signing: true
endpoint: <http://flyte-minio.flyte:9000>
auth_type: accesskey
container: my-s3-bucket
100-inline-config.yaml: |
plugins:
cluster_resources:
customData:
- production:
- projectQuotaCpu:
value: "5"
- projectQuotaMemory:
value: 4000Mi
- staging:
- projectQuotaCpu:
value: "2"
- projectQuotaMemory:
value: 3000Mi
- development:
- projectQuotaCpu:
value: "4"
- projectQuotaMemory:
value: 5000Mi
refresh: 5m
refreshInterval: 5m
k8s:
default-env-vars:
- FLYTE_AWS_ENDPOINT: <http://flyte-minio.flyte:9000>
- FLYTE_AWS_ACCESS_KEY_ID: minio
- FLYTE_AWS_SECRET_ACCESS_KEY: miniostorage
spark:
spark-config-default:
- spark.driver.cores: "1"
- spark.hadoop.fs.s3a.aws.credentials.provider: org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
- spark.hadoop.fs.s3a.endpoint: <http://minio.flyte:9000>
- spark.hadoop.fs.s3a.access.key: minio
- spark.hadoop.fs.s3a.secret.key: miniostorage
- spark.hadoop.fs.s3a.path.style.access: "true"
- spark.kubernetes.allocation.batch.size: "50"
- spark.hadoop.fs.s3a.acl.default: BucketOwnerFullControl
- spark.hadoop.fs.s3n.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
- spark.hadoop.fs.AbstractFileSystem.s3n.impl: org.apache.hadoop.fs.s3a.S3A
- spark.hadoop.fs.s3.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
- spark.hadoop.fs.AbstractFileSystem.s3.impl: org.apache.hadoop.fs.s3a.S3A
- spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
- spark.hadoop.fs.AbstractFileSystem.s3a.impl: org.apache.hadoop.fs.s3a.S3A
storage:
signedURL:
stowConfigOverride:
endpoint: <http://10.227.231.9:30003>
task_resources:
defaults:
cpu: 500m
ephemeralStorage: 0
gpu: 0
memory: 1Gi
limits:
cpu: 0
ephemeralStorage: 0
gpu: 0
memory: 0
kind: ConfigMap
metadata:
annotations:
<http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte
<http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
creationTimestamp: "2024-01-10T01:58:35Z"
labels:
<http://app.kubernetes.io/instance|app.kubernetes.io/instance>: flyte
<http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: Helm
<http://app.kubernetes.io/name|app.kubernetes.io/name>: flyte-sandbox
<http://app.kubernetes.io/version|app.kubernetes.io/version>: 1.16.0
<http://helm.sh/chart|helm.sh/chart>: flyte-binary-v1.10.7-b0
k8slens-edit-resource-version: v1
name: flyte-flyte-sandbox-config
namespace: flyte
resourceVersion: "772878263"
uid: cbb1f7eb-f857-44ff-9ec9-793012fad166
narrow-helicopter-35399
01/12/2024, 9:45 AM+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.16.8.246 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner local:///usr/local/bin/entrypoint.py pyflyte-execute --inputs <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-awlzj45z59p8bhdplc74/n0/data/inputs.pb> --output-prefix <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-awlzj45z59p8bhdplc74/n0/data/0> --raw-output-data-prefix <s3://my-s3-bucket/data/0m/awlzj45z59p8bhdplc74-n0-0> --checkpoint-path <s3://my-s3-bucket/data/0m/awlzj45z59p8bhdplc74-n0-0/_flytecheckpoints> --prev-checkpoint '""' --resolver flytekit.core.python_auto_container.default_task_resolver -- task-module workflows.pyspark_example task-name hello_spark
bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
24/01/12 09:42:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Getting <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-awlzj45z59p8bhdplc74/n0/data/inputs.pb> to /tmp/flyte-hloonq75/sandbox/local_flytekit/inputs.pb
{"asctime": "2024-01-12 09:42:34,794", "name": "flytekit.entrypoint", "levelname": "ERROR", "message": "Exception when executing task workflows.pyspark_example.hello_spark, reason Failed to get data from <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-awlzj45z59p8bhdplc74/n0/data/inputs.pb> to /tmp/flyte-hloonq75/sandbox/local_flytekit/inputs.pb (recursive=False).\n\nOriginal exception: Unable to locate credentials"}
{"asctime": "2024-01-12 09:42:34,795", "name": "flytekit.entrypoint", "levelname": "ERROR", "message": "!! Begin Unknown System Error Captured by Flyte !!"}
{"asctime": "2024-01-12 09:42:34,795", "name": "flytekit.entrypoint", "levelname": "ERROR", "message": "Traceback (most recent call last):\n File \"/usr/local/lib/python3.9/dist-packages/flytekit/core/data_persistence.py\", line 473, in get_data\n self.get(remote_path, to_path=local_path, recursive=is_multipart, **kwargs)\n File \"/usr/local/lib/python3.9/dist-packages/flytekit/core/data_persistence.py\", line 251, in get\n dst = file_system.get(from_path, to_path, recursive=recursive, **kwargs)\n File \"/usr/local/lib/python3.9/dist-packages/fsspec/asyn.py\", line 118, in wrapper\n return sync(self.loop, func, *args, **kwargs)\n File \"/usr/local/lib/python3.9/dist-packages/fsspec/asyn.py\", line 103, in sync\n raise return_result\n File \"/usr/local/lib/python3.9/dist-packages/fsspec/asyn.py\", line 56, in _runner\n result[0] = await coro\n File \"/usr/local/lib/python3.9/dist-packages/fsspec/asyn.py\", line 609, in _get\n rpaths = [\n File \"/usr/local/lib/python3.9/dist-packages/fsspec/asyn.py\", line 610, in <listcomp>\n p for p in rpaths if not (trailing_sep(p) or await self._isdir(p))\n File \"/usr/local/lib/python3.9/dist-packages/s3fs/core.py\", line 1411, in _isdir\n return bool(await self._lsdir(path))\n File \"/usr/local/lib/python3.9/dist-packages/s3fs/core.py\", line 706, in _lsdir\n async for c in self._iterdir(\n File \"/usr/local/lib/python3.9/dist-packages/s3fs/core.py\", line 738, in _iterdir\n s3 = await self.get_s3(bucket)\n File \"/usr/local/lib/python3.9/dist-packages/s3fs/core.py\", line 336, in get_s3\n return await self._s3creator.get_bucket_client(bucket)\n File \"/usr/local/lib/python3.9/dist-packages/s3fs/utils.py\", line 39, in get_bucket_client\n response = await general_client.head_bucket(Bucket=bucket_name)\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/client.py\", line 366, in _make_api_call\n http, parsed_response = await self._make_request(\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/client.py\", line 391, in _make_request\n return await self._endpoint.make_request(\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/endpoint.py\", line 96, in _send_request\n request = await self.create_request(request_dict, operation_model)\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/endpoint.py\", line 84, in create_request\n await self._event_emitter.emit(\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/hooks.py\", line 66, in _emit\n response = await resolve_awaitable(handler(**kwargs))\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/_helpers.py\", line 15, in resolve_awaitable\n return await obj\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/signers.py\", line 24, in handler\n return await self.sign(operation_name, request)\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/signers.py\", line 82, in sign\n auth.add_auth(request)\n File \"/usr/local/lib/python3.9/dist-packages/botocore/auth.py\", line 418, in add_auth\n raise NoCredentialsError()\nbotocore.exceptions.NoCredentialsError: Unable to locate credentials\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/usr/local/bin/entrypoint.py\", line 88, in _dispatch_execute\n ctx.file_access.get_data(inputs_path, local_inputs_file)\n File \"/usr/local/lib/python3.9/dist-packages/flytekit/core/data_persistence.py\", line 475, in get_data\n raise FlyteAssertion(\nflytekit.exceptions.user.FlyteAssertion: Failed to get data from <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-awlzj45z59p8bhdplc74/n0/data/inputs.pb> to /tmp/flyte-hloonq75/sandbox/local_flytekit/inputs.pb (recursive=False).\n\nOriginal exception: Unable to locate credentials\n"}
tall-lock-23197
plugins.yaml
narrow-helicopter-35399
01/12/2024, 10:38 AMplugins.yaml
file?tall-lock-23197
001-plugins.yaml
?narrow-helicopter-35399
01/12/2024, 10:44 AMnarrow-helicopter-35399
01/12/2024, 10:44 AMapiVersion: v1
data:
000-core.yaml: |
admin:
endpoint: localhost:8089
insecure: true
catalog-cache:
endpoint: localhost:8081
insecure: true
type: datacatalog
cluster_resources:
standaloneDeployment: false
templatePath: /etc/flyte/cluster-resource-templates
logger:
show-source: true
level: 6
propeller:
create-flyteworkflow-crd: true
webhook:
certDir: /var/run/flyte/certs
localCert: true
secretName: flyte-flyte-sandbox-webhook-secret
serviceName: flyte-flyte-sandbox-webhook
servicePort: 443
flyte:
admin:
disableClusterResourceManager: false
disableScheduler: false
disabled: false
seedProjects:
- flytesnacks
dataCatalog:
disabled: false
propeller:
disableWebhook: false
disabled: false
001-plugins.yaml: |
tasks:
task-plugins:
default-for-task-types:
container: container
container_array: k8s-array
sidecar: sidecar
spark: spark
enabled-plugins:
- container
- sidecar
- k8s-array
- agent-service
- spark
plugins:
logs:
kubernetes-enabled: true
kubernetes-template-uri: <http://localhost:30080/kubernetes-dashboard/#/log/{{.namespace> }}/{{ .podName }}/pod?namespace={{ .namespace }}
cloudwatch-enabled: false
stackdriver-enabled: false
k8s:
co-pilot:
image: "<http://cr.flyte.org/flyteorg/flytecopilot-release:v1.10.7-b0|cr.flyte.org/flyteorg/flytecopilot-release:v1.10.7-b0>"
k8s-array:
logs:
config:
kubernetes-enabled: true
kubernetes-template-uri: <http://localhost:30080/kubernetes-dashboard/#/log/{{.namespace> }}/{{ .podName }}/pod?namespace={{ .namespace }}
cloudwatch-enabled: false
stackdriver-enabled: false
spark:
spark-config-default:
- spark.driver.cores: "1"
- spark.hadoop.fs.s3a.aws.credentials.provider: org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
- spark.hadoop.fs.s3a.endpoint: <http://minio.flyte:9000>
- spark.hadoop.fs.s3a.access.key: minio
- spark.hadoop.fs.s3a.secret.key: miniostorage
- spark.hadoop.fs.s3a.path.style.access: "true"
- spark.kubernetes.allocation.batch.size: "50"
- spark.hadoop.fs.s3a.acl.default: BucketOwnerFullControl
- spark.hadoop.fs.s3n.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
- spark.hadoop.fs.AbstractFileSystem.s3n.impl: org.apache.hadoop.fs.s3a.S3A
- spark.hadoop.fs.s3.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
- spark.hadoop.fs.AbstractFileSystem.s3.impl: org.apache.hadoop.fs.s3a.S3A
- spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
- spark.hadoop.fs.AbstractFileSystem.s3a.impl: org.apache.hadoop.fs.s3a.S3A
002-database.yaml: |
database:
postgres:
username: postgres
host: flyte-postgresql
port: 5432
dbname: flyte
options: "sslmode=disable"
003-storage.yaml: |
propeller:
rawoutput-prefix: <s3://my-s3-bucket/data>
storage:
type: stow
stow:
kind: s3
config:
region: us-east-1
disable_ssl: true
v2_signing: true
endpoint: <http://flyte-minio.flyte:9000>
auth_type: accesskey
container: my-s3-bucket
100-inline-config.yaml: |
plugins:
cluster_resources:
customData:
- production:
- projectQuotaCpu:
value: "5"
- projectQuotaMemory:
value: 4000Mi
- staging:
- projectQuotaCpu:
value: "2"
- projectQuotaMemory:
value: 3000Mi
- development:
- projectQuotaCpu:
value: "4"
- projectQuotaMemory:
value: 5000Mi
refresh: 5m
refreshInterval: 5m
k8s:
default-env-vars:
- FLYTE_AWS_ENDPOINT: <http://flyte-minio.flyte:9000>
- FLYTE_AWS_ACCESS_KEY_ID: minio
- FLYTE_AWS_SECRET_ACCESS_KEY: miniostorage
spark:
spark-config-default:
- spark.driver.cores: "1"
- spark.hadoop.fs.s3a.aws.credentials.provider: org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
- spark.hadoop.fs.s3a.endpoint: <http://minio.flyte:9000>
- spark.hadoop.fs.s3a.access.key: minio
- spark.hadoop.fs.s3a.secret.key: miniostorage
- spark.hadoop.fs.s3a.path.style.access: "true"
- spark.kubernetes.allocation.batch.size: "50"
- spark.hadoop.fs.s3a.acl.default: BucketOwnerFullControl
- spark.hadoop.fs.s3n.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
- spark.hadoop.fs.AbstractFileSystem.s3n.impl: org.apache.hadoop.fs.s3a.S3A
- spark.hadoop.fs.s3.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
- spark.hadoop.fs.AbstractFileSystem.s3.impl: org.apache.hadoop.fs.s3a.S3A
- spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
- spark.hadoop.fs.AbstractFileSystem.s3a.impl: org.apache.hadoop.fs.s3a.S3A
storage:
signedURL:
stowConfigOverride:
endpoint: <http://10.227.231.9:30003>
task_resources:
defaults:
cpu: 500m
ephemeralStorage: 0
gpu: 0
memory: 1Gi
limits:
cpu: 0
ephemeralStorage: 0
gpu: 0
memory: 0
kind: ConfigMap
metadata:
annotations:
<http://meta.helm.sh/release-name|meta.helm.sh/release-name>: flyte
<http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: flyte
creationTimestamp: "2024-01-10T01:58:35Z"
labels:
<http://app.kubernetes.io/instance|app.kubernetes.io/instance>: flyte
<http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: Helm
<http://app.kubernetes.io/name|app.kubernetes.io/name>: flyte-sandbox
<http://app.kubernetes.io/version|app.kubernetes.io/version>: 1.16.0
<http://helm.sh/chart|helm.sh/chart>: flyte-binary-v1.10.7-b0
k8slens-edit-resource-version: v1
name: flyte-flyte-sandbox-config
namespace: flyte
resourceVersion: "772909613"
uid: cbb1f7eb-f857-44ff-9ec9-793012fad166
narrow-helicopter-35399
01/12/2024, 10:48 AMnarrow-helicopter-35399
01/12/2024, 10:51 AM+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.16.8.248 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner local:///usr/local/bin/entrypoint.py pyflyte-execute --inputs <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-a52lrhcsbnsk569j6t74/n0/data/inputs.pb> --output-prefix <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-a52lrhcsbnsk569j6t74/n0/data/0> --raw-output-data-prefix <s3://my-s3-bucket/data/ds/a52lrhcsbnsk569j6t74-n0-0> --checkpoint-path <s3://my-s3-bucket/data/ds/a52lrhcsbnsk569j6t74-n0-0/_flytecheckpoints> --prev-checkpoint '""' --resolver flytekit.core.python_auto_container.default_task_resolver -- task-module workflows.pyspark_example task-name hello_spark
bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
24/01/12 10:47:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Getting <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-a52lrhcsbnsk569j6t74/n0/data/inputs.pb> to /tmp/flyte-ius582vp/sandbox/local_flytekit/inputs.pb
{"asctime": "2024-01-12 10:47:39,918", "name": "flytekit.entrypoint", "levelname": "ERROR", "message": "Exception when executing task workflows.pyspark_example.hello_spark, reason Failed to get data from <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-a52lrhcsbnsk569j6t74/n0/data/inputs.pb> to /tmp/flyte-ius582vp/sandbox/local_flytekit/inputs.pb (recursive=False).\n\nOriginal exception: Unable to locate credentials"}
{"asctime": "2024-01-12 10:47:39,919", "name": "flytekit.entrypoint", "levelname": "ERROR", "message": "!! Begin Unknown System Error Captured by Flyte !!"}
{"asctime": "2024-01-12 10:47:39,919", "name": "flytekit.entrypoint", "levelname": "ERROR", "message": "Traceback (most recent call last):\n File \"/usr/local/lib/python3.9/dist-packages/flytekit/core/data_persistence.py\", line 473, in get_data\n self.get(remote_path, to_path=local_path, recursive=is_multipart, **kwargs)\n File \"/usr/local/lib/python3.9/dist-packages/flytekit/core/data_persistence.py\", line 251, in get\n dst = file_system.get(from_path, to_path, recursive=recursive, **kwargs)\n File \"/usr/local/lib/python3.9/dist-packages/fsspec/asyn.py\", line 118, in wrapper\n return sync(self.loop, func, *args, **kwargs)\n File \"/usr/local/lib/python3.9/dist-packages/fsspec/asyn.py\", line 103, in sync\n raise return_result\n File \"/usr/local/lib/python3.9/dist-packages/fsspec/asyn.py\", line 56, in _runner\n result[0] = await coro\n File \"/usr/local/lib/python3.9/dist-packages/fsspec/asyn.py\", line 609, in _get\n rpaths = [\n File \"/usr/local/lib/python3.9/dist-packages/fsspec/asyn.py\", line 610, in <listcomp>\n p for p in rpaths if not (trailing_sep(p) or await self._isdir(p))\n File \"/usr/local/lib/python3.9/dist-packages/s3fs/core.py\", line 1411, in _isdir\n return bool(await self._lsdir(path))\n File \"/usr/local/lib/python3.9/dist-packages/s3fs/core.py\", line 706, in _lsdir\n async for c in self._iterdir(\n File \"/usr/local/lib/python3.9/dist-packages/s3fs/core.py\", line 738, in _iterdir\n s3 = await self.get_s3(bucket)\n File \"/usr/local/lib/python3.9/dist-packages/s3fs/core.py\", line 336, in get_s3\n return await self._s3creator.get_bucket_client(bucket)\n File \"/usr/local/lib/python3.9/dist-packages/s3fs/utils.py\", line 39, in get_bucket_client\n response = await general_client.head_bucket(Bucket=bucket_name)\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/client.py\", line 366, in _make_api_call\n http, parsed_response = await self._make_request(\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/client.py\", line 391, in _make_request\n return await self._endpoint.make_request(\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/endpoint.py\", line 96, in _send_request\n request = await self.create_request(request_dict, operation_model)\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/endpoint.py\", line 84, in create_request\n await self._event_emitter.emit(\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/hooks.py\", line 66, in _emit\n response = await resolve_awaitable(handler(**kwargs))\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/_helpers.py\", line 15, in resolve_awaitable\n return await obj\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/signers.py\", line 24, in handler\n return await self.sign(operation_name, request)\n File \"/usr/local/lib/python3.9/dist-packages/aiobotocore/signers.py\", line 82, in sign\n auth.add_auth(request)\n File \"/usr/local/lib/python3.9/dist-packages/botocore/auth.py\", line 418, in add_auth\n raise NoCredentialsError()\nbotocore.exceptions.NoCredentialsError: Unable to locate credentials\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File \"/usr/local/bin/entrypoint.py\", line 88, in _dispatch_execute\n ctx.file_access.get_data(inputs_path, local_inputs_file)\n File \"/usr/local/lib/python3.9/dist-packages/flytekit/core/data_persistence.py\", line 475, in get_data\n raise FlyteAssertion(\nflytekit.exceptions.user.FlyteAssertion: Failed to get data from <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-a52lrhcsbnsk569j6t74/n0/data/inputs.pb> to /tmp/flyte-ius582vp/sandbox/local_flytekit/inputs.pb (recursive=False).\n\nOriginal exception: Unable to locate credentials\n"}
tall-lock-23197
narrow-helicopter-35399
01/12/2024, 10:53 AMnarrow-helicopter-35399
01/12/2024, 10:54 AMpyspark_example.py
import random
from operator import add
import flytekit
from flytekit import Resources, task, workflow, ImageSpec
from flytekitplugins.spark import Spark
spark_image = ImageSpec(
registry="<http://harbor.linecorp.com/ecacda|harbor.linecorp.com/ecacda>"
)
@task(
task_config=Spark(
# this configuration is applied to the spark cluster
spark_conf={
"spark.driver.memory": "1000M",
"spark.executor.memory": "500M",
"spark.executor.cores": "1",
"spark.executor.instances": "1",
"spark.driver.cores": "1",
},
# executor_path="/usr/bin/python3",
# applications_path="local:///opt/entrypoint.sh",
),
limits=Resources(mem="3000M"),
cache_version="1",
container_image=spark_image,
)
def hello_spark(partitions: int) -> float:
print("=> Starting Sparkfk wifth Partitions: {}".format(partitions))
n = 100000 * partitions
sess = flytekit.current_context().spark_session
count = (
sess.sparkContext.parallelize(range(1, n + 1), partitions).map(f).reduce(add)
)
pi_val = 4.0 * count / n
print("Pi val is :{}".format(pi_val))
return pi_val
def f(_):
x = random.random() * 2 - 1
y = random.random() * 2 - 1
return 1 if x**2 + y**2 <= 1 else 0
@workflow
def wf() -> float:
"""
Using the workflow is still as any other workflow. As image is a property of the task, the workflow does not care
about how the image is configured.
"""
pi = hello_spark(partitions=50)
return pi
if __name__ == "__main__":
wf()
narrow-helicopter-35399
01/12/2024, 10:55 AMFLYTE_SDK_LOGGING_LEVEL=20 pyflyte register --non-fast -p flytesnack
s --domain development pyspark_example.py --version s14
narrow-helicopter-35399
01/12/2024, 10:56 AMnarrow-helicopter-35399
01/12/2024, 10:58 AMhello world
task can be executed successfullytall-lock-23197
narrow-helicopter-35399
01/12/2024, 11:01 AMnarrow-helicopter-35399
01/12/2024, 11:07 AMpyflyte register -p flytesnacks --domain development src
and run secret example code
import flytekit
from flytekit import Secret, task, workflow
SECRET_GROUP = "user-info"
SECRET_NAME = "user_secret"
@task(secret_requests=[Secret(group=SECRET_GROUP, key=SECRET_NAME)])
def hello_secret():
print("hello secret")
context = flytekit.current_context()
secret_val = context.secrets.get(SECRET_GROUP, SECRET_NAME)
print(secret_val)
@workflow
def wf():
hello_secret()
if __name__ == "__main__":
wf()
narrow-helicopter-35399
01/12/2024, 11:09 AMtall-lock-23197
narrow-helicopter-35399
01/12/2024, 11:57 AMnarrow-helicopter-35399
01/12/2024, 11:59 AMpyflyte run --remote cache_example_wf.py wf
from flytekit import task, workflow
# Doc:<https://docs.flyte.org/projects/cookbook/en/latest/auto_examples/development_lifecycle/task_cache.html>
@task(cache=True, cache_version="1.0") # noqa: F841
def square(n: int) -> int:
"""
Parameters:
n (int): name of the parameter for the task will be derived from the name of the input variable.
The type will be automatically deduced to ``Types.Integer``.
Return:
int: The label for the output will be automatically assigned, and the type will be deduced from the annotation.
"""
return n * n
@workflow
def wf() -> int:
return square(n=10)
narrow-helicopter-35399
01/12/2024, 12:00 PMpyflyte run --remote dynamic_example_wf.py wf --s1 test1 --s2 test2
import typing
from flytekit import dynamic, task, workflow
# Doc:<https://docs.flyte.org/projects/cookbook/en/latest/auto_examples/advanced_composition/dynamics.html#id1>
@task
def return_index(character: str) -> int:
"""
Computes the character index (which needs to fit into the 26 characters list)"""
if character.islower():
return ord(character) - ord("a")
else:
return ord(character) - ord("A")
@task
def update_list(freq_list: typing.List[int], list_index: int) -> typing.List[int]:
"""
Notes the frequency of characters"""
freq_list[list_index] += 1
return freq_list
@task
def derive_count(freq1: typing.List[int], freq2: typing.List[int]) -> int:
"""
Derives the number of common characters"""
count = 0
for i in range(26):
count += min(freq1[i], freq2[i])
return count
@dynamic
def count_characters(s1: str, s2: str) -> int:
"""
Calls the required tasks and returns the final result"""
# s1 and s2 are accessible
# initialize an empty list consisting of 26 empty slots corresponding to every alphabet (lower and upper case)
freq1 = [0] * 26
freq2 = [0] * 26
# looping through the string s1
for i in range(len(s1)):
# index and freq1 are not accessible as they are promises
index = return_index(character=s1[i])
freq1 = update_list(freq_list=freq1, list_index=index)
# looping through the string s2
for i in range(len(s2)):
# index and freq2 are not accessible as they are promises
index = return_index(character=s2[i])
freq2 = update_list(freq_list=freq2, list_index=index)
# counting the common characters
return derive_count(freq1=freq1, freq2=freq2)
@workflow
def wf(s1: str, s2: str) -> int:
"""
Calls the dynamic workflow and returns the result"""
# sending two strings to the workflow
return count_characters(s1=s1, s2=s2)
if __name__ == "__main__":
print(wf(s1="Pear", s2="Earth"))
narrow-helicopter-35399
01/12/2024, 12:01 PMtall-lock-23197
tall-lock-23197
tall-lock-23197
narrow-helicopter-35399
01/13/2024, 2:34 PMtall-lock-23197
narrow-helicopter-35399
01/15/2024, 7:02 AMspark-config-default
in values.yaml
file.
spark-config-default:
- spark.driver.cores: "1"
- spark.hadoop.fs.s3a.aws.credentials.provider: "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
- spark.hadoop.fs.s3a.endpoint: http://{{ printf "%s-minio" .Release.Name | trunc 63 | trimSuffix "-" }}.{{ .Release.Namespace }}:9000
- spark.hadoop.fs.s3a.access.key: "minio"
- spark.hadoop.fs.s3a.secret.key: "miniostorage"
- spark.hadoop.fs.s3a.path.style.access: "true"
- spark.kubernetes.allocation.batch.size: "50"
- spark.hadoop.fs.s3a.acl.default: "BucketOwnerFullControl"
- spark.hadoop.fs.s3n.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
- spark.hadoop.fs.AbstractFileSystem.s3n.impl: "org.apache.hadoop.fs.s3a.S3A"
- spark.hadoop.fs.s3.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
- spark.hadoop.fs.AbstractFileSystem.s3.impl: "org.apache.hadoop.fs.s3a.S3A"
- spark.hadoop.fs.s3a.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
- spark.hadoop.fs.AbstractFileSystem.s3a.impl: "org.apache.hadoop.fs.s3a.S3A"
the particularly different place is here...
- spark.hadoop.fs.s3a.aws.credentials.provider: "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
- spark.hadoop.fs.s3a.endpoint: http://{{ printf "%s-minio" .Release.Name | trunc 63 | trimSuffix "-" }}.{{ .Release.Namespace }}:9000
- spark.hadoop.fs.s3a.access.key: "minio"
- spark.hadoop.fs.s3a.secret.key: "miniostorage"
- spark.hadoop.fs.s3a.path.style.access: "true"
by the way, I learned how to enable spark plugin
when deploying flyte-sandbox
with helm. In the values.yaml file, I need to add an additional addition to the flyte-binary block. (currently, there are only two ways to write flyte-binary and flyte-core in Doc.)
enabled_plugins:
tasks:
task-plugins:
enabled-plugins:
- container
- sidecar
- k8s-array
- agent-service
- spark
default-for-task-types:
container: container
sidecar: sidecar
container_array: k8s-array
spark: spark
Maybe we can add a flyte-sandbox section in Doc to describe the settings of values.yaml, I can submit a PR to update this doc, what do you think?narrow-helicopter-35399
01/15/2024, 4:59 PMFlyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.
Powered by