rhythmic-glass-85662
04/26/2022, 3:53 PMk8s_spark.<http://dataframe_passing.my|dataframe_passing.my>_smart_structured_dataset
example. I've been able to run the other spark example pyspark_pi
. I've set up the K8s Operator, built the Docker Image based on the Dockerfile in the cookbook/integrations/kubernetes/k8s
folder. Getting this error about s3 :
[3/3] currentAttempt done. Last Error: SYSTEM::Traceback (most recent call last):
File "/opt/venv/lib/python3.8/site-packages/flytekit/exceptions/scopes.py", line 165, in system_entry_point
return wrapped(*args, **kwargs)
File "/opt/venv/lib/python3.8/site-packages/flytekit/core/base_task.py", line 527, in dispatch_execute
raise TypeError(
Message:
Failed to convert return value for var o0 for function k8s_spark.dataframe_passing.create_spark_df with error <class 'py4j.protocol.Py4JJavaError'>: An error occurred while calling o41.parquet.
: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3"
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3443)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FiInternal(DataFrameWriter.scala:355)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:781)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:748)
SYSTEM ERROR! Contact platform administrators.
freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
@task(
task_config=Spark(
spark_conf={
spark.hadoop.fs.s3.impl:"org.apache.hadoop.fs.s3a.S3AFileSystem",
})
def foo():
pass
rhythmic-glass-85662
04/26/2022, 4:31 PMfreezing-airport-6809
freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
kubectl -n flyte get cm flyte-propeller-config
rhythmic-glass-85662
04/26/2022, 4:43 PMthankful-minister-83577
kubectl -n flyte rollout restart deploy flytepropeller
rhythmic-glass-85662
04/26/2022, 4:46 PMrhythmic-glass-85662
04/26/2022, 6:55 PMfreezing-airport-6809
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3"
freezing-airport-6809
rhythmic-glass-85662
04/26/2022, 10:33 PM.config('spark.jars.packages', 'org.apache.hadoop:hadoop-aws:3.2.0') \
in the spark config (tried that with task config to no avail) or when I start the spark app. And EMR handles it in production.
What is the recommended method for adding it to the container?freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
rhythmic-glass-85662
04/27/2022, 2:06 AMfreezing-airport-6809
freezing-airport-6809
freezing-airport-6809
flytekit_install_spark3
freezing-airport-6809
tall-lock-23197
rhythmic-glass-85662
04/27/2022, 11:42 AMRUN flytekit_install_spark3.sh
tall-lock-23197
rhythmic-glass-85662
04/27/2022, 1:34 PM./var/lib/docker/overlay2/de24159d93f1ddb20b418e05ba33ceb3f545008bebfd370439553a34ea53aa29/diff/opt/spark/jars/hadoop-aws-3.2.0.jar
tall-lock-23197
tall-lock-23197
rhythmic-glass-85662
04/27/2022, 2:39 PMtall-lock-23197
rhythmic-glass-85662
04/27/2022, 5:26 PMrhythmic-glass-85662
04/27/2022, 5:28 PMkubectl -n flyte get cm flyte-propeller-config
but it just seems to tell me the size of the config and how long its lived.rhythmic-glass-85662
04/27/2022, 5:44 PMrhythmic-glass-85662
04/27/2022, 5:46 PMtall-lock-23197
Answer from Java side is empty
). When I inspected the config map through kubectl -n flyte get cm flyte-propeller-config
, spark wasn’t present. When I asked @great-school-54368 about it, he said we ought to have flyte
as the key in the values-override.yaml
file, meaning the config has to be present under flyte
key:
flyte:
cluster_resource_manager:
# -- Enables the Cluster resource manager component
enabled: true
# -- Configmap for ClusterResource parameters
config:
...
But this isn’t successfully bringing up all the pods after following https://docs.flyte.org/en/latest/deployment/plugin_setup/k8s/index.html#deployment-plugin-setup-k8s:
flyte-kubernetes-dashboard-7fd989b99d-9dj2r 1/1 Running 0 7m40s
flyteconsole-668f9ccdd8-bj8pf 1/1 Running 0 7m40s
flyteadmin-7db9b49c6f-24dq6 1/1 Running 0 7m40s
syncresources-27519190-4ws6k 0/1 Completed 0 6m45s
syncresources-27519191-rkhhn 0/1 Completed 0 5m45s
syncresources-27519192-jtc7q 0/1 Completed 0 4m45s
postgres-5b4ccdcd68-9gpv8 1/1 Running 0 3m51s
minio-999cb6d9b-ltnzn 1/1 Running 0 3m51s
flyte-contour-contour-7cfc9f6fb5-g842s 1/1 Running 0 3m51s
flytepropeller-cfcdd6bf5-krtw2 1/1 Running 0 3m51s
datacatalog-7cc7d996d5-s4dmv 1/1 Running 0 3m51s
flyte-pod-webhook-67dfc889df-wsxqd 1/1 Running 0 3m51s
flytescheduler-6d6f79d89-6lqt6 1/1 Running 3 3m51s
flyte-contour-envoy-pcwjh 2/2 Running 0 2m45s
flyteadmin-8c8b86d46-v2jxw 0/2 Init:CrashLoopBackOff 4 3m51s
syncresources-27519193-shg2c 0/1 CrashLoopBackOff 4 3m45s
syncresources-27519194-w86t7 0/1 Error 4 2m45s
flyteconsole-85df86887d-ftcjg 0/1 ErrImagePull 0 3m51s
syncresources-27519195-l77td 0/1 Error 3 105s
syncresources-27519196-kcj2p 1/1 Running 2 45s
Notes:
• flyteconsole
is failing because it’s trying to pull "<http://cr.flyte.org/flyteorg/flyteconsole-release:v0.19.0|cr.flyte.org/flyteorg/flyteconsole-release:v0.19.0>"
which doesn’t exist.
• `flyte`’s version on running helm upgrade ..
is 0.19.0, which needs to be upgraded to 1.0.0
• In flyteadmin
pod log, I see `Back-off restarting failed container`; here’s the event log:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 13m default-scheduler Successfully assigned flyte/flyteadmin-8c8b86d46-v2jxw to eba2bc8245ba
Normal Pulled 13m kubelet Container image "<http://ecr.flyte.org/ubuntu/postgres:13-21.04_beta|ecr.flyte.org/ubuntu/postgres:13-21.04_beta>" already present on machine
Normal Created 13m kubelet Created container check-db-ready
Normal Started 13m kubelet Started container check-db-ready
Normal Pulling 13m kubelet Pulling image "<http://cr.flyte.org/flyteorg/flyteadmin-release:v0.19.0|cr.flyte.org/flyteorg/flyteadmin-release:v0.19.0>"
Normal Pulled 13m kubelet Successfully pulled image "<http://cr.flyte.org/flyteorg/flyteadmin-release:v0.19.0|cr.flyte.org/flyteorg/flyteadmin-release:v0.19.0>" in 34.417862933s
Normal Created 13m kubelet Created container run-migrations
Normal Started 13m kubelet Started container run-migrations
Normal Pulled 13m kubelet Container image "<http://cr.flyte.org/flyteorg/flyteadmin-release:v0.19.0|cr.flyte.org/flyteorg/flyteadmin-release:v0.19.0>" already present on machine
Normal Created 13m kubelet Created container seed-projects
Normal Started 13m kubelet Started container seed-projects
Normal Pulled 11m (x4 over 13m) kubelet Container image "<http://cr.flyte.org/flyteorg/flyteadmin-release:v0.19.0|cr.flyte.org/flyteorg/flyteadmin-release:v0.19.0>" already present on machine
Normal Created 11m (x4 over 13m) kubelet Created container sync-cluster-resources
Normal Started 11m (x4 over 13m) kubelet Started container sync-cluster-resources
Warning BackOff 3m39s (x37 over 12m) kubelet Back-off restarting failed container
Not sure if this would resolve the issue, but IMO, we’ll have to fix this first.limited-dog-47035
04/28/2022, 1:41 PMfreezing-airport-6809
freezing-airport-6809
tall-lock-23197
high-park-82026
tall-lock-23197
helm dep update
. Will continue debugging the s3 issue.tall-lock-23197
Failed to convert return value for var o0 for function k8s_spark.dataframe_passing.create_spark_df with error <class 'py4j.protocol.Py4JJavaError'>: An error occurred while calling o140.parquet.
: java.io.FileNotFoundException: PUT 0-byte object on r3/adggpq6ghqkppp2jjj9m-n0-3/8addd767922501d67338242d43f12759/_temporary/0/: com.amazonaws.services.s3.model.AmazonS3Exception: Not Found (Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; Request ID: null; S3 Extended Request ID: null; Proxy: minio.flyte), S3 Extended Request ID: null:404 Not Found
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:260)
at org.apache.hadoop.fs.s3a.Invoker.once(I$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5227)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5173)
at com.amazonaws.services.s3.AmazonS3Client.access$300(AmazonS3Client.java:415)
at com.amazonaws.services.s3.AmazonS3Client$PutObjectStrategy.invokeServiceCall(AmazonS3Client.java:6289)
at com.amazonaws.services.s3.AmazonS3Client.uploadObject(AmazonS3Client.java:1834)
at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1794)
at org.apache.hadoop.fs.s3a.S3AFileSystem.putObjectDirect(S3AFileSystem.java:2432)
at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$createEmptyObject$22(S3AFileSystem.java:4098)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:115)
... 54 more
with help from @great-school-54368. I guess the problem now is concerning providing the required access.
I’m seeing this error after I’ve added:
- spark.hadoop.fs.s3a.proxy.host: minio.flyte
- spark.hadoop.fs.s3a.proxy.port: 9000
to the spark conf in the values-override.yaml
file, minio creds to the dockerfile, along with a couple of other changes.
But if there’s no proxy, I see the following error:
Failed to convert return value for var o0 for function k8s_spark.dataframe_passing.create_spark_df with error <class 'py4j.protocol.Py4JJavaError'>: An error occurred while calling o138.parquet.
: java.nio.file.AccessDeniedException: <s3://my-s3-bucket/3d/aphcqdxts4ccpdbzqjjr-n0-3/ff104636f156a8f5de8ef1378b0c81a8>: getFileStatus on <s3://my-s3-bucket/3d/aphcqdxts4ccpdbzqjjr-n0-3/ff104636f156a8f5de8ef1378b0c81a8>: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 91ZXJZRMJPSHJ7QY; S3 Extended Request ID: pOYFp8feFwlpc0fSnTDoQ75RklZu51sa69plw+kORlJbqjSazc/o5+MuUZCBGNINRMDi61IqMmU=; Proxy: null),pClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5227)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5173)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1360)
at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$6(S3AFileSystem.java:2066)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:412)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:375)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2056)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2032)
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3273)
... 46 more
SYSTEM ERROR! Contact platform administrators.
Can someone help me resolve this issue? @User / @Userfreezing-airport-6809
freezing-airport-6809
great-school-54368
04/29/2022, 4:28 PMrhythmic-glass-85662
05/02/2022, 2:57 PMfreezing-airport-6809
tall-lock-23197
tall-lock-23197
tall-lock-23197
Are u using the right service accountBtw Ketan, I’ve used “spark” service account.
tall-lock-23197
values-override.yaml
config on this page.
@rhythmic-glass-85662 & @limited-dog-47035, we’ll have to use flyte
as the base key in the values-override.yaml
file (example) until this PR is merged, which @great-school-54368 is working on.rhythmic-glass-85662
05/04/2022, 1:55 PMlimited-dog-47035
05/04/2022, 2:12 PMfreezing-airport-6809
freezing-airport-6809
freezing-airport-6809
rhythmic-glass-85662
05/04/2022, 5:19 PMvalues-override.yaml
with the new changes with the minio configurations and still getting the same error =/freezing-airport-6809
freezing-airport-6809
rhythmic-glass-85662
05/04/2022, 6:26 PMfreezing-airport-6809
freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
freezing-airport-6809
rhythmic-glass-85662
05/04/2022, 6:31 PM[3/3] currentAttempt done. Last Error: SYSTEM::Traceback (most recent call last):
File "/opt/venv/lib/python3.8/site-packages/flytekit/exceptions/scopes.py", line 165, in system_entry_point
return wrapped(*args, **kwargs)
File "/opt/venv/lib/python3.8/site-packages/flytekit/core/base_task.py", line 527, in dispatch_execute
raise TypeError(
Message:
Failed to convert return value for var o0 for function k8s_spark.dataframe_passing.create_spark_df with error <class 'py4j.protocol.Py4JJavaError'>: An error occurred while calling o41.parquet.
: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "s3"
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3443)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FiInternal(DataFrameWriter.scala:355)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:781)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:748)
SYSTEM ERROR! Contact platform administrators.
great-school-54368
05/04/2022, 6:33 PMvalues-override.yaml
?rhythmic-glass-85662
05/04/2022, 6:45 PMrhythmic-glass-85662
05/04/2022, 7:11 PMpods "flyte-sparkoperator-5f8b4845f8-" is forbidden: error looking up
service account flyte/flyte-sparkoperator: serviceaccount
"flyte-sparkoperator" not found
great-school-54368
05/04/2022, 7:38 PMflyte:
cluster_resource_manager:
# -- Enables the Cluster resource manager component
enabled: true
# -- Configmap for ClusterResource parameters
config:
# -- ClusterResource parameters
# Refer to the [structure](<https://pkg.go.dev/github.com/lyft/flyteadmin@v0.3.37/pkg/runtime/interfaces#ClusterResourceConfig>) to customize.
cluster_resources:
refreshInterval: 5m
templatePath: "/etc/flyte/clusterresource/templates"
customData:
- production:
- projectQuotaCpu:
value: "5"
- projectQuotaMemory:
value: "4000Mi"
- staging:
- projectQuotaCpu:
value: "2"
- projectQuotaMemory:
value: "3000Mi"
- development:
- projectQuotaCpu:
value: "4"
- projectQuotaMemory:
value: "5000Mi"
refresh: 5m
# -- Resource templates that should be applied
templates:
# -- Template for namespaces resources
- key: aa_namespace
value: |
apiVersion: v1
kind: Namespace
metadata:
name: {{ namespace }}
spec:
finalizers:
- kubernetes
- key: ab_project_resource_quota
value: |
apiVersion: v1
kind: ResourceQuota
metadata:
name: project-quota
namespace: {{ namespace }}
spec:
hard:
limits.cpu: {{ projectQuotaCpu }}
limits.memory: {{ projectQuotaMemory }}
- key: ac_spark_role
value: |
apiVersion: <http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>
kind: Role
metadata:
name: spark-role
namespace: {{ namespace }}
rules:
- apiGroups: ["*"]
resources: ["pods"]
verbs: ["*"]
- apiGroups: ["*"]
resources: ["services"]
verbs: ["*"]
- apiGroups: ["*"]
resources: ["configmaps", "persistentvolumeclaims"]
verbs: ["*"]
- key: ad_spark_service_account
value: |
apiVersion: v1
kind: ServiceAccount
metadata:
name: spark
namespace: {{ namespace }}
- key: ae_spark_role_binding
value: |
apiVersion: <http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>
kind: RoleBinding
metadata:
name: spark-role-binding
namespace: {{ namespace }}
roleRef:
apiGroup: <http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>
kind: Role
name: spark-role
subjects:
- kind: ServiceAccount
name: spark
namespace: {{ namespace }}
sparkoperator:
enabled: true
plugin_config:
plugins:
spark:
# -- Spark default configuration
spark-config-default:
# We override the default credentials chain provider for Hadoop so that
# it can use the serviceAccount based IAM role or ec2 metadata based.
# This is more in line with how AWS works
- spark.hadoop.fs.s3a.aws.credentials.provider: "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"
- spark.hadoop.fs.s3a.endpoint: "<http://minio.flyte.svc.cluster.local:9000>"
- spark.hadoop.fs.s3a.access.key: "minio"
- spark.hadoop.fs.s3a.secret.key: "miniostorage"
- spark.hadoop.fs.s3a.path.style.access: "true"
- spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version: "2"
- spark.kubernetes.allocation.batch.size: "50"
- spark.hadoop.fs.s3a.acl.default: "BucketOwnerFullControl"
- spark.hadoop.fs.s3n.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
- spark.hadoop.fs.AbstractFileSystem.s3n.impl: "org.apache.hadoop.fs.s3a.S3A"
- spark.hadoop.fs.s3.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
- spark.hadoop.fs.AbstractFileSystem.s3.impl: "org.apache.hadoop.fs.s3a.S3A"
- spark.hadoop.fs.s3a.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
- spark.hadoop.fs.AbstractFileSystem.s3a.impl: "org.apache.hadoop.fs.s3a.S3A"
- spark.hadoop.fs.s3a.multipart.threshold: "536870912"
- spark.excludeOnFailure.enabled: "true"
- spark.excludeOnFailure.timeout: "5m"
- spark.task.maxfailures: "8"
configmap:
enabled_plugins:
# -- Tasks specific configuration [structure](<https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#GetConfig>)
tasks:
# -- Plugins configuration, [structure](<https://pkg.go.dev/github.com/flyteorg/flytepropeller/pkg/controller/nodes/task/config#TaskPluginConfig>)
task-plugins:
# -- [Enabled Plugins](<https://pkg.go.dev/github.com/flyteorg/flyteplugins/go/tasks/config#Config>). Enable sagemaker*, athena if you install the backend
# plugins
enabled-plugins:
- container
- sidecar
- k8s-array
- spark
default-for-task-types:
container: container
sidecar: sidecar
container_array: k8s-array
spark: spark
rhythmic-glass-85662
05/04/2022, 10:07 PM[1/1] currentAttempt done. Last Error: USER::Spark Job Submission Failed with Error: failed to run spark-submit for SparkApplication flytesnacks-development/a9sgv4zlx2nqhljrrnd2-n0-0: WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/05/04 22:04:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
22/05/04 22:04:19 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
22/05/04 22:04:20 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
Exception in thread "main" org.apache.spark.SparkException: Please specify spark.kubernetes.file.upload.path property.
at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:299)
at org.apache.spark.deploy.k8s.KubernetesUtils$.renameMainAppResource(KubernetesUtils.scala:270)
at org.apache.spark.deploy.k8s.features.DriverCommandFeatureStep.configureForPython(DriverCommandFeatureStep.scala:109)
at org.apache.spark.deploy.k8s.features.DriverCommandFeatureStep.configurePod(DriverCommandFeatureStep.scala:44)
at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:59)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:89)
at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:106)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2611)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
at <http://org.apache.spark.deploy.SparkSubmit.org|org.apache.spark.deploy.SparkSubmit.org>$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
22/05/04 22:04:20 INFO ShutdownHookManager: Shutdown hook called
22/05/04 22:04:20 INFO ShutdownHookManager: Deleting directory /tmp/spark-857af52f-ce40-4c26-893a-44478cd6dd98
I'm starting to understand a bit more how this is all working together so I'll keep trying to debug it.tall-lock-23197
helm dep update
in the flyte/charts/flyte
repository after you clone https://github.com/flyteorg/flyte?tall-lock-23197
## BUILD
flytectl sandbox exec -- docker build . --tag "examples:v1" --file k8s_spark/Dockerfile
pyflyte --pkgs k8s_spark package --image "examples:v1"
flytectl register files --project flytesnacks --domain development --archive flyte-package.tgz --version v1
tall-lock-23197
rhythmic-glass-85662
05/12/2022, 4:24 PMfreezing-airport-6809
freezing-airport-6809
rhythmic-glass-85662
05/13/2022, 12:59 PMfreezing-airport-6809
rhythmic-glass-85662
05/14/2022, 12:09 PMpyspark_pi
example just fine without installing the k8s_sparkoperator. But cannot run the dataframe_passing
example. Soon as I install the k8s_sparkoperator neither of them will run due to the error below.
[1/1] currentAttempt done. Last Error: USER::Spark Job Submission Failed with Error: failed to run spark-submit for SparkApplication flytesnacks-development/a78dhklnqtwphd22bm8v-n0-0: WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/05/14 12:05:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
22/05/14 12:05:04 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
22/05/14 12:05:05 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
Exception in thread "main" org.apache.spark.SparkException: Please specify spark.kubernetes.file.upload.path property.
at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:299)
at org.apache.spark.deploy.k8s.KubernetesUtils$.renameMainAppResource(KubernetesUtils.scala:270)
at org.apache.spark.deploy.k8s.features.DriverCommandFeatureStep.configureForPython(DriverCommandFeatureStep.scala:109)
at org.apache.spark.deploy.k8s.features.DriverCommandFeatureStep.configurePod(DriverCommandFeatureStep.scala:44)
at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:59)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:89)
at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58)
at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:106)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3(KubernetesClientApplication.scala:213)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$3$adapted(KubernetesClientApplication.scala:207)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2611)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:207)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:179)
at <http://org.apache.spark.deploy.SparkSubmit.org|org.apache.spark.deploy.SparkSubmit.org>$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
22/05/14 12:05:05 INFO ShutdownHookManager: Shutdown hook called
22/05/14 12:05:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-c50b58af-9177-4c36-877e-0ed7b7149eaf
tall-lock-23197
tall-lock-23197
kubectl -n flyte get cm flyte-propeller-config -o yaml
, do you see spark config?tall-lock-23197
rhythmic-glass-85662
05/16/2022, 6:15 PMtall-lock-23197
tall-lock-23197
FROM ubuntu:focal
LABEL org.opencontainers.image.source <https://github.com/flyteorg/flytesnacks>
WORKDIR /root
ENV VENV /opt/venv
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8
ENV PYTHONPATH /root
ENV DEBIAN_FRONTEND=noninteractive
# Install Python3 and other basics
RUN apt-get update && apt-get install -y python3.8 python3.8-venv make build-essential libssl-dev python3-pip curl
# Install AWS CLI to run on AWS (for GCS install GSutil). This will be removed
# in future versions to make it completely portable
RUN pip3 install awscli
WORKDIR /opt
RUN curl <https://sdk.cloud.google.com> > install.sh
RUN bash /opt/install.sh --install-dir=/opt
ENV PATH $PATH:/opt/google-cloud-sdk/bin
WORKDIR /root
ENV VENV /opt/venv
# Virtual environment
RUN python3 -m venv ${VENV}
ENV PATH="${VENV}/bin:$PATH"
RUN pip3 install wheel
# Install Python dependencies
COPY k8s_spark/requirements.txt /root
RUN pip install -r /root/requirements.txt
RUN flytekit_install_spark3.sh
# Adding Tini support for the spark pods
RUN wget <https://github.com/krallin/tini/releases/download/v0.18.0/tini> && \
cp tini /sbin/tini && cp tini /usr/bin/tini && \
chmod a+x /sbin/tini && chmod a+x /usr/bin/tini
# Setup Spark environment
ENV JAVA_HOME /usr/lib/jvm/java-8-openjdk-amd64
ENV SPARK_HOME /opt/spark
ENV SPARK_VERSION 3.2.1
ENV PYSPARK_PYTHON ${VENV}/bin/python3
ENV PYSPARK_DRIVER_PYTHON ${VENV}/bin/python3
# Copy the makefile targets to expose on the container. This makes it easier to register.
# Delete this after we update CI
COPY <http://in_container.mk|in_container.mk> /root/Makefile
# Delete this after we update CI to not serialize inside the container
COPY k8s_spark/sandbox.config /root
# Copy the actual code
COPY k8s_spark/ /root/k8s_spark
# This tag is supplied by the build script and will be used to determine the version
# when registering tasks, workflows, and launch plans
ARG tag
ENV FLYTE_INTERNAL_IMAGE $tag
# Copy over the helper script that the SDK relies on
RUN cp ${VENV}/bin/flytekit_venv /usr/local/bin/
RUN chmod a+x /usr/local/bin/flytekit_venv
# For spark we want to use the default entrypoint which is part of the
# distribution, also enable the virtualenv for this image.
ENTRYPOINT ["/opt/entrypoint.sh"]
tall-lock-23197
rhythmic-glass-85662
05/17/2022, 12:29 PMtall-lock-23197
tall-lock-23197
kubectl version --short
in your docker sandbox?tall-lock-23197
tall-lock-23197
kubernetes
directory in flytesnacks:
flytectl sandbox start --source .
# export kubeconfig and flytectl config commands
helm repo add flyteorg <https://flyteorg.github.io/flyte>
helm repo add spark-operator <https://googlecloudplatform.github.io/spark-on-k8s-operator>
helm dep update (in flyte repo; flyte/charts/flyte directory)
helm install spark-operator spark-operator/spark-operator --namespace spark-operator --create-namespace
helm upgrade flyte flyteorg/flyte -f values-override.yaml -n flyte
# wait for a few seconds until all the pods are up and running; check the status using kubectl get pods -n flyte
flytectl sandbox exec -- docker build . --tag "examples:v1" --file k8s_spark/Dockerfile
pyflyte --pkgs k8s_spark package --image "examples:v1"
flytectl register files --project flytesnacks --domain development --archive flyte-package.tgz --version v1
rhythmic-glass-85662
05/18/2022, 12:12 AMkubectl
command to pull the log, I'm just exporting it from the kubernetes console. If there's a better way I should be doing it let me know; still learning kubernetes.rhythmic-glass-85662
05/18/2022, 12:13 AMtall-lock-23197
kubectl
-related commands once you open bash for your docker container using docker exec -it <container-id> bash
(run docker ps
to know the container ID), although, the K8s console log should be the same as the pod log. In any case, can you please send me the pod log? Run kubectl get pods -n flytesnacks-development
first, get the pod name (it’s usually of the format <…>-n0-0-driver
, but if you don’t see this, it should be <…>-n0-0
), and run kubectl logs <pod-name> -n flytesnacks-development
.
Also, please double check the order of your commands and K8s version and lemme know. Make sure you wait for a few seconds until all the pods are in COMPLETED and RUNNING after you run helm upgrade …
command. To check the pods’ status, run kubectl get pods -n flyte
in the docker container bash.
IMO, spark.kubernetes.file.upload.path
should be available automatically. Your code’s failing even before the spark submission, and hence you aren’t seeing any service account-related error. I’m presuming the spark operator isn’t getting installed properly. Let’s see if you’re able to circumvent the issue after you run the commands I’ve sent.rhythmic-glass-85662
05/19/2022, 6:54 PMkubectl get pods -n flytesnacks-development
I get "No resources found in flytesnacks-development namespace"
• Here are all the pods that it says are running. This is after replicating the error trying to run the workflow.tall-lock-23197
flytesnacks-development
namespace after you run the workflow. Let me DM you.