limited-dog-47035
08/04/2022, 8:34 PM"msg":"failed to load plugin - spark: no matches for kind \"SparkApplication\" in version \"<http://sparkoperator.k8s.io/v1beta2|sparkoperator.k8s.io/v1beta2>\"
My propeller config map looks like this:
# Source: flyte-core/templates/propeller/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: flyte-propeller-config
namespace: flyte
labels:
<http://app.kubernetes.io/name|app.kubernetes.io/name>: flyteadmin
data:
admin.yaml: |
admin:
clientId: 'flytepropeller'
clientSecretLocation: /etc/secrets/client_secret
endpoint: flyteadmin:81
insecure: true
event:
capacity: 1000
rate: 500
type: admin
catalog.yaml: |
catalog-cache:
endpoint: datacatalog:89
insecure: true
type: datacatalog
copilot.yaml: |
plugins:
k8s:
co-pilot:
image: <http://cr.flyte.org/flyteorg/flytecopilot:v0.0.24|cr.flyte.org/flyteorg/flytecopilot:v0.0.24>
name: flyte-copilot-
start-timeout: 30s
core.yaml: |
manager:
pod-application: flytepropeller
pod-template-container-name: flytepropeller
pod-template-name: flytepropeller-template
propeller:
downstream-eval-duration: 30s
enable-admin-launcher: true
gc-interval: 12h
kube-client-config:
burst: 25
qps: 100
timeout: 30s
leader-election:
enabled: true
lease-duration: 15s
lock-config-map:
name: propeller-leader
namespace: flyte
renew-deadline: 10s
retry-period: 2s
limit-namespace: all
max-workflow-retries: 50
metadata-prefix: metadata/propeller
metrics-prefix: flyte
prof-port: 10254
queue:
batch-size: -1
batching-interval: 2s
queue:
base-delay: 5s
capacity: 1000
max-delay: 120s
rate: 100
type: maxof
sub-queue:
capacity: 1000
rate: 100
type: bucket
type: batch
rawoutput-prefix: s3://${ parameters.s3_bucket_name }/
workers: 40
workflow-reeval-duration: 30s
webhook:
certDir: /etc/webhook/certs
serviceName: flyte-pod-webhook
enabled_plugins.yaml: |
tasks:
task-plugins:
default-for-task-types:
container: container
container_array: k8s-array
sidecar: sidecar
spark: spark
enabled-plugins:
- container
- sidecar
- k8s-array
- spark
k8s.yaml: |
plugins:
k8s:
default-cpus: 100m
default-env-vars: []
default-memory: 100Mi
resource_manager.yaml: |
propeller:
resourcemanager:
type: noop
storage.yaml: |
storage:
type: s3
container: "${ parameters.s3_bucket_name }"
connection:
auth-type: iam
region: ${ parameters.aws_region }
limits:
maxDownloadMBs: 10
cache.yaml: |
cache:
max_size_mbs: 1024
target_gc_percent: 70
task_logs.yaml: |
plugins:
logs:
cloudwatch-enabled: false
kubernetes-enabled: false
spark.yaml: |
plugins:
spark:
spark-config-default:
- spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version: "2"
- spark.kubernetes.allocation.batch.size: "50"
- spark.hadoop.fs.s3a.acl.default: "BucketOwnerFullControl"
- spark.hadoop.fs.s3n.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
- spark.hadoop.fs.AbstractFileSystem.s3n.impl: "org.apache.hadoop.fs.s3a.S3A"
- spark.hadoop.fs.s3.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
- spark.hadoop.fs.AbstractFileSystem.s3.impl: "org.apache.hadoop.fs.s3a.S3A"
- spark.hadoop.fs.s3a.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
- spark.hadoop.fs.AbstractFileSystem.s3a.impl: "org.apache.hadoop.fs.s3a.S3A"
- spark.hadoop.fs.s3a.multipart.threshold: "536870912"
- spark.blacklist.enabled: "true"
- spark.blacklist.timeout: "5m"
- spark.task.maxfailures: "8"
And my cluster resource template
# Source: flyte-core/templates/clusterresourcesync/cluster_resource_configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: clusterresource-template
namespace: flyte
labels:
<http://app.kubernetes.io/name|app.kubernetes.io/name>: flyteadmin
<http://helm.sh/chart|helm.sh/chart>: flyte-core-v0.1.10
data:
aa_namespace.yaml: |
apiVersion: v1
kind: Namespace
metadata:
name: {{ namespace }}
spec:
finalizers:
- kubernetes
aab_default_service_account.yaml: |
apiVersion: v1
kind: ServiceAccount
metadata:
name: default
namespace: {{ namespace }}
ab_project_resource_quota.yaml: |
apiVersion: v1
kind: ResourceQuota
metadata:
name: project-quota
namespace: {{ namespace }}
spec:
hard:
limits.cpu: {{ projectQuotaCpu }}
limits.memory: {{ projectQuotaMemory }}
ac_spark_role.yaml: |
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: Role
metadata:
name: spark-role
namespace: {{ namespace }}
rules:
- apiGroups: ["*"]
resources:
- pods
verbs:
- '*'
- apiGroups: ["*"]
resources:
- services
verbs:
- '*'
- apiGroups: ["*"]
resources:
- configmaps
verbs:
- '*'
ad_spark_service_account.yaml: |
apiVersion: v1
kind: ServiceAccount
metadata:
name: spark
namespace: {{ namespace }}
ae_spark_role_binding.yaml: |
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: RoleBinding
metadata:
name: spark-role-binding
namespace: {{ namespace }}
roleRef:
apiGroup: <http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>
kind: Role
name: spark-role
subjects:
- kind: ServiceAccount
name: spark
namespace: {{ namespace }}
hallowed-mouse-14616
08/04/2022, 8:47 PMlimited-dog-47035
08/04/2022, 8:49 PMlimited-dog-47035
08/04/2022, 8:50 PMlimited-dog-47035
08/04/2022, 8:57 PMspark-operator
namespacelimited-dog-47035
08/04/2022, 10:27 PM"No plugin found for Handler-type [spark], defaulting to [container]"
Also
"No plugin found for Handler-type [python-task], defaulting to [container]"
However based on some other slack messages I searched, this seems to be the normal behavior?hallowed-mouse-14616
08/05/2022, 11:22 AMpython-task
this is normal. Basically, under the plugins configuration there is a mapping of task types to the plugin id. Typically these are the same so it seems a little redundant, example:
enabled_plugins.yaml: |
tasks:
task-plugins:
default-for-task-types:
container: container
container_array: k8s-array
sidecar: sidecar
spark: spark
enabled-plugins:
- container
- sidecar
- k8s-array
- spark
In propeller if we register a task-type that doesn't have an associated plugin it fallsback to the container
plugin. As I mentioned, this is normal for python-task
but for spark
it could be an issue. Did you change this configuration?hallowed-mouse-14616
08/05/2022, 11:23 AMfreezing-airport-6809
limited-dog-47035
08/05/2022, 3:14 PMfreezing-airport-6809
great-school-54368
08/05/2022, 3:30 PMkubectl api-versions | grep '<http://sparkoperator.k8s.io|sparkoperator.k8s.io>'
kubectl api-resources | grep '<http://sparkoperator.k8s.io|sparkoperator.k8s.io>'
limited-dog-47035
08/05/2022, 3:38 PMlimited-dog-47035
08/05/2022, 3:51 PM$ kubectl --kubeconfig=ss-dev-new1 api-versions | grep '<http://sparkoperator.k8s.io|sparkoperator.k8s.io>'
<http://sparkoperator.k8s.io/v1beta2|sparkoperator.k8s.io/v1beta2>
$ kubectl --kubeconfig=ss-dev-new1 api-resources | grep '<http://sparkoperator.k8s.io|sparkoperator.k8s.io>'
scheduledsparkapplications scheduledsparkapp <http://sparkoperator.k8s.io/v1beta2|sparkoperator.k8s.io/v1beta2> true ScheduledSparkApplication
sparkapplications sparkapp <http://sparkoperator.k8s.io/v1beta2|sparkoperator.k8s.io/v1beta2> true SparkApplication
great-school-54368
08/05/2022, 6:12 PMlimited-dog-47035
08/05/2022, 7:02 PMhelm template flyte flyteorg/flyte-core -f <https://raw.githubusercontent.com/flyteorg/flyte/master/charts/flyte-core/values-sandbox.yaml> -f values-override.yaml -n flyte > spark-override.yaml
limited-dog-47035
08/05/2022, 7:09 PMlimited-dog-47035
08/05/2022, 7:09 PMtime="2022-08-05T19:03:52Z" level=info msg=------------------------------------------------------------------------
time="2022-08-05T19:03:52Z" level=info msg="App [flytepropeller], Version [unknown], BuildSHA [unknown], BuildTS [2022-08-05 19:03:52.575011998 +0000 UTC m=+0.049406538]"
time="2022-08-05T19:03:52Z" level=info msg=------------------------------------------------------------------------
time="2022-08-05T19:03:52Z" level=info msg="Detected: 8 CPU's\n"
{"json":{},"level":"error","msg":"failed to initialize token source provider. Err: failed to fetch auth metadata. Error: rpc error: code = Unimplemented desc = unknown service flyteidl.service.AuthMetadataService","ts":"2022-08-05T19:03:53Z"}
{"json":{},"level":"warning","msg":"Starting an unauthenticated client because: can't create authenticated channel without a TokenSourceProvider","ts":"2022-08-05T19:03:53Z"}
{"json":{},"level":"error","msg":"failed to initialize token source provider. Err: failed to fetch auth metadata. Error: rpc error: code = Unimplemented desc = unknown service flyteidl.service.AuthMetadataService","ts":"2022-08-05T19:03:53Z"}
{"json":{},"level":"warning","msg":"Starting an unauthenticated client because: can't create authenticated channel without a TokenSourceProvider","ts":"2022-08-05T19:03:53Z"}
{"json":{},"level":"warning","msg":"defaulting max ttl for workflows to 23 hours, since configured duration is larger than 23 [23]","ts":"2022-08-05T19:03:53Z"}
{"json":{},"level":"warning","msg":"stow configuration section missing, defaulting to legacy s3/minio connection config","ts":"2022-08-05T19:03:53Z"}
I0805 19:03:53.592186 1 leaderelection.go:243] attempting to acquire leader lease flyte/propeller-leader...
I0805 19:04:10.415449 1 leaderelection.go:253] successfully acquired lease flyte/propeller-leader
freezing-airport-6809
freezing-airport-6809
freezing-airport-6809