Thread
#ask-the-community
    Rodrigo Baron

    Rodrigo Baron

    7 months ago
    I've deployed flyte using kubeadm along with spark-k8s-operator which run successful this example, did try run the example from flytesnacks and got some issues (logs in 🧵). Someone know about
    Non-spark-on-k8s command provided
    ?
    ++ id -u
    + myuid=0
    ++ id -g
    + mygid=0
    + set +e
    ++ getent passwd 0
    + uidentry=root:x:0:0:root:/root:/bin/bash
    + set -e
    + '[' -z root:x:0:0:root:/root:/bin/bash ']'
    + SPARK_CLASSPATH=':/opt/spark/jars/*'
    + env
    + grep SPARK_JAVA_OPT_
    + sort -t_ -k4 -n
    + sed 's/[^=]*=\(.*\)/\1/g'
    + readarray -t SPARK_EXECUTOR_JAVA_OPTS
    + '[' -n '' ']'
    + '[' '' == 2 ']'
    + '[' '' == 3 ']'
    + '[' -n '' ']'
    + '[' -z ']'
    + case "$1" in
    + echo 'Non-spark-on-k8s command provided, proceeding in pass-through mode...'
    + CMD=("$@")
    Non-spark-on-k8s command provided, proceeding in pass-through mode...
    + exec /usr/bin/tini -s -- pyflyte-execute --inputs <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-p5jty2p9e5/k8ssparkpysparkpihellospark/data/inputs.pb> --output-prefix <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-p5jty2p9e5/k8ssparkpysparkpihellospark/data/0> --raw-output-data-prefix <s3://my-s3-bucket/ko/p5jty2p9e5-f3cpp6wq-0> --resolver flytekit.core.python_auto_container.default_task_resolver -- task-module k8s_spark.pyspark_pi task-name hello_spark
    Welcome to Flyte! Version: 0.22.2
    Attempting to run with flytekit.core.python_auto_container.default_task_resolver...
    WARNING:root:No config file provided or invalid flyte config_file_path flytekit.config specified.
    Using user directory /tmp/flyte/20220204_165445/sandbox/local_flytekit/a504ea1e3e3c62771228ed2d83186827
    {"asctime": "2022-02-04 16:54:57,721", "name": "flytekit", "levelname": "DEBUG", "message": "Task returns unnamed native tuple <class 'float'>"}
    DEBUG:flytekit:Task returns unnamed native tuple <class 'float'>
    {"asctime": "2022-02-04 16:54:57,722", "name": "flytekit", "levelname": "DEBUG", "message": "Task returns unnamed native tuple <class 'int'>"}
    DEBUG:flytekit:Task returns unnamed native tuple <class 'int'>
    {"asctime": "2022-02-04 16:54:57,821", "name": "flytekit", "levelname": "DEBUG", "message": "Task returns unnamed native tuple <class 'float'>"}
    DEBUG:flytekit:Task returns unnamed native tuple <class 'float'>
    No images specified, will use the default image
    Running native-typed task
    INFO:root:Entering timed context: Copying (<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-p5jty2p9e5/k8ssparkpysparkpihellospark/data/inputs.pb> -> /tmp/flyte7c4gkp2a/local_flytekit/inputs.pb)
    INFO:root:Output of command '['aws', '--endpoint-url', '<http://192.168.0.222:30084>', 's3', 'cp', '<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-p5jty2p9e5/k8ssparkpysparkpihellospark/data/inputs.pb>', '/tmp/flyte7c4gkp2a/local_flytekit/inputs.pb']':
    b'Completed 22 Bytes/22 Bytes (248 Bytes/s) with 1 file(s) remaining\rdownload: <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-p5jty2p9e5/k8ssparkpysparkpihellospark/data/inputs.pb> to ../tmp/flyte7c4gkp2a/local_flytekit/inputs.pb\n'
    
    INFO:root:Exiting timed context: Copying (<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-p5jty2p9e5/k8ssparkpysparkpihellospark/data/inputs.pb> -> /tmp/flyte7c4gkp2a/local_flytekit/inputs.pb) [Wall Time: 9.195532751000428s, Process Time: 0.0055688540000000675s]
    22/02/04 16:56:07 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    INFO:py4j.java_gateway:Error while receiving.
    Traceback (most recent call last):
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 1207, in send_command
        raise Py4JNetworkError("Answer from Java side is empty")
    py4j.protocol.Py4JNetworkError: Answer from Java side is empty
    ERROR:root:Exception while sending command.
    Traceback (most recent call last):
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 1207, in send_command
        raise Py4JNetworkError("Answer from Java side is empty")
    py4j.protocol.Py4JNetworkError: Answer from Java side is empty
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 1033, in send_command
        response = connection.send_command(command)
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 1211, in send_command
        raise Py4JNetworkError(
    py4j.protocol.Py4JNetworkError: Error while receiving
    ERROR:root:!! Begin System Error Captured by Flyte !!
    ERROR:root:Traceback (most recent call last):
    
          File "/opt/venv/lib/python3.8/site-packages/flytekit/common/exceptions/scopes.py", line 165, in system_entry_point
            return wrapped(*args, **kwargs)
          File "/opt/venv/lib/python3.8/site-packages/flytekit/core/base_task.py", line 442, in dispatch_execute
            new_user_params = self.pre_execute(ctx.user_space_params)
          File "/opt/venv/lib/python3.8/site-packages/flytekitplugins/spark/task.py", line 122, in pre_execute
            self.sess = sess_builder.getOrCreate()
          File "/opt/venv/lib/python3.8/site-packages/pyspark/sql/session.py", line 228, in getOrCreate
            sc = SparkContext.getOrCreate(sparkConf)
          File "/opt/venv/lib/python3.8/site-packages/pyspark/context.py", line 384, in getOrCreate
            SparkContext(conf=conf or SparkConf())
          File "/opt/venv/lib/python3.8/site-packages/pyspark/context.py", line 146, in __init__
            self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
          File "/opt/venv/lib/python3.8/site-packages/pyspark/context.py", line 209, in _do_init
            self._jsc = jsc or self._initialize_context(self._conf._jconf)
          File "/opt/venv/lib/python3.8/site-packages/pyspark/context.py", line 321, in _initialize_context
            return self._jvm.JavaSparkContext(jconf)
          File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 1568, in __call__
            return_value = get_return_value(
          File "/opt/venv/lib/python3.8/site-packages/py4j/protocol.py", line 334, in get_return_value
            raise Py4JError(
    
    Message:
    
        An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext
    
    SYSTEM ERROR! Contact platform administrators.
    ERROR:root:!! End Error Captured by Flyte !!
    INFO:root:Entering timed context: Writing (/tmp/flyte7c4gkp2a/local_flytekit/engine_dir -> <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-p5jty2p9e5/k8ssparkpysparkpihellospark/data/0>)
    INFO:root:Output of command '['aws', '--endpoint-url', '<http://192.168.0.222:30084>', 's3', 'cp', '--recursive', '--acl', 'bucket-owner-full-control', '/tmp/flyte7c4gkp2a/local_flytekit/engine_dir', '<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-p5jty2p9e5/k8ssparkpysparkpihellospark/data/0']'>:
    b'Completed 1.7 KiB/1.7 KiB (291.7 KiB/s) with 1 file(s) remaining\rupload: ../tmp/flyte7c4gkp2a/local_flytekit/engine_dir/error.pb to <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-p5jty2p9e5/k8ssparkpysparkpihellospark/data/0/error.pb>\n'
    
    INFO:root:Exiting timed context: Writing (/tmp/flyte7c4gkp2a/local_flytekit/engine_dir -> <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-p5jty2p9e5/k8ssparkpysparkpihellospark/data/0>) [Wall Time: 11.491473693000444s, Process Time: 0.009896259000000018s]
    INFO:root:Engine folder written successfully to the output prefix <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-p5jty2p9e5/k8ssparkpysparkpihellospark/data/0>
    ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:37755)
    Traceback (most recent call last):
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 977, in _get_connection
        connection = self.deque.pop()
    IndexError: pop from an empty deque
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 1115, in start
        self.socket.connect((self.address, self.port))
    ConnectionRefusedError: [Errno 111] Connection refused
    ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:37755)
    Traceback (most recent call last):
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 977, in _get_connection
        connection = self.deque.pop()
    IndexError: pop from an empty deque
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 1115, in start
        self.socket.connect((self.address, self.port))
    ConnectionRefusedError: [Errno 111] Connection refused
    ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:37755)
    Traceback (most recent call last):
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 977, in _get_connection
        connection = self.deque.pop()
    IndexError: pop from an empty deque
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 1115, in start
        self.socket.connect((self.address, self.port))
    ConnectionRefusedError: [Errno 111] Connection refused
    ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:37755)
    Traceback (most recent call last):
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 977, in _get_connection
        connection = self.deque.pop()
    IndexError: pop from an empty deque
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 1115, in start
        self.socket.connect((self.address, self.port))
    ConnectionRefusedError: [Errno 111] Connection refused
    Haytham Abuelfutuh

    Haytham Abuelfutuh

    7 months ago
    @Yee @Eduardo Apolinario (eapolinario) would you be able to help @Rodrigo Baron?
    Yee

    Yee

    7 months ago
    hey @Rodrigo Baron when you get a chance, could you send us the pod specs for the spark operator you have running and the task pod that failed?
    i’ll try that on my end as well and see what the differences are.
    from there hopefully we can try to narrow it down
    i’m not familiar with the setup prescribed by the py-pi yaml files (not that i’m terribly familiar with the helm chart we use either, but at least that’s one difference we can start to dig into)
    Rodrigo Baron

    Rodrigo Baron

    7 months ago
    the pod is running the flyte task:
    apiVersion: v1
    kind: Pod
    metadata:
      annotations:
        <http://cluster-autoscaler.kubernetes.io/safe-to-evict|cluster-autoscaler.kubernetes.io/safe-to-evict>: "false"
        <http://cni.projectcalico.org/containerID|cni.projectcalico.org/containerID>: 501668ab8335b3661174264ea733d6a0589a733172a8ae3a388d8e99d434abfa
        <http://cni.projectcalico.org/podIP|cni.projectcalico.org/podIP>: 172.29.77.31/32
        <http://cni.projectcalico.org/podIPs|cni.projectcalico.org/podIPs>: 172.29.77.31/32
      creationTimestamp: "2022-02-04T21:01:39Z"
      labels:
        domain: development
        execution-id: byn0i5mak6
        interruptible: "false"
        node-id: k8ssparkpysparkpihellospark
        project: flytesnacks
        shard-key: "11"
        task-name: k8s-spark-pyspark-pi-hello-spark
        workflow-name: flytegen-k8s-spark-pyspark-pi-hello-spark
      name: byn0i5mak6-f3cpp6wq-0
      namespace: flytesnacks-development
      ownerReferences:
      - apiVersion: <http://flyte.lyft.com/v1alpha1|flyte.lyft.com/v1alpha1>
        blockOwnerDeletion: true
        controller: true
        kind: flyteworkflow
        name: byn0i5mak6
        uid: a0c7d530-8416-40de-9a70-11024bc7a7a6
      resourceVersion: "1081142"
      uid: 731da94e-2786-4d23-9366-ebe7a841f62e
    spec:
      containers:
      - args:
        - pyflyte-execute
        - --inputs
        - <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-byn0i5mak6/k8ssparkpysparkpihellospark/data/inputs.pb>
        - --output-prefix
        - <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-byn0i5mak6/k8ssparkpysparkpihellospark/data/0>
        - --raw-output-data-prefix
        - <s3://my-s3-bucket/e3/byn0i5mak6-f3cpp6wq-0>
        - --resolver
        - flytekit.core.python_auto_container.default_task_resolver
        - --
        - task-module
        - k8s_spark.pyspark_pi
        - task-name
        - hello_spark
        env:
        - name: FLYTE_INTERNAL_IMAGE
          value: rodrigobaron/flyte:0.0.4
        - name: FLYTE_INTERNAL_EXECUTION_WORKFLOW
          value: flytesnacks:development:.flytegen.k8s_spark.pyspark_pi.hello_spark
        - name: FLYTE_INTERNAL_EXECUTION_ID
          value: byn0i5mak6
        - name: FLYTE_INTERNAL_EXECUTION_PROJECT
          value: flytesnacks
        - name: FLYTE_INTERNAL_EXECUTION_DOMAIN
          value: development
        - name: FLYTE_ATTEMPT_NUMBER
          value: "0"
        - name: FLYTE_INTERNAL_TASK_PROJECT
          value: flytesnacks
        - name: FLYTE_INTERNAL_TASK_DOMAIN
          value: development
        - name: FLYTE_INTERNAL_TASK_NAME
          value: k8s_spark.pyspark_pi.hello_spark
        - name: FLYTE_INTERNAL_TASK_VERSION
          value: v1
        - name: FLYTE_INTERNAL_PROJECT
          value: flytesnacks
        - name: FLYTE_INTERNAL_DOMAIN
          value: development
        - name: FLYTE_INTERNAL_NAME
          value: k8s_spark.pyspark_pi.hello_spark
        - name: FLYTE_INTERNAL_VERSION
          value: v1
        - name: FLYTE_AWS_ENDPOINT
          value: <http://192.168.0.222:30084>
        - name: FLYTE_AWS_ACCESS_KEY_ID
          value: minio
        - name: FLYTE_AWS_SECRET_ACCESS_KEY
          value: miniostorage
        image: rodrigobaron/flyte:0.0.4
        imagePullPolicy: IfNotPresent
        name: byn0i5mak6-f3cpp6wq-0
        resources:
          limits:
            cpu: 100m
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          name: kube-api-access-hg2dw
          readOnly: true
      dnsPolicy: ClusterFirst
      enableServiceLinks: true
      nodeName: k8s
      preemptionPolicy: PreemptLowerPriority
      priority: 0
      restartPolicy: Never
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: default
      serviceAccountName: default
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoExecute
        key: <http://node.kubernetes.io/not-ready|node.kubernetes.io/not-ready>
        operator: Exists
        tolerationSeconds: 300
      - effect: NoExecute
        key: <http://node.kubernetes.io/unreachable|node.kubernetes.io/unreachable>
        operator: Exists
        tolerationSeconds: 300
      volumes:
      - name: kube-api-access-hg2dw
        projected:
          defaultMode: 420
          sources:
          - serviceAccountToken:
              expirationSeconds: 3607
              path: token
          - configMap:
              items:
              - key: ca.crt
                path: ca.crt
              name: kube-root-ca.crt
          - downwardAPI:
              items:
              - fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
                path: namespace
    status:
      conditions:
      - lastProbeTime: null
        lastTransitionTime: "2022-02-04T21:01:39Z"
        status: "True"
        type: Initialized
      - lastProbeTime: null
        lastTransitionTime: "2022-02-04T21:01:42Z"
        status: "True"
        type: Ready
      - lastProbeTime: null
        lastTransitionTime: "2022-02-04T21:01:42Z"
        status: "True"
        type: ContainersReady
      - lastProbeTime: null
        lastTransitionTime: "2022-02-04T21:01:39Z"
        status: "True"
        type: PodScheduled
      containerStatuses:
      - containerID: <docker://184942b452022d150ab107adbe2134be7621907b834cb0c25cc86754bb710d0>4
        image: rodrigobaron/flyte:0.0.4
        imageID: <docker-pullable://rodrigobaron/flyte@sha256:bf082fbb2bb7626956d2976cb418c8fe7344c82279e7b628e3d18697c988e0c3>
        lastState: {}
        name: byn0i5mak6-f3cpp6wq-0
        ready: true
        restartCount: 0
        started: true
        state:
          running:
            startedAt: "2022-02-04T21:01:41Z"
      hostIP: 192.168.0.222
      phase: Running
      podIP: 172.29.77.31
      podIPs:
      - ip: 172.29.77.31
      qosClass: Guaranteed
      startTime: "2022-02-04T21:01:39Z"
    the pod is running the spark-operator:
    apiVersion: v1
    kind: Pod
    metadata:
      annotations:
        <http://cni.projectcalico.org/containerID|cni.projectcalico.org/containerID>: f4e801034ed42fe8954ef4a40ae13261719fc332ef14cd349e6ab039433fa365
        <http://cni.projectcalico.org/podIP|cni.projectcalico.org/podIP>: 172.29.77.32/32
        <http://cni.projectcalico.org/podIPs|cni.projectcalico.org/podIPs>: 172.29.77.32/32
        <http://prometheus.io/path|prometheus.io/path>: /metrics
        <http://prometheus.io/port|prometheus.io/port>: "10254"
        <http://prometheus.io/scrape|prometheus.io/scrape>: "true"
      creationTimestamp: "2022-02-04T14:17:10Z"
      generateName: flyte-sparkoperator-86bc9b4dc9-
      labels:
        <http://app.kubernetes.io/instance|app.kubernetes.io/instance>: flyte
        <http://app.kubernetes.io/name|app.kubernetes.io/name>: sparkoperator
        pod-template-hash: 86bc9b4dc9
      name: flyte-sparkoperator-86bc9b4dc9-npgxp
      namespace: default
      ownerReferences:
      - apiVersion: apps/v1
        blockOwnerDeletion: true
        controller: true
        kind: ReplicaSet
        name: flyte-sparkoperator-86bc9b4dc9
        uid: 71878b96-d5d4-473a-91bf-8b31488b3427
      resourceVersion: "997418"
      uid: 10d48af7-cb93-43d9-8be0-850e27ef167a
    spec:
      containers:
      - args:
        - -v=2
        - -logtostderr
        - -namespace=
        - -ingress-url-format=
        - -controller-threads=10
        - -resync-interval=30
        - -enable-batch-scheduler=false
        - -enable-metrics=true
        - -metrics-labels=app_type
        - -metrics-port=10254
        - -metrics-endpoint=/metrics
        - -metrics-prefix=
        - -enable-resource-quota-enforcement=false
        image: <http://gcr.io/spark-operator/spark-operator:v1beta2-1.2.0-3.0.0|gcr.io/spark-operator/spark-operator:v1beta2-1.2.0-3.0.0>
        imagePullPolicy: IfNotPresent
        name: sparkoperator
        ports:
        - containerPort: 10254
          name: metrics
          protocol: TCP
        resources:
          limits:
            cpu: 100m
            memory: 300Mi
          requests:
            cpu: 100m
            memory: 300Mi
        securityContext: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          name: kube-api-access-ppdjt
          readOnly: true
      dnsPolicy: ClusterFirst
      enableServiceLinks: true
      nodeName: k8s
      preemptionPolicy: PreemptLowerPriority
      priority: 0
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: flyte-sparkoperator
      serviceAccountName: flyte-sparkoperator
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoExecute
        key: <http://node.kubernetes.io/not-ready|node.kubernetes.io/not-ready>
        operator: Exists
        tolerationSeconds: 300
      - effect: NoExecute
        key: <http://node.kubernetes.io/unreachable|node.kubernetes.io/unreachable>
        operator: Exists
        tolerationSeconds: 300
      volumes:
      - name: kube-api-access-ppdjt
        projected:
          defaultMode: 420
          sources:
          - serviceAccountToken:
              expirationSeconds: 3607
              path: token
          - configMap:
              items:
              - key: ca.crt
                path: ca.crt
              name: kube-root-ca.crt
          - downwardAPI:
              items:
              - fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
                path: namespace
    status:
      conditions:
      - lastProbeTime: null
        lastTransitionTime: "2022-02-04T14:17:10Z"
        status: "True"
        type: Initialized
      - lastProbeTime: null
        lastTransitionTime: "2022-02-04T14:17:14Z"
        status: "True"
        type: Ready
      - lastProbeTime: null
        lastTransitionTime: "2022-02-04T14:17:14Z"
        status: "True"
        type: ContainersReady
      - lastProbeTime: null
        lastTransitionTime: "2022-02-04T14:17:10Z"
        status: "True"
        type: PodScheduled
      containerStatuses:
      - containerID: <docker://6a7e8c352ffdc324c549815e6327c58f87b80828b72db9fb0980945609c0152>4
        image: <http://gcr.io/spark-operator/spark-operator:v1beta2-1.2.0-3.0.0|gcr.io/spark-operator/spark-operator:v1beta2-1.2.0-3.0.0>
        imageID: <docker-pullable://gcr.io/spark-operator/spark-operator@sha256:a8bb2e06fce6c3b140d952fd978a3044d55e34e7e8fb6f510e095549f90ee6d2>
        lastState: {}
        name: sparkoperator
        ready: true
        restartCount: 0
        started: true
        state:
          running:
            startedAt: "2022-02-04T14:17:14Z"
      hostIP: 192.168.0.222
      phase: Running
      podIP: 172.29.77.32
      podIPs:
      - ip: 172.29.77.32
      qosClass: Guaranteed
      startTime: "2022-02-04T14:17:10Z"
    Yee

    Yee

    7 months ago
    and remind me what version of flytekit you have?
    oh and also, any obvious logs in the spark operator pod?
    like anything erroring etc
    just looking at the operator pod, not really seeing any differences. the resources you have seem a bit low but that’s it.
    Rodrigo Baron

    Rodrigo Baron

    7 months ago
    if increase the memory the task run successfully however don't have any new logs in the spark-operator pod about the spark job
    the most obvious thing is this log:
    + readarray -t SPARK_EXECUTOR_JAVA_OPTS
    + '[' -n '' ']'
    + '[' '' == 2 ']'
    + '[' '' == 3 ']'
    + '[' -n '' ']'
    + '[' -z ']'
    + case "$1" in
    + echo 'Non-spark-on-k8s command provided, proceeding in pass-through mode...'
    i'm using the lastest version of flytekit: 0.26.1
    hoo is
    spark
    missing in my conf
    enabled-plugins:
    this is the fix .. thanks for your support 🙂
    Yee

    Yee

    7 months ago
    sorry about this. glad you figured it out.
    didn’t even consider this, definitely should have
    yeah it was weird that you were able to get the pod spec… typically what happens is that the worker pods are killed immediately by the driver