• Rodrigo Baron

    Rodrigo Baron

    6 months ago
    I've deployed flyte using kubeadm along with spark-k8s-operator which run successful this example, did try run the example from flytesnacks and got some issues (logs in 🧵). Someone know about
    Non-spark-on-k8s command provided
    ?
  • ++ id -u
    + myuid=0
    ++ id -g
    + mygid=0
    + set +e
    ++ getent passwd 0
    + uidentry=root:x:0:0:root:/root:/bin/bash
    + set -e
    + '[' -z root:x:0:0:root:/root:/bin/bash ']'
    + SPARK_CLASSPATH=':/opt/spark/jars/*'
    + env
    + grep SPARK_JAVA_OPT_
    + sort -t_ -k4 -n
    + sed 's/[^=]*=\(.*\)/\1/g'
    + readarray -t SPARK_EXECUTOR_JAVA_OPTS
    + '[' -n '' ']'
    + '[' '' == 2 ']'
    + '[' '' == 3 ']'
    + '[' -n '' ']'
    + '[' -z ']'
    + case "$1" in
    + echo 'Non-spark-on-k8s command provided, proceeding in pass-through mode...'
    + CMD=("$@")
    Non-spark-on-k8s command provided, proceeding in pass-through mode...
    + exec /usr/bin/tini -s -- pyflyte-execute --inputs <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-p5jty2p9e5/k8ssparkpysparkpihellospark/data/inputs.pb> --output-prefix <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-p5jty2p9e5/k8ssparkpysparkpihellospark/data/0> --raw-output-data-prefix <s3://my-s3-bucket/ko/p5jty2p9e5-f3cpp6wq-0> --resolver flytekit.core.python_auto_container.default_task_resolver -- task-module k8s_spark.pyspark_pi task-name hello_spark
    Welcome to Flyte! Version: 0.22.2
    Attempting to run with flytekit.core.python_auto_container.default_task_resolver...
    WARNING:root:No config file provided or invalid flyte config_file_path flytekit.config specified.
    Using user directory /tmp/flyte/20220204_165445/sandbox/local_flytekit/a504ea1e3e3c62771228ed2d83186827
    {"asctime": "2022-02-04 16:54:57,721", "name": "flytekit", "levelname": "DEBUG", "message": "Task returns unnamed native tuple <class 'float'>"}
    DEBUG:flytekit:Task returns unnamed native tuple <class 'float'>
    {"asctime": "2022-02-04 16:54:57,722", "name": "flytekit", "levelname": "DEBUG", "message": "Task returns unnamed native tuple <class 'int'>"}
    DEBUG:flytekit:Task returns unnamed native tuple <class 'int'>
    {"asctime": "2022-02-04 16:54:57,821", "name": "flytekit", "levelname": "DEBUG", "message": "Task returns unnamed native tuple <class 'float'>"}
    DEBUG:flytekit:Task returns unnamed native tuple <class 'float'>
    No images specified, will use the default image
    Running native-typed task
    INFO:root:Entering timed context: Copying (<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-p5jty2p9e5/k8ssparkpysparkpihellospark/data/inputs.pb> -> /tmp/flyte7c4gkp2a/local_flytekit/inputs.pb)
    INFO:root:Output of command '['aws', '--endpoint-url', '<http://192.168.0.222:30084>', 's3', 'cp', '<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-p5jty2p9e5/k8ssparkpysparkpihellospark/data/inputs.pb>', '/tmp/flyte7c4gkp2a/local_flytekit/inputs.pb']':
    b'Completed 22 Bytes/22 Bytes (248 Bytes/s) with 1 file(s) remaining\rdownload: <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-p5jty2p9e5/k8ssparkpysparkpihellospark/data/inputs.pb> to ../tmp/flyte7c4gkp2a/local_flytekit/inputs.pb\n'
    
    INFO:root:Exiting timed context: Copying (<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-p5jty2p9e5/k8ssparkpysparkpihellospark/data/inputs.pb> -> /tmp/flyte7c4gkp2a/local_flytekit/inputs.pb) [Wall Time: 9.195532751000428s, Process Time: 0.0055688540000000675s]
    22/02/04 16:56:07 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    INFO:py4j.java_gateway:Error while receiving.
    Traceback (most recent call last):
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 1207, in send_command
        raise Py4JNetworkError("Answer from Java side is empty")
    py4j.protocol.Py4JNetworkError: Answer from Java side is empty
    ERROR:root:Exception while sending command.
    Traceback (most recent call last):
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 1207, in send_command
        raise Py4JNetworkError("Answer from Java side is empty")
    py4j.protocol.Py4JNetworkError: Answer from Java side is empty
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 1033, in send_command
        response = connection.send_command(command)
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 1211, in send_command
        raise Py4JNetworkError(
    py4j.protocol.Py4JNetworkError: Error while receiving
    ERROR:root:!! Begin System Error Captured by Flyte !!
    ERROR:root:Traceback (most recent call last):
    
          File "/opt/venv/lib/python3.8/site-packages/flytekit/common/exceptions/scopes.py", line 165, in system_entry_point
            return wrapped(*args, **kwargs)
          File "/opt/venv/lib/python3.8/site-packages/flytekit/core/base_task.py", line 442, in dispatch_execute
            new_user_params = self.pre_execute(ctx.user_space_params)
          File "/opt/venv/lib/python3.8/site-packages/flytekitplugins/spark/task.py", line 122, in pre_execute
            self.sess = sess_builder.getOrCreate()
          File "/opt/venv/lib/python3.8/site-packages/pyspark/sql/session.py", line 228, in getOrCreate
            sc = SparkContext.getOrCreate(sparkConf)
          File "/opt/venv/lib/python3.8/site-packages/pyspark/context.py", line 384, in getOrCreate
            SparkContext(conf=conf or SparkConf())
          File "/opt/venv/lib/python3.8/site-packages/pyspark/context.py", line 146, in __init__
            self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
          File "/opt/venv/lib/python3.8/site-packages/pyspark/context.py", line 209, in _do_init
            self._jsc = jsc or self._initialize_context(self._conf._jconf)
          File "/opt/venv/lib/python3.8/site-packages/pyspark/context.py", line 321, in _initialize_context
            return self._jvm.JavaSparkContext(jconf)
          File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 1568, in __call__
            return_value = get_return_value(
          File "/opt/venv/lib/python3.8/site-packages/py4j/protocol.py", line 334, in get_return_value
            raise Py4JError(
    
    Message:
    
        An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext
    
    SYSTEM ERROR! Contact platform administrators.
    ERROR:root:!! End Error Captured by Flyte !!
    INFO:root:Entering timed context: Writing (/tmp/flyte7c4gkp2a/local_flytekit/engine_dir -> <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-p5jty2p9e5/k8ssparkpysparkpihellospark/data/0>)
    INFO:root:Output of command '['aws', '--endpoint-url', '<http://192.168.0.222:30084>', 's3', 'cp', '--recursive', '--acl', 'bucket-owner-full-control', '/tmp/flyte7c4gkp2a/local_flytekit/engine_dir', '<s3://my-s3-bucket/metadata/propeller/flytesnacks-development-p5jty2p9e5/k8ssparkpysparkpihellospark/data/0']'>:
    b'Completed 1.7 KiB/1.7 KiB (291.7 KiB/s) with 1 file(s) remaining\rupload: ../tmp/flyte7c4gkp2a/local_flytekit/engine_dir/error.pb to <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-p5jty2p9e5/k8ssparkpysparkpihellospark/data/0/error.pb>\n'
    
    INFO:root:Exiting timed context: Writing (/tmp/flyte7c4gkp2a/local_flytekit/engine_dir -> <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-p5jty2p9e5/k8ssparkpysparkpihellospark/data/0>) [Wall Time: 11.491473693000444s, Process Time: 0.009896259000000018s]
    INFO:root:Engine folder written successfully to the output prefix <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-p5jty2p9e5/k8ssparkpysparkpihellospark/data/0>
    ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:37755)
    Traceback (most recent call last):
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 977, in _get_connection
        connection = self.deque.pop()
    IndexError: pop from an empty deque
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 1115, in start
        self.socket.connect((self.address, self.port))
    ConnectionRefusedError: [Errno 111] Connection refused
    ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:37755)
    Traceback (most recent call last):
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 977, in _get_connection
        connection = self.deque.pop()
    IndexError: pop from an empty deque
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 1115, in start
        self.socket.connect((self.address, self.port))
    ConnectionRefusedError: [Errno 111] Connection refused
    ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:37755)
    Traceback (most recent call last):
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 977, in _get_connection
        connection = self.deque.pop()
    IndexError: pop from an empty deque
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 1115, in start
        self.socket.connect((self.address, self.port))
    ConnectionRefusedError: [Errno 111] Connection refused
    ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:37755)
    Traceback (most recent call last):
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 977, in _get_connection
        connection = self.deque.pop()
    IndexError: pop from an empty deque
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "/opt/venv/lib/python3.8/site-packages/py4j/java_gateway.py", line 1115, in start
        self.socket.connect((self.address, self.port))
    ConnectionRefusedError: [Errno 111] Connection refused
  • Haytham Abuelfutuh

    Haytham Abuelfutuh

    6 months ago
    @Yee @Eduardo Apolinario (eapolinario) would you be able to help @Rodrigo Baron?
  • Yee

    Yee

    6 months ago
    hey @Rodrigo Baron when you get a chance, could you send us the pod specs for the spark operator you have running and the task pod that failed?
  • i’ll try that on my end as well and see what the differences are.
  • from there hopefully we can try to narrow it down
  • i’m not familiar with the setup prescribed by the py-pi yaml files (not that i’m terribly familiar with the helm chart we use either, but at least that’s one difference we can start to dig into)
  • Rodrigo Baron

    Rodrigo Baron

    6 months ago
    the pod is running the flyte task:
    apiVersion: v1
    kind: Pod
    metadata:
      annotations:
        <http://cluster-autoscaler.kubernetes.io/safe-to-evict|cluster-autoscaler.kubernetes.io/safe-to-evict>: "false"
        <http://cni.projectcalico.org/containerID|cni.projectcalico.org/containerID>: 501668ab8335b3661174264ea733d6a0589a733172a8ae3a388d8e99d434abfa
        <http://cni.projectcalico.org/podIP|cni.projectcalico.org/podIP>: 172.29.77.31/32
        <http://cni.projectcalico.org/podIPs|cni.projectcalico.org/podIPs>: 172.29.77.31/32
      creationTimestamp: "2022-02-04T21:01:39Z"
      labels:
        domain: development
        execution-id: byn0i5mak6
        interruptible: "false"
        node-id: k8ssparkpysparkpihellospark
        project: flytesnacks
        shard-key: "11"
        task-name: k8s-spark-pyspark-pi-hello-spark
        workflow-name: flytegen-k8s-spark-pyspark-pi-hello-spark
      name: byn0i5mak6-f3cpp6wq-0
      namespace: flytesnacks-development
      ownerReferences:
      - apiVersion: <http://flyte.lyft.com/v1alpha1|flyte.lyft.com/v1alpha1>
        blockOwnerDeletion: true
        controller: true
        kind: flyteworkflow
        name: byn0i5mak6
        uid: a0c7d530-8416-40de-9a70-11024bc7a7a6
      resourceVersion: "1081142"
      uid: 731da94e-2786-4d23-9366-ebe7a841f62e
    spec:
      containers:
      - args:
        - pyflyte-execute
        - --inputs
        - <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-byn0i5mak6/k8ssparkpysparkpihellospark/data/inputs.pb>
        - --output-prefix
        - <s3://my-s3-bucket/metadata/propeller/flytesnacks-development-byn0i5mak6/k8ssparkpysparkpihellospark/data/0>
        - --raw-output-data-prefix
        - <s3://my-s3-bucket/e3/byn0i5mak6-f3cpp6wq-0>
        - --resolver
        - flytekit.core.python_auto_container.default_task_resolver
        - --
        - task-module
        - k8s_spark.pyspark_pi
        - task-name
        - hello_spark
        env:
        - name: FLYTE_INTERNAL_IMAGE
          value: rodrigobaron/flyte:0.0.4
        - name: FLYTE_INTERNAL_EXECUTION_WORKFLOW
          value: flytesnacks:development:.flytegen.k8s_spark.pyspark_pi.hello_spark
        - name: FLYTE_INTERNAL_EXECUTION_ID
          value: byn0i5mak6
        - name: FLYTE_INTERNAL_EXECUTION_PROJECT
          value: flytesnacks
        - name: FLYTE_INTERNAL_EXECUTION_DOMAIN
          value: development
        - name: FLYTE_ATTEMPT_NUMBER
          value: "0"
        - name: FLYTE_INTERNAL_TASK_PROJECT
          value: flytesnacks
        - name: FLYTE_INTERNAL_TASK_DOMAIN
          value: development
        - name: FLYTE_INTERNAL_TASK_NAME
          value: k8s_spark.pyspark_pi.hello_spark
        - name: FLYTE_INTERNAL_TASK_VERSION
          value: v1
        - name: FLYTE_INTERNAL_PROJECT
          value: flytesnacks
        - name: FLYTE_INTERNAL_DOMAIN
          value: development
        - name: FLYTE_INTERNAL_NAME
          value: k8s_spark.pyspark_pi.hello_spark
        - name: FLYTE_INTERNAL_VERSION
          value: v1
        - name: FLYTE_AWS_ENDPOINT
          value: <http://192.168.0.222:30084>
        - name: FLYTE_AWS_ACCESS_KEY_ID
          value: minio
        - name: FLYTE_AWS_SECRET_ACCESS_KEY
          value: miniostorage
        image: rodrigobaron/flyte:0.0.4
        imagePullPolicy: IfNotPresent
        name: byn0i5mak6-f3cpp6wq-0
        resources:
          limits:
            cpu: 100m
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: FallbackToLogsOnError
        volumeMounts:
        - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          name: kube-api-access-hg2dw
          readOnly: true
      dnsPolicy: ClusterFirst
      enableServiceLinks: true
      nodeName: k8s
      preemptionPolicy: PreemptLowerPriority
      priority: 0
      restartPolicy: Never
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: default
      serviceAccountName: default
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoExecute
        key: <http://node.kubernetes.io/not-ready|node.kubernetes.io/not-ready>
        operator: Exists
        tolerationSeconds: 300
      - effect: NoExecute
        key: <http://node.kubernetes.io/unreachable|node.kubernetes.io/unreachable>
        operator: Exists
        tolerationSeconds: 300
      volumes:
      - name: kube-api-access-hg2dw
        projected:
          defaultMode: 420
          sources:
          - serviceAccountToken:
              expirationSeconds: 3607
              path: token
          - configMap:
              items:
              - key: ca.crt
                path: ca.crt
              name: kube-root-ca.crt
          - downwardAPI:
              items:
              - fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
                path: namespace
    status:
      conditions:
      - lastProbeTime: null
        lastTransitionTime: "2022-02-04T21:01:39Z"
        status: "True"
        type: Initialized
      - lastProbeTime: null
        lastTransitionTime: "2022-02-04T21:01:42Z"
        status: "True"
        type: Ready
      - lastProbeTime: null
        lastTransitionTime: "2022-02-04T21:01:42Z"
        status: "True"
        type: ContainersReady
      - lastProbeTime: null
        lastTransitionTime: "2022-02-04T21:01:39Z"
        status: "True"
        type: PodScheduled
      containerStatuses:
      - containerID: <docker://184942b452022d150ab107adbe2134be7621907b834cb0c25cc86754bb710d0>4
        image: rodrigobaron/flyte:0.0.4
        imageID: <docker-pullable://rodrigobaron/flyte@sha256:bf082fbb2bb7626956d2976cb418c8fe7344c82279e7b628e3d18697c988e0c3>
        lastState: {}
        name: byn0i5mak6-f3cpp6wq-0
        ready: true
        restartCount: 0
        started: true
        state:
          running:
            startedAt: "2022-02-04T21:01:41Z"
      hostIP: 192.168.0.222
      phase: Running
      podIP: 172.29.77.31
      podIPs:
      - ip: 172.29.77.31
      qosClass: Guaranteed
      startTime: "2022-02-04T21:01:39Z"
  • the pod is running the spark-operator:
    apiVersion: v1
    kind: Pod
    metadata:
      annotations:
        <http://cni.projectcalico.org/containerID|cni.projectcalico.org/containerID>: f4e801034ed42fe8954ef4a40ae13261719fc332ef14cd349e6ab039433fa365
        <http://cni.projectcalico.org/podIP|cni.projectcalico.org/podIP>: 172.29.77.32/32
        <http://cni.projectcalico.org/podIPs|cni.projectcalico.org/podIPs>: 172.29.77.32/32
        <http://prometheus.io/path|prometheus.io/path>: /metrics
        <http://prometheus.io/port|prometheus.io/port>: "10254"
        <http://prometheus.io/scrape|prometheus.io/scrape>: "true"
      creationTimestamp: "2022-02-04T14:17:10Z"
      generateName: flyte-sparkoperator-86bc9b4dc9-
      labels:
        <http://app.kubernetes.io/instance|app.kubernetes.io/instance>: flyte
        <http://app.kubernetes.io/name|app.kubernetes.io/name>: sparkoperator
        pod-template-hash: 86bc9b4dc9
      name: flyte-sparkoperator-86bc9b4dc9-npgxp
      namespace: default
      ownerReferences:
      - apiVersion: apps/v1
        blockOwnerDeletion: true
        controller: true
        kind: ReplicaSet
        name: flyte-sparkoperator-86bc9b4dc9
        uid: 71878b96-d5d4-473a-91bf-8b31488b3427
      resourceVersion: "997418"
      uid: 10d48af7-cb93-43d9-8be0-850e27ef167a
    spec:
      containers:
      - args:
        - -v=2
        - -logtostderr
        - -namespace=
        - -ingress-url-format=
        - -controller-threads=10
        - -resync-interval=30
        - -enable-batch-scheduler=false
        - -enable-metrics=true
        - -metrics-labels=app_type
        - -metrics-port=10254
        - -metrics-endpoint=/metrics
        - -metrics-prefix=
        - -enable-resource-quota-enforcement=false
        image: <http://gcr.io/spark-operator/spark-operator:v1beta2-1.2.0-3.0.0|gcr.io/spark-operator/spark-operator:v1beta2-1.2.0-3.0.0>
        imagePullPolicy: IfNotPresent
        name: sparkoperator
        ports:
        - containerPort: 10254
          name: metrics
          protocol: TCP
        resources:
          limits:
            cpu: 100m
            memory: 300Mi
          requests:
            cpu: 100m
            memory: 300Mi
        securityContext: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          name: kube-api-access-ppdjt
          readOnly: true
      dnsPolicy: ClusterFirst
      enableServiceLinks: true
      nodeName: k8s
      preemptionPolicy: PreemptLowerPriority
      priority: 0
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: flyte-sparkoperator
      serviceAccountName: flyte-sparkoperator
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoExecute
        key: <http://node.kubernetes.io/not-ready|node.kubernetes.io/not-ready>
        operator: Exists
        tolerationSeconds: 300
      - effect: NoExecute
        key: <http://node.kubernetes.io/unreachable|node.kubernetes.io/unreachable>
        operator: Exists
        tolerationSeconds: 300
      volumes:
      - name: kube-api-access-ppdjt
        projected:
          defaultMode: 420
          sources:
          - serviceAccountToken:
              expirationSeconds: 3607
              path: token
          - configMap:
              items:
              - key: ca.crt
                path: ca.crt
              name: kube-root-ca.crt
          - downwardAPI:
              items:
              - fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
                path: namespace
    status:
      conditions:
      - lastProbeTime: null
        lastTransitionTime: "2022-02-04T14:17:10Z"
        status: "True"
        type: Initialized
      - lastProbeTime: null
        lastTransitionTime: "2022-02-04T14:17:14Z"
        status: "True"
        type: Ready
      - lastProbeTime: null
        lastTransitionTime: "2022-02-04T14:17:14Z"
        status: "True"
        type: ContainersReady
      - lastProbeTime: null
        lastTransitionTime: "2022-02-04T14:17:10Z"
        status: "True"
        type: PodScheduled
      containerStatuses:
      - containerID: <docker://6a7e8c352ffdc324c549815e6327c58f87b80828b72db9fb0980945609c0152>4
        image: <http://gcr.io/spark-operator/spark-operator:v1beta2-1.2.0-3.0.0|gcr.io/spark-operator/spark-operator:v1beta2-1.2.0-3.0.0>
        imageID: <docker-pullable://gcr.io/spark-operator/spark-operator@sha256:a8bb2e06fce6c3b140d952fd978a3044d55e34e7e8fb6f510e095549f90ee6d2>
        lastState: {}
        name: sparkoperator
        ready: true
        restartCount: 0
        started: true
        state:
          running:
            startedAt: "2022-02-04T14:17:14Z"
      hostIP: 192.168.0.222
      phase: Running
      podIP: 172.29.77.32
      podIPs:
      - ip: 172.29.77.32
      qosClass: Guaranteed
      startTime: "2022-02-04T14:17:10Z"
  • Yee

    Yee

    6 months ago
    and remind me what version of flytekit you have?
  • oh and also, any obvious logs in the spark operator pod?
  • like anything erroring etc
  • just looking at the operator pod, not really seeing any differences. the resources you have seem a bit low but that’s it.
  • Rodrigo Baron

    Rodrigo Baron

    6 months ago
    if increase the memory the task run successfully however don't have any new logs in the spark-operator pod about the spark job
  • the most obvious thing is this log:
    + readarray -t SPARK_EXECUTOR_JAVA_OPTS
    + '[' -n '' ']'
    + '[' '' == 2 ']'
    + '[' '' == 3 ']'
    + '[' -n '' ']'
    + '[' -z ']'
    + case "$1" in
    + echo 'Non-spark-on-k8s command provided, proceeding in pass-through mode...'
  • i'm using the lastest version of flytekit: 0.26.1
  • hoo is
    spark
    missing in my conf
    enabled-plugins:
  • this is the fix .. thanks for your support 🙂
  • Yee

    Yee

    6 months ago
    sorry about this. glad you figured it out.
  • didn’t even consider this, definitely should have
  • yeah it was weird that you were able to get the pod spec… typically what happens is that the worker pods are killed immediately by the driver