I'm trying to integrate Spark with Flyte. I have a...
# ask-the-community
d
I'm trying to integrate Spark with Flyte. I have an example workflow set up with a regular Python Flyte task, as well as a Spark task. I am using
apache/spark:3.5.0-python3
as the Docker image for the step that is running the Spark job. When I run the workflow, the spark step fails with an error message like this:
Copy code
24/06/03 15:07:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
/usr/bin/python3: can't open file '/opt/venv/bin/entrypoint.py': [Errno 2] No such file or directory
The regular Spark image doesn't seem to have that entrypoint.py script, so I can only assume this integration is supposed to use a Flyte-specific Spark image of some sort. I tried using the Flyte image
<http://ghcr.io/flyteorg/flytekit:py3.12-1.12.0|ghcr.io/flyteorg/flytekit:py3.12-1.12.0>
, which I use for my regular Python tasks, but that one doesn't seem to have Spark installed on it. Is there a particular image that must be used with Spark tasks in Flyte? I can't figure out where this entrypoint script that the Flyte task is using should be coming from.
e
yes, can you give
<http://ghcr.io/flyteorg/flytekit:spark-1.12.0|ghcr.io/flyteorg/flytekit:spark-1.12.0>
a try?
s
spark image should also be used automatically when you use image spec: https://docs.flyte.org/en/latest/flytesnacks/examples/k8s_spark_plugin/pyspark_pi.html#spark-task
d
HI @Eduardo Apolinario (eapolinario) I tried out that image and am getting a similar error with it:
Copy code
24/06/04 15:24:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
/usr/bin/python3: can't open file '/opt/venv/bin/entrypoint.py': [Errno 2] No such file or directory
Do you expect that script to be present in this image?