Hi :slightly_smiling_face: We are trying to build ...
# ask-the-community
t
Hi 🙂 We are trying to build a Spark Task, and we noticed some behaviors that we don’t quite understand. When running the code from this example : https://docs.flyte.org/projects/cookbook/en/stable/auto/integrations/kubernetes/k8s_spark/pyspark_pi.html , everything works fine. We would like to leverage Fast registration (https://docs.flyte.org/projects/cookbook/en/latest/auto/deployment/deploying_workflows.html#fast-registration) and use it to avoid having to rebuild/repush a Docker image. But then, if we try to change the code in, say, the function
f
, and then use
pyflyte --pkgs path.to.code package --image my_image --force --fast
to package the code and then upload and run the new version of the workflow, we notice that the behavior
f
is the same as before. As I understand it now, this could be because the Spark executors still have the previous code. Does this work as intended ? TL;DR : can we use
--fast
to avoid building / pushing Docker images for Spark tasks ?
k
after package you have to register it
may i recommend using
pyflyte run
directly
so
pyflyte run --remote --image <http://ghcr.io/flyteorg/flytecookbook:k8s_spark-43585feeccabc8a48452dc6838426f3acf4c6a9d|ghcr.io/flyteorg/flytecookbook:k8s_spark-43585feeccabc8a48452dc6838426f3acf4c6a9d> pyspark_pi.py my_spark --triggered_date now
t
Thank you for your answer, however I need to run the code remotely, so I use
flytectl register files --archive flyte-package.tgz etc.
with a correct project, domain and service account. Then I use the console to run the workflow. The behavior I am expecting : the new code packaged in my .tgz file should be used by the Spark executors, since I used the
--fast
tag. What I observed : the Spark driver code is updated, but the Spark executors code is the one from the image, not the one from the .tgz file. I wonder if this is expected behavior, and if it is, how can I register a workflow with new code without having to re-build a Docker image ?
k
This is not expected behavior
The code should be available to the driver
The executors should get it from the driver
But we would love help in any issues if you find them. Cc @Evan Sadler has fast registration working with spark or databricks?
e
I have only tried it with databricks
k
And it works right?
e
Yeah it worked. I needed to be careful that the destination directory worked with the entrypoint.py file on Databricks. I am not sure if it is similar for K8s spark. Something like this fixed it
pyflyte register -*-destination-dir .*
t
Thx, I will try to investigate this !
e
At least my issue was the image used for databricks had python running in directory that wasn’t
/root
or whatever I had set it to. Good luck!
t
Hi again ! Thanks again for you quick answers 🙂 I tried what you suggested using the flag -
--destinationDirectory
in my command `flytectl register etc.`which has the same effect as
pyflyte register --destination-dir etc.
. It did not solve my problem (as my code did actually run in
/root
) so I might write a GitHub issue later, unless this works 'as intended' or as a limitation of Spark (i.e. Spark executors should be expected to pull the image and use it 'as is' instead of using the updated code).
t
@Théo LACOUR Did you get around to making an issue? If you did, could you link it here? thanks!
t
Yes I did, here is the link https://github.com/flyteorg/flyte/issues/3338 🙂
y
@Théo LACOUR May I ask when packaging and registering the workflow are you doing it using local machine with Mac M1 chip. I’m asking since one of my colleague encountered exactly the same issue as you, while I’m not able to reproduce it on my laptop which is also Mac but with Intel chip🙂
101 Views