Hey folks. I was wondering if Flyte has support for writing spark jobs in Scala. I see a lot of examples for pyspark using the python flyte api, but I want to use flyte-scala to write scala spark pipelines, and documentation seems kinda sparse on this topic.
03/31/2022, 9:34 PM
this does not exist yet
but if you folks are interested we can probably get something in later part of april
this is completely doable
cc @Guillaume Perchais / @Babis Kiosidis / @Nelson Arapé from the Spotify team who help us drive the java/scala sdk
@William Young do you work for spotify 😄 If so we should definitely prioritize this. There are lot of folks in teh community who want this 😄.
03/31/2022, 9:38 PM
yes I do!
It would make a lot of folks in my part of the org very happy if this were possible. 🙂
Implementation-wise, we are looking at three possibilities: running Spark on Databricks, Dataproc, and K8. Preferrably all 3. I think the first two would just be package up a jar, and make api call. Don’t know much about K8 though.
03/31/2022, 11:04 PM
ya the first 2 should be trivial, but databricks and dataproc plugins need to be implemented
and for k8s is also all the backend work is already done
just need a java plugin
i acutally gave instructions to someone
let me see if i can share the issue / docs?
this is pretty easy to do
04/01/2022, 1:19 PM
I took a quick look. I think the Databricks one might be harder as they don’t have a java api that I can find. Dataproc of course does have one though, and I’m guessing the implementation would be somewhat similar to the Dataflow one we already have. Would need guidance on K8. cc@Mark Grey
04/01/2022, 1:24 PM
K8 should not be hard
We actually want to implement a backend plugin for data to
Would you folks have a few minutes to sync?
04/01/2022, 3:19 PM
So the folks you mentioned above are probably the actual owners of the Flyte infrastructure within Spotify. I am working with Mark (mentioned above) who is closer to them organizationally than I am. I am just a motivated potential user. But given that Spark is not a “blessed” technology within Spotify yet, unsure as to their roadmap. I’d be happy to take a look into this myself if I had guidance from one of the above though. Might require an internal discussion as I don’t know the code structure well enough yet to know how much of this is a potential open source contribution, and how much is Spotify specific.
04/01/2022, 3:21 PM
More from a point of view to understand what is the goal, how to maximize impact in short term
04/06/2022, 3:12 PM
Will get back in the next week or two? Have to have some internal discussions first, but I still really wanna do this.