Thread
#announcements
    William Young

    William Young

    5 months ago
    Hey folks. I was wondering if Flyte has support for writing spark jobs in Scala. I see a lot of examples for pyspark using the python flyte api, but I want to use flyte-scala to write scala spark pipelines, and documentation seems kinda sparse on this topic.
    Ketan (kumare3)

    Ketan (kumare3)

    5 months ago
    this does not exist yet
    but if you folks are interested we can probably get something in later part of april
    this is completely doable
    cc @Guillaume Perchais / @Babis Kiosidis / @Nelson Arapé from the Spotify team who help us drive the java/scala sdk
    @William Young do you work for spotify 😄 If so we should definitely prioritize this. There are lot of folks in teh community who want this 😄.
    William Young

    William Young

    5 months ago
    yes I do!
    It would make a lot of folks in my part of the org very happy if this were possible. 🙂
    Implementation-wise, we are looking at three possibilities: running Spark on Databricks, Dataproc, and K8. Preferrably all 3. I think the first two would just be package up a jar, and make api call. Don’t know much about K8 though.
    Ketan (kumare3)

    Ketan (kumare3)

    5 months ago
    ya the first 2 should be trivial, but databricks and dataproc plugins need to be implemented
    and for k8s is also all the backend work is already done
    just need a java plugin
    i acutally gave instructions to someone
    let me see if i can share the issue / docs?
    this is pretty easy to do
    William Young

    William Young

    5 months ago
    I took a quick look. I think the Databricks one might be harder as they don’t have a java api that I can find. Dataproc of course does have one though, and I’m guessing the implementation would be somewhat similar to the Dataflow one we already have. Would need guidance on K8. cc@Mark Grey
    Ketan (kumare3)

    Ketan (kumare3)

    5 months ago
    K8 should not be hard
    We actually want to implement a backend plugin for data to
    Databricks
    Would you folks have a few minutes to sync?
    William Young

    William Young

    5 months ago
    So the folks you mentioned above are probably the actual owners of the Flyte infrastructure within Spotify. I am working with Mark (mentioned above) who is closer to them organizationally than I am. I am just a motivated potential user. But given that Spark is not a “blessed” technology within Spotify yet, unsure as to their roadmap. I’d be happy to take a look into this myself if I had guidance from one of the above though. Might require an internal discussion as I don’t know the code structure well enough yet to know how much of this is a potential open source contribution, and how much is Spotify specific.
    Ketan (kumare3)

    Ketan (kumare3)

    5 months ago
    More from a point of view to understand what is the goal, how to maximize impact in short term
    William Young

    William Young

    5 months ago
    Will get back in the next week or two? Have to have some internal discussions first, but I still really wanna do this.
    Ketan (kumare3)

    Ketan (kumare3)

    5 months ago
    this is fantastic
    we should be able to help with this too