Hey! Has anyone gotten apache-beam to run on Flyte natively? I was going to explore this a bit, but I wanted to see if anyone here has thought through this.
It could be possible to use the spark components to run apache-beam instead of writing a custom backend.
09/07/2022, 12:38 AM
but, we have not explored. Would love to understand the usecase
09/07/2022, 1:45 PM
Apache beam is really good for model inference if you are using python ml packages. The sdk is in python so like no annoying spark Java errors or needing to use special python UDFs or building spark transformers.
I want to use it to batch score models in parallel using this https://beam.apache.org/documentation/sdks/python-machine-learning/.
Also using GCPs dataflow is cool because auto scaling features are incredible. Vertical and horizontal scaling. Plus you can use GPUs.
I will see if I can run it using the spark backend…would be so cool if it just kind of worked
09/07/2022, 2:41 PM
With spark backend it should
But I want to understand wdym by flyte backend for beam
09/07/2022, 2:45 PM
I just meant like a way to spin up and run Apache beam on Flyte.
Similar to how spark works
But beam. If spark doesn't work then I know people use Flink, which might support more features around streaming, but I don't really need them…maybe one day!