Ray is for distributed computation tasks, its similar to Spark.
Essentially, imagine you have a CSV that is 100GB and you want to do some processing on it. You’re probably not going to want to load that into memory on 1 machine.
With Ray, we can split that up and put a fraction of it on any number of machines, its generally easier to get lots of small machine than it is to get one massive machine.
Within Flyte we can integrate using Ray for tasks into the rest of a pipeline. It can automatically spin up a Ray cluster of whatever size we want, execute whatever we want it to do, output the data or whatever we output to the next Flyte task, and shut down the Ray cluster that we no longer need.
IMO its easiest to understand Ray in terms of big datasets, but you can obviously use it for training jobs etc.