Infrastructure orchestration doesn’t have to be so hard. Learn how to use Ray from within Flyte without changing the core of your code in this blog by Union.ai Backend Systems Engineer @Kevin Su.
"With Ray native support in Flyte, users can easily transition from prototyping to production and rely on Flyte to automatically manage the Ray cluster lifecycle."
- Keshi Dai (Spotify)
Ray and Flyte: Distributed Computing and Orchestration
08/26/2022, 12:18 PM
our flyte stuff is running on aws us-east-2 atm and im looking into a way of doing trainig on gcp (because aws only has 8x a100, and bad gpus only on us-east-2 😞 )
could i use this here? run ray on gcp with gpus and start it from flyte, passing eg. s3 paths for input/output data?
before this, my idea was to use the k8s api to create/manage pods on our gcp cluster manually but this might be better (or smth like https://cloud.google.com/sdk/gcloud/reference/ai-platform/jobs/submit/training)
08/29/2022, 5:28 PM
08/29/2022, 5:55 PM
If you install propeller on gcp cluster as well, you are able to submit the ray job to gcp cluster. @Yee Correct me if I’m wrong.
08/29/2022, 6:07 PM
yeah, this seems really complicated. so currently (and keshi had this question too) you need to have propeller running in the same cluster as the ray operator. the plugin today when called by propeller will always create the crd in the same cluster as it itself (propeller) is running in.
but even if that were not the case, and the plugin or a plugin made to monitor a ray job crd in another cluster, there’s still the question of data right? which would be a problem regardless of what compute technology you’re using - mainly permissions and cost, depending on the size of the data.