Infrastructure orchestration doesn t have to be so hard Lear Flyte #announcements

Infrastructure orchestration doesn’t have to be so...

thankful-church-46366

08/25/2022, 4:49 PM

Infrastructure orchestration doesn’t have to be so hard. Learn how to use Ray from within Flyte without changing the core of your code in this blog by Union.ai Backend Systems Engineer @glamorous-carpet-83516. "With Ray native support in Flyte, users can easily transition from prototyping to production and rely on Flyte to automatically manage the Ray cluster lifecycle." - Keshi Dai (Spotify) Ray and Flyte: Distributed Computing and Orchestration

👍 6

🔥 4

🦜 3

⚡ 3

flyte 3

elegant-petabyte-32634

08/26/2022, 12:18 PM

our flyte stuff is running on aws us-east-2 atm and im looking into a way of doing trainig on gcp (because aws only has 8x a100, and bad gpus only on us-east-2 😞 ) could i use this here? run ray on gcp with gpus and start it from flyte, passing eg. s3 paths for input/output data? before this, my idea was to use the k8s api to create/manage pods on our gcp cluster manually but this might be better (or smth like https://cloud.google.com/sdk/gcloud/reference/ai-platform/jobs/submit/training)

thankful-church-46366

08/29/2022, 5:28 PM

@thankful-minister-83577 @glamorous-carpet-83516

glamorous-carpet-83516

08/29/2022, 5:55 PM

If you install propeller on gcp cluster as well, you are able to submit the ray job to gcp cluster. @thankful-minister-83577 Correct me if I’m wrong.

thankful-minister-83577

08/29/2022, 6:07 PM

yeah, this seems really complicated. so currently (and keshi had this question too) you need to have propeller running in the same cluster as the ray operator. the plugin today when called by propeller will always create the crd in the same cluster as it itself (propeller) is running in.

👍 1

thankful-minister-83577

08/29/2022, 6:09 PM

but even if that were not the case, and the plugin or a plugin made to monitor a ray job crd in another cluster, there’s still the question of data right? which would be a problem regardless of what compute technology you’re using - mainly permissions and cost, depending on the size of the data.

160 Views

Open in Slack

Previous Next