Hi! I have kind of an open-ended ask, but I was wondering if anyone has experience using Flyte in production with purpose-built batch schedulers like Apache Yunikorn or Volcano? If so, I'd love to chat to understand more about your experience and any pragmatic details for what the integration process looks like. Bonus points if you're also using any Flyte plugins that rely on operators to create ephemeral clusters -- e.g., Spark or Ray -- and might have some insights to share about how those interact w/ a batch scheduler.
05/31/2023, 9:48 PM
I've given this a brief glance in the past to look into how to support gang scheduling (I think Volcano supports this), which would be awesome for our Flyte (& Argo) workloads. Haven't gotten too deep in the weeds though for a POC, certainly haven't used it in prod yet
06/01/2023, 2:02 AM
Yes we have used volcano in the past
If you see there is actually a core support to use any scheduler - for advanced usecases you can use pod templates
I do think eventual goal is to optimize global schedules with cooperative scheduling