Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.

Flyte

I just started with flyte and I am building a workflow which has a task that requires inference using a large model. How do I go about reducing the cold start of this task by not loading the model from disk to gpu every time it gets invoked? Is there a way to set a task/pod to stay warm (I dont mind it blocking a set of gpus)? Any pointers are appreciated 

This is available on union today not on Flyte

You can pin models to memory and use it across multiple tasks even is different workflows to it

<https://docs.union.ai/byoc/user-guide/core-concepts/actors/|https://docs.union.ai/byoc/user-guide/core-concepts/actors/>

Is union open source that I can run on my cluster?

Union is not open source but can be deployed to your cluster