I just started with flyte and I am building a work...
# flyte-support
c
I just started with flyte and I am building a workflow which has a task that requires inference using a large model. How do I go about reducing the cold start of this task by not loading the model from disk to gpu every time it gets invoked? Is there a way to set a task/pod to stay warm (I dont mind it blocking a set of gpus)? Any pointers are appreciated
f
This is available on union today not on Flyte
You can pin models to memory and use it across multiple tasks even is different workflows to it
Also load vllm etc
c
In flyte or union?
f
Union
c
Is union open source that I can run on my cluster?
f
Union is not open source but can be deployed to your cluster
It’s the commercial version
c
Thank you