Hi, trying to plan out the developer experience fo...
# ask-the-community
Hi, trying to plan out the developer experience for people working in a flyte setup where we'd be making heavier use of pre-registered workflows/tasks (reference_task decorator). That pushes it towards using local k8s which is fine but my first experiments suggest it's pretty slow, adding something like 10+s per task. Any guide as to why that might be?
hi - would you mind elaborating a bit? I’m a bit confused… the thing is using these reference entities actually skips a step in the registration process. if anything it should be faster. why do you say that it pushes towards local k8s?
are you fetching tasks from a broader corporate flyte backend, and then testing executions locally with a local k8s?
hi, no I'm registering & running on local k8s, and get this kind of delay
the tasks I'm running are the demo ones from the docs, like creating a tiny dataframe and doing quick stats on it
and I'm running linux and know that I can start docker containers in ms
to simplify things, how long you expect https://docs.flyte.org/en/latest/getting_started/index.html#executing-workflows-on-a-flyte-cluster to take, with the cluster already running?
it's taking me 25s to run testing now though I think I've seen 50s. 25s for two tasks in a workflow doing nearly no work is a lot
the broader point is that if it's going to take minutes to hours to run basic multi-step workloads on a local cluster due to just pure overhead I have to do something about the developer experience
Hi @Ian Calvert so the 10s+ of overhead you're seeing is a little unusual. We expect overheads in a production environment to be in the seconds range, typically 3-4s ish. Are you running the sandbox / demo env locally? It's worth noting that this is not optimized for performance and therefore is significantly less performant thank an EKS or AWS deployment.
Also, what version of Flyte are you currently running?
yeah this is the demo env, I'm surprised that there's extra latency running locally compared to eks since there's no provisioning of machines required. I'll try the sandbox and see if that's any different
yeah, so demo and sandbox performance will be similar. the performance difference is not a construct of machine provisioning, etc. rather the Flyte component configuration is designed for a lightweight environment.
flytectl reports 0.6.14, flytekit in python is 1.1.3. I'll update to 1.2 as I might as well while I'm setting things up
interesting, thanks, I'll have a look at if I can setup a more prod-like instance on my machine
sure, let me know if you want to disucss anything. i'm always up for diving into Flyte performance 🙂. the main things to look for are flyte node runtimes and comparrison to k8s pod and container durations. like i mentioned, this overhead should be measureable in seconds, if it's not then we can look at modifying some configuration.
so the case of a single task with a relatively short execution time (ie. seconds) is the worst case Flyte overhead as well. for most use-cases this small amount of overhead is easily ammortized over a large number of tasks executing in parallel.
thanks, really appreciate the help - I'd like to take a dive into that sometime, I'm expecting a bunch of the workflows to be longer chains where each actual execution is pretty fast because a key part is the sharing of already deployed reference tasks. I think my next steps are to understand the reference_task flow and see if I can dynamically switch between local code and reference tasks so that development can happen at a reasonable pace. I'll have to try out prefect and see which requires me to do less hackery
@Ian Calvert sounds great! Please ping if there's anything we can help with.