Would anyone be able to point me in the direction ...
# flyte-support
c
Would anyone be able to point me in the direction of where Flyte’s DAG model lives in the source code repository?
h
So the CompiledWorkflowClosure is the protobuf definition of a workflow. It is the result of "compiling" a Workflow proto. Then FlyteAdmin calls the BuildFlyteWorkflow function to create a FlyteWorkflow CRD based on a
CompiledWorkflowClosure
and the input values to the workflow. The CRD is what FlytePropeller uses to execute a workflow.
Maybe a little more information than what you wanted - but depending on what you're looking for it can be in one of a few places.
c
@hallowed-mouse-14616 I appreciate all the info I can get, so thank you! I actually wrote my own DAG workflow framework before discovering Flyte so I am very interested in the internals. Since I don’t have the support of a team for my project, it’s on hold for now 😅
So in short, i am interested in how Flyte treats workflows from a graph theoretic perspective
@hallowed-mouse-14616 Thanks for the excellent links to Flyte’s underlying DAG model. Is there a place you could point me to for more code providing context on the actual execution heuristics for the DAG model?
My team and I are curious about this and if there is any existing explicit message passing algorithm for nodes in a Flyte workflow. Wondering if Flyte is stateful or stateless, and if the Chandy–Lamport algorithm might be a useful feature to contribute
h
Oh fantastic! So right now things are a little embedded. The entrypoint is the RecursiveNodeHandler function which is initially called on the DAGs
start-node
. It then traverses throughout the DAG in a depth first search scheduling and executing nodes along the way. Admittedly this may not be the most efficient algorithm for many workflows - it's something of a passion project of mine to refactor this so I'm personally pretty invested in your vision here. There are a few other community members who has expressed interest in allowing plugable schedulers into Flyte and allow for different workflows to execute under different schedulers.
💯 1
As far as the stateful vs stateless debate - the components of Flyte are stateless. So flytepropeller handles workflow executions. It stores execution state in a k8s CRD, aptly named FlyteWorkflow and then operates as a k8s controller. So it periodically processes the workflow, checking node status' and scheduling new nodes (along with many other operations).
💯 1
and then updates the CRD with the current execution state.
💯 1
If, after you look into the code a little bit, you (and / or your team) are interested in diving into abstracting the node scheduling algorithms away I would be very interested in hopping on a call to discuss this. We could potentially setup some collaboration with other community members.
💯 1
c
Absolutely! There is a lot I’d also love to discuss. Right now we’re still looking to get the green light from leadership after we demonstrate incremental proof-of-concepts, but assuming that happens I’d want to look into getting an additional green light to disclose info on my DAG framework to share details about that with you.
h
That sounds great! Please keep us posted.
c
Will do!
So not all DAGs are Trees, but all Trees are DAGs, and Flyte’s
executor
/DAG model just so happens to be a Tree? 🙂
152 Views