Hello everyone! I work with <Ought> – we're curre...
# introductions
j
Hello everyone! I work with Ought – we're currently evaluating a couple of frameworks to handle a data pipeline: ingesting academic papers, joining, filtering, enriching, transforming, calculating embeddings, … We're excited to have Flyte in the mix. I'm also passively looking for a replacement for Cortex.dev after they were acquired, and more general MLOps tooling – so UnionML would be something we look at if Flyte works out well for the pipeline.
❤️ 4
s
Welcome @James Brady! Glad to have you here. For orchestrating ML pipelines in production, UnionML should fit the bill! @Niels Bantilan would be the best person to chat with about UnionML!
👍 1
n
hi @James Brady, welcome! Yes let’s chat, curious to learn more about what you do at Ought
👋 1
m
welcome @James Brady
👋 1
j
Hi all! Just working through the User Guide now, and enjoying it.
👍 1
I'd like to actually kick the tyres on a couple of things which seem to be a little awkward in some of the other platforms I'm evaluating: 1. Having some parts of our data pipeline run on GPU-accelerated nodes (but not all parts). 2. Parallel processing of batches in our dataset. For #1, I see these docs; for #2, I see this – are there other resources you'd point me towards for these use cases?
n
1. for a code example on gpu-accelerated nodes, see here. Another useful class here is Resources 2. yes, maptasks are the recommended way to parallelize nodes, a more flexible alternative would be dynamic workflows (the tradeoff is that dynamics don’t have a the ability to express
concurrency
and
min_success_ratio
)
j
Fabulous! Thanks Niels
s
yes, maptasks are the recommended way to parallelize nodes, a more flexible alternative would be dynamic workflows (the tradeoff is that dynamics don’t have a the ability to express
concurrency
and
min_success_ratio
)
+1. Also, maps tasks do not spin up nodes for every instance whereas dynamic workflow does.
k
Map tasks are like static parallelized, they will run a new container per unit
j
Am I understanding this ☝️ correctly: Dynamic workflows create new nodes (like k8s nodes) for each item being mapped over? Whereas
map_task
creates a new pod per item being mapped over? (Side-note: I've not seen where I configured the podspec we need, but I'm sure I will, just haven't read the docs yet)
I know you can set resource overrides for
map_task
– those override the resource requests for the pods being created?
k
I think Samhita confused the terms, think about dynamic workflows as a parent graph node that spins off new child nodes that represent a new child graph. So dynamic workflows at runtime receive input and create new workflows. These have graph nodes 😊
Everything by default run on k8s pods
That may independently trigger a k8s scale up and down
j
Ah ha, I see. yeah "node" is overloaded 😬
k
Please help improve docs
👍 1
Yes so default every task is launched as a container in a pod. The pod itself is a platform wide config using pod templates (do not worry about this unless you want to customize some part of the pod) If you want user side control of the pod use the flytekit pods plugin
👍 1