Hello everyone I work with <https ought org |Ought> we re cu Flyte #introductions

Hello everyone! I work with <Ought> – we're curre...

elegant-ram-35298

08/17/2022, 10:29 AM

Hello everyone! I work with Ought – we're currently evaluating a couple of frameworks to handle a data pipeline: ingesting academic papers, joining, filtering, enriching, transforming, calculating embeddings, … We're excited to have Flyte in the mix. I'm also passively looking for a replacement for Cortex.dev after they were acquired, and more general MLOps tooling – so UnionML would be something we look at if Flyte works out well for the pipeline.

❤️ 4

tall-lock-23197

08/17/2022, 10:52 AM

Welcome @elegant-ram-35298! Glad to have you here. For orchestrating ML pipelines in production, UnionML should fit the bill! @broad-monitor-993 would be the best person to chat with about UnionML!

👍 1

broad-monitor-993

08/17/2022, 1:07 PM

hi @elegant-ram-35298, welcome! Yes let’s chat, curious to learn more about what you do at Ought

👋 1

shy-grass-43338

08/17/2022, 1:15 PM

welcome @elegant-ram-35298

👋 1

elegant-ram-35298

08/17/2022, 1:20 PM

Hi all! Just working through the User Guide now, and enjoying it.

👍 1

elegant-ram-35298

08/17/2022, 1:22 PM

I'd like to actually kick the tyres on a couple of things which seem to be a little awkward in some of the other platforms I'm evaluating: 1. Having some parts of our data pipeline run on GPU-accelerated nodes (but not all parts). 2. Parallel processing of batches in our dataset. For #1, I see these docs; for #2, I see this – are there other resources you'd point me towards for these use cases?

broad-monitor-993

08/17/2022, 1:36 PM

1. for a code example on gpu-accelerated nodes, see here. Another useful class here is Resources 2. yes, maptasks are the recommended way to parallelize nodes, a more flexible alternative would be dynamic workflows (the tradeoff is that dynamics don’t have a the ability to express concurrency
and
min_success_ratio
)

elegant-ram-35298

08/17/2022, 1:37 PM

Fabulous! Thanks Niels

tall-lock-23197

08/17/2022, 1:41 PM

yes, maptasks are the recommended way to parallelize nodes, a more flexible alternative would be dynamic workflows (the tradeoff is that dynamics don’t have a the ability to express concurrency
and
min_success_ratio
)

+1. Also, maps tasks do not spin up nodes for every instance whereas dynamic workflow does.

freezing-airport-6809

08/17/2022, 1:45 PM

Map tasks are like static parallelized, they will run a new container per unit

elegant-ram-35298

08/17/2022, 1:47 PM

Am I understanding this ☝️ correctly: Dynamic workflows create new nodes (like k8s nodes) for each item being mapped over? Whereas

map_task

creates a new pod per item being mapped over? (Side-note: I've not seen where I configured the podspec we need, but I'm sure I will, just haven't read the docs yet)

elegant-ram-35298

08/17/2022, 1:48 PM

I know you can set resource overrides for

map_task

– those override the resource requests for the pods being created?

freezing-airport-6809

08/17/2022, 1:53 PM

I think Samhita confused the terms, think about dynamic workflows as a parent graph node that spins off new child nodes that represent a new child graph. So dynamic workflows at runtime receive input and create new workflows. These have graph nodes 😊

freezing-airport-6809

08/17/2022, 1:53 PM

Everything by default run on k8s pods

freezing-airport-6809

08/17/2022, 1:53 PM

That may independently trigger a k8s scale up and down

elegant-ram-35298

08/17/2022, 1:54 PM

Ah ha, I see. yeah "node" is overloaded 😬

freezing-airport-6809

08/17/2022, 1:54 PM

Please help improve docs

👍 1

✅ 1

freezing-airport-6809

08/17/2022, 1:56 PM

Yes so default every task is launched as a container in a pod. The pod itself is a platform wide config using pod templates (do not worry about this unless you want to customize some part of the pod) If you want user side control of the pod use the flytekit pods plugin

👍 1

broad-monitor-993

08/17/2022, 3:11 PM

pod plugin example: https://docs.flyte.org/projects/cookbook/en/latest/auto/integrations/kubernetes/pod/pod.html#sphx-glr-auto-integrations-kubernetes-pod-pod-py docs for flytekit pods: https://docs.flyte.org/projects/flytekit/en/latest/plugins/generated/flytekitplugins.pod.Pod.html#flytekitplugins-pod-pod (@billowy-sundown-31926 @tall-lock-23197 we should probably update the docstrings for this class)

👀 1

✅ 1

8 Views

Open in Slack

Previous Next