Hi everyone, I have question related to workflows ...
# ask-the-community
s
Hi everyone, I have question related to workflows that have many tasks that are executed in a different system (for example, hive or spark), Does flyte create one pod for each task? I am trying to study the performance impact of this situation when we have high fanout workflows of this kind. any pointers will be helpful.
b
Depending on the task, most tasks are pod IMO (tensorflow task, PyTorch task etc). Others tasks are just a process (like a client) in the propeller to call webAPI (HTTP). For example, databricks plugin use this pattern.
Go to webAPI to understand the second type
s
Thanks a lot Bryon. this helps. I will follow-up if i have more questions. 👍
b
No problem
d
Hey @Saravanan Arumugam, so there are a number of ways Flyte can run tasks, which as @Byron Hsu mentioned may change between task types (thanks!). For simple tasks Flyte directly creates k8s Pods, this inlcudes things like python functions with the
@task
decorator, PodTasks, ContainerTasks. Additionally, there are separate plugins for Spark, Ray, Dask, Tensorflow, PyTorch, etc that use existing k8s operators. I this scheme Flyte creates the requisite k8s custom resource and the operator is tasked with execution, Flyte tracks the job status. And then finally there are web API plugins which Flyte does not create any k8s resources for, rather it interacts with an external service - for example snowflake, databricks, etc.
k
Great answer @Byron Hsu let me actually elaborate a little more. All tasks will be a container/pod by default. But tasks actually have a task plugin system in the back. This can be number of ways…
haha @Dan Rammer (hamersaw) beat me to it
b
One question, what is the use case for sync web API. Seems that I didn’t see any plugin using that
k
coming in the future
😄
i mean you could use it to call services and get instantaneous response
like a rest api caller
this can make it possible to write serverless workflows that run at high speed
goal is to do this with knative to provide lambda like services. But today its a little early
b
for asynchronous API, for example, we launch a databricks client to call HTTP api, the next task still needs to wait for this task to complete right? If so, why it is called Async
Can you elaborate more on Sync vs Async
k
no async in terms of propeller. i.e., Flyte propeller will call it and will not expect a response in the
create
call. While sync will expect a response instantenously. This will block Propeller loops themselves
sync should be like a webapi call that returns response. For example - lookup a value in a DB.
b
So for Async, create will not be blocked by Web API. For sync, it will be
What do you mean by “no Async in terms of propeller”
Just want to make sure that for a Async task, the next task is stilled blocked by the termination of this task right
d
Yes, downstream tasks can not execute without an upstream task completing
So propeller works as a k8s operator, periodically evaluating workflows in "rounds" where each round consists of checking the status' of tasks and scheduling new ones, retries, etc. These operations are all done synchronously within a round. For example, check status of node
n1
, if it's complete then write it's results to cache (given it's a cached node), then schedule downstream node
n2
, etc ... With the sync web API plugin checking the status of a node is a synchronous operation, so it requires a request RTT for each node check, these add up if there are many tasks making a single "round" of propeller large. Alternatively, the async web api plugin uses a background process in propeller to periodically check the task status, then in a propeller "round" each task status is stored in a cache and looked up in memory for extremely fast propeller "rounds".
b
this adds up if there are many tasks
Is it the case that one parent task has many children tasks, so in each round it needs to check many task, and the sync API blocks the check function so it takes much time.
This is extremely insightful thanks!
d
Is it the case that one parent task has many children tasks
It could be, more so just instances where propeller is executing many tasks in parallel - obviously if there are 30 tasks running then propeller has to check the status of 30 tasks.
b
@Dan Rammer (hamersaw) thanks!
180 Views