Okay interesting outcome of an experiment i just did. This is my simple workflow:
from flytekit import workflow, task, dynamic, LaunchPlan, Resources, FixedRate
import time
from datetime import timedelta
@task(requests=Resources(cpu="500m", mem="500Mi"))
def say_hello(input: str):
time.sleep(60)
print(f"Hello {input}")
@workflow
def wf():
counter = 50
for i in range(counter):
say_hello(input="World")
LaunchPlan.get_or_create(
name="p-launchplan",
workflow=wf,
max_parallelism=2,
schedule=FixedRate(duration=timedelta(minutes=3)),
)
Executing this and observing the pods in kubernetes showed that Flyte is only launching 2 pods at a time but does not wait until the pods are done. So it basically increments with 2 over time and still ends up with 50 running pods at some time. I really thought
max_parallelism
reflects the amount of RUNNING pods at a time. Is this intended or am i missing something?