https://flyte.org logo
#ask-the-community
Title
# ask-the-community
l

Len Strnad

02/23/2024, 4:11 PM
I have been pretty happy with this pattern around
map_task
using
dynamic
. Sharing here to see if anyone has any “gotchas” or additional ideas. I wrap
map_task
with a
dynamic
task for two reasons: 1. We can parameterize the concurrency arg, this is helpful for us when we have a big variety in the number of inputs between executions. 2. If caching at the task level, we can cache at the dynamic level (I just recently discovered dynamic also supports caching). This (from what I can tell) avoids the need to check the cache on every task in map_task and will check cache for just the dynamic task. This can save a few minutes in some cases!
Copy code
from flytekit import map_task, task, workflow, dynamic


@task(cache=True, cache_version="0.0.1")
def build_inputs() -> list[int]:
    return list(range(10))


@task(cache=True, cache_version="0.0.1")
def example_task(x: int) -> int:
    return x


@dynamic(cache=True, cache_version="0.0.1")
def dynamic_map_task(xs: list[int], concurrency: int) -> list[int]:
    return map_task(example_task, concurrency=concurrency)(x=xs)


@workflow
def example_workflow(concurrency: int = 16) -> list[int]:
    inputs = build_inputs()
    return dynamic_map_task(xs=inputs, concurrency=concurrency)
k

Ketan (kumare3)

02/26/2024, 3:33 PM
Dynamic does support caching
Your approach is indeed supported
Can you write a blog to educate others 😍
l

Len Strnad

02/26/2024, 4:49 PM
Where could I get started and is there anything else worth adding around?
k

Ketan (kumare3)

02/26/2024, 5:00 PM
No just write your experience and usecase - where do you work? You can write it there or on Flyte.org. Preferably at your company’s blog. Usecases and education together is better
l

Len Strnad

02/27/2024, 3:08 PM
I am at Pachama. I am not sure about our priority of writing a blog like this and I would probably do so outside of working hours. Maybe I’ll make a short medium article and make sure to add use cases examples and differences in cache hit times.
k

Ketan (kumare3)

02/27/2024, 3:09 PM
Also happy to peer review suggest edits. Cc @David Espejo (he/him) / @Sage Elliott
l

Len Strnad

02/27/2024, 3:10 PM
Sounds good! No promises, but I’ll put it on the docket!
h

Hema Jayachandran

04/12/2024, 8:43 AM
Hi @Len Strnad We tried your approach here wrapping map_task using dynamic in one of our long running pipelines. It actually helped us in saving time by avoiding the need to read cached results from each task within a map_task. Thanks for sharing this gratitude thank you
k

Ketan (kumare3)

04/12/2024, 3:31 PM
Why were you reading cached results in a map task would love to understand the problem
l

Len Strnad

04/12/2024, 3:34 PM
I think they mean they saved time in their workflow by checking one cache (the dynamic task) instead of each task in the map task individually. I could be wrong. That is why I often do it. I'll do the same thing when using stubs from workflows in other projects that don't have the most performance caching in sub tasks.
h

Hema Jayachandran

04/16/2024, 6:55 AM
@Ketan (kumare3) We have a cron schedule associated with the launch plan and we prefer to not rerun the map tasks in every run, hence we have enabled caching on the input task. We had a different problem too with the map_tasks that it gets stuck at initialising for a very long time like 1h+ on re-runs and we couldn’t figure out why that was happening as no pods were getting spawned or no logs were present. So an immediate solution for us was to avoid the attempt to rerun the map task and wrap it with dynamic, so that it only checks the cache at dynamic level. @Len Strnad solution saved the time for our workflows.
l

Len Strnad

04/16/2024, 1:19 PM
@Hema Jayachandran How many tasks and what type of objects are being cached? Just curious.