https://flyte.org logo
#ask-the-community
Title
# ask-the-community
a

Alex Lyashok

01/17/2024, 2:01 AM
what is the recommended way to do a workflow that is basically doing map_reduce - i.e. runs a workflow for each of the inputs i started with dynamic tasks but those are uber-hard to track in console i now have a sequence of map_tasks but then each step waits until all inputs get processed before proceeding. is there a better way that is still usable in console?
j

Jay Ganbat

01/17/2024, 3:33 AM
are these separate map tasks part of a same overall workflow? as in initially you had 1 dynamic task that is calling 1 of these map tasks underlying tasks?
a

Alex Lyashok

01/17/2024, 4:02 AM
now i have
Copy code
@workflow()
def crack_documents(filters: List[str], limit: Optional[int] = None) -> str:
    doc_ids = filter_documents(filters=filters, limit=limit)

    page_images = map_task(extract_page_images, min_success_ratio=0)(doc_id=doc_ids)
    thumbs = map_task(extract_thumbnails, min_success_ratio=0)(doc_id=doc_ids)
    insets = map_task(extract_insets, min_success_ratio=0)(doc_id=doc_ids)
    ocr_google = map_task(extract_ocr_google, min_success_ratio=0)(doc_id=doc_ids)
    ocr_msft = map_task(extract_ocr_msft, min_success_ratio=0)(doc_id=doc_ids)
    captions = map_task(caption_images, min_success_ratio=0)(doc_id=doc_ids)
    text_embeddings = map_task(embed_text, min_success_ratio=0)(doc_id=doc_ids)

    page_images >> thumbs
    page_images >> insets
    thumbs >> ocr_google
    thumbs >> ocr_msft
    ocr_msft >> text_embeddings
    ocr_google >> text_embeddings
    insets >> captions
    captions >> text_embeddings

    return "SUCCESS"
so new map_task only gets kicked off after preiovus gets completed any way to have sub-workflows as map_tasks?
j

Jay Ganbat

01/17/2024, 4:56 AM
ohh you are right, map task only takes a flyte task right? 😞
i wonder if you can wrap all those sub tasks into a subworkflow and call the launchplan from a single dynamic task.
does your previous idea like this and it wasnt able to expand? i have seen few instances of that as well unfortunately but as of now, i dont think its possible without reverting back to dynamic tasks, just have to wait for each map-task
a

Alex Lyashok

01/17/2024, 12:14 PM
i have 1000s of the inoputs so dynamic task becomes quickly unusable
do ArrayNode do anything any different?
d

Dan Rammer (hamersaw)

01/17/2024, 3:18 PM
Unfortunately, right now ArrayNode executes exactly the same as maptask. So it only operates over single flyte tasks. However, part of the decision to implement it is that it conceptually can support mapping over any Flyte node type (dynamic, subworkflow, etc, ) in addition to any task type. We're seeing more and more ask for this support and will likely prioritize this in the next few months.
2 Views