what is the recommended way to do a workflow that ...
# ask-the-community
a
what is the recommended way to do a workflow that is basically doing map_reduce - i.e. runs a workflow for each of the inputs i started with dynamic tasks but those are uber-hard to track in console i now have a sequence of map_tasks but then each step waits until all inputs get processed before proceeding. is there a better way that is still usable in console?
j
are these separate map tasks part of a same overall workflow? as in initially you had 1 dynamic task that is calling 1 of these map tasks underlying tasks?
a
now i have
Copy code
@workflow()
def crack_documents(filters: List[str], limit: Optional[int] = None) -> str:
    doc_ids = filter_documents(filters=filters, limit=limit)

    page_images = map_task(extract_page_images, min_success_ratio=0)(doc_id=doc_ids)
    thumbs = map_task(extract_thumbnails, min_success_ratio=0)(doc_id=doc_ids)
    insets = map_task(extract_insets, min_success_ratio=0)(doc_id=doc_ids)
    ocr_google = map_task(extract_ocr_google, min_success_ratio=0)(doc_id=doc_ids)
    ocr_msft = map_task(extract_ocr_msft, min_success_ratio=0)(doc_id=doc_ids)
    captions = map_task(caption_images, min_success_ratio=0)(doc_id=doc_ids)
    text_embeddings = map_task(embed_text, min_success_ratio=0)(doc_id=doc_ids)

    page_images >> thumbs
    page_images >> insets
    thumbs >> ocr_google
    thumbs >> ocr_msft
    ocr_msft >> text_embeddings
    ocr_google >> text_embeddings
    insets >> captions
    captions >> text_embeddings

    return "SUCCESS"
so new map_task only gets kicked off after preiovus gets completed any way to have sub-workflows as map_tasks?
j
ohh you are right, map task only takes a flyte task right? 😞
i wonder if you can wrap all those sub tasks into a subworkflow and call the launchplan from a single dynamic task.
does your previous idea like this and it wasnt able to expand? i have seen few instances of that as well unfortunately but as of now, i dont think its possible without reverting back to dynamic tasks, just have to wait for each map-task
a
i have 1000s of the inoputs so dynamic task becomes quickly unusable
do ArrayNode do anything any different?
d
Unfortunately, right now ArrayNode executes exactly the same as maptask. So it only operates over single flyte tasks. However, part of the decision to implement it is that it conceptually can support mapping over any Flyte node type (dynamic, subworkflow, etc, ) in addition to any task type. We're seeing more and more ask for this support and will likely prioritize this in the next few months.