Hello :), What is the best way to develop a produc...
# ask-the-community
s
Hello :), What is the best way to develop a producer-consumer pipeline using Flyte? Scenario: The producer reads words from a large file and passes each word to a "word queue". Workers listening on that queue pick up the word and perform an operation (say word length) and forward the length to "word length" queue. The worker listening to the "word length" plots a histogram words length in realtime. After the file is done, the workers are shutdown and the final histogram is persisted. The above is a toy example. We want to implement the same architecture for a workload involving protein sequences with a computationally heavy operation. Currently we have built a workflow that chunks the large file and passes to N workers via
map_task()
, however the workers (due to the DAG nature of flyte workflow) have to wait until the chunking finishes. What is a good pattern to do this via Flyte?. Thanks for your help!
k
and you cannot collapse the
word length
and
word length histogram
to one python function, or you prefer not to
at the moment, sadly all map tasks have to finish
we are indeed working on mapping over
workflows
/
launchplans
and we will be delivering this later. But would love to understand if you can work with this limitation today
cc @John Votta / @Haytham Abuelfutuh - map task over launchplans.
s
@Ketan (kumare3), We are currently using
map_task()
to distribute the work on "workers", however that is not the ask here. The ask is to spin up N consumers and M producers that communicate over a queue.
k
At union we are working on reusable containers- we have in early alpha right now. You won’t have to think about queues