Hi everyone, I’m wondering if Flyte supports strea...
# flyte-support
s
Hi everyone, I’m wondering if Flyte supports streaming data between tasks, allowing them to run simultaneously. For example, let’s say I have three tasks: producer, mapper, and reducer: • Producer generates chunks of data sequentially:
Copy code
def producer() -> Iterator[str]:
    yield "hello"
    sleep(10)
    yield "world"
    yield "!"
Mapper processes each item as it arrives:
Copy code
def mapper(s: str) -> str:
    return s.upper()
Reducer aggregates the transformed results:
Copy code
def reducer(l: Iterable[str]) -> str:
    return ' '.join(l)
Does Flyte support (or plan to support) this kind of streaming execution, where tasks can process data as it becomes available? Or do all tasks need to fully complete before the next one starts? Thanks in advance!
c
Correct, this is not supported currently, although we had a gh discussion about this feature some time ago, we called it streaming literals. @steep-nest-3156, I'd love to understand your use case better, maybe we could collaborate on this.
s
We don’t have clear requirement for that, and we in the team are going to discuss more details later. Our usecase is that we assess model on the dataset. This assessment consist of multiple steps. When the dataset consists of images we can not load it to the memory. So every step has to load it process and save somewhere on a disk for the following steps. Currently we have local execution, so we plan to streamarize this, that each step can load batch of data that fit in the memory, and we handle this batch, and then go the next one, minimising disk IO operations. In parallel we do integration with flyte, and on flyte such streaming would not work. Other question whether we need such streaming on flyte. As I understand anyway result of a task needs to be saved on disk somewhere (maybe network communication is also involved). So with streaming we don’t win much. UPD: maybe it allows to see not final result of the last step