Hey Everyone! I'm trying to create workflows aroun...
# ask-the-community
a
Hey Everyone! I'm trying to create workflows around video processing, splitting up the data into chunks and processing frames with ML then coalescing the information back together. There are different stages of extracting information from the video/images in parallel. Previously, our system relied on using google cloud functions to split up processing then had a worker that would keep track of all processes, which was a bit verbose. Trying to simplify it with Flyte, but unsure where to get started. Any tips would be appreciated. Thanks!
k
I think you can try to use map task in flyte. Flyte can launch a pod for each chunk of data, and run preprocessing in parallel https://docs.flyte.org/projects/cookbook/en/latest/auto/core/control_flow/map_task.html#sphx-glr-auto-core-control-flow-map-task-py
k
or you can use Dynamic Workflows
one of the problems with simply processing video into one frame might be too expensive for Flyte (at the moment - more coming later). This is because, it will spawn a new pod for every task execution today
a
how long does spawning up new pods take?
for our processing, we currently just batch up the video, so a chunk of video will be processed in a "task"
k
Spawning new pods depends on a few things. Network and size of containers and ip addresses
So lowest I have seen is 1-2 second
a
Is it linear? So if I'm spinning up a couple thousand parallel tasks, will it take super long?
k
hmm no, it amortizes
but couple thousand may take time
what is the end goal
how fast to you want t6he end to end
a
I mean first goal (which is the primary motivation for me switching) is by far reliability. With cloud functions, messages in the queue sometimes get dropped, or take super long to get acknowledged, and a friend of mine recommended Flyte so I thought I'd look into it. Currently I've been having to handle a lot of edge cases where data is processed multiple times or dropped entirely
k
ohh so reliability will be so much better, also with caching and recovery you will see much better outputs
let me ping you
a
For end to end it would cool to have an essentially logarithmic processing time for video. We'd love to have a 60 second video process in 60 seconds, and then logarithmically scale afterwards
k
i think that can be done, if you chunk the video correctly
so for Flyte large the chunk size, better it is
108 Views