• a

    Abhinav Ayalur

    2 months ago
    Hey Everyone! I'm trying to create workflows around video processing, splitting up the data into chunks and processing frames with ML then coalescing the information back together. There are different stages of extracting information from the video/images in parallel. Previously, our system relied on using google cloud functions to split up processing then had a worker that would keep track of all processes, which was a bit verbose. Trying to simplify it with Flyte, but unsure where to get started. Any tips would be appreciated. Thanks!
  • Kevin Su

    Kevin Su

    2 months ago
    I think you can try to use map task in flyte. Flyte can launch a pod for each chunk of data, and run preprocessing in parallelhttps://docs.flyte.org/projects/cookbook/en/latest/auto/core/control_flow/map_task.html#sphx-glr-auto-core-control-flow-map-task-py
  • Ketan (kumare3)

    Ketan (kumare3)

    2 months ago
    or you can use Dynamic Workflows
  • one of the problems with simply processing video into one frame might be too expensive for Flyte (at the moment - more coming later). This is because, it will spawn a new pod for every task execution today
  • a

    Abhinav Ayalur

    2 months ago
    how long does spawning up new pods take?
  • for our processing, we currently just batch up the video, so a chunk of video will be processed in a "task"
  • Ketan (kumare3)

    Ketan (kumare3)

    2 months ago
    Spawning new pods depends on a few things. Network and size of containers and ip addresses
  • So lowest I have seen is 1-2 second
  • a

    Abhinav Ayalur

    2 months ago
    Is it linear? So if I'm spinning up a couple thousand parallel tasks, will it take super long?
  • Ketan (kumare3)

    Ketan (kumare3)

    2 months ago
    hmm no, it amortizes
  • but couple thousand may take time
  • what is the end goal
  • how fast to you want t6he end to end
  • a

    Abhinav Ayalur

    2 months ago
    I mean first goal (which is the primary motivation for me switching) is by far reliability. With cloud functions, messages in the queue sometimes get dropped, or take super long to get acknowledged, and a friend of mine recommended Flyte so I thought I'd look into it. Currently I've been having to handle a lot of edge cases where data is processed multiple times or dropped entirely
  • Ketan (kumare3)

    Ketan (kumare3)

    2 months ago
    ohh so reliability will be so much better, also with caching and recovery you will see much better outputs
  • let me ping you
  • a

    Abhinav Ayalur

    2 months ago
    For end to end it would cool to have an essentially logarithmic processing time for video. We'd love to have a 60 second video process in 60 seconds, and then logarithmically scale afterwards
  • Ketan (kumare3)

    Ketan (kumare3)

    2 months ago
    i think that can be done, if you chunk the video correctly
  • so for Flyte large the chunk size, better it is