Hi, I’m looking to bootstrap data labeling orchest...
# flyte-support
c
Hi, I’m looking to bootstrap data labeling orchestration with flyte. Since it’s for data labeling, there’s going to be a lot of data waiting to be labeled (wait_for_input or approve), and we will need to run ingestion and automatic pre-labeling of data as the data comes in (streaming data inputs). I have two questions: 1. The documentation seems to indicate that when waiting, a “wait” node will be long running. How much resource does this take, and does it count towards max-parallelism? How much resources I need to allocate for paused tasks? 2. I want to do pre-labeling of a stream of data, so the overhead of creating a launch plan for every piece of data and then starting containers for the nodes is kind of inefficient. Do we have a mechanism of long-standing nodes that does stream processing? Can I possibly scale out these nodes preemptively to maximize resource utilization?