Dynamic Job Registration So the jobs I am planning...
# ask-the-community
d
Dynamic Job Registration So the jobs I am planning to run on flight are "dynamic" in that the set of jobs that exist (and their schedules) can change minute to minute. For example, if one of our customers goes into our UI and adds an "export" then we suddenly have a new ExportJob that needs to be scheduled and ready to go in less than 15 minutes. All instances of "ExportJob" share the same code but each one has a unique identity and set of inputs. We typically find all of the jobs that exist (somewhere under 10K total) by periodically making a bunch of services calls and then updating our scheduling system. I'm trying to figure out the correct way to build the same thing using Flyte, and just want to double check my understanding of the solution. I looks like there are two main options: 1. dynamically generate the workflow code and then execute the Flyte CLI's "register" command: https://docs.flyte.org/en/latest/concepts/registration.html 2. manually call
LaunchPlanCreateRequest
? Are both of these supported workflows, or is # 2 considered a bit of a hack?
d
Hey @David Cupp! Great question. Is the execution of the job a one-time thing? Or when users add an "export" does that start a job that should run on a schedule (ex. daily, hourly, etc)? IIUC the goal is to trigger a workflow on a user action. We have been referring to this process as "reactive workflows". Currently, there are a few community members who have home-grown systems for their specific use-case. As you suggested, there are APIs for this. Most notably flytekit remote has a function for starting an execution given a workflow name and a set of inputs. This is a simple wrapper over the gRPC API, so if python doesn't fit your use-case it should be simple to write you're own wrapper in whatever language you need.
d
Oh interesting....I was not talking about user triggered actions. This particularly use case involves a job that will have a [possibly complicated] schedule, like RRULE (I asked a separate question here about implementing it). But it needs to get to the scheduling system fast, because, e.g. at 6:15pm a user might create a job that runs every weekday at 6:30pm and they, reasonably, expect to see the first run 15 minutes later. We could keep our existing scheduler though, and just use Flyte as the execution system.
d
Oh sure, I recall the question - looks like here right?
d
Yeah my plan is to try to contribute an RRULE feature if we go with Prefect. I don't see RRULE being a problem.
d
Oh sure, An RRULE contribution would be awesome! Would really appreciate it. For a quick PoC though you could certainly plug Flyte into your existing scheduling system using raw RPC calls.
d
That's good to know. What about registering jobs though? If we assume that its the future and I've successfully contributed the RRULE support, can I still register Launch Plans/Workflows dynamically using the API?
Might be a dumb question -- I'm asking because I get the sense the the normal workflow is for people to write some python in files and run a CLI tool, and I don't know if directly hitting the API is normal, or if I am accidentally creating a hack.
d
Yeah, so @Prafulla Mahindrakar may be able to answer better than me. Please feel free to chime in here regarding scheduling stuff.
So all workflow executions in Flyte are performed by calling an endpoint on FlyteAdmin to start a launchplan. Whether this is from a CLI, the FlyteConsole UI, external RPC, etc. The scheduling service is just another layer over the top of this to periodically call a launchplan.
So if the workflow is the exact same for each instance, just with different input values, you would register the workflow and create the launchplan. Then regardless if you use Flyte scheduler or an external RPC just periodically start that workflow using the FlyteAdmin endpoint. Nothing hacky about it.
d
you would register the workflow
You are referring to
LaunchPlanCreateRequest
+
LaunchPlanUpdateRequest
?
oh sorry that's obviously not right; it doesnt have the schedule.
d
Correct. So registering a workflow can be done with
flytectl
or
pyflyte
documented here. This should create a "default" launchplan which does not have hardcoded inputs etc. If you wanted to create other launchplans to, for example, hardcode some inputs or change the workflow execution parameters (overwrite cache, max parallelism, etc) you can do that with the messages you linked.
I'm looking right now to see what options we have to externally interface with the scheduler - it should be exposed through an RPC API. and if it's not, it certainly should be.
d
LaunchPlanMetadata
has a
schedule
attribute. Oh right, I saw:
LaunchPlanCreateRequest
->
LaunchPlanSpec
->
LaunchPlanMetadata
->
Schedule
d
Just found it 😂
Yeah so you could create a launchplan from scratch for each "export" that has a schedule.
d
beautiful. That is exactly what I was hoping to hear. Thanks!
d
No problem! Let us know if you run into any problems! Or have any suggestions here. An RRULE would make this really powerful, hopefully we can work on that together 😄
d
Oh yeah I'd love to just sit down and implement a feature like that. I will post again if I have more questions. Thanks for digging into the code for me -- I had a big question mark next to "register jobs at runtime" and I'm happy I can mark that one as a solid Yes.
p
So currently as mentioned earlier you can package and create a scheduled launchplan like the code mentioned here https://docs.flyte.org/en/latest/concepts/schedules.html And then register and enable schedule in the same command from the generated packaged proto file https://docs.flyte.org/projects/flytectl/en/latest/gen/flytectl_register_files.html Currently the scheduler supports cron and fixed rate scheduled. Adding support for RRULE would be awesome
152 Views