Hi all. A while back I mentioned some pain points ...
# flyte-support
f
Hi all. A while back I mentioned some pain points I feel related to scheduled workflow executions. Some of these can be filed under "things I miss about Airflow", but they are more generally related to backfills and scheduled launch plans. I was asked to write up a list and put it in a feature request. But I thought I'd start here to gather any other ideas before getting to a specific set of feature requests. Here are a few thoughts (not at the level of specificity of a design doc): • Problem: When creating a new scheduled launch plan, I often want to backfill to some given date. ◦ Potential approach: Include an optional argument
start_date
for
CronSchedule
that indicates how far back in the past the time based view should go. All past workflow executions will be scheduled when this is registered, perhaps sequentially. ◦ Defaults to a value based on the time the schedule was activated. That is, by default there is no backfill done. ◦ This determines the number of backfill "slots" to be executed. • Problem: The workflow UI shows the list of workflow executions based on the time they were run. But for scheduled launch plans, I want to see a view based on the scheduled times. How do I know if a particular failed schedule workflow execution ever had a subsequent success? ◦ Potential approach: Have a "schedule time" based view for each schedule. ◦ Visually display the status of each execution slot in the backfill by the status of the last execution. E.g., success, failure, running, etc. ◦ Click on an execution slot to relaunch a failed or successful task or launch a task that has not yet run. • Problem: Running backfills for a scheduled launch plan is tedious. Is there a better way than just a lot of clicking in the UI? ◦ Potential approach: Provide backfill capabilities in the UI for scheduled launch plans. For example, select a launch plan with a schedule, select a start datetime and end datetime, then run. ◦ Provide a few concurrency and backfill settings. (Borrowed from Airflow) ◦
max_active_runs
sets the maximum number of executions running concurrently. Setting to 1 would backfill sequentially, one at a time. • Problem: In the workflow UI, in the section "All Executions in the Workflow", there is no easy way to distinguish between scheduled runs and manually triggered runs. And the inputs to the execution are not readily available in this view. ◦ Potential approach: Enable a filtered view of only those executions that were initiated by a schedule (rather than by the user launching via the UI). ◦ Filter by the name of the launch plan used to trigger the workflow execution. ◦ Include columns for the value of the time based argument passed to the workflow. And or include the name of the launch plan used to trigger the execution. I'd be happy to file these as feature requests, but I'd welcome any feedback before I get started.
❤️ 6
f
@fancy-yak-23698 thank you for the feedback. A while back we were asking about how we can improve. We did add one thing want to get your take on this. Not yet documented so early - but very simple feature ‘Pyflyte backfill’ https://github.com/flyteorg/flytekit/pull/1420 Also what we are working on, Flyte execution tags to allow arbitrary groupings, like grouping by schedule tags. Maybe we should show the executions view under the launchplan - already filtered? https://github.com/flyteorg/flyte/pull/3320
But cc @broad-monitor-993
👍 1
I do want to update the Ui a little bit. The problem is, holes. How do holes even happen?, is it because you turned off a specific time instance? More on this, for scheduled launchplans should Ui simply interpolate and he possible previous time ranges? And then fill them with like a grays out bars
@brainy-church-54824 ^
👀 1
I really love this conversation too. Please let’s help us shape the data engineering part of the product
b
Thanks for the great feedback @fancy-yak-23698! Re the problems that you outlined, I think each of them would entail efforts that would improve Flyte’s UX!
Problem: When creating a new scheduled launch plan, I often want to backfill to some given date.
We recently introduced
pyflyte backfill
/
FlyteRemote.launch_backfill
commands that generates a static workflow that performs a backfill for some time window. This makes your backfills fully reproducible/tracked on Flyte. We could support dynamic-workflow-based backfills, but we’re starting off with static workflows. Let us know if you have any thoughts on this! Documentation is still WIP
Problem: The workflow UI shows the list of workflow executions based on the time they were run. But for scheduled launch plans, I want to see a view based on the scheduled times. How do I know if a particular failed schedule workflow execution ever had a subsequent success?
This is great feedback! Definitely something to coordinate with the UI team @late-eye-50215
Problem: Running backfills for a scheduled launch plan is tedious. Is there a better way than just a lot of clicking in the UI?
This could be addressed by a
@dynamic
workflow that compiles the backfill workflow on the fly. This would also be able to support concurrency by executing chunks of dates over the backfill window.
Problem: In the workflow UI, in the section “All Executions in the Workflow”, there is no easy way to distinguish between scheduled runs and manually triggered runs. And the inputs to the execution are not readily available in this view.
We’re thinking about introducing execution tags that can be used to group together sets of executions. That combined with tag filter/sort functionality in the UI would make this possible @freezing-airport-6809 @late-eye-50215
I think we can organize these efforts in an epic Improving Launchplans and Backfill UX. @fancy-yak-23698 I created a stub RFC discussion here, please go ahead and edit it as you see fit! Your initial thoughts are great, but feel free to add more detail/ideas/solution proposals to the problems you outlined.
🔥 1
163 Views