<#3212 [Core] Backfill and look-back> Issue created by <frelyf> Description Users want to know how ...
a
#3212 [Core] Backfill and look-back Issue created by frelyf Description Users want to know how to run backfill jobs. The docs should include an example of how to trigger a launch plan across a range of dates. The launch plan should take datetime as an input so that the range of dates can be applied onto it. This is different from dynamic and map tasks because backfill jobs likely want to run on their own nodes and want to be evaluated at compile-time. A similar use case is for workflows that want look-back. When users run daily jobs, it is convenient to quickly evaluate the state of the last couple of days. If the data paths of the previous days' outputs is well known, it is a simple gsutil ls (or the like on other file systems) to check this. If a previous day is missing its output data, then that day should be processed and that workflow will proceed with the heavy compute. Being able to put the workflow iterator on a cron schedule would make this possible. I haven't been able to prove this case yet, but believe it is technically possible. Method to iterate over dates on a launch plan and return a workflow: `
Copy code
def generate_backfill_workflow(
    start_date: datetime, end_date: datetime, base_lp: LaunchPlan
) -> Workflow:

    if base_lp.schedule is None:
        raise ValueError("Backfill can only be created for scheduled launchplans")

    if isinstance(base_lp.schedule, CronSchedule):
        pass
    else:
        raise NotImplementedError("The launchplan schedule needs to be a cron schedule")

    if start_date >= end_date:
        raise ValueError("Start date should be greater than end date")

    print(f"Generating backfill for {start_date} to {end_date}")
    wf = Workflow(name=f"backfill-{base_lp.name}")
    lp_iter = croniter(
        base_lp.schedule.cron_schedule.schedule, start_time=start_date, ret_type=datetime
    )
    while True:
        next_start_date = lp_iter.get_next()
        if next_start_date > end_date:
            break
        print(f"Adding -> {next_start_date}")
        wf.add_launch_plan(base_lp, kickoff_time=next_start_date)
` Are you sure this issue hasn't been raised already? ☑︎ Yes Have you read the Code of Conduct? ☑︎ Yes flyteorg/flyte