<#3586 [BUG] Native scheduler failed silently and ...
# flyte-github
a
#3586 [BUG] Native scheduler failed silently and restarted when running malformed Launch Plan Issue created by pradithya Describe the bug Native scheduler is not able to create execution for a scheduled Launch Plan that doesn't provide all required input. The error is not surfaced to user and only available as error log in the native scheduler. Additionally, the native scheduler is experiencing restart when trying to catch up the launch plan
Copy code
{"json":{"src":"schedule_executor.go:93"},"level":"error","msg":"failed to catch up on all the schedules. Aborting","ts":"2023-04-12T07:04:15Z"}
{"json":{"src":"schedule_executor.go:94"},"level":"info","msg":"Flyte native scheduler shutdown","ts":"2023-04-12T07:04:15Z"}
It's also possible that catch up procedure for a valid scheduled launch plan might not be executed if there is malformed launch plan. This line will immediately exit
CatchupAll
which potentially can starve other launch plan (https://github.com/flyteorg/flyteadmin/blob/eb695b19dcc6fd53492176586c2ab9d64f0c990d/scheduler/core/gocron_scheduler.go#L190) Expected behavior 1. Scheduled Launch Plan with incomplete inputs should be rejected during registration. 2. Error during create execution request from scheduler should be surfaced to the user so that they are aware of the issue. 3. Ensure that scheduler doesn't restart when a malformed scheduled launch plan is failed to be executed. I think fixing 1 is more urgent as it can avoid this issue altogether. However, 2 will also be useful in case there is any condition that can lead to this. Additional context to reproduce Using the following workflow and launch plan code.
Copy code
@task
def square(a: int) -> int:
    return a * a

@task
def add(a: int, b: int) -> int:
    return a + b

@workflow
def my_wf(kickoff_time: datetime, a: int, b: int) -> int:
    # a and b are required inputs
    x = square(a=a)
    return add(a=x, b=b)

my_wf_lp = LaunchPlan.get_or_create(
        name=f"my-schedule",
        workflow=my_wf,
        fixed_inputs={
            "a": 1,
            # omit b from fixed_inputs, so the scheduled launch plan will only pass in "kickoff_time" and "a" input.
        },
        schedule=CronSchedule(
            schedule="*/5 * * * *",
            kickoff_time_input_arg="kickoff_time",
        ),
    )
Log in scheduler
Copy code
{
  "json": {
    "routine": "jobfunc-11804557365892249653",
    "src": "executor_impl.go:110"
  },
  "level": "error",
  "msg": "failed to create execution create request %+v due to %vproject:\"sample\" domain:\"development\" name:\"f0bd182b3249867ba000\" spec:<launch_plan:<resource_type:LAUNCH_PLAN project:\"sample\" domain:\"development\" name:\"ml_pipeline.launchplan.schedule\" version:\"0.1.5\" > metadata:<mode:SCHEDULED scheduled_at:<seconds:1681271880 > > > inputs:<literals:<key:\"kickoff_time\" value:<scalar:<primitive:<datetime:<seconds:1681271880 > > > > > >  rpc error: code = InvalidArgument desc = expected_inputs b missing",
  "ts": "2023-04-12T04:04:42Z"
}
Screenshots No response Are you sure this issue hasn't been raised already? ☑︎ Yes Have you read the Code of Conduct? ☑︎ Yes flyteorg/flyte