GitHub
04/10/2023, 7:17 PM<https://github.com/flyteorg/flyte/tree/master|master>
by wild-endeavor
<https://github.com/flyteorg/flyte/commit/268d167372d2ec1ef03150604c39026a74d95dc9|268d1673>
- Run functional tests against sandbox-bundled (#3581)
flyteorg/flyteGitHub
04/10/2023, 11:10 PMGitHub
04/11/2023, 12:32 AM<https://github.com/flyteorg/flytepropeller/tree/master|master>
by hamersaw
<https://github.com/flyteorg/flytepropeller/commit/ef164c5b7088510a88056c2a2f3a08af4295e026|ef164c5b>
- Remove resource injection on the node for container task (#544)
flyteorg/flytepropellerGitHub
04/11/2023, 7:22 AMGitHub
04/11/2023, 11:54 AMCausedByError: Failed to propagate Abort for workflow. Error: 0: [SystemError] system error, caused by: rpc error: code = PermissionDenied desc = Cannot abort an already terminate workflow execution
.
One of the subworkflows is intented to fail under certain conditions. When this workflow fails, Propeller tries to abort the rest of the running subworkflows. Sometimes the rest of the subworkflows are properly aborted but other times Propeller receives that PermissionDenied error from Flyte Admin.
It seems to be a race condition in Propeller, when Propeller tries to abort a workflow in a terminated status because when Propeller checks the Status of the rest of the subworkflows they are in status "running" but at the time when the abort is called they already changed to a terminated status. I checked that the finish time difference when this happened between the failing subworkflow that is trying to abort the rest and the successful one is 3 ms so I think that when propeller checks the status of the rest it is reported as running although it is actually Succeeded when the abort call is executed. Maybe these lines are relevant to the issue: https://github.com/flyteorg/flytepropeller/blob/master/pkg/controller/nodes/task/handler.go#L795-L825
(currentPhase might change when p.Abort is called)
Please check the attached screenshots to see how different executions of the same code produce different results.
Eventually, the parent workflow (the one containing the subworkflows) fails with this error:
RuntimeExecutionError: max number of system retry attempts [51/50] exhausted.
This error is increasing the number of calls made to FlyteAdmin and also this is increasing the metric associated to the PermissionDenied error.
Please do not hesitate to ask for further information if needed.
Expected behavior
FlytePropeller should not retry to abort a node in a terminated status and that node status should be updated in parent workflow with the terminated status (sometimes the node is shown as running although it is succeeded when you open the subworkflow).
Additional context to reproduce
No response
Screenshots
image▾
image▾
image▾
GitHub
04/11/2023, 1:35 PMGitHub
04/11/2023, 8:48 PMGitHub
04/11/2023, 9:17 PMGitHub
04/11/2023, 9:32 PMflyte
repo docs
flyteorg/flyte
✅ All checks have passed
6/6 successful checksGitHub
04/11/2023, 9:35 PMGitHub
04/11/2023, 9:39 PM<https://github.com/flyteorg/flytekit-python-template/tree/main|main>
by zeryx
<https://github.com/flyteorg/flytekit-python-template/commit/f8281b33d330eb0e79a80ca2c013413d311a8bf3|f8281b33>
- Update README.md (#27)
flyteorg/flytekit-python-templateGitHub
04/11/2023, 9:59 PM<https://github.com/flyteorg/flyteconsole/tree/master|master>
by jsonporter
<https://github.com/flyteorg/flyteconsole/commit/7115a0610d03511465373969f8493f0ee997f590|7115a061>
- feat: show launchplan in execution table (#738)
flyteorg/flyteconsoleGitHub
04/11/2023, 10:01 PM<https://github.com/flyteorg/flyteconsole/tree/master|master>
by jsonporter
<https://github.com/flyteorg/flyteconsole/commit/c5fc069c6d5dfeacb21afa166c4fc3141cac3974|c5fc069c>
- feat: show launch plan information in workflow's schedules (#739)
flyteorg/flyteconsoleGitHub
04/11/2023, 10:13 PMGitHub
04/12/2023, 1:49 AMflytectl
. This limitation introduce UX friction for users who are used to using UI to operate their ML pipelines. Having the ability to activate/deactivate launch plan from Flyteconsole will greatly improve the usability of the Flyte console.
Goal: What should the final outcome look like, ideally?
User should be able to activate and deactivate certain version of Launch Plan using Flyte Console.
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
04/12/2023, 4:07 AM{"json":{"src":"schedule_executor.go:93"},"level":"error","msg":"failed to catch up on all the schedules. Aborting","ts":"2023-04-12T07:04:15Z"}
{"json":{"src":"schedule_executor.go:94"},"level":"info","msg":"Flyte native scheduler shutdown","ts":"2023-04-12T07:04:15Z"}
It's also possible that catch up procedure for a valid scheduled launch plan might not be executed if there is malformed launch plan.
This line will immediately exit CatchupAll
which potentially can starve other launch plan
(https://github.com/flyteorg/flyteadmin/blob/eb695b19dcc6fd53492176586c2ab9d64f0c990d/scheduler/core/gocron_scheduler.go#L190)
Expected behavior
1. Scheduled Launch Plan with incomplete inputs should be rejected during registration.
2. Error during create execution request from scheduler should be surfaced to the user so that they are aware of the issue.
3. Ensure that scheduler doesn't restart when a malformed scheduled launch plan is failed to be executed.
I think fixing 1 is more urgent as it can avoid this issue altogether. However, 2 will also be useful in case there is any condition that can lead to this.
Additional context to reproduce
Using the following workflow and launch plan code.
@task
def square(a: int) -> int:
return a * a
@task
def add(a: int, b: int) -> int:
return a + b
@workflow
def my_wf(kickoff_time: datetime, a: int, b: int) -> int:
# a and b are required inputs
x = square(a=a)
return add(a=x, b=b)
my_wf_lp = LaunchPlan.get_or_create(
name=f"my-schedule",
workflow=my_wf,
fixed_inputs={
"a": 1,
# omit b from fixed_inputs, so the scheduled launch plan will only pass in "kickoff_time" and "a" input.
},
schedule=CronSchedule(
schedule="*/5 * * * *",
kickoff_time_input_arg="kickoff_time",
),
)
Log in scheduler
{
"json": {
"routine": "jobfunc-11804557365892249653",
"src": "executor_impl.go:110"
},
"level": "error",
"msg": "failed to create execution create request %+v due to %vproject:\"sample\" domain:\"development\" name:\"f0bd182b3249867ba000\" spec:<launch_plan:<resource_type:LAUNCH_PLAN project:\"sample\" domain:\"development\" name:\"ml_pipeline.launchplan.schedule\" version:\"0.1.5\" > metadata:<mode:SCHEDULED scheduled_at:<seconds:1681271880 > > > inputs:<literals:<key:\"kickoff_time\" value:<scalar:<primitive:<datetime:<seconds:1681271880 > > > > > > rpc error: code = InvalidArgument desc = expected_inputs b missing",
"ts": "2023-04-12T04:04:42Z"
}
Screenshots
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
04/12/2023, 4:22 AMGitHub
04/12/2023, 4:22 AM<https://github.com/flyteorg/datacatalog/tree/master|master>
by jeevb
<https://github.com/flyteorg/datacatalog/commit/9ebdb93534cd4048a236b10b98aeba395849565a|9ebdb935>
- Infer GOOS and GOARCH from environment (#103)
flyteorg/datacatalogGitHub
04/12/2023, 4:22 AMGitHub
04/12/2023, 4:22 AM<https://github.com/flyteorg/flytepropeller/tree/master|master>
by jeevb
<https://github.com/flyteorg/flytepropeller/commit/2be01e251f21687e581e2f78a55b965e46cf184f|2be01e25>
- Infer GOOS and GOARCH from environment (#552)
flyteorg/flytepropellerGitHub
04/12/2023, 4:26 AMGitHub
04/12/2023, 4:36 AMGitHub
04/12/2023, 4:43 AMGitHub
04/12/2023, 4:49 AMGitHub
04/12/2023, 4:49 AMGitHub
04/12/2023, 5:01 AMGitHub
04/12/2023, 6:09 AMGitHub
04/12/2023, 1:03 PMflytectl
commands:
• flytectl get launchplans -p {project} -d {domain} {launchplan_name}
to list all launchplans
• flytectl update launchplan -p {project} -d {domain} {launchplan_name} --version {version} --activate
to activate the correct one
It would be nice to simplify this process.
Goal: What should the final outcome look like, ideally?
Create a CLI command which activates the latest version of a launch plan by default. It could also accept a specific version as a parameter.
Describe alternatives you've considered
I propose to add this feature to pyflyte
Propose: Link/Inline OR Additional context
No response
Are you sure this issue hasn't been raised already?
☑︎ Yes
Have you read the Code of Conduct?
☑︎ Yes
flyteorg/flyteGitHub
04/12/2023, 1:05 PMSynchronousFlyteClient
has update_launch_plan method. So I decided to wrap it up as a CLI-command.
Tracking Issue
flyteorg/flyte#3587
Follow-up issue
NA
flyteorg/flytekit
✅ All checks have passed
30/30 successful checksGitHub
04/12/2023, 2:21 PM<https://github.com/flyteorg/flyte/tree/master|master>
by cosmicBboy
<https://github.com/flyteorg/flyte/commit/36efb93aabcfed787d008998f8bd85d4f7ccb3c6|36efb93a>
- [auto-update-contributors] update all-contributors (#3571)
flyteorg/flyte