I need some help getting launchplans with a cronjo...
# announcements
f
I need some help getting launchplans with a cronjob configured to work: I first register a launchplan with a
* * * * *
cron expression via flyteremote. After that, the flyte UI still shows
This workflow has no schedules
, as expected. I then execute
flytectl update launchplan -p default -d development --version fabio-cron-2a3fb7f-dirty-BQN44 cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 --activate
which gives:
Copy code
updated launchplan successfully on cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0%
In the UI, the schedule is updated as well, it now says
Every minute
. Problem: I’m not seeing any executions. The flyte admin shows this in the logs after I activated the launchplan:
Copy code
{"json":{"src":"launch_plan.go:108"},"level":"info","msg":"Adding resource type for unspecified value in request: [%+v]id:\u003cproject:\"default\" domain:\"development\" name:\"cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0\" version:\"fabio-cron-2a3fb7f-dirty-BQN44\" \u003e state:ACTIVE ","ts":"2022-03-31T18:13:01Z"}
{"json":{"src":"log.go:42"},"level":"warning","msg":"Failed to parse authenticated client metadata when creating audit log","ts":"2022-03-31T18:13:01Z"}
{"json":{"src":"log.go:91"},"level":"info","msg":"Recording request: [{\"Principal\":{\"Subject\":\"\",\"ClientID\":\"\",\"TokenIssuedAt\":\"0001-01-01T00:00:00Z\"},\"Client\":{\"ClientIP\":\"\"},\"Request\":{\"Method\":\"UpdateLaunchPlan\",\"Parameters\":{\"domain\":\"development\",\"name\":\"cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0\",\"project\":\"default\",\"version\":\"fabio-cron-2a3fb7f-dirty-BQN44\"},\"Mode\":1,\"ReceivedAt\":\"2022-03-31T18:13:01.578706629Z\"},\"Response\":{\"ResponseCode\":\"OK\",\"SentAt\":\"2022-03-31T18:13:01.597983728Z\"}}]","ts":"2022-03-31T18:13:01Z"}
The
flytescheduler
doesn’t show any update in the logs. Does anyone have some guidance where to start debugging this? Thanks!
The helm values files contains the following:
Copy code
# --
# Flyte uses a cloud hosted Cron scheduler to run workflows on a schedule. The following module is optional. Without,
# this module, you will not have scheduled launchplans/workflows.
workflow_scheduler:
  enabled: true
  type: native
k
cc @katrina?
👀 1
k
hey @Fabio Grätz does flytescheduler show other logs? do you have other schedules active?
f
Copy code
time="2022-03-31T18:01:37Z" level=info msg="Using config file: [/etc/flyte/config/admin.yaml /etc/flyte/config/db.yaml /etc/flyte/config/logger.yaml /etc/flyte/config/server.yaml]"
{"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [externalevents] updated. No update handler registered.","ts":"2022-03-31T18:01:38Z"}
{"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [cluster_resources] updated. No update handler registered.","ts":"2022-03-31T18:01:38Z"}
{"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [queues] updated. No update handler registered.","ts":"2022-03-31T18:01:38Z"}
{"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [storage] updated. No update handler registered.","ts":"2022-03-31T18:01:38Z"}
{"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [remotedata] updated. No update handler registered.","ts":"2022-03-31T18:01:38Z"}
{"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [registration] updated. No update handler registered.","ts":"2022-03-31T18:01:38Z"}
{"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [task_resources] updated. No update handler registered.","ts":"2022-03-31T18:01:38Z"}

[33m[2022-03-31 18:01:38][0m [35m[info] registering callback `validations:validate` from /go/pkg/mod/github.com/qor/validations@v0.0.0-20171228122639-f364bca61b46/callbacks.go:74[0m [31;1m [0m

[33m[2022-03-31 18:01:38][0m [35m[info] registering callback `validations:validate` from /go/pkg/mod/github.com/qor/validations@v0.0.0-20171228122639-f364bca61b46/callbacks.go:77[0m [31;1m [0m
{"json":{"src":"client.go:181"},"level":"error","msg":"failed to initialize token source provider. Err: failed to fetch auth metadata. Error: rpc error: code = Unimplemented desc = unknown service flyteidl.service.AuthMetadataService","ts":"2022-03-31T18:01:38Z"}
{"json":{"src":"client.go:186"},"level":"warning","msg":"Starting an unauthenticated client because: can't create authenticated channel without a TokenSourceProvider","ts":"2022-03-31T18:01:38Z"}
{"json":{"src":"client.go:65"},"level":"info","msg":"Initialized Admin client","ts":"2022-03-31T18:01:38Z"}
{"json":{"src":"scheduler.go:60"},"level":"info","msg":"Successfully initialized a native flyte scheduler","ts":"2022-03-31T18:01:38Z"}
{"json":{"src":"schedule_executor.go:37"},"level":"info","msg":"Flyte native scheduler started successfully","ts":"2022-03-31T18:01:38Z"}
{"json":{"src":"server.go:96"},"level":"info","msg":"Starting profiling server on port [10254]","ts":"2022-03-31T18:01:38Z"}
{"json":{"src":"schedule_executor.go:57"},"level":"info","msg":"Number of schedules retrieved 0","ts":"2022-03-31T18:01:38Z"}
These are all its logs. No, no other schedules running, trying to get this to work on a schedule for the first time 🙂
p
Hi @Fabio Grätz, sorry that you are running into this issue. Also can you send the admin logs. Also can you post the output for
flytectl version
f
Copy code
flytectl version

 A new release of flytectl is available: 0.5.8 → v0.5.9
To upgrade, run: brew update && brew upgrade flytectl
<https://github.com/flyteorg/flytectl/releases/tag/v0.5.9>

{
  "App": "flytectl",
  "Build": "ebda958",
  "Version": "0.5.8",
  "BuildTime": "2022-04-01 10:22:09.671428 +0200 CEST m=+0.080706115"
}%
p
And also can you help me with your admin version that you are using in your deployment
f
These are the last 3k lines of admin (has been running for a while).
Copy code
Containers:
  flyteadmin:
    Container ID:  <containerd://f8538b021cabdbf6d3475f20991e259f3bc68ea5c6903c09197a3358dbb1bc3>7
    Image:         <http://cr.flyte.org/flyteorg/flyteadmin-release:v0.18.2|cr.flyte.org/flyteorg/flyteadmin-release:v0.18.2>
p
Thank you . let me check and get back to you on this
f
Thanks a lot, I really do appreciate the help 🙂
p
Also just to confirm your launch is correctly activated. Can you check this for me.
Copy code
flytectl get launchplans -p flytesnacks -d development cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 -o yaml
I tried using same schedule (
* * * * *
) on sample launch plan that we have in our flytesnacks examples here https://github.com/flyteorg/flytesnacks/blob/f7af27e23b3935a166645cf96a68583cdd263a87/cookbook/core/scheduled_workflows/lp_schedules.py#L55 but that worked well for me I deployed the latest sandbox though. One thing i could suggest at this point is to upgrade to latest released flyteadmin. The one you are using is 4 months old https://github.com/flyteorg/flyteadmin/pkgs/container/flyteadmin-release/16570755?tag=v0.19.3
Meanwhile i will try to dig more how this can happen
One additional thing i would like to check is do you have KickOffTimeInputArg on your schedule eg : cron schedule from flytesnacks with kickofftime
Copy code
cron_lp = LaunchPlan.get_or_create(
    name="my_cron_scheduled_lp",
    workflow=date_formatter_wf,
    schedule=CronSchedule(
        # Note that kickoff_time_input_arg matches the workflow input we defined above: kickoff_time
        # But in case you are using the AWS scheme of schedules and not using the native scheduler then switch over the schedule parameter with cron_expression
        schedule="* * * * *",  # Following schedule runs every min
        kickoff_time_input_arg="kickoff_time",
    ),
)
And later i see a lot deactivation attempts in your logs after 7 mins Activation sent at :
2022-03-31T18:12:32Z
First deactivation sent at :
2022-03-31T18:19:51Z
The o/p of
flytectl get launchplans -p flytesnacks -d development cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering  -o yaml
would be helpful
f
flytectl update launchplan -p default -d development --version fabio-cron-2a3fb7f-dirty-BQN44 cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 --activate
gives:
updated launchplan successfully on cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0
Then:
flytectl get launchplans -p default -d development cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0  -o yaml
(Note the different project and workflow name compared to the command you posted) One thing that might be relevant that I just realized: I’m accessing flyte via
kubectl -n flyte port-forward svc/flyteadmin 30081:81
One additional thing i would like to check is do you have KickOffTimeInputArg on your schedule
eg : cron schedule from flytesnacks with kickofftime
No, I don’t use this arg, is it required?
p
Sorry about that , but there seems to be a bug in the code which you seem to have hit https://github.com/flyteorg/flyteadmin/blob/64da9d3fba663da35fe2354d3b0bb20e519b4456/pkg/manager/impl/launch_plan_manager.go#L140 So CronExpression is the legacy way of defining schedules for which this function has a check but it doesn’t have a check for ScheduleExpression which is new way of defining schedules. And hence your schedule is not getting enabled. To unblock you can add a kick off time input arg as mentioned in flytesnacks example .(Fixed rate schedule doesn’t have this bug)
I will create an issue and we will have this released soon. Meanwhile please use the workaround for now
🙂 1
f
Ok perfect, this is what i will try, thanks for debugging so quickly!!!
k
thank you @Prafulla Mahindrakar!
f
I fear I might need your help again @Prafulla Mahindrakar I was able to create a cronjob via FixedRate but now I have the problem that the cronjob won’t stop anymore 🙈 First, it showed “Every 1 minutes” in the UI. After
flytectl update … --archive
it stopped showing that but still runs the workflow every minute.
Copy code
flytectl update launchplan -p default -d development --version fabio-cron-2a3fb7f-dirty-3M0U1 hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 --archive
gives
Copy code
updated launchplan successfully on hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0
But the command doesn’t cause any logs in the flytescheduler
k
what?
cc @katrina
f
k
was just about to ask for logs 🙂 thanks
Copy code
{"json":{"lp":"hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0","src":"event_scheduler_impl.go:81"},"level":"info","msg":"Deactivated the schedule {{UNSPECIFIED default development hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 fabio-cron-2a3fb7f-dirty-3M0U1 {} [] 0} } in the scheduler","ts":"2022-04-01T18:51:27Z"}
{"json":{"lp":"hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0","src":"launch_plan_manager.go:265"},"level":"debug","msg":"disabled launch plan: [resource_type:LAUNCH_PLAN project:\"default\" domain:\"development\" name:\"hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0\" version:\"fabio-cron-2a3fb7f-dirty-3M0U1\" ]","ts":"2022-04-01T18:51:27Z"}
this looks right, why is scheduler still running it?
f
Not sure
They all have this version:
And there appears to be only one launchplan for that:
Copy code
flytectl -p default -d development get launchplan | grep 3M0U1
| fabio-cron-2a3fb7f-dirty-3M0U1                     | hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0      | LAUNCH_PLAN | 2022-04-01T18:36:28.931942Z |
Is there a way to deactivate all launchplans?
pyflyte lp --help
suggests that there only is a way to activate-all.
k
no, but at most one launch plan should be active across {project}-{domain}-{name} versions
f
I redployed the helm chart (database and buckets are persistent of course) and now the workflow is not executed anymore. Flyte scheduler still shows these logs:
Copy code
{"json":{"src":"identifier.go:54"},"level":"debug","msg":"Returning hash for [{ResourceType:UNSPECIFIED Project:default Domain:development Name:cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-ERM5F XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}]: 5967604739973439342","ts":"2022-04-01T19:18:22Z"}
{"json":{"src":"gocron_scheduler.go:148"},"level":"debug","msg":"Job doesn't exists in the map for name 5967604739973439342 with schedule {BaseModel:{ID:1 CreatedAt:2022-04-01 13:42:12.120162 +0000 +0000 UpdatedAt:2022-04-01 13:45:18.437093 +0000 +0000 DeletedAt:\u003cnil\u003e} SchedulableEntityKey:{Project:default Domain:development Name:cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-ERM5F} CronExpression:* * * * * FixedRateValue:0 Unit:MINUTE KickoffTimeInputArg:kickoff_time Active:0xc0001f24f8}  and hence already removed","ts":"2022-04-01T19:18:22Z"}
{"json":{"src":"identifier.go:54"},"level":"debug","msg":"Returning hash for [{ResourceType:UNSPECIFIED Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-BYUKX XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}]: 13364209728234539433","ts":"2022-04-01T19:18:22Z"}
{"json":{"src":"gocron_scheduler.go:148"},"level":"debug","msg":"Job doesn't exists in the map for name 13364209728234539433 with schedule {BaseModel:{ID:2 CreatedAt:2022-04-01 18:32:29.471284 +0000 +0000 UpdatedAt:2022-04-01 18:32:47.517206 +0000 +0000 DeletedAt:\u003cnil\u003e} SchedulableEntityKey:{Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-BYUKX} CronExpression:* * * * * FixedRateValue:0 Unit:MINUTE KickoffTimeInputArg:kickoff_time Active:0xc0001f26a8}  and hence already removed","ts":"2022-04-01T19:18:22Z"}
{"json":{"src":"identifier.go:54"},"level":"debug","msg":"Returning hash for [{ResourceType:UNSPECIFIED Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-3M0U1 XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}]: 1598734420952108938","ts":"2022-04-01T19:18:22Z"}
{"json":{"src":"gocron_scheduler.go:148"},"level":"debug","msg":"Job doesn't exists in the map for name 1598734420952108938 with schedule {BaseModel:{ID:3 CreatedAt:2022-04-01 18:37:01.571398 +0000 +0000 UpdatedAt:2022-04-01 19:02:16.956318 +0000 +0000 DeletedAt:\u003cnil\u003e} SchedulableEntityKey:{Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-3M0U1} CronExpression: FixedRateValue:1 Unit:MINUTE KickoffTimeInputArg: Active:0xc0001f27e8}  and hence already removed","ts":"2022-04-01T19:18:22Z"}
{"json":{"src":"identifier.go:54"},"level":"debug","msg":"Returning hash for [{ResourceType:UNSPECIFIED Project:default Domain:development Name:cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-ERM5F XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}]: 5967604739973439342","ts":"2022-04-01T19:18:52Z"}
{"json":{"src":"gocron_scheduler.go:148"},"level":"debug","msg":"Job doesn't exists in the map for name 5967604739973439342 with schedule {BaseModel:{ID:1 CreatedAt:2022-04-01 13:42:12.120162 +0000 +0000 UpdatedAt:2022-04-01 13:45:18.437093 +0000 +0000 DeletedAt:\u003cnil\u003e} SchedulableEntityKey:{Project:default Domain:development Name:cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-ERM5F} CronExpression:* * * * * FixedRateValue:0 Unit:MINUTE KickoffTimeInputArg:kickoff_time Active:0xc00077a938}  and hence already removed","ts":"2022-04-01T19:18:52Z"}
{"json":{"src":"identifier.go:54"},"level":"debug","msg":"Returning hash for [{ResourceType:UNSPECIFIED Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-BYUKX XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}]: 13364209728234539433","ts":"2022-04-01T19:18:52Z"}
{"json":{"src":"gocron_scheduler.go:148"},"level":"debug","msg":"Job doesn't exists in the map for name 13364209728234539433 with schedule {BaseModel:{ID:2 CreatedAt:2022-04-01 18:32:29.471284 +0000 +0000 UpdatedAt:2022-04-01 18:32:47.517206 +0000 +0000 DeletedAt:\u003cnil\u003e} SchedulableEntityKey:{Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-BYUKX} CronExpression:* * * * * FixedRateValue:0 Unit:MINUTE KickoffTimeInputArg:kickoff_time Active:0xc00077aa88}  and hence already removed","ts":"2022-04-01T19:18:52Z"}
{"json":{"src":"identifier.go:54"},"level":"debug","msg":"Returning hash for [{ResourceType:UNSPECIFIED Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-3M0U1 XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}]: 1598734420952108938","ts":"2022-04-01T19:18:53Z"}
{"json":{"src":"gocron_scheduler.go:148"},"level":"debug","msg":"Job doesn't exists in the map for name 1598734420952108938 with schedule {BaseModel:{ID:3 CreatedAt:2022-04-01 18:37:01.571398 +0000 +0000 UpdatedAt:2022-04-01 19:02:16.956318 +0000 +0000 DeletedAt:\u003cnil\u003e} SchedulableEntityKey:{Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-3M0U1} CronExpression: FixedRateValue:1 Unit:MINUTE KickoffTimeInputArg: Active:0xc00077ab58}  and hence already removed","ts":"2022-04-01T19:18:53Z"}
k
I wonder if this is a similar issue with the scheduler not handling CronSchedule and CronExpression type schedules
but the logs you pasted indicate that the schedule isn't active in the scheduler's job store
f
It is also not running anymore now after redeploying the helm chart (logs are from after that).
🤞 1
I’m getting more errors now when registering workflows:
Copy code
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.INTERNAL
	details = "failed to create workflow in propeller the server could not find the requested resource (post flyteworkflows.flyte.lyft.com)"
	debug_error_string = "{"created":"@1648841090.727274000","description":"Error received from peer ipv4:127.0.0.1:30081","file":"src/core/lib/surface/call.cc","file_line":904,"grpc_message":"failed to create workflow in propeller the server could not find the requested resource (post flyteworkflows.flyte.lyft.com)","grpc_status":13}"
>
Maybe I bricked something. I think it might be best if I clear the database and the buckets and start fresh, see if the problem persists. Luckily it is not a system where I cannot do that 🙂
k
wait you get that registering workflows or starting executions?
f
Could be starting the execution, my python script first registers using flyteremote and then starts the execution.
Can confirm that the registration works. Execution doesn’t, also via UI:
Please don’t bother, I must have messed something up. I will first start fresh and then see if the problem re-occurs
Have a nice weekend!!
Thanks for the help 🙂
k
this almost looks like an issue with creating the flyteworkflow CRD, can you try
kubectl get fly
and see if that even returns successfully
f
kubectl get fly error: the server doesn’t have a resource type “fly”
Interesting
k
ah okay starting fresh should help 🙂 hope you have a good weekend too, i think i found an issue with the CronSchedule in the scheduler but I'll follow up with Prafulla offline to verify and/or fix
yeah a fresh deploy should fix that 😄
f
ah okay starting fresh should help 🙂 hope you have a good weekend too, i think i found an issue with the CronSchedule in the scheduler but I’ll follow up with Prafulla offline to verify and/or fix
I was using the FixedRateScheduler now.
1
p
Hi @Fabio Grätz, Would suggest for now please use the cron schedule with kick off time until we figure out this descheduling issue .
f
@Emirhan Karagül
186 Views