• Fabio Grätz

    Fabio Grätz

    4 months ago
    I need some help getting launchplans with a cronjob configured to work: I first register a launchplan with a
    * * * * *
    cron expression via flyteremote. After that, the flyte UI still shows
    This workflow has no schedules
    , as expected. I then execute
    flytectl update launchplan -p default -d development --version fabio-cron-2a3fb7f-dirty-BQN44 cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 --activate
    which gives:
    updated launchplan successfully on cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0%
    In the UI, the schedule is updated as well, it now says
    Every minute
    . Problem: I’m not seeing any executions. The flyte admin shows this in the logs after I activated the launchplan:
    {"json":{"src":"launch_plan.go:108"},"level":"info","msg":"Adding resource type for unspecified value in request: [%+v]id:\u003cproject:\"default\" domain:\"development\" name:\"cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0\" version:\"fabio-cron-2a3fb7f-dirty-BQN44\" \u003e state:ACTIVE ","ts":"2022-03-31T18:13:01Z"}
    {"json":{"src":"log.go:42"},"level":"warning","msg":"Failed to parse authenticated client metadata when creating audit log","ts":"2022-03-31T18:13:01Z"}
    {"json":{"src":"log.go:91"},"level":"info","msg":"Recording request: [{\"Principal\":{\"Subject\":\"\",\"ClientID\":\"\",\"TokenIssuedAt\":\"0001-01-01T00:00:00Z\"},\"Client\":{\"ClientIP\":\"\"},\"Request\":{\"Method\":\"UpdateLaunchPlan\",\"Parameters\":{\"domain\":\"development\",\"name\":\"cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0\",\"project\":\"default\",\"version\":\"fabio-cron-2a3fb7f-dirty-BQN44\"},\"Mode\":1,\"ReceivedAt\":\"2022-03-31T18:13:01.578706629Z\"},\"Response\":{\"ResponseCode\":\"OK\",\"SentAt\":\"2022-03-31T18:13:01.597983728Z\"}}]","ts":"2022-03-31T18:13:01Z"}
    The
    flytescheduler
    doesn’t show any update in the logs. Does anyone have some guidance where to start debugging this? Thanks!
  • The helm values files contains the following:
    # --
    # Flyte uses a cloud hosted Cron scheduler to run workflows on a schedule. The following module is optional. Without,
    # this module, you will not have scheduled launchplans/workflows.
    workflow_scheduler:
      enabled: true
      type: native
  • Ketan (kumare3)

    Ketan (kumare3)

    4 months ago
    cc @katrina?
  • k

    katrina

    4 months ago
    hey @Fabio Grätz does flytescheduler show other logs? do you have other schedules active?
  • Fabio Grätz

    Fabio Grätz

    4 months ago
    time="2022-03-31T18:01:37Z" level=info msg="Using config file: [/etc/flyte/config/admin.yaml /etc/flyte/config/db.yaml /etc/flyte/config/logger.yaml /etc/flyte/config/server.yaml]"
    {"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [externalevents] updated. No update handler registered.","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [cluster_resources] updated. No update handler registered.","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [queues] updated. No update handler registered.","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [storage] updated. No update handler registered.","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [remotedata] updated. No update handler registered.","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [registration] updated. No update handler registered.","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [task_resources] updated. No update handler registered.","ts":"2022-03-31T18:01:38Z"}
    
    [33m[2022-03-31 18:01:38][0m [35m[info] registering callback `validations:validate` from /go/pkg/mod/github.com/qor/validations@v0.0.0-20171228122639-f364bca61b46/callbacks.go:74[0m [31;1m [0m
    
    [33m[2022-03-31 18:01:38][0m [35m[info] registering callback `validations:validate` from /go/pkg/mod/github.com/qor/validations@v0.0.0-20171228122639-f364bca61b46/callbacks.go:77[0m [31;1m [0m
    {"json":{"src":"client.go:181"},"level":"error","msg":"failed to initialize token source provider. Err: failed to fetch auth metadata. Error: rpc error: code = Unimplemented desc = unknown service flyteidl.service.AuthMetadataService","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"client.go:186"},"level":"warning","msg":"Starting an unauthenticated client because: can't create authenticated channel without a TokenSourceProvider","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"client.go:65"},"level":"info","msg":"Initialized Admin client","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"scheduler.go:60"},"level":"info","msg":"Successfully initialized a native flyte scheduler","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"schedule_executor.go:37"},"level":"info","msg":"Flyte native scheduler started successfully","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"server.go:96"},"level":"info","msg":"Starting profiling server on port [10254]","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"schedule_executor.go:57"},"level":"info","msg":"Number of schedules retrieved 0","ts":"2022-03-31T18:01:38Z"}
  • These are all its logs. No, no other schedules running, trying to get this to work on a schedule for the first time 🙂
  • p

    Prafulla Mahindrakar

    4 months ago
    Hi @Fabio Grätz, sorry that you are running into this issue. Also can you send the admin logs. Also can you post the output for
    flytectl version
  • Fabio Grätz

    Fabio Grätz

    4 months ago
    flytectl version
    
     A new release of flytectl is available: 0.5.8 → v0.5.9
    To upgrade, run: brew update && brew upgrade flytectl
    <https://github.com/flyteorg/flytectl/releases/tag/v0.5.9>
    
    {
      "App": "flytectl",
      "Build": "ebda958",
      "Version": "0.5.8",
      "BuildTime": "2022-04-01 10:22:09.671428 +0200 CEST m=+0.080706115"
    }%
  • p

    Prafulla Mahindrakar

    4 months ago
    And also can you help me with your admin version that you are using in your deployment
  • Fabio Grätz

    Fabio Grätz

    4 months ago
    These are the last 3k lines of admin (has been running for a while).
  • Containers:
      flyteadmin:
        Container ID:  <containerd://f8538b021cabdbf6d3475f20991e259f3bc68ea5c6903c09197a3358dbb1bc3>7
        Image:         <http://cr.flyte.org/flyteorg/flyteadmin-release:v0.18.2|cr.flyte.org/flyteorg/flyteadmin-release:v0.18.2>
  • p

    Prafulla Mahindrakar

    4 months ago
    Thank you . let me check and get back to you on this
  • Fabio Grätz

    Fabio Grätz

    4 months ago
    Thanks a lot, I really do appreciate the help 🙂
  • p

    Prafulla Mahindrakar

    4 months ago
    Also just to confirm your launch is correctly activated. Can you check this for me.
    flytectl get launchplans -p flytesnacks -d development cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 -o yaml
    I tried using same schedule (
    * * * * *
    ) on sample launch plan that we have in our flytesnacks examples here https://github.com/flyteorg/flytesnacks/blob/f7af27e23b3935a166645cf96a68583cdd263a87/cookbook/core/scheduled_workflows/lp_schedules.py#L55 but that worked well for me I deployed the latest sandbox though. One thing i could suggest at this point is to upgrade to latest released flyteadmin. The one you are using is 4 months oldhttps://github.com/flyteorg/flyteadmin/pkgs/container/flyteadmin-release/16570755?tag=v0.19.3
  • Meanwhile i will try to dig more how this can happen
  • One additional thing i would like to check is do you have KickOffTimeInputArg on your schedule eg : cron schedule from flytesnacks with kickofftime
    cron_lp = LaunchPlan.get_or_create(
        name="my_cron_scheduled_lp",
        workflow=date_formatter_wf,
        schedule=CronSchedule(
            # Note that kickoff_time_input_arg matches the workflow input we defined above: kickoff_time
            # But in case you are using the AWS scheme of schedules and not using the native scheduler then switch over the schedule parameter with cron_expression
            schedule="* * * * *",  # Following schedule runs every min
            kickoff_time_input_arg="kickoff_time",
        ),
    )
  • And later i see a lot deactivation attempts in your logs after 7 mins Activation sent at :
    2022-03-31T18:12:32Z
    First deactivation sent at :
    2022-03-31T18:19:51Z
  • The o/p of
    flytectl get launchplans -p flytesnacks -d development cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering  -o yaml
    would be helpful
  • Fabio Grätz

    Fabio Grätz

    4 months ago
    flytectl update launchplan -p default -d development --version fabio-cron-2a3fb7f-dirty-BQN44 cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 --activate
    gives:
    updated launchplan successfully on cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0
    Then:
    flytectl get launchplans -p default -d development cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0  -o yaml
    (Note the different project and workflow name compared to the command you posted) One thing that might be relevant that I just realized: I’m accessing flyte via
    kubectl -n flyte port-forward svc/flyteadmin 30081:81
  • One additional thing i would like to check is do you have KickOffTimeInputArg on your schedule eg : cron schedule from flytesnacks with kickofftime No, I don’t use this arg, is it required?
  • p

    Prafulla Mahindrakar

    4 months ago
    Sorry about that , but there seems to be a bug in the code which you seem to have hit https://github.com/flyteorg/flyteadmin/blob/64da9d3fba663da35fe2354d3b0bb20e519b4456/pkg/manager/impl/launch_plan_manager.go#L140 So CronExpression is the legacy way of defining schedules for which this function has a check but it doesn’t have a check for ScheduleExpression which is new way of defining schedules. And hence your schedule is not getting enabled. To unblock you can add a kick off time input arg as mentioned in flytesnacks example .(Fixed rate schedule doesn’t have this bug)
  • I will create an issue and we will have this released soon. Meanwhile please use the workaround for now
  • Fabio Grätz

    Fabio Grätz

    4 months ago
    Ok perfect, this is what i will try, thanks for debugging so quickly!!!
  • k

    katrina

    4 months ago
    thank you @Prafulla Mahindrakar!
  • Fabio Grätz

    Fabio Grätz

    4 months ago
    I fear I might need your help again @Prafulla Mahindrakar I was able to create a cronjob via FixedRate but now I have the problem that the cronjob won’t stop anymore 🙈 First, it showed “Every 1 minutes” in the UI. After
    flytectl update … --archive
    it stopped showing that but still runs the workflow every minute.
    flytectl update launchplan -p default -d development --version fabio-cron-2a3fb7f-dirty-3M0U1 hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 --archive
    gives
    updated launchplan successfully on hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0
    But the command doesn’t cause any logs in the flytescheduler
  • message has been deleted
  • Ketan (kumare3)

    Ketan (kumare3)

    4 months ago
    what?
  • cc @katrina
  • Fabio Grätz

    Fabio Grätz

    4 months ago
  • k

    katrina

    4 months ago
    was just about to ask for logs 🙂 thanks
  • {"json":{"lp":"hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0","src":"event_scheduler_impl.go:81"},"level":"info","msg":"Deactivated the schedule {{UNSPECIFIED default development hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 fabio-cron-2a3fb7f-dirty-3M0U1 {} [] 0} } in the scheduler","ts":"2022-04-01T18:51:27Z"}
    {"json":{"lp":"hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0","src":"launch_plan_manager.go:265"},"level":"debug","msg":"disabled launch plan: [resource_type:LAUNCH_PLAN project:\"default\" domain:\"development\" name:\"hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0\" version:\"fabio-cron-2a3fb7f-dirty-3M0U1\" ]","ts":"2022-04-01T18:51:27Z"}
    this looks right, why is scheduler still running it?
  • Fabio Grätz

    Fabio Grätz

    4 months ago
    Not sure
  • They all have this version:
  • And there appears to be only one launchplan for that:
    flytectl -p default -d development get launchplan | grep 3M0U1
    | fabio-cron-2a3fb7f-dirty-3M0U1                     | hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0      | LAUNCH_PLAN | 2022-04-01T18:36:28.931942Z |
  • Is there a way to deactivate all launchplans?
  • pyflyte lp --help
    suggests that there only is a way to activate-all.
  • k

    katrina

    4 months ago
    no, but at most one launch plan should be active across {project}-{domain}-{name} versions
  • Fabio Grätz

    Fabio Grätz

    4 months ago
    I redployed the helm chart (database and buckets are persistent of course) and now the workflow is not executed anymore. Flyte scheduler still shows these logs:
    {"json":{"src":"identifier.go:54"},"level":"debug","msg":"Returning hash for [{ResourceType:UNSPECIFIED Project:default Domain:development Name:cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-ERM5F XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}]: 5967604739973439342","ts":"2022-04-01T19:18:22Z"}
    {"json":{"src":"gocron_scheduler.go:148"},"level":"debug","msg":"Job doesn't exists in the map for name 5967604739973439342 with schedule {BaseModel:{ID:1 CreatedAt:2022-04-01 13:42:12.120162 +0000 +0000 UpdatedAt:2022-04-01 13:45:18.437093 +0000 +0000 DeletedAt:\u003cnil\u003e} SchedulableEntityKey:{Project:default Domain:development Name:cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-ERM5F} CronExpression:* * * * * FixedRateValue:0 Unit:MINUTE KickoffTimeInputArg:kickoff_time Active:0xc0001f24f8}  and hence already removed","ts":"2022-04-01T19:18:22Z"}
    {"json":{"src":"identifier.go:54"},"level":"debug","msg":"Returning hash for [{ResourceType:UNSPECIFIED Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-BYUKX XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}]: 13364209728234539433","ts":"2022-04-01T19:18:22Z"}
    {"json":{"src":"gocron_scheduler.go:148"},"level":"debug","msg":"Job doesn't exists in the map for name 13364209728234539433 with schedule {BaseModel:{ID:2 CreatedAt:2022-04-01 18:32:29.471284 +0000 +0000 UpdatedAt:2022-04-01 18:32:47.517206 +0000 +0000 DeletedAt:\u003cnil\u003e} SchedulableEntityKey:{Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-BYUKX} CronExpression:* * * * * FixedRateValue:0 Unit:MINUTE KickoffTimeInputArg:kickoff_time Active:0xc0001f26a8}  and hence already removed","ts":"2022-04-01T19:18:22Z"}
    {"json":{"src":"identifier.go:54"},"level":"debug","msg":"Returning hash for [{ResourceType:UNSPECIFIED Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-3M0U1 XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}]: 1598734420952108938","ts":"2022-04-01T19:18:22Z"}
    {"json":{"src":"gocron_scheduler.go:148"},"level":"debug","msg":"Job doesn't exists in the map for name 1598734420952108938 with schedule {BaseModel:{ID:3 CreatedAt:2022-04-01 18:37:01.571398 +0000 +0000 UpdatedAt:2022-04-01 19:02:16.956318 +0000 +0000 DeletedAt:\u003cnil\u003e} SchedulableEntityKey:{Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-3M0U1} CronExpression: FixedRateValue:1 Unit:MINUTE KickoffTimeInputArg: Active:0xc0001f27e8}  and hence already removed","ts":"2022-04-01T19:18:22Z"}
    {"json":{"src":"identifier.go:54"},"level":"debug","msg":"Returning hash for [{ResourceType:UNSPECIFIED Project:default Domain:development Name:cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-ERM5F XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}]: 5967604739973439342","ts":"2022-04-01T19:18:52Z"}
    {"json":{"src":"gocron_scheduler.go:148"},"level":"debug","msg":"Job doesn't exists in the map for name 5967604739973439342 with schedule {BaseModel:{ID:1 CreatedAt:2022-04-01 13:42:12.120162 +0000 +0000 UpdatedAt:2022-04-01 13:45:18.437093 +0000 +0000 DeletedAt:\u003cnil\u003e} SchedulableEntityKey:{Project:default Domain:development Name:cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-ERM5F} CronExpression:* * * * * FixedRateValue:0 Unit:MINUTE KickoffTimeInputArg:kickoff_time Active:0xc00077a938}  and hence already removed","ts":"2022-04-01T19:18:52Z"}
    {"json":{"src":"identifier.go:54"},"level":"debug","msg":"Returning hash for [{ResourceType:UNSPECIFIED Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-BYUKX XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}]: 13364209728234539433","ts":"2022-04-01T19:18:52Z"}
    {"json":{"src":"gocron_scheduler.go:148"},"level":"debug","msg":"Job doesn't exists in the map for name 13364209728234539433 with schedule {BaseModel:{ID:2 CreatedAt:2022-04-01 18:32:29.471284 +0000 +0000 UpdatedAt:2022-04-01 18:32:47.517206 +0000 +0000 DeletedAt:\u003cnil\u003e} SchedulableEntityKey:{Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-BYUKX} CronExpression:* * * * * FixedRateValue:0 Unit:MINUTE KickoffTimeInputArg:kickoff_time Active:0xc00077aa88}  and hence already removed","ts":"2022-04-01T19:18:52Z"}
    {"json":{"src":"identifier.go:54"},"level":"debug","msg":"Returning hash for [{ResourceType:UNSPECIFIED Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-3M0U1 XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}]: 1598734420952108938","ts":"2022-04-01T19:18:53Z"}
    {"json":{"src":"gocron_scheduler.go:148"},"level":"debug","msg":"Job doesn't exists in the map for name 1598734420952108938 with schedule {BaseModel:{ID:3 CreatedAt:2022-04-01 18:37:01.571398 +0000 +0000 UpdatedAt:2022-04-01 19:02:16.956318 +0000 +0000 DeletedAt:\u003cnil\u003e} SchedulableEntityKey:{Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-3M0U1} CronExpression: FixedRateValue:1 Unit:MINUTE KickoffTimeInputArg: Active:0xc00077ab58}  and hence already removed","ts":"2022-04-01T19:18:53Z"}
  • k

    katrina

    4 months ago
    I wonder if this is a similar issue with the scheduler not handling CronSchedule and CronExpression type schedules
  • but the logs you pasted indicate that the schedule isn't active in the scheduler's job store
  • Fabio Grätz

    Fabio Grätz

    4 months ago
    It is also not running anymore now after redeploying the helm chart (logs are from after that).
  • I’m getting more errors now when registering workflows:
    grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    	status = StatusCode.INTERNAL
    	details = "failed to create workflow in propeller the server could not find the requested resource (post flyteworkflows.flyte.lyft.com)"
    	debug_error_string = "{"created":"@1648841090.727274000","description":"Error received from peer ipv4:127.0.0.1:30081","file":"src/core/lib/surface/call.cc","file_line":904,"grpc_message":"failed to create workflow in propeller the server could not find the requested resource (post flyteworkflows.flyte.lyft.com)","grpc_status":13}"
    >
    Maybe I bricked something. I think it might be best if I clear the database and the buckets and start fresh, see if the problem persists. Luckily it is not a system where I cannot do that 🙂
  • k

    katrina

    4 months ago
    wait you get that registering workflows or starting executions?
  • Fabio Grätz

    Fabio Grätz

    4 months ago
    Could be starting the execution, my python script first registers using flyteremote and then starts the execution.
  • Can confirm that the registration works. Execution doesn’t, also via UI:
  • Please don’t bother, I must have messed something up. I will first start fresh and then see if the problem re-occurs
  • Have a nice weekend!!
  • Thanks for the help 🙂
  • k

    katrina

    4 months ago
    this almost looks like an issue with creating the flyteworkflow CRD, can you try
    kubectl get fly
    and see if that even returns successfully
  • Fabio Grätz

    Fabio Grätz

    4 months ago
    kubectl get fly error: the server doesn’t have a resource type “fly”
  • Interesting
  • k

    katrina

    4 months ago
    ah okay starting fresh should help 🙂 hope you have a good weekend too, i think i found an issue with the CronSchedule in the scheduler but I'll follow up with Prafulla offline to verify and/or fix
  • yeah a fresh deploy should fix that 😄
  • Fabio Grätz

    Fabio Grätz

    4 months ago
    ah okay starting fresh should help 🙂 hope you have a good weekend too, i think i found an issue with the CronSchedule in the scheduler but I’ll follow up with Prafulla offline to verify and/or fix I was using the FixedRateScheduler now.
  • p

    Prafulla Mahindrakar

    4 months ago
    Hi @Fabio Grätz, Would suggest for now please use the cron schedule with kick off time until we figure out this descheduling issue .
  • Fabio Grätz

    Fabio Grätz

    4 months ago
    @Emirhan Karagül