Fabio Grätz

    Fabio Grätz

    5 months ago
    I need some help getting launchplans with a cronjob configured to work: I first register a launchplan with a
    * * * * *
    cron expression via flyteremote. After that, the flyte UI still shows
    This workflow has no schedules
    , as expected. I then execute
    flytectl update launchplan -p default -d development --version fabio-cron-2a3fb7f-dirty-BQN44 cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 --activate
    which gives:
    updated launchplan successfully on cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0%
    In the UI, the schedule is updated as well, it now says
    Every minute
    . Problem: I’m not seeing any executions. The flyte admin shows this in the logs after I activated the launchplan:
    {"json":{"src":"launch_plan.go:108"},"level":"info","msg":"Adding resource type for unspecified value in request: [%+v]id:\u003cproject:\"default\" domain:\"development\" name:\"cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0\" version:\"fabio-cron-2a3fb7f-dirty-BQN44\" \u003e state:ACTIVE ","ts":"2022-03-31T18:13:01Z"}
    {"json":{"src":"log.go:42"},"level":"warning","msg":"Failed to parse authenticated client metadata when creating audit log","ts":"2022-03-31T18:13:01Z"}
    {"json":{"src":"log.go:91"},"level":"info","msg":"Recording request: [{\"Principal\":{\"Subject\":\"\",\"ClientID\":\"\",\"TokenIssuedAt\":\"0001-01-01T00:00:00Z\"},\"Client\":{\"ClientIP\":\"\"},\"Request\":{\"Method\":\"UpdateLaunchPlan\",\"Parameters\":{\"domain\":\"development\",\"name\":\"cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0\",\"project\":\"default\",\"version\":\"fabio-cron-2a3fb7f-dirty-BQN44\"},\"Mode\":1,\"ReceivedAt\":\"2022-03-31T18:13:01.578706629Z\"},\"Response\":{\"ResponseCode\":\"OK\",\"SentAt\":\"2022-03-31T18:13:01.597983728Z\"}}]","ts":"2022-03-31T18:13:01Z"}
    The
    flytescheduler
    doesn’t show any update in the logs. Does anyone have some guidance where to start debugging this? Thanks!
    The helm values files contains the following:
    # --
    # Flyte uses a cloud hosted Cron scheduler to run workflows on a schedule. The following module is optional. Without,
    # this module, you will not have scheduled launchplans/workflows.
    workflow_scheduler:
      enabled: true
      type: native
    Ketan (kumare3)

    Ketan (kumare3)

    5 months ago
    cc @katrina?
    k

    katrina

    5 months ago
    hey @Fabio Grätz does flytescheduler show other logs? do you have other schedules active?
    Fabio Grätz

    Fabio Grätz

    5 months ago
    time="2022-03-31T18:01:37Z" level=info msg="Using config file: [/etc/flyte/config/admin.yaml /etc/flyte/config/db.yaml /etc/flyte/config/logger.yaml /etc/flyte/config/server.yaml]"
    {"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [externalevents] updated. No update handler registered.","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [cluster_resources] updated. No update handler registered.","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [queues] updated. No update handler registered.","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [storage] updated. No update handler registered.","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [remotedata] updated. No update handler registered.","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [registration] updated. No update handler registered.","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"viper.go:398"},"level":"debug","msg":"Config section [task_resources] updated. No update handler registered.","ts":"2022-03-31T18:01:38Z"}
    
    [33m[2022-03-31 18:01:38][0m [35m[info] registering callback `validations:validate` from /go/pkg/mod/github.com/qor/validations@v0.0.0-20171228122639-f364bca61b46/callbacks.go:74[0m [31;1m [0m
    
    [33m[2022-03-31 18:01:38][0m [35m[info] registering callback `validations:validate` from /go/pkg/mod/github.com/qor/validations@v0.0.0-20171228122639-f364bca61b46/callbacks.go:77[0m [31;1m [0m
    {"json":{"src":"client.go:181"},"level":"error","msg":"failed to initialize token source provider. Err: failed to fetch auth metadata. Error: rpc error: code = Unimplemented desc = unknown service flyteidl.service.AuthMetadataService","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"client.go:186"},"level":"warning","msg":"Starting an unauthenticated client because: can't create authenticated channel without a TokenSourceProvider","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"client.go:65"},"level":"info","msg":"Initialized Admin client","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"scheduler.go:60"},"level":"info","msg":"Successfully initialized a native flyte scheduler","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"schedule_executor.go:37"},"level":"info","msg":"Flyte native scheduler started successfully","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"server.go:96"},"level":"info","msg":"Starting profiling server on port [10254]","ts":"2022-03-31T18:01:38Z"}
    {"json":{"src":"schedule_executor.go:57"},"level":"info","msg":"Number of schedules retrieved 0","ts":"2022-03-31T18:01:38Z"}
    These are all its logs. No, no other schedules running, trying to get this to work on a schedule for the first time 🙂
    p

    Prafulla Mahindrakar

    5 months ago
    Hi @Fabio Grätz, sorry that you are running into this issue. Also can you send the admin logs. Also can you post the output for
    flytectl version
    Fabio Grätz

    Fabio Grätz

    5 months ago
    flytectl version
    
     A new release of flytectl is available: 0.5.8 → v0.5.9
    To upgrade, run: brew update && brew upgrade flytectl
    <https://github.com/flyteorg/flytectl/releases/tag/v0.5.9>
    
    {
      "App": "flytectl",
      "Build": "ebda958",
      "Version": "0.5.8",
      "BuildTime": "2022-04-01 10:22:09.671428 +0200 CEST m=+0.080706115"
    }%
    p

    Prafulla Mahindrakar

    5 months ago
    And also can you help me with your admin version that you are using in your deployment
    Fabio Grätz

    Fabio Grätz

    5 months ago
    These are the last 3k lines of admin (has been running for a while).
    Containers:
      flyteadmin:
        Container ID:  <containerd://f8538b021cabdbf6d3475f20991e259f3bc68ea5c6903c09197a3358dbb1bc3>7
        Image:         <http://cr.flyte.org/flyteorg/flyteadmin-release:v0.18.2|cr.flyte.org/flyteorg/flyteadmin-release:v0.18.2>
    p

    Prafulla Mahindrakar

    5 months ago
    Thank you . let me check and get back to you on this
    Fabio Grätz

    Fabio Grätz

    5 months ago
    Thanks a lot, I really do appreciate the help 🙂
    p

    Prafulla Mahindrakar

    5 months ago
    Also just to confirm your launch is correctly activated. Can you check this for me.
    flytectl get launchplans -p flytesnacks -d development cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 -o yaml
    I tried using same schedule (
    * * * * *
    ) on sample launch plan that we have in our flytesnacks examples here https://github.com/flyteorg/flytesnacks/blob/f7af27e23b3935a166645cf96a68583cdd263a87/cookbook/core/scheduled_workflows/lp_schedules.py#L55 but that worked well for me I deployed the latest sandbox though. One thing i could suggest at this point is to upgrade to latest released flyteadmin. The one you are using is 4 months oldhttps://github.com/flyteorg/flyteadmin/pkgs/container/flyteadmin-release/16570755?tag=v0.19.3
    Meanwhile i will try to dig more how this can happen
    One additional thing i would like to check is do you have KickOffTimeInputArg on your schedule eg : cron schedule from flytesnacks with kickofftime
    cron_lp = LaunchPlan.get_or_create(
        name="my_cron_scheduled_lp",
        workflow=date_formatter_wf,
        schedule=CronSchedule(
            # Note that kickoff_time_input_arg matches the workflow input we defined above: kickoff_time
            # But in case you are using the AWS scheme of schedules and not using the native scheduler then switch over the schedule parameter with cron_expression
            schedule="* * * * *",  # Following schedule runs every min
            kickoff_time_input_arg="kickoff_time",
        ),
    )
    And later i see a lot deactivation attempts in your logs after 7 mins Activation sent at :
    2022-03-31T18:12:32Z
    First deactivation sent at :
    2022-03-31T18:19:51Z
    The o/p of
    flytectl get launchplans -p flytesnacks -d development cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering  -o yaml
    would be helpful
    Fabio Grätz

    Fabio Grätz

    5 months ago
    flytectl update launchplan -p default -d development --version fabio-cron-2a3fb7f-dirty-BQN44 cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 --activate
    gives:
    updated launchplan successfully on cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0
    Then:
    flytectl get launchplans -p default -d development cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0  -o yaml
    (Note the different project and workflow name compared to the command you posted) One thing that might be relevant that I just realized: I’m accessing flyte via
    kubectl -n flyte port-forward svc/flyteadmin 30081:81
    One additional thing i would like to check is do you have KickOffTimeInputArg on your schedule
    eg : cron schedule from flytesnacks with kickofftime
    No, I don’t use this arg, is it required?
    p

    Prafulla Mahindrakar

    5 months ago
    Sorry about that , but there seems to be a bug in the code which you seem to have hit https://github.com/flyteorg/flyteadmin/blob/64da9d3fba663da35fe2354d3b0bb20e519b4456/pkg/manager/impl/launch_plan_manager.go#L140 So CronExpression is the legacy way of defining schedules for which this function has a check but it doesn’t have a check for ScheduleExpression which is new way of defining schedules. And hence your schedule is not getting enabled. To unblock you can add a kick off time input arg as mentioned in flytesnacks example .(Fixed rate schedule doesn’t have this bug)
    I will create an issue and we will have this released soon. Meanwhile please use the workaround for now
    Fabio Grätz

    Fabio Grätz

    5 months ago
    Ok perfect, this is what i will try, thanks for debugging so quickly!!!
    k

    katrina

    5 months ago
    thank you @Prafulla Mahindrakar!
    Fabio Grätz

    Fabio Grätz

    5 months ago
    I fear I might need your help again @Prafulla Mahindrakar I was able to create a cronjob via FixedRate but now I have the problem that the cronjob won’t stop anymore 🙈 First, it showed “Every 1 minutes” in the UI. After
    flytectl update … --archive
    it stopped showing that but still runs the workflow every minute.
    flytectl update launchplan -p default -d development --version fabio-cron-2a3fb7f-dirty-3M0U1 hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 --archive
    gives
    updated launchplan successfully on hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0
    But the command doesn’t cause any logs in the flytescheduler
    message has been deleted
    Ketan (kumare3)

    Ketan (kumare3)

    5 months ago
    what?
    cc @katrina
    Fabio Grätz

    Fabio Grätz

    5 months ago
    k

    katrina

    5 months ago
    was just about to ask for logs 🙂 thanks
    {"json":{"lp":"hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0","src":"event_scheduler_impl.go:81"},"level":"info","msg":"Deactivated the schedule {{UNSPECIFIED default development hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 fabio-cron-2a3fb7f-dirty-3M0U1 {} [] 0} } in the scheduler","ts":"2022-04-01T18:51:27Z"}
    {"json":{"lp":"hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0","src":"launch_plan_manager.go:265"},"level":"debug","msg":"disabled launch plan: [resource_type:LAUNCH_PLAN project:\"default\" domain:\"development\" name:\"hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0\" version:\"fabio-cron-2a3fb7f-dirty-3M0U1\" ]","ts":"2022-04-01T18:51:27Z"}
    this looks right, why is scheduler still running it?
    Fabio Grätz

    Fabio Grätz

    5 months ago
    Not sure
    They all have this version:
    And there appears to be only one launchplan for that:
    flytectl -p default -d development get launchplan | grep 3M0U1
    | fabio-cron-2a3fb7f-dirty-3M0U1                     | hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0      | LAUNCH_PLAN | 2022-04-01T18:36:28.931942Z |
    Is there a way to deactivate all launchplans?
    pyflyte lp --help
    suggests that there only is a way to activate-all.
    k

    katrina

    5 months ago
    no, but at most one launch plan should be active across {project}-{domain}-{name} versions
    Fabio Grätz

    Fabio Grätz

    5 months ago
    I redployed the helm chart (database and buckets are persistent of course) and now the workflow is not executed anymore. Flyte scheduler still shows these logs:
    {"json":{"src":"identifier.go:54"},"level":"debug","msg":"Returning hash for [{ResourceType:UNSPECIFIED Project:default Domain:development Name:cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-ERM5F XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}]: 5967604739973439342","ts":"2022-04-01T19:18:22Z"}
    {"json":{"src":"gocron_scheduler.go:148"},"level":"debug","msg":"Job doesn't exists in the map for name 5967604739973439342 with schedule {BaseModel:{ID:1 CreatedAt:2022-04-01 13:42:12.120162 +0000 +0000 UpdatedAt:2022-04-01 13:45:18.437093 +0000 +0000 DeletedAt:\u003cnil\u003e} SchedulableEntityKey:{Project:default Domain:development Name:cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-ERM5F} CronExpression:* * * * * FixedRateValue:0 Unit:MINUTE KickoffTimeInputArg:kickoff_time Active:0xc0001f24f8}  and hence already removed","ts":"2022-04-01T19:18:22Z"}
    {"json":{"src":"identifier.go:54"},"level":"debug","msg":"Returning hash for [{ResourceType:UNSPECIFIED Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-BYUKX XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}]: 13364209728234539433","ts":"2022-04-01T19:18:22Z"}
    {"json":{"src":"gocron_scheduler.go:148"},"level":"debug","msg":"Job doesn't exists in the map for name 13364209728234539433 with schedule {BaseModel:{ID:2 CreatedAt:2022-04-01 18:32:29.471284 +0000 +0000 UpdatedAt:2022-04-01 18:32:47.517206 +0000 +0000 DeletedAt:\u003cnil\u003e} SchedulableEntityKey:{Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-BYUKX} CronExpression:* * * * * FixedRateValue:0 Unit:MINUTE KickoffTimeInputArg:kickoff_time Active:0xc0001f26a8}  and hence already removed","ts":"2022-04-01T19:18:22Z"}
    {"json":{"src":"identifier.go:54"},"level":"debug","msg":"Returning hash for [{ResourceType:UNSPECIFIED Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-3M0U1 XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}]: 1598734420952108938","ts":"2022-04-01T19:18:22Z"}
    {"json":{"src":"gocron_scheduler.go:148"},"level":"debug","msg":"Job doesn't exists in the map for name 1598734420952108938 with schedule {BaseModel:{ID:3 CreatedAt:2022-04-01 18:37:01.571398 +0000 +0000 UpdatedAt:2022-04-01 19:02:16.956318 +0000 +0000 DeletedAt:\u003cnil\u003e} SchedulableEntityKey:{Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-3M0U1} CronExpression: FixedRateValue:1 Unit:MINUTE KickoffTimeInputArg: Active:0xc0001f27e8}  and hence already removed","ts":"2022-04-01T19:18:22Z"}
    {"json":{"src":"identifier.go:54"},"level":"debug","msg":"Returning hash for [{ResourceType:UNSPECIFIED Project:default Domain:development Name:cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-ERM5F XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}]: 5967604739973439342","ts":"2022-04-01T19:18:52Z"}
    {"json":{"src":"gocron_scheduler.go:148"},"level":"debug","msg":"Job doesn't exists in the map for name 5967604739973439342 with schedule {BaseModel:{ID:1 CreatedAt:2022-04-01 13:42:12.120162 +0000 +0000 UpdatedAt:2022-04-01 13:45:18.437093 +0000 +0000 DeletedAt:\u003cnil\u003e} SchedulableEntityKey:{Project:default Domain:development Name:cron_hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-ERM5F} CronExpression:* * * * * FixedRateValue:0 Unit:MINUTE KickoffTimeInputArg:kickoff_time Active:0xc00077a938}  and hence already removed","ts":"2022-04-01T19:18:52Z"}
    {"json":{"src":"identifier.go:54"},"level":"debug","msg":"Returning hash for [{ResourceType:UNSPECIFIED Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-BYUKX XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}]: 13364209728234539433","ts":"2022-04-01T19:18:52Z"}
    {"json":{"src":"gocron_scheduler.go:148"},"level":"debug","msg":"Job doesn't exists in the map for name 13364209728234539433 with schedule {BaseModel:{ID:2 CreatedAt:2022-04-01 18:32:29.471284 +0000 +0000 UpdatedAt:2022-04-01 18:32:47.517206 +0000 +0000 DeletedAt:\u003cnil\u003e} SchedulableEntityKey:{Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-BYUKX} CronExpression:* * * * * FixedRateValue:0 Unit:MINUTE KickoffTimeInputArg:kickoff_time Active:0xc00077aa88}  and hence already removed","ts":"2022-04-01T19:18:52Z"}
    {"json":{"src":"identifier.go:54"},"level":"debug","msg":"Returning hash for [{ResourceType:UNSPECIFIED Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-3M0U1 XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}]: 1598734420952108938","ts":"2022-04-01T19:18:53Z"}
    {"json":{"src":"gocron_scheduler.go:148"},"level":"debug","msg":"Job doesn't exists in the map for name 1598734420952108938 with schedule {BaseModel:{ID:3 CreatedAt:2022-04-01 18:37:01.571398 +0000 +0000 UpdatedAt:2022-04-01 19:02:16.956318 +0000 +0000 DeletedAt:\u003cnil\u003e} SchedulableEntityKey:{Project:default Domain:development Name:hydra_workflow_cfg_flyte_workflows.collaborative_filtering.workflow_0 Version:fabio-cron-2a3fb7f-dirty-3M0U1} CronExpression: FixedRateValue:1 Unit:MINUTE KickoffTimeInputArg: Active:0xc00077ab58}  and hence already removed","ts":"2022-04-01T19:18:53Z"}
    k

    katrina

    5 months ago
    I wonder if this is a similar issue with the scheduler not handling CronSchedule and CronExpression type schedules
    but the logs you pasted indicate that the schedule isn't active in the scheduler's job store
    Fabio Grätz

    Fabio Grätz

    5 months ago
    It is also not running anymore now after redeploying the helm chart (logs are from after that).
    I’m getting more errors now when registering workflows:
    grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
    	status = StatusCode.INTERNAL
    	details = "failed to create workflow in propeller the server could not find the requested resource (post flyteworkflows.flyte.lyft.com)"
    	debug_error_string = "{"created":"@1648841090.727274000","description":"Error received from peer ipv4:127.0.0.1:30081","file":"src/core/lib/surface/call.cc","file_line":904,"grpc_message":"failed to create workflow in propeller the server could not find the requested resource (post flyteworkflows.flyte.lyft.com)","grpc_status":13}"
    >
    Maybe I bricked something. I think it might be best if I clear the database and the buckets and start fresh, see if the problem persists. Luckily it is not a system where I cannot do that 🙂
    k

    katrina

    5 months ago
    wait you get that registering workflows or starting executions?
    Fabio Grätz

    Fabio Grätz

    5 months ago
    Could be starting the execution, my python script first registers using flyteremote and then starts the execution.
    Can confirm that the registration works. Execution doesn’t, also via UI:
    Please don’t bother, I must have messed something up. I will first start fresh and then see if the problem re-occurs
    Have a nice weekend!!
    Thanks for the help 🙂
    k

    katrina

    5 months ago
    this almost looks like an issue with creating the flyteworkflow CRD, can you try
    kubectl get fly
    and see if that even returns successfully
    Fabio Grätz

    Fabio Grätz

    5 months ago
    kubectl get fly error: the server doesn’t have a resource type “fly”
    Interesting
    k

    katrina

    5 months ago
    ah okay starting fresh should help 🙂 hope you have a good weekend too, i think i found an issue with the CronSchedule in the scheduler but I'll follow up with Prafulla offline to verify and/or fix
    yeah a fresh deploy should fix that 😄
    Fabio Grätz

    Fabio Grätz

    5 months ago
    ah okay starting fresh should help 🙂 hope you have a good weekend too, i think i found an issue with the CronSchedule in the scheduler but I’ll follow up with Prafulla offline to verify and/or fix
    I was using the FixedRateScheduler now.
    p

    Prafulla Mahindrakar

    5 months ago
    Hi @Fabio Grätz, Would suggest for now please use the cron schedule with kick off time until we figure out this descheduling issue .
    Fabio Grätz

    Fabio Grätz

    5 months ago
    @Emirhan Karagül