Any ideas for debugging why a registered cron job ...
# flyte-support
t
Any ideas for debugging why a registered cron job is not running? These green successes were manually queued runs. Normally if I see something in “Frequency” at the top I expect the Cron to run at that set frequency. Thank you!
t
i assume it’s active right?
(i don’t remember if the ui shows that up top if inactive)
getting the status of the launch plan with flytectl or something would be a good first check before diving into logs.
t
Yeah it’s active. In my experience if a launch plan is inactive it never shows up at the top here. Any recommendations as to which logs to check where I would see where the job should be getting queued up?
Nevermind, seems like Flyte just got a little bit stuck, deleted my Flyte binary pod in kubernetes and when it came back up it started up the cron job. All good! :)
h
Thank you for circling back, @tall-exabyte-99685, it's concerning to have to restart the binary to unstuck it. We would like to dig deeper in this. Do you capture logs by any chance? maybe upload the logs to CloudWatch/StackDriver?
t
When it was stuck there weren’t any logs at the time (it had been running for a long time with no cron workflows activated, it’s our dev cluster and our workflows only run in flyte on prod). If it happens again I will capture the logs and post them here, I’m also curious what’s going on
🙏 1
👍🏽 1
I've seen it stuck both ways now, now I will deactivate a cron job vs activate cron job and the deactivated job continues to run (until I delete the pod and let it restart). For context, this is the image we're running: cr.flyte.org/flyteorg/flyte-binary-release:v1.11.0 @high-park-82026 @thankful-minister-83577
f
Cc @high-park-82026 / @thankful-minister-83577 this does not seem right
Is it only with Flyte binary vs Flyte full
Some thread issue
t
We haven’t tried Flyte full since we don’t really have a use for it with our current needs. It doesn’t seem to be an issue that impacts all images though, for example we’ve never seen this on our production cluster, only on our dev cluster (where we run cron jobs much more rarely, really just to test to make sure things work as we expect before we activate on production).
t
still investigating this. if you notice this happening again, logs would be super helpful. logs from around the time the activate/deactivate call was made to the launch plan. Look for something like
Enabled schedules for activated launch plan
or
Activated scheduled entity for
. that should print when you activate the launch plan. if you have database access set up, could you also give us the record in
schedulable_entities
for the newly active launch plan, and after it’s supposed to run also the most recent two entries in
schedule_entities_snapshots
Feel free to dm it to us. it shouldn’t have any sensitive information, just names of launch plans.
👍 1
f
What version are you running, we had a bug few versions ago that we fixed
t
1.11.0
f
@tall-exabyte-99685 my guess is that Kube server is returning slow down or some other error and hence flyte is throttling you have raised the kubeclient config a lot
👍 1