I’ve been trying to get <runtime PodTemplates> wor...
# flyte-deployment
I’ve been trying to get runtime PodTemplates working, but struggling to. I have deployed the following PodTemplate in the
namespace - this is where the Flyte binary is deployed. I’ve also deployed this in the Flyte project/domain workspace, but the same issue happens.
Copy code
apiVersion: v1
kind: PodTemplate
  name: flyte-template-test
  namespace: flyte-backend
      - name: default
        image: <http://docker.io/rwgrim/docker-noop|docker.io/rwgrim/docker-noop>
        terminationMessagePath: "/dev/foo"
    hostNetwork: false
Then I try and use it with a super simple test:
Copy code
def check() -> bool:
    return True
Submit it, and it errors with
Copy code
Workflow[playground:development:using_templates.template_wf.gpu_workflow] failed. RuntimeExecutionError: max number of system retry attempts [11/10] exhausted. Last known status message: failed at Node[n0]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [container]: [BadTaskSpecification] PodTemplate 'flyte-template-test' does not exist
I’ve been through the permissions, and the default cluster role should have permission?
Copy code
      - podtemplates
  - verbs:
      - create
      - delete
      - deletecollection
      - get
      - list
      - patch
      - post
      - update
      - watch
Does anyone have any pointers? Should be able to do some really cool stuff once I’ve got this working
basic question... you confirmed the pod template was actually deployed?
Yep… unless I’m missing something super basic?
ok and the default cluster role is bound to the flyte service account?
and they're in the same namespace?
Yep thats all in the
namespace, all pretty much default as per the flyte-binary helm chart 😬
Well that's the low-hanging fruit, I'm about to try this myself so will let you know how it goes
Thanks, appreciate it 🙏
I should add as well, I have logging maxed out and it doesn’t seem to spitting anything out with respect to this 🤨
@Michael Tinsley with debug logging on flytepropeller it should output a log message like
registered PodTemplate '%s:%s' in store
is the namespace and PodTemplate name if it is added to the cache. If this doesn't exist, then something is blocking propeller from seeing it.
Also, do you know what versions of flytekit and flytepropeller you are running? This is a relatively new feature.
Okay so I’ve enabled debug logging, and it does reference the PodTemplate error - I must have missed this before…. I don’t get any mention of a PodTemplate being registered though 🤨 Is there any setting that needs enabling for PodTemplates? And I am running v1.8.1 of the flyte-binary helm chart - so everything should be up to date 😬
Is there any setting that needs enabling for PodTemplates?
There shouldn't be. Propeller uses a k8s resource watch to capture PodTemplate creations, updates, and deletes. Maybe you could try to emulate a watch using kubectl (something like
kubectl get --watch ...
) with the same permissions as propeller and see if the PodTemplate changes are viewable. It sounds like for some reason propeller isn't being notified of updates from the k8s apiserver.
@Michael Tinsley I ran into the same problem as you, the only solution I found was to set default_pod_template_name:
Copy code
I’ve just tried this and can confirm it works 🙌 … I’ve added a default and the logs show they’ve all been picked up and they are usable! Awesome! Thank you @Gopal Vashishtha @Dan Rammer (hamersaw) - Is this behaviour expected? Or should I create a bug report?
@Michael Tinsley please file a bug. I know in earlier version for PodTemplate support we didn't start the watch API unless the
value was set. However, with the introduction of task-level PodTemplates this should be running continuously.
Thanks 👍 Opened an issue -> https://github.com/flyteorg/flyte/issues/3946
You guys are the best, I ran into this exact issue today. Will try setting the default in yaml on Monday!