[RESOLVED] Facing below error while trying to use ...
# ask-the-community
m
[RESOLVED] Facing below error while trying to use podTemplate for a TfJob task.
Copy code
Workflow[moj-ml-workflows-flyte-example:development:flyte.example.tfjob.workflow.mnist_tensorflow_workflow] failed. RuntimeExecutionError: max number of system retry attempts [51/50] exhausted. Last known status message: failed at Node[n0]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [tensorflow]: [BadTaskSpecification] Unable to create pod spec: [[BadTaskSpecification] PodTemplate 'flyte-pod-template' does not exist]
Have created podTemplate using below yaml file:
Copy code
apiVersion: v1
kind: PodTemplate
metadata:
  name: flyte-pod-template
  namespace: development
template:
  metadata:
    labels:
      foo: testLabel
    annotations:
      foo: initial-value
      bar: initial-value
  spec:
    containers:
      - name: default
        image: <http://docker.io/rwgrim/docker-noop|docker.io/rwgrim/docker-noop>
        terminationMessagePath: "/dev/foo"
Verified it in corresponding k8s namespace:
Copy code
kubectl get podTemplate -n development

NAME         CONTAINERS  IMAGES             POD LABELS
flyte-pod-template  default   <http://docker.io/rwgrim/docker-noop|docker.io/rwgrim/docker-noop>  foo=testLabel
Copy code
kubectl get podTemplate -n flyte

NAME         CONTAINERS  IMAGES             POD LABELS
flyte-pod-template  default   <http://docker.io/rwgrim/docker-noop|docker.io/rwgrim/docker-noop>  foo=templateLabel
Task definintion:
s
I think the pod template needs to be present in the namespace where the execution is triggered.
m
The pods are getting created in
development
namespace and flyte cluster is setup in
flyte
namespace both has the podTemplate present with this name
s
It needs to be present in the flytesnacks-development namespace (assuming flytesnacks is the project you're triggering your execution in)
m
For me the executions/flyte-pods are getting created in just
development
namespace, no project name prefixed
Copy code
kubectl get <http://flyteworkflows.flyte.lyft.com|flyteworkflows.flyte.lyft.com> -n development
NAME          AGE
af80ebe30d4426b22c38  10h
ar22wvbzk49qrc7b2htn  3m24s
ffdf33fba0b3b9f42000  6h49m
UI view for my example project, all the successful/failed executions go into
development
namespace
m
I ran into the same issue recently, this is due to a bug. I am not an my PC just now but I will link the thread when I am. The fix however is to create a completely default pod template and set it as the default pod template in the flyte config, then Flyte will start picking up all the templates and you should be good
m
Aah, the default template doesn't serve my purpose though 😞 ... I needed custom labels for the individual tasks. For TfJobs the labels aren't getting propagated from launchplan or so, the task level podTemplate was my only hope
Do you mean one default template in config will help the cluster to pick these individual task template as well?
m
You misunderstand.... You need to set a default flyte wide podTemplate, and then you can use a task level template. It's an issue where Flyte doesn't start looking for templates at all unless a default is set
m
Got it...on to it 🤞
m
Give it a go and you will see what I mean 😀
f
There is a hierarchy. You can have a pod template in the flyte namespace, then one in the namespace of the execution. Then on the task.
m
Thanks @Michael Tinsley, it worked!!
Thanks @Samhita Alla @Fabio Grätz