When I upgraded `flytekit` to `1.13.7` (from `1.10...
# flyte-support
t
When I upgraded
flytekit
to
1.13.7
(from
1.10.3
) while the backend
flyte
version was
1.13.1
, I got the following system error running remote workflow
Copy code
RuntimeExecutionError: max number of system retry attempts [51/50] exhausted. Last known status message: worker error(s) encountered: [0]: failed at Node[n0]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [sidecar]: [BadTaskSpecification] invalid TaskSpecification, primary container [primary] not defined
Could it be caused by
flytekit
version being ahead of
flyte
version? How does version compatibility work?
p
can you send the workflow and tasks you're using? It looks like you might have a custom pod where the primary container is misnamed
t
Thanks for the pointer! The code is fairly complex with custom task resolver and custom container image, but I will check the container config to see if there is any mismatch between the flytekit versions.
p
you should describe the pod and make sure it has a container named
primary
do you have access to
kubectl
for your cluster?
t
yeah, I do have access to kubectl to describe the pod
It seems to be related to
map_task
p
can you put the describe pod output here? or the output of
kubectl -n datology-development describe pods <failing pod name> | grep primary_container_name
t
I’ll rerun the workflow. The previous pods have been cleaned up.
h
@thousands-car-79657, in flytekit 1.12.0 we switched the implementation of
map_task
to use array nodes. Just to unblock you, you can still import the legacy map task from here, but I'd love to understand what broke in your case.
This was supposed to be a no-op, but something broke in your case.
t
Let me check the code. We did a few tweaks to tailor map_task to our use cases, so very likely.
@prehistoric-leather-97354 I got:
Copy code
k -n staging describe pods fves1p1y-0 | grep primary_
                  primary_container_name: primary
p
ah then yeah I would agree with you it's some version mismatch issue. But maybe the map task needs to be given the primary container name explicitly?
t
Thanks for the tips, guys! I will report back
h
For sure. Once we have more details, let's try to capture that in a gh issue.
t
Ah, the issue is indeed
primary_container_name
. If the
task_config
of the
task
is not overwritten,
primary_container_name
will be the pod name. But if it’s overwritten with
flytekitplugins.pod.task.Pod
, then
primary_container_name
will be
primary
, which causes the error.
It turns out that the code was using legacy
flytekitplugins-pod
. Once I switched to
PodTemplate
(as advised here), it worked!