fierce-oil-47448
06/26/2024, 6:09 AMtemplate:
metadata:
annotations:
<http://yunikorn.apache.org/task-group-name|yunikorn.apache.org/task-group-name>: ""
<http://yunikorn.apache.org/task-groups|yunikorn.apache.org/task-groups>: ""
<http://yunikorn.apache.org/schedulingPolicyParameters|yunikorn.apache.org/schedulingPolicyParameters>: ""
Should the task-group-name be unique value shared for all the pods that belong to the same Flyte execution and task group? Or is there a different unique identifier that Yubikorn will pick to identify pods that belong to the same Flyte execution and task-group-name needs to be unique only within the specific Flyte execution.
Cc @cool-lifeguard-49380cool-lifeguard-49380
06/26/2024, 6:59 AMfierce-oil-47448
06/26/2024, 7:00 AMfierce-oil-47448
06/26/2024, 7:04 AMapplicationId
label is needed for identifying pods belonging to the same application.cool-lifeguard-49380
06/26/2024, 7:12 AMcool-lifeguard-49380
06/26/2024, 7:13 AMfierce-oil-47448
06/26/2024, 7:23 AMfierce-oil-47448
06/26/2024, 7:24 AMcool-lifeguard-49380
06/26/2024, 7:38 AMcool-lifeguard-49380
06/26/2024, 7:38 AMfierce-oil-47448
06/26/2024, 7:41 AMfierce-oil-47448
06/26/2024, 7:41 AMcool-lifeguard-49380
06/26/2024, 7:43 AMfierce-oil-47448
06/26/2024, 7:44 AMcool-lifeguard-49380
06/26/2024, 8:02 AMglamorous-carpet-83516
06/26/2024, 9:45 AMswift-scooter-38934
06/26/2024, 4:09 PMapiVersion: kubeflow.org/v1
kind: TFJob
metadata:
name: dist-mnist-for-e2e-test
namespace: kubeflow
labels:
applicationId: app-01
spec:
tfReplicaSpecs:
PS:
replicas: 1
restartPolicy: Never
template:
metadata:
labels:
applicationId: "tf-job-001"
queue: root.sandbox
annotations:
yunikorn.apache.org/task-group-name: task-group-example
yunikorn.apache.org/task-groups: |-
[{
"name": "task-group-example",
"minMember": 3,
"minResource": {
"cpu": "100m",
"memory": "50M"
},
"nodeSelector": {},
"tolerations": [],
"affinity": {},
"topologySpreadConstraints": []
}]
spec:
schedulerName: yunikorn
containers:
- name: tensorflow
image: kubeflow/tf-dist-mnist-test:1.0
imagePullPolicy: IfNotPresent
Worker:
replicas: 2
restartPolicy: Never
template:
metadata:
labels:
applicationId: "tf-job-001"
queue: root.sandbox
annotations:
yunikorn.apache.org/task-group-name: task-group-example
yunikorn.apache.org/task-groups: |-
[{
"name": "task-group-example",
"minMember": 3,
"minResource": {
"cpu": "100m",
"memory": "50M"
},
"nodeSelector": {},
"tolerations": [],
"affinity": {},
"topologySpreadConstraints": []
}]
spec:
schedulerName: yunikorn
containers:
- name: tensorflow
image: kubeflow/tf-dist-mnist-test:1.0
imagePullPolicy: IfNotPresent
fierce-oil-47448
06/26/2024, 4:17 PMswift-scooter-38934
06/26/2024, 10:03 PMfierce-oil-47448
06/26/2024, 10:10 PMswift-scooter-38934
06/26/2024, 10:49 PM<http://yunikorn.apache.org/task-group-name|yunikorn.apache.org/task-group-name>
<http://yunikorn.apache.org/task-groups|yunikorn.apache.org/task-groups>
fierce-oil-47448
06/26/2024, 11:32 PMfreezing-airport-6809
freezing-airport-6809
freezing-airport-6809