purple-father-70173
02/18/2025, 7:49 PM@ray.remote
def f():
val = os.uname().nodename
print(val)
return val
@task(
container_image=custom_image,
task_config=RayJobConfig(
head_node_config=HeadNodeConfig(),
worker_node_config=[
WorkerNodeConfig(
group_name="ray-group", replicas=2
)
],
),
requests=Resources(cpu="12", mem="64Gi", gpu="1"),
)
def ray_task() -> typing.List[str]:
futures = [f.remote() for _ in range(100)]
2025-02-18 19:09:07,996 INFO worker.py:1832 -- Started a local Ray instance. View the dashboard at <http://127.0.0.1:8265>
2025-02-18 19:09:08,000 INFO packaging.py:574 -- Creating a file package for local module '/root'.
2025-02-18 19:09:08,001 INFO packaging.py:366 -- Pushing file package '<gcs://_ray_pkg_4a190130c7bd83a1.zip>' (0.00MiB) to Ray cluster...
2025-02-18 19:09:08,002 INFO packaging.py:379 -- Successfully pushed file package '<gcs://_ray_pkg_4a190130c7bd83a1.zip>'.
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1181) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1184) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1184) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1184) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1184) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1184) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1184) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1184) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1184) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1184) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1184) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1184) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1184) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1184) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1184) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1184) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1184) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1184) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1184) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1180) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1180) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1180) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1180) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1180) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1180) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1180) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1180) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1180) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1180) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1180) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1180) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1180) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1180) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1180) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1180) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1186) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1186) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1186) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1186) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1186) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1186) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1186) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1186) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1186) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1186) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1186) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1186) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1186) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1186) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1186) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1186) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1178) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1185) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1187) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1179) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1182) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1182) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1182) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1182) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1182) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1182) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1182) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1182) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1182) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1182) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
(f pid=1183) abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
When I try to add the RAY_ADDRESS
to the submitter it says that I don't have the node config to do that and if I stop the submitter's ray server and then connect to the existing cluster it gets killed by a probepurple-father-70173
02/18/2025, 8:11 PMray start --address=...
, but my assumption based on the provided examples was that this isn't necessary.clean-glass-36808
02/18/2025, 8:53 PMclean-glass-36808
02/18/2025, 8:56 PMpurple-father-70173
02/18/2025, 8:58 PMfreezing-airport-6809
flytekitplugins-ray
in the image?freezing-airport-6809
ray
backend plugin in propeller configclean-glass-36808
02/18/2025, 11:10 PMpurple-father-70173
02/18/2025, 11:18 PMpurple-father-70173
02/18/2025, 11:19 PMclean-glass-36808
02/18/2025, 11:25 PMpurple-father-70173
02/18/2025, 11:45 PMapiVersion: v1
kind: Pod
metadata:
annotations:
<http://operator.1password.io/status|operator.1password.io/status>: injected
creationTimestamp: "2025-02-18T19:08:56Z"
generateName: abf8s2dxg8jrdfcs8bd8-raydevraytask-0-
labels:
<http://batch.kubernetes.io/controller-uid|batch.kubernetes.io/controller-uid>: 7bd62b58-f6ca-4d8a-9c77-02b7ac8d2708
<http://batch.kubernetes.io/job-name|batch.kubernetes.io/job-name>: abf8s2dxg8jrdfcs8bd8-raydevraytask-0
controller-uid: 7bd62b58-f6ca-4d8a-9c77-02b7ac8d2708
domain: development
execution-id: abf8s2dxg8jrdfcs8bd8
flyte-pod: "true"
interruptible: "false"
job-name: abf8s2dxg8jrdfcs8bd8-raydevraytask-0
node-id: raydevraytask
project: flytesnacks
shard-key: "8"
task-name: ray-dev-ray-task
workflow-name: flytegen-ray-dev-ray-task
name: abf8s2dxg8jrdfcs8bd8-raydevraytask-0-hmqgh
namespace: fl97
ownerReferences:
- apiVersion: batch/v1
blockOwnerDeletion: true
controller: true
kind: Job
name: abf8s2dxg8jrdfcs8bd8-raydevraytask-0
uid: 7bd62b58-f6ca-4d8a-9c77-02b7ac8d2708
resourceVersion: "144746380"
uid: a3c2a5f2-1426-412d-9ce8-67cc6f733ab7
spec:
affinity: {}
containers:
- args:
- pyflyte-fast-execute
- --additional-distribution
- <s3://flyte/flytesnacks/development/VYXT4JISOSHJE4TPC7HGG63MYE======/faste62fbecc1ad336d6b69b63d0ddb08673.tar.gz>
- --dest-dir
- .
- --
- pyflyte-execute
- --inputs
- <s3://flyte/metadata/propeller/flytesnacks-development-abf8s2dxg8jrdfcs8bd8/raydevraytask/data/inputs.pb>
- --output-prefix
- <s3://flyte/metadata/propeller/flytesnacks-development-abf8s2dxg8jrdfcs8bd8/raydevraytask/data/0>
- --raw-output-data-prefix
- <s3://flyte/data/2e/abf8s2dxg8jrdfcs8bd8-raydevraytask-0>
- --checkpoint-path
- <s3://flyte/data/2e/abf8s2dxg8jrdfcs8bd8-raydevraytask-0/_flytecheckpoints>
- --prev-checkpoint
- '""'
- --resolver
- flytekit.core.python_auto_container.default_task_resolver
- --
- task-module
- ray_dev
- task-name
- ray_task
command:
- /op/bin/op
- run
- --
- /op/bin/op
- run
- --
env:
- name: OP_SERVICE_ACCOUNT_TOKEN
valueFrom:
secretKeyRef:
key: token
name: op-service-account
- name: FLYTE_INTERNAL_EXECUTION_WORKFLOW
value: flytesnacks:development:.flytegen.ray_dev.ray_task
- name: FLYTE_INTERNAL_EXECUTION_ID
value: abf8s2dxg8jrdfcs8bd8
- name: FLYTE_INTERNAL_EXECUTION_PROJECT
value: flytesnacks
- name: FLYTE_INTERNAL_EXECUTION_DOMAIN
value: development
- name: FLYTE_ATTEMPT_NUMBER
value: "0"
- name: FLYTE_INTERNAL_TASK_PROJECT
value: flytesnacks
- name: FLYTE_INTERNAL_TASK_DOMAIN
value: development
- name: FLYTE_INTERNAL_TASK_NAME
value: ray_dev.ray_task
- name: FLYTE_INTERNAL_TASK_VERSION
value: ieQ9YgRouHsQLaDuyamylw
- name: FLYTE_INTERNAL_PROJECT
value: flytesnacks
- name: FLYTE_INTERNAL_DOMAIN
value: development
- name: FLYTE_INTERNAL_NAME
value: ray_dev.ray_task
- name: FLYTE_INTERNAL_VERSION
value: ieQ9YgRouHsQLaDuyamylw
- name: _F_L_MIN_SIZE_MB
value: "10"
- name: _F_L_MAX_SIZE_MB
value: "1000"
- name: FLYTE_AWS_ENDPOINT
value: <http://10.141.3.3:9000>
- name: FLYTE_AWS_ACCESS_KEY_ID
value: minio
- name: FLYTE_AWS_SECRET_ACCESS_KEY
value: miniostorage
- name: PYTHONUNBUFFERED
value: "1"
- name: RAY_DASHBOARD_ADDRESS
value: dfcs8bd8-raydevraytask-0-raycluster-dg9jg-head-svc.fl97.svc.cluster.local:8265
- name: RAY_JOB_SUBMISSION_ID
value: abf8s2dxg8jrdfcs8bd8-raydevraytask-0-tkdrl
- name: OP_INTEGRATION_NAME
value: 1Password Kubernetes Webhook
- name: OP_INTEGRATION_ID
value: K8W
- name: OP_INTEGRATION_BUILDNUMBER
value: "1000101"
image: <http://579102688835.dkr.ecr.us-east-1.amazonaws.com/ssi:IaMZmB8OWMWoYdNNyJ0L0w|579102688835.dkr.ecr.us-east-1.amazonaws.com/ssi:IaMZmB8OWMWoYdNNyJ0L0w>
imagePullPolicy: Always
name: abf8s2dxg8jrdfcs8bd8-raydevraytask-0
ports:
- containerPort: 8080
name: http
protocol: TCP
resources:
limits:
cpu: "12"
memory: 64Gi
<http://nvidia.com/gpu|nvidia.com/gpu>: "1"
requests:
cpu: "12"
memory: 64Gi
<http://nvidia.com/gpu|nvidia.com/gpu>: "1"
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /shared
name: utility-volume
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-bh9db
readOnly: true
- mountPath: /op/bin/
name: op-bin
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
imagePullSecrets:
- name: ecr-ssi
initContainers:
- command:
- sh
- -c
- cp /usr/local/bin/op /op/bin/
image: 1password/op:2
imagePullPolicy: IfNotPresent
name: copy-op-bin
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /op/bin/
name: op-bin
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-bh9db
readOnly: true
nodeName: fl97-hgx-04
nodeSelector:
<http://nvidia.com/gpu.present|nvidia.com/gpu.present>: "true"
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
serviceAccount: dev
serviceAccountName: dev
terminationGracePeriodSeconds: 30
tolerations:
- key: <http://nvidia.com/gpu|nvidia.com/gpu>
value: "true"
- effect: NoSchedule
key: <http://nvidia.com/gpu|nvidia.com/gpu>
operator: Equal
value: "true"
- effect: NoExecute
key: <http://node.kubernetes.io/not-ready|node.kubernetes.io/not-ready>
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: <http://node.kubernetes.io/unreachable|node.kubernetes.io/unreachable>
operator: Exists
tolerationSeconds: 300
volumes:
- name: utility-volume
persistentVolumeClaim:
claimName: pvc-utility-exascaler
- name: kube-api-access-bh9db
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
- emptyDir:
medium: Memory
name: op-bin
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2025-02-18T19:09:15Z"
status: "False"
type: PodReadyToStartContainers
- lastProbeTime: null
lastTransitionTime: "2025-02-18T19:08:59Z"
reason: PodCompleted
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2025-02-18T19:09:13Z"
reason: PodCompleted
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2025-02-18T19:09:13Z"
reason: PodCompleted
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2025-02-18T19:08:56Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: <containerd://dc54b94390650993c552f0f06bb5353e35cec9349e88c88319de520313fd9bf>9
image: <http://579102688835.dkr.ecr.us-east-1.amazonaws.com/ssi:IaMZmB8OWMWoYdNNyJ0L0w|579102688835.dkr.ecr.us-east-1.amazonaws.com/ssi:IaMZmB8OWMWoYdNNyJ0L0w>
imageID: <http://579102688835.dkr.ecr.us-east-1.amazonaws.com/ssi@sha256:b764f67ebdef3a6a972f7fb5f6657c7cad073ef06ea0de300c2a880b5857c891|579102688835.dkr.ecr.us-east-1.amazonaws.com/ssi@sha256:b764f67ebdef3a6a972f7fb5f6657c7cad073ef06ea0de300c2a880b5857c891>
lastState: {}
name: abf8s2dxg8jrdfcs8bd8-raydevraytask-0
ready: false
restartCount: 0
started: false
state:
terminated:
containerID: <containerd://dc54b94390650993c552f0f06bb5353e35cec9349e88c88319de520313fd9bf>9
exitCode: 0
finishedAt: "2025-02-18T19:09:13Z"
reason: Completed
startedAt: "2025-02-18T19:09:00Z"
hostIP: 10.141.1.4
hostIPs:
- ip: 10.141.1.4
initContainerStatuses:
- containerID: <containerd://737cca74705887e5efb4e0ca9e03acf029244b51dceb821c511bcb342b3f2e3>5
image: <http://docker.io/1password/op:2|docker.io/1password/op:2>
imageID: <http://docker.io/1password/op@sha256:e7b4dcc8df09659096cc7b7dfbeb6119eb49c2f01a5083d4c477ac5f9a23413d|docker.io/1password/op@sha256:e7b4dcc8df09659096cc7b7dfbeb6119eb49c2f01a5083d4c477ac5f9a23413d>
lastState: {}
name: copy-op-bin
ready: true
restartCount: 0
started: false
state:
terminated:
containerID: <containerd://737cca74705887e5efb4e0ca9e03acf029244b51dceb821c511bcb342b3f2e3>5
exitCode: 0
finishedAt: "2025-02-18T19:08:59Z"
reason: Completed
startedAt: "2025-02-18T19:08:59Z"
phase: Succeeded
podIP: 10.42.4.164
podIPs:
- ip: 10.42.4.164
qosClass: Burstable
startTime: "2025-02-18T19:08:58Z"
purple-father-70173
02/18/2025, 11:46 PMclean-glass-36808
02/18/2025, 11:50 PM/op/bin/op
purple-father-70173
02/18/2025, 11:50 PMconfiguration:
..
inline:
...
tasks:
task-plugins:
enabled-plugins:
- container
- sidecar
- k8s-array
- pytorch
- ray
default-for-task-types:
- container: container
- container_array: k8s-array
- pytorch: pytorch
- ray: ray
plugins:
...
ray:
ttlSecondsAfterFinished: 3600
rbac:
extraRules:
- apiGroups:
- <http://kubeflow.org|kubeflow.org>
resources:
- pytorchjobs
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- "<http://ray.io|ray.io>"
resources:
- rayjobs
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
clean-glass-36808
02/18/2025, 11:50 PMray job submit
purple-father-70173
02/18/2025, 11:50 PM1password secrets injectorIt's this, I can also share my k8s pod template
clean-glass-36808
02/18/2025, 11:51 PMpurple-father-70173
02/18/2025, 11:51 PMapiVersion: v1
kind: PodTemplate
metadata:
name: default
namespace: fl97
template:
metadata:
labels:
flyte-pod: "true"
spec:
imagePullSecrets:
- name: ecr-ssi
nodeSelector:
<http://nvidia.com/gpu.present|nvidia.com/gpu.present>: "true"
serviceAccountName: dev
containers:
- name: default
image: <http://ghcr.io/flyteorg/flytekit:flyteinteractive-latest|ghcr.io/flyteorg/flytekit:flyteinteractive-latest>
command: ["/op/bin/op", "run", "--"]
env:
- name: OP_SERVICE_ACCOUNT_TOKEN
valueFrom:
secretKeyRef:
name: op-service-account
key: token
ports:
- name: http
containerPort: 8080
protocol: TCP
imagePullPolicy: Always
volumeMounts:
- name: utility-volume
mountPath: /shared
volumes:
- name: utility-volume
persistentVolumeClaim:
claimName: pvc-utility-exascaler
tolerations:
- key: <http://nvidia.com/gpu|nvidia.com/gpu>
value: "true"
It sounds like we have some clash with the commandpurple-father-70173
02/18/2025, 11:52 PMclean-glass-36808
02/18/2025, 11:52 PMclean-glass-36808
02/18/2025, 11:54 PMExternalSecret
. Not sure about with ray tasks, I'd ahve to lookclean-glass-36808
02/18/2025, 11:54 PMpurple-father-70173
02/18/2025, 11:55 PMcommand
in my PodTemplate
because it overrides specifically RayJobConfig
?purple-father-70173
02/18/2025, 11:56 PMPyTorch
or any kind of Python
task typeclean-glass-36808
02/18/2025, 11:58 PMPyTorch
plugin works maybe @freezing-airport-6809 works. I think for python
tasks there is no command
set so it probably just works without conflictpurple-father-70173
02/19/2025, 12:00 AMray job submit
clean-glass-36808
02/19/2025, 12:02 AMray job submit
clean-glass-36808
02/19/2025, 12:03 AMcommand:
- ray
- job
- submit
- '--address'
- <http://a4bfkvblbc4ljtmzp6ww-n0-0-head-svc.flytesnacks-production.svc.cluster.local:8265>
- '--runtime-env-json'
- '{"env_vars":{"PATH":"/run/flyte/bin:${PATH}"},"pip":["numpy","pandas"]}'
- '--'
- pyflyte-execute
- '--inputs'
...
- '--prev-checkpoint'
- ''
- '--resolver'
- site-packages.flytekit.core.python_auto_container.default_task_resolver
- '--'
- task-module
- ray_example
- task-name
- ray_task
purple-father-70173
02/19/2025, 12:04 AMpurple-father-70173
02/19/2025, 12:05 AMclean-glass-36808
02/19/2025, 12:05 AMpyflyte-execute
stuff is also defined as args for me. I don't think the Ray plugin is used in a production sense much sense theres probably lots of cruftclean-glass-36808
02/19/2025, 12:07 AMpurple-father-70173
02/19/2025, 12:08 AMpurple-father-70173
02/19/2025, 12:08 AMfreezing-airport-6809
Well all theWDYM by this? We do use it in production at scalestuff is also defined as args for me. I don't think the Ray plugin is used in a production sense much sense theres probably lots of cruftpyflyte-execute
freezing-airport-6809
clean-glass-36808
02/19/2025, 12:27 AMpurple-father-70173
02/19/2025, 12:28 AMcommand
from my PodTemplate
. I have it working now.Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.
Powered by