Rob Ulbrich
08/31/2023, 12:01 PMKetan (kumare3)
Fabio Grätz
08/31/2023, 2:16 PMKetan (kumare3)
Rob Ulbrich
08/31/2023, 2:30 PMKetan (kumare3)
Fabio Grätz
08/31/2023, 2:36 PMRob Ulbrich
08/31/2023, 2:37 PMpyflyte run --remote -p workflow -d dev --service-account default mnist.py horovod_training_wf
Fabio Grätz
08/31/2023, 2:38 PMRob Ulbrich
08/31/2023, 2:38 PM--service-account
flag. The service account to use is default
.Fabio Grätz
08/31/2023, 2:38 PMRob Ulbrich
08/31/2023, 2:39 PMFabio Grätz
08/31/2023, 2:39 PMKetan (kumare3)
Fabio Grätz
08/31/2023, 2:41 PMSparkApplication
?Rob Ulbrich
08/31/2023, 2:42 PMpyflyte
Fabio Grätz
08/31/2023, 2:43 PMkubectl -n <namespace name> get serviceaccount default -o yaml
Rob Ulbrich
08/31/2023, 2:45 PMapiVersion: v1
kind: ServiceAccount
metadata:
annotations:
<http://iam.gke.io/gcp-service-account|iam.gke.io/gcp-service-account>: flyte-workflow-dev-sa@<redacted>
creationTimestamp: "2023-08-03T07:15:09Z"
name: default
namespace: dev
resourceVersion: "676089"
uid: dd7f67ce-2530-4c59-883d-2e08bebeebbb
Fabio Grätz
08/31/2023, 2:45 PMRob Ulbrich
08/31/2023, 2:45 PMdefault
service acccont is the one that we want to useFabio Grätz
08/31/2023, 2:45 PMRob Ulbrich
08/31/2023, 2:45 PMFabio Grätz
08/31/2023, 2:45 PMRob Ulbrich
08/31/2023, 2:46 PMFabio Grätz
08/31/2023, 2:46 PMKetan (kumare3)
Fabio Grätz
08/31/2023, 2:47 PMKetan (kumare3)
Rob Ulbrich
08/31/2023, 2:50 PM@task(
container_image="europe-west4-docker.pkg.dev/<redacted>/flyte/mpi-mnist:latest",
task_config=MPIJob(
launcher=Launcher(
replicas=1,
),
worker=Worker(
replicas=2,
),
),
retries=3,
requests=Resources(cpu="1", mem="1000Mi"),
limits=Resources(cpu="2", mem="4000Mi")
)
Ketan (kumare3)
Fabio Grätz
08/31/2023, 3:02 PM@Dimss consider trying the v2 controller, which doesn’t depend on ServiceAccounts 🙂
Rob Ulbrich
08/31/2023, 3:08 PMFabio Grätz
08/31/2023, 3:09 PMRob Ulbrich
08/31/2023, 3:10 PM<https://github.com/kubeflow/training-operator>
Fabio Grätz
08/31/2023, 3:13 PMKetan (kumare3)
Rahul Mehta
08/31/2023, 4:11 PMYubo Wang
08/31/2023, 4:25 PMFabio Grätz
08/31/2023, 5:28 PMYubo Wang
08/31/2023, 5:38 PMRob Ulbrich
09/01/2023, 2:09 PMkubectl exec
. It then connects to the sidecar container instead to the main container where the worker is locatedKetan (kumare3)
Rob Ulbrich
09/04/2023, 7:56 AMKetan (kumare3)
Rob Ulbrich
09/05/2023, 7:38 AM