Tao He
08/02/2022, 3:51 AMhorovod.spark.run
to execute the distributed training function" means that flyte will launch num_proc
spark workers on num_proc
flyte worker? Or it just launches them on one spark worker?Ketan (kumare3)
Flyte workers
today. Flyte launches ephemeral Spark clusters using Spark operator or Spark for K8s.
In the horovod case, it simply uses the horovod with spark integration, just make sure the configuration is correctTao He
08/02/2022, 5:57 AMFlyte launches ephemeral Spark clusters using Spark operator or Spark for K8s.I see. Thanks!
Ketan (kumare3)
Tao He
08/04/2022, 6:46 AMFlyte tries to prevent starting the spark cluster itself, if the task is cachedNow I can understand that the spark job (launching the spark cluster using spark-operator and submit a job to it) is "a single task" in a flyte workflow.
Ketan (kumare3)
Tao He
08/05/2022, 1:48 AMKetan (kumare3)
Tao He
08/05/2022, 2:34 AMKetan (kumare3)
Tao He
08/05/2022, 4:05 AMKetan (kumare3)