Tao He
horovod.spark.run
to execute the distributed training function" means that flyte will launch num_proc
spark workers on num_proc
flyte worker? Or it just launches them on one spark worker?Ketan (kumare3)
Flyte workers
today. Flyte launches ephemeral Spark clusters using Spark operator or Spark for K8s.
In the horovod case, it simply uses the horovod with spark integration, just make sure the configuration is correctTao He
Flyte launches ephemeral Spark clusters using Spark operator or Spark for K8s. I see. Thanks!
Ketan (kumare3)
Tao He
Flyte tries to prevent starting the spark cluster itself, if the task is cached Now I can understand that the spark job (launching the spark cluster using spark-operator and submit a job to it) is "a single task" in a flyte workflow.
Ketan (kumare3)
Tao He
Ketan (kumare3)
Tao He
Ketan (kumare3)
Tao He
Ketan (kumare3)