rich-monitor-45380
08/02/2022, 3:51 AMhorovod.spark.run
to execute the distributed training function" means that flyte will launch num_proc
spark workers on num_proc
flyte worker? Or it just launches them on one spark worker?freezing-airport-6809
Flyte workers
today. Flyte launches ephemeral Spark clusters using Spark operator or Spark for K8s.
In the horovod case, it simply uses the horovod with spark integration, just make sure the configuration is correctrich-monitor-45380
08/02/2022, 5:57 AMFlyte launches ephemeral Spark clusters using Spark operator or Spark for K8s.I see. Thanks!
rich-monitor-45380
08/02/2022, 6:12 AMfreezing-airport-6809
rich-monitor-45380
08/04/2022, 6:46 AMFlyte tries to prevent starting the spark cluster itself, if the task is cachedNow I can understand that the spark job (launching the spark cluster using spark-operator and submit a job to it) is "a single task" in a flyte workflow.
rich-monitor-45380
08/04/2022, 6:51 AMfreezing-airport-6809
rich-monitor-45380
08/05/2022, 1:48 AMfreezing-airport-6809
rich-monitor-45380
08/05/2022, 2:34 AMfreezing-airport-6809
rich-monitor-45380
08/05/2022, 4:05 AMfreezing-airport-6809
freezing-airport-6809