Fredrik Lyford
11/30/2022, 12:12 PMGeoff Salmon
11/30/2022, 2:16 PMspark.driver.extraClassPath
and spark.executor.extraClassPath
in the task's spark conf), and then call into your scala spark driver code from the pyspark python code? Something like
@task(
task_config=Spark(
# this configuration is applied to the spark cluster
spark_conf={
"spark.driver.extraClassPath": ...,
"spark.executor.extraClassPath": ...,
}
),
)
def spark_task() -> float:
sess = flytekit.current_context().spark_session
return sess.sparkContext._jvm.com.my.scala.package.ScalaDriver.go()
Geoff Salmon
11/30/2022, 2:18 PMsess.sparkContext._<http://jsc.sc|jsc.sc>()
returns the Java SparkContext
object which you could pass to the scala side too. There's probably thread-local or static references within spark that you could use to get the spark context on the scala side too.Geoff Salmon
11/30/2022, 2:22 PM<http://com.my|com.my>.scala.package.ScalaDriver
in sess.sparkContext._<http://jvm.com.my|jvm.com.my>.scala.package.ScalaDriver.go()
is a class name in your scala code and go
is a static method. I'm using the java names for things here because I'm not super familiar with scala.Ketan (kumare3)
Ketan (kumare3)
Ketan (kumare3)
Ketan (kumare3)
Fredrik Lyford
12/01/2022, 7:49 AMFredrik Lyford
12/01/2022, 9:41 AMGeoff Salmon
12/01/2022, 7:40 PMFredrik Lyford
12/02/2022, 10:08 AM