Flyte enables production-grade orchestration for machine learning workflows and data processing created to accelerate local workflows to production.

Flyte

Hello, team, I'm presently trying to integrate a Databricks Spark Scala job into an existing flyte workflow. I found a doc which goes into some details of using DB plugin to run analysis over the MovieLens data using python. Is there a way for the DB plugin to launch a Scala Spark job and pass it some command-line arguments?

I think I can make this work with an instance of a SparkJob: <https://github.com/flyteorg/flytekit/blob/cc3a7a9277852dc06bc1b2c041a962be973d8faf/plugins/flytekit-spark/flytekitplugins/spark/models.py#L19>

where:
spark_type: SCALA
application_file: path to JAR file in object storage
main_class: Main:

plus some databricks token and instance.

what I cannot see is a way to inject command-line args so they get passed to the Databricks API. I _think_ I could possibly insert them into the `databricks_conf` attribute of `SparkJob` but I'm not entirely certain how this attribute is used to construct the API call.

Also found this open PR and added some questions there.
<https://github.com/flyteorg/flytekit/pull/767>