Hi We are using the databricks built in plugin in our shop a Flyte #flyte-support

Hi, We are using the databricks built-in plugin in...

gentle-state-35322

12/20/2023, 1:54 PM

Hi, We are using the databricks built-in plugin in our shop and there seems to be issue with applications_path parameter passed in the databricks workflow config. We are trying to override the entrypoint file location using applications_path but the overridden value is not being picked up and always defaults to the value in the plugin config. Looks like we can no longer override the entrypoint via config from the workflow. It is currently set to whatever is set in the configuration (server-side)

freezing-airport-6809

12/20/2023, 3:00 PM

Would need more info. This is not using agent yet I assume

gentle-state-35322

12/20/2023, 3:04 PM

No this is not using the agent

gentle-state-35322

12/20/2023, 3:04 PM

this is using the databricks built-in plugin

glamorous-carpet-83516

12/20/2023, 6:12 PM

you still can override the application path. here is an example:

Copy code

@task(
    task_config=Spark(
        # this configuration is applied to the spark cluster
        spark_conf={
            "spark.driver.memory": "1000M",
            "spark.executor.memory": "1000M",
            "spark.executor.cores": "1",
            "spark.executor.instances": "2",
            "spark.driver.cores": "1",
        },
        executor_path="/usr/bin/python3",
        applications_path="local:///usr/local/bin/entrypoint.py",
    ),
    limits=Resources(mem="2000M"),
    cache_version="1",
    container_image=spark_image,
)
def hello_spark(partitions: int) -> float:

gentle-state-35322

12/20/2023, 6:44 PM

@glamorous-carpet-83516 we are trying to do the override on the databricks workflow config because we want to run our tasks on our Databricks instances

Copy code

@task(
    task_config=Databricks(
        # this configuration is applied to the spark cluster
        spark_conf={
            "spark.driver.memory": "1000M",
            "spark.executor.memory": "1000M",
            "spark.executor.cores": "1",
            "spark.executor.instances": "2",
            "spark.driver.cores": "1",
        },
        databricks_conf={
          ...
        },
        applications_path="dbfs:///Filestore/tables/entrypoint.py",
    ),
    limits=Resources(mem="2000M"),
    cache_version="1"
)
def hello_spark(partitions: int) -> float:

gentle-state-35322

12/20/2023, 6:46 PM

The entrypoint override through applications_path is not being picked up and is always using the location specified in the plugin config (values.yaml)

gentle-state-35322

12/20/2023, 6:49 PM

We have done it multiple times and trust me the applications_path provided in the databricks workflow config is simply being ignored

glamorous-carpet-83516

12/20/2023, 7:10 PM

for the backend plugin. instead of setting the applications_path in the databricks config. you need to set the path in the propeller config.

Copy code

plugins:
  databricks:
    entrypointFile: dbfs:///FileStore/tables/entrypoint.py
    databricksInstance: <DATABRICKS_ACCOUNT>.<http://cloud.databricks.com|cloud.databricks.com>
  k8s:
    default-env-vars:
      - FLYTE_AWS_ACCESS_KEY_ID: <AWS_ACCESS_KEY_ID>
      - FLYTE_AWS_SECRET_ACCESS_KEY: <AWS_SECRET_ACCESS_KEY>
      - AWS_DEFAULT_REGION: <AWS_REGION>

gentle-state-35322

12/20/2023, 7:32 PM

Yeah @glamorous-carpet-83516 understand that but there is an issue using the recommended entrypoint script between two different databricks runtimes. It looks like the entrypoint script provided in the flyte documentation works with the latest databricks runtimes (version 11 and above) but it throws an on older runtimes like 10.4 so for 10.4 we had to switch using the older version of your entrypoint script. So, for us to switch between different versions of entrypoint script we wanted to use the applications_path parameter so we can override the entrypoint script path

gentle-state-35322

12/20/2023, 7:33 PM

I haven't look too deep into the entrypoint script so not sure why the entrypoint script is working differently between different databricks run times

glamorous-carpet-83516

12/20/2023, 7:38 PM

This is the entrypoint file. https://github.com/flyteorg/flytetools/commit/aff8a9f2adbf5deda81d36d59a0b8fa3b1fc3679 it allows flyte to use different command (pyflyte-execute) to run a spark job on Databricks platform.

glamorous-carpet-83516

12/20/2023, 7:38 PM

What’s the error?

gentle-state-35322

12/20/2023, 7:55 PM

yeah I think this is the entrypoint file we are using now which works fine with databricks run time version 12.2 but with databricks run time version 10.4 with this entrypoint script we were getting weird module import error. There were errors importing modules that were part of the application source code and switching to the older entrypoint script (like 11 months ago) worked with 10.4

gentle-state-35322

12/20/2023, 7:56 PM

I am not able to share the exact the error message here because its been quite some time but either way is it possible to enable the applications_path override parameter through databricks config which could be beneficial in the future for what its worth.

glamorous-carpet-83516

12/20/2023, 7:58 PM

you are using pyflyte run?

gentle-state-35322

12/20/2023, 8:02 PM

we are doing pyflyte package + flytectl register

gentle-state-35322

12/20/2023, 8:02 PM

we are also doing --fast packaging

glamorous-carpet-83516

12/20/2023, 8:11 PM

There are some issues of using fast register with spark task

glamorous-carpet-83516

12/20/2023, 8:12 PM

have you tried to use non-fast register spark task

gentle-state-35322

12/20/2023, 8:14 PM

We can try that

gentle-state-35322

12/20/2023, 8:14 PM

for fast packaging we are providing the destination dir (/databricks/driver) so flyte knows where to inject the source code at run time but I don't think that is needed for non-fast and we just have to make sure the application source code is copied to the /databricks/driver directory path in the image. Right?

tall-lock-23197

12/21/2023, 8:47 AM

yes, that should work.

6 Views

Open in Slack

Previous Next