Hello Does anyone know or hear that databricks plugin has be Flyte #flyte-support

Hello, Does anyone know or hear that databricks_pl...

salmon-refrigerator-32115

11/07/2023, 5:21 PM

Hello, Does anyone know or hear that databricks_plugin has been used by any organizations in their production environment? https://docs.flyte.org/en/latest/deployment/plugins/webapi/databricks.html#databricks-plugin

freezing-airport-6809

11/07/2023, 5:49 PM

@salmon-refrigerator-32115 we are also migrating to using Databricks agents, as they can be run locally and remote. Make it much simpler to update (as written in python) and offer similar power

salmon-refrigerator-32115

11/07/2023, 5:50 PM

@freezing-airport-6809, Thanks for the heads-up!

freezing-airport-6809

11/07/2023, 5:50 PM

cc @damp-lion-88352 (OSS contributor) has helped folks from Expedia also use the agent

freezing-airport-6809

11/07/2023, 5:51 PM

we are working on making it all work locally etc

freezing-airport-6809

11/07/2023, 5:51 PM

a little early, but eventually you should be able to migrate to the agent without many changes and super simplify your testing etc

glamorous-carpet-83516

11/07/2023, 8:59 PM

HBO also uses in it the production

glamorous-carpet-83516

11/07/2023, 9:40 PM

cc @damp-lion-88352 could you share the databricks doc

salmon-refrigerator-32115

11/07/2023, 9:48 PM

Hi @glamorous-carpet-83516, I represent HBO. Unfortunately, Evan Sadler only tried it in Dev before he left. And I don’t know if it’s working in Dev either.

damp-lion-88352

11/08/2023, 2:51 AM

https://flyte--4008.org.readthedocs.build/en/4008/deployment/agents/databricks.html#deployment-agent-setup-databricks

damp-lion-88352

11/08/2023, 2:51 AM

https://docs.flyte.org/en/latest/deployment/plugins/webapi/databricks.html#deployment-plugin-setup-webapi-databricks

damp-lion-88352

11/08/2023, 2:52 AM

One is for agent, one is for plugin

damp-lion-88352

11/08/2023, 2:52 AM

Do you need help to set up the databricks plugin?

salmon-refrigerator-32115

11/08/2023, 2:54 AM

Hi @damp-lion-88352, Yes.

damp-lion-88352

11/08/2023, 2:55 AM

I recommend you use the databricks_plugin Doc now, currently the agent version hasn't been merged.

gratitude thank you 1

damp-lion-88352

11/08/2023, 2:55 AM

https://github.com/flyteorg/flyte/pull/4361

damp-lion-88352

11/08/2023, 2:55 AM

This PR can help you figure out more details about how to setup

damp-lion-88352

11/08/2023, 2:56 AM

If you need more help, please list your problem, Kevin and I will try our best to help you.

salmon-refrigerator-32115

11/08/2023, 8:04 PM

Hi @damp-lion-88352, @glamorous-carpet-83516 Before I decide to migrate to databricks agent or plugin for my production workflow from the open source k8s spark operator implementation, may I confirm a few things from you? 1. is installing the k8s spark operator still needed (I don’t think so but just to double check)? https://docs.flyte.org/en/latest/deployment/plugins/k8s/index.html#install-the-kubernetes-operator 2. Is the following configurations still needed? https://docs.flyte.org/en/latest/deployment/plugins/k8s/index.html#specify-plugin-configuration Specifically, this:

Copy code

cluster_resource_manager:    <- Is this needed?
  enabled: true
  config:
    cluster_resources:
      refreshInterval: 5m
      templatePath: "/etc/flyte/clusterresource/templates"
      customData:
        - production:
            - projectQuotaCpu:
                value: "5"
            - projectQuotaMemory:
                value: "4000Mi"
        - staging:
            - projectQuotaCpu:
                value: "2"
            - projectQuotaMemory:
                value: "3000Mi"
        - development:
            - projectQuotaCpu:
                value: "4"
            - projectQuotaMemory:
                value: "3000Mi"
      refresh: 5m

  # -- Resource templates that should be applied
  templates:
    # -- Template for namespaces resources
    - key: aa_namespace
      value: |
        apiVersion: v1
        kind: Namespace
        metadata:
          name: {{ namespace }}
        spec:
          finalizers:
          - kubernetes

    - key: ab_project_resource_quota
      value: |
        apiVersion: v1
        kind: ResourceQuota
        metadata:
          name: project-quota
          namespace: {{ namespace }}
        spec:
          hard:
            limits.cpu: {{ projectQuotaCpu }}
            limits.memory: {{ projectQuotaMemory }}

    - key: ac_spark_role     <- Is this needed?
      value: |
        apiVersion: <http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>
        kind: Role
        metadata:
          name: spark-role
          namespace: {{ namespace }}
        rules:
        - apiGroups: ["*"]
          resources:
          - pods
          verbs:
          - '*'
        - apiGroups: ["*"]
          resources:
          - services
          verbs:
          - '*'
        - apiGroups: ["*"]
          resources:
          - configmaps
          verbs:
          - '*'

    - key: ad_spark_service_account     <- Is this needed?
      value: |
        apiVersion: v1
        kind: ServiceAccount
        metadata:
          name: spark
          namespace: {{ namespace }}

    - key: ae_spark_role_binding     <- Is this needed?
      value: |
        apiVersion: <http://rbac.authorization.k8s.io/v1beta1|rbac.authorization.k8s.io/v1beta1>
        kind: RoleBinding
        metadata:
          name: spark-role-binding
          namespace: {{ namespace }}
        roleRef:
          apiGroup: <http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>
          kind: Role
          name: spark-role
        subjects:
        - kind: ServiceAccount
          name: spark
          namespace: {{ namespace }}

sparkoperator:   <- Is this needed?
  enabled: true
  plugin_config:
    plugins:
      spark:
        # Edit the Spark configuration as you see fit
        spark-config-default:
          - spark.driver.cores: "1"
          - spark.hadoop.fs.s3a.aws.credentials.provider: "com.amazonaws.auth.DefaultAWSCredentialsProviderChain"
          - spark.kubernetes.allocation.batch.size: "50"
          - spark.hadoop.fs.s3a.acl.default: "BucketOwnerFullControl"
          - spark.hadoop.fs.s3n.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
          - spark.hadoop.fs.AbstractFileSystem.s3n.impl: "org.apache.hadoop.fs.s3a.S3A"
          - spark.hadoop.fs.s3.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
          - spark.hadoop.fs.AbstractFileSystem.s3.impl: "org.apache.hadoop.fs.s3a.S3A"
          - spark.hadoop.fs.s3a.impl: "org.apache.hadoop.fs.s3a.S3AFileSystem"
          - spark.hadoop.fs.AbstractFileSystem.s3a.impl: "org.apache.hadoop.fs.s3a.S3A"
          - spark.network.timeout: 600s
          - spark.executorEnv.KUBERNETES_REQUEST_TIMEOUT: 100000
          - spark.executor.heartbeatInterval: 60s

Also what is the equivalent databricks plugin configuration like (in the helm chart)? Could you point me to an example? 3) How is the databricks spark job logging integrated with Flyte UI? 4) When running a flyte spark task on the server, is the --service-account spark option still required?

salmon-refrigerator-32115

12/11/2023, 7:13 PM

Hi @damp-lion-88352, @glamorous-carpet-83516, Is the agent version of databricks available now?

glamorous-carpet-83516

12/11/2023, 7:14 PM

yes, we just merge databricks agent pr.

glamorous-carpet-83516

12/11/2023, 7:14 PM

flytekit also support submitting a databricks job in the local execution

glamorous-carpet-83516

12/11/2023, 7:14 PM

so you can easily test it in the local execution

salmon-refrigerator-32115

12/11/2023, 7:15 PM

Great news. @glamorous-carpet-83516, which one should I use agent or plugin?

glamorous-carpet-83516

12/11/2023, 7:17 PM

Agent. we will deprecate the backend plugins. agents are well maintained right now.

glamorous-carpet-83516

12/11/2023, 7:17 PM

check out the example in the PR description. https://github.com/flyteorg/flytekit/pull/1951

salmon-refrigerator-32115

12/11/2023, 7:20 PM

Nice. @glamorous-carpet-83516, could you point me to the doc to install databricks through agent in the k8s backend?

glamorous-carpet-83516

12/11/2023, 7:21 PM

yes, are you deploying flyte on EKS?

salmon-refrigerator-32115

12/11/2023, 7:21 PM

yes

salmon-refrigerator-32115

12/11/2023, 7:21 PM

through helm charts

glamorous-carpet-83516

12/11/2023, 7:22 PM

okok

glamorous-carpet-83516

12/11/2023, 7:49 PM

1. Enable agent here 2. Update plugin config. checkout here 3. Update the agent secret

salmon-refrigerator-32115

12/11/2023, 7:50 PM

Thanks

glamorous-carpet-83516

12/11/2023, 7:53 PM

Here is the Dockerfile for the agent. https://github.com/flyteorg/flytekit/blob/master/Dockerfile.agent

6 Views

Open in Slack

Previous Next