Integration
Tools Connectors

Databricks

11min

Setting up databricks for Data Observability

Welcome to our comprehensive guide on integrating our product with Databricks for enhanced data observability. This step-by-step tutorial is designed to facilitate a smooth setup process, ensuring you can leverage the full potential of Databricks in monitoring and managing your data ecosystem. Here's what we'll cover:

  1. Setting Up Your Databricks Connection: Learn how to securely connect to Databricks using an access token. This section will walk you through obtaining your Databricks access token and using it to establish a connection. (Estimated Task Time: 5 minutes)
  2. Configuring Your Databricks Notebook: We'll guide you through the quick and straightforward process of configuring the Databricks notebook to work seamlessly with our product. (Estimated Task Time: 2 minutes)

Prerequisite

Before diving into the integration process, you'll need a Databricks Access Token. This token serves as your key to connecting our product with Databricks. You can obtain an access token by navigating to your Databricks workspace. Click on your user name, select "User Settings", then "Developer", and go to "Access Tokens" where you can manage and generate new tokens. Additionally, Databricks offers API options for token generation, which you can explore in the official databricks documentation.

Configuration





Connect Kensu to Databricks

To install Kensu Spark agent on multiple databricks clusters at once, follow these steps:

  1. Open the Kensu app and navigate to the "Collectors" tab.
  2. Configure a new Databricks connection:
    1. Workspace host is the base part of the URL of your Databricks instance, e.g. https://adb-my-hostname.0.azuredatabricks.net
    2. Databricks Access Token
Creating a Databricks connection
Creating a Databricks connection


3. Select the clusters which you want to be tracked by Kensu.

Selecting the clusters
Selecting the clusters


4. Restart the Databricks clusters you want to observe to enable the configaration.



Note that if you are using Databricks Unity-enabled cluster (also called default catalog), due to security reasons the are more restricted, you will need to add an extra Scala code cell at the end of each of your notebooks which you want to be tracked by Kensu agent.

Databricks cell, at end of notebook




Fine-tuning agent configuration for each notebook

Once you have run a notebook for first time, an entry will appear in Kensu app's "Collectors => Integrations" tab. From there you can manage the Kensu agent configuration for that notebook by clicking on edit icon to:

  • provide agent configuration parameters (mostly same as conf.ini as described in general Kensu Spark agent documentation, except that you do not need to provide the kensu_ingestion_url nor kensu_ingestion_token if configuring from Kensu app)
  • select a Kensu application group and token ,which can be use to manage who has access to see the metadata about this notebook in Kensu
Select a notebook to set configuration for
Select a notebook to set configuration for


Once you set the application group the first time, the earlier ingested entities (without explicit application group) will be hidden, because without application group the accesibility/privacy of information about the notebook is undefined

Saving agent configuration parameters and application group
Saving agent configuration parameters and application group


Configuration parameters

Default values: the following default values will be used if not set explicitly in the configuration in use:

  • process_name = <Notebook name>
  • project_name = <Notebook name>
  • code_location = <Notebook path>
  • user_name = <Notebook user>

The rest of configuration options behave the same as in conf.ini described in general Kensu Spark agent documentation.

Supported Spark & Databricks cluster versions

Spark version

Databricks runtime version

Status

Unity supported?

3.0.1

7.3 LTS

⚠️ partial support

N/A

3.1.2

9.1 LTS

✅ ⚠️ mostly OK

N/A

3.2.1

10.4 LTS

❌ not supported currently

N/A

3.3.0

13.3 LTS

3.3.1

12.0

3.3.1

12.1

3.3.2

12.2 LTS

3.4.0

13.0

3.4.0

13.1

3.4.0

13.2 Beta

3.4.1

1.3.3 LTS

⚠️ seem to work ok, using 3.4.0 jar (3.4.1 jar being prepared for release)

3.5.0

1.4.0

✅ ⚠️ works (needs unreleased spark-agent v1.3.1 jar for spark 3.5.0)

*ML clusters were not tested

More versions may be supported, but either currently have some limitations, or were not yet tested. Contact Kensu customer support if needed.

How does it work internally?

The Kensu Databricks agent works by creating a Databricks global cluster init script, which if the cluster is enabled to be tracked by Kensu:

  • adds a Kensu jar to Spark classpath
  • registers a Kensu Spark listener to be called by Apache Spark



Updated 06 Feb 2024
Doc contributor
Doc contributor
Doc contributor
Doc contributor
Did this page help you?