Databricks
Welcome to our comprehensive guide on integrating our product with Databricks for enhanced data observability. This step-by-step tutorial is designed to facilitate a smooth setup process, ensuring you can leverage the full potential of Databricks in monitoring and managing your data ecosystem. Here's what we'll cover:
- Setting Up Your Databricks Connection: Learn how to securely connect to Databricks using an access token. This section will walk you through obtaining your Databricks access token and using it to establish a connection. (Estimated Task Time: 5 minutes)
- Configuring Your Databricks Notebook: We'll guide you through the quick and straightforward process of configuring the Databricks notebook to work seamlessly with our product. (Estimated Task Time: 2 minutes)
Before diving into the integration process, you'll need a Databricks Access Token. This token serves as your key to connecting our product with Databricks. You can obtain an access token by navigating to your Databricks workspace. Click on your user name, select "User Settings", then "Developer", and go to "Access Tokens" where you can manage and generate new tokens. Additionally, Databricks offers API options for token generation, which you can explore in the official databricks documentation.
To install Kensu Spark agent on multiple databricks clusters at once, follow these steps:
- Open the Kensu app and navigate to the "Collectors" tab.
- Configure a new Databricks connection:
- Workspace host is the base part of the URL of your Databricks instance, e.g. https://adb-my-hostname.0.azuredatabricks.net
- Databricks Access Token
3. Select the clusters which you want to be tracked by Kensu.
4. Restart the Databricks clusters you want to observe to enable the configaration.
Note that if you are using Databricks Unity-enabled cluster (also called default catalog), due to security reasons the are more restricted, you will need to add an extra Scala code cell at the end of each of your notebooks which you want to be tracked by Kensu agent.
Once you have run a notebook for first time, an entry will appear in Kensu app's "Collectors => Integrations" tab. From there you can manage the Kensu agent configuration for that notebook by clicking on edit icon to:
- provide agent configuration parameters (mostly same as conf.ini as described in general Kensu Spark agent documentation, except that you do not need to provide the kensu_ingestion_url nor kensu_ingestion_token if configuring from Kensu app)
- select a Kensu application group and token ,which can be use to manage who has access to see the metadata about this notebook in Kensu
Once you set the application group the first time, the earlier ingested entities (without explicit application group) will be hidden, because without application group the accesibility/privacy of information about the notebook is undefined
Default values: the following default values will be used if not set explicitly in the configuration in use:
- process_name = <Notebook name>
- project_name = <Notebook name>
- code_location = <Notebook path>
- user_name = <Notebook user>
The rest of configuration options behave the same as in conf.ini described in general Kensu Spark agent documentation.
Spark version | Databricks runtime version | Status | Unity supported? |
---|---|---|---|
3.0.1 | 7.3 LTS | ⚠️ partial support | N/A |
3.1.2 | 9.1 LTS | ✅ ⚠️ mostly OK | N/A |
3.2.1 | 10.4 LTS | ❌ not supported currently | N/A |
3.3.0 | 13.3 LTS | ✅ | ✅ |
3.3.1 | 12.0 | ✅ | ✅ |
3.3.1 | 12.1 | ✅ | ✅ |
3.3.2 | 12.2 LTS | ✅ | ✅ |
3.4.0 | 13.0 | ✅ | ✅ |
3.4.0 | 13.1 | ✅ | ✅ |
3.4.0 | 13.2 Beta | ✅ | ✅ |
3.4.1 | 1.3.3 LTS | ✅ ⚠️ seem to work ok, using 3.4.0 jar (3.4.1 jar being prepared for release) | ✅ |
3.5.0 | 1.4.0 | ✅ ⚠️ works (needs unreleased spark-agent v1.3.1 jar for spark 3.5.0) | ✅ |
*ML clusters were not tested
More versions may be supported, but either currently have some limitations, or were not yet tested. Contact Kensu customer support if needed.
The Kensu Databricks agent works by creating a Databricks global cluster init script, which if the cluster is enabled to be tracked by Kensu:
- adds a Kensu jar to Spark classpath
- registers a Kensu Spark listener to be called by Apache Spark