Databricks

11 min

setting up databricks for data observability welcome to our comprehensive guide on integrating our product with databricks for enhanced data observability this step by step tutorial is designed to facilitate a smooth setup process, ensuring you can leverage the full potential of databricks in monitoring and managing your data ecosystem here's what we'll cover setting up your databricks connection learn how to securely connect to databricks using an access token this section will walk you through obtaining your databricks access token and using it to establish a connection (estimated task time 5 minutes) configuring your databricks notebook we'll guide you through the quick and straightforward process of configuring the databricks notebook to work seamlessly with our product (estimated task time 2 minutes) prerequisite before diving into the integration process, you'll need a databricks access token this token serves as your key to connecting our product with databricks you can obtain an access token by navigating to your databricks workspace click on your user name, select "user settings", then "developer", and go to "access tokens" where you can manage and generate new tokens additionally, databricks offers api options for token generation, which you can explore in the official databricks documentation configuration connect kensu to databricks to install kensu spark agent on multiple databricks clusters at once, follow these steps open the kensu app and navigate to the "collectors" tab configure a new databricks connection workspace host is the base part of the url of your databricks instance, e g https //adb my hostname 0 azuredatabricks net databricks access token 3\ select the clusters which you want to be tracked by kensu 4\ r estart the databricks clusters you want to observe to enable the configaration note that if you are using databricks unity enabled cluster (also called default catalog), due to security reasons the are more restricted, you will need to add an extra scala code cell at the end of each of your notebooks which you want to be tracked by kensu agent databricks cell, at end of notebook %scala io kensu sparkcollector kensusparkcollector computedelayedstats(spark) fine tuning agent configuration for each notebook once you have run a notebook for first time, an entry will appear in kensu app's "collectors => integrations" tab from there you can manage the kensu agent configuration for that notebook by clicking on edit icon to provide agent configuration parameters (mostly same as conf ini as described in general kensu spark agent documentation, except that you do not need to provide the kensu ingestion url nor kensu ingestion token if configuring from kensu app) select a kensu application group and token ,which can be use to manage who has access to see the metadata about this notebook in kensu once you set the application group the first time, the earlier ingested entities (without explicit application group) will be hidden , because without application group the accesibility/privacy of information about the notebook is undefined configuration parameters default values the following default values will be used if not set explicitly in the configuration in use process name = \<notebook name> project name = \<notebook name> code location = \<notebook path> user name = \<notebook user> the rest of configuration options behave the same as in conf ini described in general pyspark docid\ ogv7xhhwgtngjizejom5e agent documentation supported spark & databricks cluster versions spark version databricks runtime version status unity supported? 3 0 1 7 3 lts ⚠️ partial support n/a 3 1 2 9 1 lts ✅ ⚠️ mostly ok n/a 3 2 1 10 4 lts ❌ not supported currently n/a 3 3 0 13 3 lts ✅ ✅ 3 3 1 12 0 ✅ ✅ 3 3 1 12 1 ✅ ✅ 3 3 2 12 2 lts ✅ ✅ 3 4 0 13 0 ✅ ✅ 3 4 0 13 1 ✅ ✅ 3 4 0 13 2 beta ✅ ✅ 3 4 1 1 3 3 lts ✅ ⚠️ seem to work ok, using 3 4 0 jar (3 4 1 jar being prepared for release) ✅ 3 5 0 1 4 0 ✅ ⚠️ works (needs unreleased spark agent v1 3 1 jar for spark 3 5 0) ✅ ml clusters were not tested more versions may be supported, but either currently have some limitations, or were not yet tested contact kensu customer support if needed how does it work internally? the kensu databricks agent works by creating a databricks global cluster init script, which if the cluster is enabled to be tracked by kensu