Documentation

⌘K
Getting started with the Kensu Community Edition
Marketing campaign
Financial data report
Getting credentials
Recipe: Observe Your First Pipeline
Integration
Collectors
Connectors
Agents: getting started
Python
PySpark
Scala Spark
Databricks Notebook
Agent Listing
Docs powered by
Archbee
Integration
Collectors

Databricks

11min

The integration between the Databricks and Kensu platforms brings mutual customers enhanced data observability and metadata automation capabilities completing Databricks Unity's capabilities.

This integration empowers data teams deploying Databricks jobs to automate the harvesting of metadata, lineage (traces), and data metrics during the execution of the deployed Spark jobs, ensuring data quality, performance, and compliance.

Key Features and Benefits

Key features

Benefits

Automated Metadata Harvesting and Lineage

  • Streamline the collection of metadata and lineage information during Spark job execution.
  • Eliminate manual effort and potential human errors associated with capturing these critical insights.

Comprehensive Data Source Support

  • Seamlessly interact with all data source formats, whether internal or external, for read or write operations.
  • Enrich and automate the visibility of any formats such as CSV, Kafka, Parquet, Delta Tables, and more.

Automatic Computation of Data Metrics

  • Obtain valuable data metrics, including descriptive statistics, automatically from the involved data sources.
  • Drive data-driven decision-making by quickly accessing essential data quality indicators.

Support for Batch and Streaming Jobs

  • Ensure data observability in both batch and streaming Spark jobs.
  • Stay informed about data quality and issues in batch and streaming scenarios.

Runtime Discrepancy Detection and Recommendations

  • Detect discrepancies in the data during Spark job execution and receive timely recommendations.
  • Empower engineers and data users to address data-related issues proactively without impacting stakeholders.

Streamlined Root Cause Analysis

  • Expedite the root cause analysis process with comprehensive metadata, lineage, and data metrics.
  • Troubleshoot and debug data issues efficiently, minimizing risks of users’ disappointment and bad decisions.

Native Circuit-Breaking Support

  • Leverage circuit-breaking capabilities to halt Spark job execution when data doesn't match expectations or contains defects.
  • Safeguard your operations and prevent downstream issues caused by faulty or inconsistent data.

Getting Started

Prerequisites

To proceed with the installation of this integration, you'll only need the following:

  • A Kensu Instance (Request a free trial if you haven't got one yet).
  • A Databricks account.
  • A Databricks Personal Access Token (see Databricks documentation).

Installation

To install the integration, please follow the below few steps:

1 - Login to your Kensu instance

2 - On the sidebar, navigate to Collectors > Configure a connection

3 - Click on the Databricks Logo

Navigate to the Databrick integration setup page
Navigate to the Databrick integration setup page


4 - Provide your Databricks information requested in the previous section

5 - Then click on Connect

Enter Databricks Connection Info
Enter Databricks Connection Info


6 - The Kensu interface will list all clusters of your Databricks account, so you can select the cluster you want Kensu to observe the data usages. Click on Configure to validate your choice.

Select the Databricks Clusters and Configure
Select the Databricks Clusters and Configure


7 - You can also set Kensi on all new clusters that will be created in your Databricks account, to ensure all data usage are observed by Kensu. For this, switch the button Enable Kensu on every new cluster by default, then click on Configure.



Enable Kensu on every new cluster by default
Enable Kensu on every new cluster by default




Usage

If you have selected an existing cluster, then you need to restart it for the Kensu integration to be enabled. It will then attach to it the appropriate Kensu jar (agent) and install the kensu python module.

Running your notebooks on these clusters will autonatically track the usage and metrics of you data involved in those notebooks.

Kensu will notify you if there are unexpected behavior in your data.

Unity

As most Databricks customers, you have probably started using Unity.

In this case, the agent installed by Kensu on the cluster requires a dedicated hint to ensure the metrics of the data sources involved in your Spark jobs are computed.

This is done with a single cell to be added at the end of your notebooks:

Java
|
%scala
io.kensu.sparkcollector.KensuSparkCollector.computeDelayedStats(spark)


Technical details



Achitecture Diagram Databricks Connector
Achitecture Diagram Databricks Connector




Updated 14 Jun 2023
Did this page help you?
PREVIOUS
Collectors
NEXT
Connectors
Docs powered by
Archbee
TABLE OF CONTENTS
Key Features and Benefits
Getting Started
Prerequisites
Installation
Usage
Unity
Technical details
Docs powered by
Archbee