Kensu Documentation

⌘K
Getting started with the Kensu Community Edition
Marketing campaign
Financial data report
Getting credentials
Recipe: Observe Your First Pipeline
Agents: getting started
Python
PySpark
Scala Spark
Databricks Notebook
Agent Listing
Docs powered by archbee 
14min

Find the root cause with Kensu

In this section, we run the pipeline again, having created the rule. This will create a ticket for the Min-Max rule created in Create a monitoring rule in Kensu.

Definition



A ticket is like a notification or work item. It is created by the Kensu platform when a rule exceeds its thresholds.

This draws attention to data events worth reviewing as they may be issues for example.

Detect a data issue

1️⃣ Run the following command, this time using data from December 2021.

Docker Pandas
Docker PySpark
Local Pandas
Local PySpark
|



2️⃣ Look at the Home page. The tickets count is now 1. Click on Tickets to review it.

Document image

Analyze the issue

1️⃣ On the Tickets page:

  1. To see the reason for the ticket click the + icon.
  2. Click the Data Source name report_buzzfeed.csv.
Document image



2️⃣ Display the Min/Max rule by clicking the chart icon.

Document image

The chart displays Intraday_Delta.std.

Document image

Notes



The red exclamation mark is the observation that violated the rule Min-Max rule.

The table below the chart shows the observation that is out-of-bounds.

Find the Root Cause

Explore the statistics of the data source

Now, investigate the drop in quality of the data set.

To do so, we drill into values for Intraday_Delta.

1️⃣ Click +Select Attributes.

Document image



2️⃣ Click on the checkbox Intraday_Delta. Click OK.

Document image

As you can see, something looks fishy. You would expect the data set to have around 20 rows of data for 20 business days per month, yet the last run has only 3 rows.

If you use PySpark, count is named nrows.

Document image

This is an interesting metric to follow. To avoid future issues like this, one could add a rule on the Count to ensure it is always around 20.

Explore the lineage to find the origin of the issue

Now, having found an issue, we find its root cause.

To do this, Kensu collects the technical data lineage. This lets you browse the data sources backward, from the faulty data source to the origins of the pipeline.

1️⃣ In the panel Creation Of, you see the data sources that were used to create the report_buzzfeed.csv file. They represent its upstream lineage. Click on the bar next to monthly_assets.csv, toward the bottom. This is the upstream data source node monthly_assets.csv.

2️⃣ Click on View Data Source Details.

Document image



3️⃣ You see the monthly_assets.csv data source page. You can explore those statistics.

4️⃣ Now, go back up the page to the Statistics area. Click +Select Attribute.

Document image

Select Symbol.

Document image

In the table of observations, you can see both runs of the pipeline. These are sorted in reverse execution time, Timestamp. Observe that the number of unique Symbols, num_categories, has increased by one.

Looking at other columns, notice:

  1. The ENFA stock symbol shows 3 rows while there were 21 for the first execution.
  2. A new stock symbol, BZFD, appears.

So, based on those observations, you've discovered a stock symbol ticker name change.

On the 6th of December, the ENFA ticker changed its name to BZFD. Therefore, we have 3 records for ENFA and the remaining 19 days for BZFD.

This dramatically changed the standard deviation compared to last month, causing the Min-Max rule to trigger.

Document image

To prevent future issues, you could add a rule on the number of categories.

Go to the next step

Updated 08 Nov 2022
Did this page help you?
Yes
No
UP NEXT
Create a monitoring rule programmatically
Docs powered by archbee 
TABLE OF CONTENTS
Detect a data issue
Analyze the issue
Find the Root Cause
Explore the statistics of the data source
Explore the lineage to find the origin of the issue