Create a monitoring rule in Kensu
Run the program for the first time
In this section, you will execute the pipeline by running the data_ingestion.py and reporting.py scripts. You can find those scripts in the repository you cloned under the folder python_code.
Those programs manipulate CSV files using the vanilla Pandas in cooperation with the Kensu Pandas layer. This extra layer will augment the installed Pandas version with Data Observability capabilities such as tracing and logging your data usage, but also profile (e.g. compute metrics) on the data consumed and produced.
Both applications follow the Data Observability Driven Development (DODD) principles.
For general information on how to use Kensu with Python look at Configure the Python agent.
To execute the pipeline, run the Python or Docker commands below by choosing one of
- Docker Pandas
- Local Pandas
- Local PySpark
These commands will run both Python scripts, one after the other, using data from November 2021.
🎉 Congratulations! 🎉 You've sent your first data observability information to Kensu! You can now go to the Kensu data sources page and review the data.
In the next section, you will learn how to use the Kensu UI to view and work with the data observations to, for example, troubleshoot any data problems.
Create a Min-Max rule in Kensu
Suppose that the Risk Officers have agreed to track several quality metrics for the business.
The first metric is the monthly volatility of the returns for the Buzzfeed stock. If the volatility exceeds 20%, the risk officers will reduce the amount of Buzzfeed stocks in the portfolio.
The volatility of the "returns", also called the risk, is the standard deviation of the returns.
To calculate this, the reporting officer added a new rule to the report_buzzfeed.csv data source. This fires an alert when the standard deviation of Intraday_delta exceeds 20%.
How to Create the Rule
1️⃣ Go to the Kensu main page.
2️⃣ Click on Data Sources.
3️⃣ Select the data source for which you want to add a rule, report_buzzfeed.csv, in the column Logical Data Source Name.
4️⃣ Hover the mouse over the Add rule button and click Add Min-Max.
5️⃣ Put these parameters:
- Statistic Name: Use the drop-down to select Intraday_Delta.std(for Pandas) or Intraday_Delta.stddev(for PySpark). You can also filter the values by typing the first letter in the text field.
- Maximum valid value: Put 0.2, meaning you want to be notified if the value exceeds 0.2.
After clicking OK, you will see a new rule in the Rules section:
On the Find the root cause with Kensu, we will trigger an alert for this rule. Then we will see how to use Kensu to find the root cause.