Observe your pipeline
Run the program for the first time
In this section, you will execute the pipeline by running the load_first_campain.py program and the Marketing dbt project. You can find those scripts in the repository you cloned.
The first part manipulates CSV files and ingests them in a Postgresql Database using the vanilla Pandas in cooperation with the Kensu Pandas layer. This extra layer will augment the installed Pandas version with Data Observability capabilities such as tracking and logging your data usage, but also profile (e.g., compute metrics) on the data consumed and produced.
The dbt part will manipulate the tables inside the Postgresql Database in order to create new tables. The final result is a list of customers the marketing team will call to propose a new product. The docker contains an augmented version of dbt, where the Kensu-py library has been installed to send Data observability metadata to the platform.
Both applications follow the Data Observability Driven Development (DODD) principles.
For general information on how to use Kensu with Python look at Configure the Python agent.
Run the following command in your terminal. This command will trigger the full pipeline, feeding the database with Python and executing the dbt models.
🎉 Congratulations! 🎉 You have sent data observability metrics to Kensu!
Check the result in Kensu
In this section, we will walk through the Kensu platform and see the elements we have collected.
Access the Data Source page
1️⃣ Go to the Kensu main page and click on Data Sources
2️⃣ You will see the list of all the data sources used or created by the program. Select the orders_and_customers data source.
You will access the Data Source page, which presents the metadata we have collected with the Kensu agents.
Explore Observability metrics
For each execution, Kensu automatically gathers a list of predefined metrics in function of the data source schema. Our agent will, for instance, collect distribution metrics for numerical columns, the frequency for categorical columns, and even timestamps for Date columns.
1️⃣ Go to the Statistics panel and click on Select Attributes
2️⃣ Select total_qty, which represents the number of articles made in the order, and click on OK to display the metrics.
As total_qtyis a numerical field, the agent computed the minimum, mean and maximum value of the column, as well as the number of missing values.
Explore the data lineage
Kensu Agent will also consolidate the data lineage, so the dataflows taking place inside the application.
Those are displayed in the Lineage panel.
The first diagram, Creation of, explains how the data source was created while the second one, Usage of, shows how the data source is used.
Filters are available to have a granular view of the lineage. You can have a list of applications creating the data source or even know which data sources are involved in the creation of a specific field.
Overview of the dbt tests
Kensu allows also to create rules programmatically, based on the dbt tests.
This is based on the dbt schema.yml file, which allows adding tests. In this example, a not_null test was added to the phone column of the produced table.
This will be translated as a rule in the Kensu platform.
1️⃣ From the Kensu main page, click on Rules
This will list all the created rules. You can see a rule set on the phone.nullrows attribute of the contact_list data source. The not_null test has been converted into a min/max rule. If the number of null values of the phone columns exceeds 0, a ticket will be generated.
On the Prevent further issues , we will trigger an alert and create a ticket based on that rule, and we will see how to manually add them from the UI to prevent other issues.