Prevent further issues
It has been several months since the last campaign. The Marketing team asks you to execute the pipeline in order to prepare a new campaign.
In your terminal window, run the second script:
You log into the Kensu platform, and you see that a new ticket has been created!
1️⃣ Click on the Ticket title to see the list of tickets
2️⃣ By clicking on the + icon, you can access the details and context of the ticket. You will see in which project, application, and environment the rule has been violated.
3️⃣ Click on the data source name, testme.testme_schema.contact_list, in order to be redirected to the corresponding data source page.
The collected metadata and observability metrics will help you to define the source of the issue.
1️⃣ The first step involves using the lineage in order to find the origin of the data. Click on thecustomer_list.csv rectangle representing the data source.
2️⃣ Click on View data source details in order to navigate to this data source.
3️⃣ Once on the customer_list.csvdata source, a shortcut allows to see the missing values observability metrics. Click on the button to display the chart.
In the table below the chart, you can see both executions, ordered by descending timestamp. You can observe that the latest execution contains about 49% of null values for the column of interest - phone.nullrows.
In this case, it means that half of the customers cannot be contacted by phone. There might have been a change in the customer behavior, reducing the proportion of people giving their phone number, while the email counter of null rows seems to be stable. The marketing team decides to change the process and to contact the customers by email. Therefore, they ask the data team to provide them with the email addresses.
In order to avoid further issues, you decide to create a variability rule. This rule will detect a high volatility in the number of missing values of the email field: email.nullrows, so that the same kind of situation you had with the phone numbers won't happen with the email addresses.
1️⃣ In the Rules section, hover the mouse over the Add rule button and click Add Variability
2️⃣ Put these parameters:
- Statistic Name: Use the drop-down to select email.nullrows . You can also filter the values by typing the first letter in the text field.
- Maximum variation value: Put 20, meaning you want to be notified if the variation exceeds 20%
- Leave the For no more than the past field blank.
After clicking OK, you will see a new rule in the Rules section:
As the latest execution contained 16 nullrows, the next execution will be linked to a ticket if the number of nullrows in the email column exceeds 20 or is below 12.