Azure Data Factory
The primary purpose is to turn Azure Data Factory data observable by generating data observations based on Data Factory pipeline executions. It empowers users to trace data flow, comprehend dependencies, and ensure data quality and compliance. With this collector, you can attain a comprehensive view of your data pipelines and make informed decisions.
The Azure Data Factory Collector functions by utilizing the Azure Data Factory Python SDK to interact with Azure resources.
It retrieves pipeline and activity run information directly from Azure Data Factory and extracts lineage and contextual information (pipeline name, Azure Data Factory project, environment, timestamp).
This allows you to discern which pipelines produce specific outputs and consume particular inputs. To enrich the observations, the collector also retrieves schema details and compute statistical metrics on the data sources (see Data Source Connections)
The Azure Data Factory Collector gathers a wealth of information from your Data Factory pipeline runs, activity runs, and tables used. This encompasses pipeline run specifics like start time, end time, and status, as well as the input and output datasets used by each pipeline. Furthermore, it captures schema information and performance metrics related to the datasets employed in your Data Factory pipelines.
This collector seamlessly integrates with Azure Data Factory, leveraging the Azure Data Factory Python SDK to retrieve "pipeline run" metadata and other infromation about Azure Resources.
With this integration in place, you can effortlessly fetch data observations for all your Azure Data Factory pipelines.
The Azure Data Factory Collector comes with an agent integrating with the Circuit breaker.
This provides the user the capability to automatically break the executions of the data factories in case of incidents.
With this feature, the collector constantly monitors the observations in Azure Data Factory to identify any data-related issues with pipeline executions. If a data issue is detected, the circuit breaker will halt the affected pipeline run, safeguarding the integrity of your data and preventing potential downstream issues.
To achieve the agent is composed of an Azure WebHook that triggers the Circuit Breaker.
The Azure Data Factory collector relies on the Azure SDK and direct connections to the data sources used in the Azure Data Factory pipelines.
To register an Azure Data Factory Connection, please follow these steps.