Integration
Collectors

Azure Data Factory

6min
the azure data factory collector is a robust tool designed to enhance data observability within your azure data factory pipelines by gathering metadata from data factory pipeline runs and azure data lake storage, it provides valuable insights into data lineage, schema changes, and performance metrics purpose the primary purpose is to turn azure data factory data observable by generating data observations based on data factory pipeline executions it empowers users to trace data flow, comprehend dependencies, and ensure data quality and compliance with this collector, you can attain a comprehensive view of your data pipelines and make informed decisions how it works the azure data factory collector functions by utilizing the azure data factory python sdk to interact with azure resources it retrieves pipeline and activity run information directly from azure data factory and extracts lineage and contextual information (pipeline name, azure data factory project, environment, timestamp) this allows you to discern which pipelines produce specific outputs and consume particular inputs to enrich the observations, the collector also retrieves schema details and compute statistical metrics on the data sources (see data source connections docid\ duhj8qvowpx1rhtlvdq9i ) features data collection the azure data factory collector gathers a wealth of information from your data factory pipeline runs, activity runs, and tables used this encompasses pipeline run specifics like start time, end time, and status, as well as the input and output datasets used by each pipeline furthermore, it captures schema information and performance metrics related to the datasets employed in your data factory pipelines integration with azure data factory this collector seamlessly integrates with azure data factory, leveraging the azure data factory python sdk to retrieve "pipeline run" metadata and other infromation about azure resources with this integration in place, you can effortlessly fetch data observations for all your azure data factory pipelines circuit breaker feature the azure data factory collector comes with an agent integrating with the import adf circuit breaker agent docid\ i2sffveq0lbg3mek5nfgg this provides the user the capability to automatically break the executions of the data factories in case of incidents with this feature, the collector constantly monitors the observations in azure data factory to identify any data related issues with pipeline executions if a data issue is detected, the circuit breaker will halt the affected pipeline run, safeguarding the integrity of your data and preventing potential downstream issues to achieve the agent is composed of an azure webhook that triggers the circuit breaker workflow configuration of the azure data factory collector the azure data factory collector relies on the azure sdk and direct connections to the data sources used in the azure data factory pipelines to register an azure data factory connection, please follow register azure data factory connection docid\ snotlaydlhr0npw0mi21c