Fabric | Getting Started with Data Factory, Pipelines, and Connectors

Table of contents

1. The Role of Fabric Data Factory in Data Architecture

2. Data Acquisition, Transformation, and Orchestration in Fabric

2.1. Fabric Pipelines for Data Flows and Orchestration

2.2. Data Acquisition (Ingestion) – Pipelines and Prebuilt Connectors

2.3. Data Transformation into Silver and Gold – Notebooks, SQL, Python, dbt

3. Summary and Conclusion – Operations, Governance, and Benefits

This article is intended primarily for managers, IT specialists, and technical decision-makers who are evaluating Microsoft Fabric and are considering its implementation. It is also intended for the broader professional audience that is exploring Fabric, as well as for educational purposes. The text focuses on the principles of Fabric Data Factory, key technical concepts, and practical benefits. Detailed implementation procedures are not included – these are covered in various, more specialized articles available on this website. A structured overview of all Fabric-related articles, including context, can be found here: Fabric | dbt – Architecture and the Role of dbt in the Medallion Architecture. An unstructured overview is also available in the Microsoft Fabric category.

The Role of Fabric Data Factory in Data Architecture

All data that data professionals work with goes through several phases during processing. First, data must be extracted from source systems, then processed, cleansed, historized, modeled for storage, and finally reported on.

It is therefore essential to have a robust tool for this work. Throughout the entire data processing lifecycle, we must also respect security principles, data governance, and a sustainable, scalable architecture, so that the solution does not become unmanageable over time as data volumes and complexity grow.

Fabric Data Factory is a tool designed for data integration and orchestration of data processes. It serves as a central data engineering layer that manages data flows, specifically:

Data acquisition – ingesting data from various source systems into the central OneLake storage
Data cleansing, maintaining data history (SCD2), transformation, and reporting – subsequent cleansing and processing into additional data layers (Silver, Gold), as well as delivery to analytical and reporting layers (such as Power BI and others)
Orchestration of the above – enabling the creation of control mechanisms that trigger different tasks sequentially and in the correct order, defining and managing when, from where, and how data is processed

Data Acquisition, Transformation, and Orchestration in Fabric

Fabric Data Factory enables integration with a wide range of data sources, from relational databases and file storage systems to cloud services and application interfaces.

Fabric Pipelines for Data Flows and Orchestration

The core concept of Fabric is pipelines, which represent managed workflows. Visually, the environment is user-friendly: individual pipeline components are assembled using drag-and-drop, resulting in a fully visual, low-code experience.

A pipeline works on a similar principle to, for example, an SSIS package. It is a collection of predefined tasks packaged into a single executable object. Each pipeline can be executed independently or triggered by another pipeline, which forms the basis of orchestration. A pipeline defines:

Step sequences – for example, load metadata, extract data, store data
Dependencies – what should run and in what order
Execution conditions – conditions under which a step is executed or skipped
Error-handling behavior – for example, triggering another process in case of failure (such as sending an email)
Monitoring and logging – each pipeline includes background logging

Fabric thus enables the creation of robust data processes that are easy to monitor, audit, and execute repeatedly.

Note: For data acquisition, other options can also be used, such as third-party tools like Fivetran, or Fabric Notebooks, which are executable objects containing custom code written in Python or SQL.

Data Acquisition (Ingestion) – Pipelines and Prebuilt Connectors

Initial connections to data sources can be established using prebuilt connectors, which abstract users from technical details and coding. These connectors, as shown in the screenshot, handle data acquisition from source systems. The ingested data can then be stored, for example, in a Lakehouse or Data Warehouse artifact and further processed.

Data Transformation into Silver and Gold – Notebooks, SQL, Python, dbt

Subsequent data engineering tasks following data acquisition (Bronze layer) include data cleansing and historization into the Silver layer. This can be achieved using various features supported by Fabric, depending primarily on preferences and architectural design, for example:

Fabric Notebooks – executable objects containing code (Python, SQL)
dbt (Data Build Tool) – data can be loaded from the Lakehouse and transformed into a Data Warehouse artifact using the open-source dbt tool. The website contains many articles with detailed implementation guidance, for example: Fabric | dbt – How I Build Gold Layer Dimensional Tables (SCD2) in Data Projects
Dataflow Gen2 – a visual tool similar to Power Query, allowing transformations to be configured interactively. It is less flexible than Notebooks or dbt and has higher consumption of paid Fabric capacity compared to pipelines
SQL Procedures in the Data Warehouse – if the architecture is based on SQL-driven processing within the data warehouse, stored procedures can also be used
External tools

Summary and Conclusion – Operations, Governance, and Benefits

An important aspect of Fabric Data Factory is monitoring. Every pipeline run is logged, enabling tracking of processing status, error identification, and performance analysis of individual steps. This information is critical for its long-term security, performance, and cost optimization, including workload and Fabric capacity optimization.

From an organizational perspective, Data Factory provides:

standardization of data flows
centralization under a single platform
reduced technical complexity
scalability
better control over how data is created and used within the environment

Rate this post