Bulk Table Automated Ingestion in Microsoft Fabric Data Factory Using a Single Pipeline and JSON Configuration

Table of contents

1. Data ingestion and the issue of manually creating a large number of data flows

2. Solution principle in Fabric Data Factory for pipeline automation

3. Sample JSON config for controlling bulk imports

4. Step-by-step guide for creating the Data Factory pipeline

4.1. Step 1 – Preparing the JSON configuration file

4.2. Step 2 – Creating a new pipeline

4.3. Step 3 – Lookup activity and JSON configuration

4.4. Step 4 – ForEach loop activity and expression setup

4.5. Step 5 – If conditions activity

4.6. Step 6 – Copy Activity: universal import and automated ingestion folder structure

4.7. Step 7 – Additional downstream steps (notebook)

5. Conclusion, testing, and check

This article is a technical guide on how to implement a bulk table import using a single pipeline in Fabric Data Factory (or Azure Data Factory) by using a ForEach container and an external JSON configuration file.

This approach enables easy addition, removal, and modification of tables without changing the pipeline itself. The JSON file allows us to configure many powerfull parameters that can be centrally managed for all tables, such as:

Source table name
Target parquet file name
is_enabled (whether the table should be processed)
Primary key
Ingestion method (full load vs incremental load)
Referenced notebook for Delta table ingestion
and more…

Data ingestion and the issue of manually creating a large number of data flows

When building a data platform, we often need to extract and process a large number of tables from source systems. Creating a separate pipeline for each table is highly inefficient, especially when the number of objects can easily reach tens or hundreds.

An elegant solution is to use a single orchestrator (per source system) that iterates through a list of tables stored in a JSON file and imports them using a universal method. Fabric provides the features required to achieve this, and we will need:

1× Fabric Data Factory pipeline
A JSON configuration file stored in a Data Lake
AdventureWorks light database as the source data

And that is essentially everything.

Note: This approach cannot always be used for various objective reasons. However, when the source contains structured data such as a database, this method is applicable.

Solution principle in Fabric Data Factory for pipeline automation

Fabric Data Factory is a tool for orchestrating data workflows across the Fabric environment. It enables execution and management of copy activities, transformations, notebooks, web calls, dataset parameterization, and automation of complex ETL/ELT processes. Thanks to parameterization, it is ideal for large-scale data ingestion.

The configuration JSON contains the list of tables and their metadata (settings)
The pipeline performs a Lookup on the JSON file.
An optional Filter activity selects only items with is_enabled = true. This allows us to enable/disable tables dynamically.
A ForEach activity iterates through the tables and settings.
Each iteration triggers a copy activity that downloads data from the source into a parquet file.
After downloading, a notebook is executed to transform the data or write it into Delta tables.

Sample JSON config for controlling bulk imports

This JSON file acts as the central control structure — defining all tables that will be imported. Typically, it is stored in Lakehouse files.

{
  "sourceSystem": "AdventureWorks",
  "tables": [
    {
      "schema": "SalesLT",
      "tableName": "Address",
      "parquetName": "address",
      "deltaSchema": "adventureworks",
      "deltaName": "bt_aw_address",
      "deltaNotebook": "2a0b0af8...",
      "source_updated_at": "ModifiedDate",
      "source_businesskey": "AddressID",
      "is_enabled": true,
      "ingestion_method": "fullload"
    },
    {
      "schema": "SalesLT",
      "tableName": "Customer",
      "parquetName": "customer",
      "deltaSchema": "adventureworks",
      "deltaName": "bt_aw_customer",
      "deltaNotebook": "acf935d3...",
      "source_updated_at": "ModifiedDate",
      "source_businesskey": "CustomerID",
      "is_enabled": true,
      "ingestion_method": "fullload"
    }
    /* ...more tables... */
  ]
}

Step-by-step guide for creating the Data Factory pipeline

Step 1 – Preparing the JSON configuration file

Store the JSON configuration in Fabric Lakehouse or ADLS: /_configs/{sourceSystem}_{Name}.json.

Step 2 – Creating a new pipeline

Create a new pipeline.

Step 3 – Lookup activity and JSON configuration

Add a Lookup activity and load the JSON configuration from the Lakehouse.

Step 4 – ForEach loop activity and expression setup

Add a ForEach loop activity and configure it to read the collection of tables from the lookup output. Enable sequential processing.

Step 5 – If conditions activity

We must ensure filtering using the is_enabled parameter. This can be done using a Filter activity or an If Condition activity.

Drag the If Condition inside the ForEach activity, then click edit to enter its inner flow.

Configure the activity to evaluate the is_enabled flag for the currently processed table. When true, activities in the True branch will be executed. When false, nothing happens.

Step 6 – Copy Activity: universal import and automated ingestion folder structure

(a) Source: Parameterize the dataset using expressions such as @item().schema and @item().tableName.

(b) Source: Additional columns may be configured under the Advanced section. These columns are important for further processing in the silver layer and template automation — we do not need to manually define primary keys or SCD2 tracking columns, which supports downstream automation.

Sink: Parameterize the dataset so that parquet files follow a structure optimized for partition pruning:

/data-lake/
   └── <source_system>/
       └── <table_name>/
           └── <ingestion_method>/
               └── year=<ingestion_year>/
                   └── month=<ingestion_month>/
                       └── day=<ingestion_day>/
                           └── <ingestion_timestamp>data.parquet

File path is defined as:

@concat(pipeline().parameters.p_destination_root,'/', pipeline().parameters.p_source_system, '/', item().parquetName, '/ingestion_method=',item().ingestion_method, '/ingestion_year=', formatDateTime(utcNow(),'yyyy'), '/ingestion_month=', formatDateTime(utcNow(),'MM'), '/ingestion_day=', formatDateTime(utcNow(),'dd') )

And the file name:

@concat(formatDateTime(utcNow(),'yyyyMMdd_HHmmss'),'_',item().parquetName,'.parquet')

Step 7 – Additional downstream steps (notebook)

After Step 5, all tables marked as enabled in the configuration file are processed and stored as parquet files. Optionally, we can also trigger a downstream notebook. The JSON config references a notebook GUID, which can be executed after data ingestion. This notebook may include deduplication, cleaning, SCD2 processing, or any other logic.

Conclusion, testing, and check

After running the prepared solution, we can see that the tables are processed only when is_enabled = true (entries marked as disabled are skipped):

When inspecting the Lakehouse, we see a clean, partitioned folder structure ready for efficient parquet access, partition pruning, and simplified maintenance.

This approach enables processing dozens of tables using a single pipeline, simplifies adding and removing tables, and significantly reduces data ingestion management overhead. Updating the JSON configuration is sufficient for the pipeline to automatically execute the import according to the new configuration. For larger data volumes or table counts, further optimizations and asynchronous processing should be considered.

Rate this post