This article is a technical guide on how to implement a bulk table import using a single pipeline in Fabric Data Factory (or Azure Data Factory) by using a ForEach container and an external JSON configuration file.
This approach enables easy addition, removal, and modification of tables without changing the pipeline itself. The JSON file allows us to configure many powerfull parameters that can be centrally managed for all tables, such as:
- Source table name
- Target parquet file name
- is_enabled (whether the table should be processed)
- Primary key
- Ingestion method (full load vs incremental load)
- Referenced notebook for Delta table ingestion
- and more…
Data ingestion and the issue of manually creating a large number of data flows
When building a data platform, we often need to extract and process a large number of tables from source systems. Creating a separate pipeline for each table is highly inefficient, especially when the number of objects can easily reach tens or hundreds.
An elegant solution is to use a single orchestrator (per source system) that iterates through a list of tables stored in a JSON file and imports them using a universal method. Fabric provides the features required to achieve this, and we will need:
- 1× Fabric Data Factory pipeline
- A JSON configuration file stored in a Data Lake
- AdventureWorks light database as the source data
And that is essentially everything.
Note: This approach cannot always be used for various objective reasons. However, when the source contains structured data such as a database, this method is applicable.
Solution principle in Fabric Data Factory for pipeline automation
Fabric Data Factory is a tool for orchestrating data workflows across the Fabric environment. It enables execution and management of copy activities, transformations, notebooks, web calls, dataset parameterization, and automation of complex ETL/ELT processes. Thanks to parameterization, it is ideal for large-scale data ingestion.
- The configuration JSON contains the list of tables and their metadata (settings)
- The pipeline performs a
Lookupon the JSON file. - An optional
Filteractivity selects only items withis_enabled = true. This allows us to enable/disable tables dynamically. - A
ForEachactivity iterates through the tables and settings. - Each iteration triggers a copy activity that downloads data from the source into a parquet file.
- After downloading, a notebook is executed to transform the data or write it into Delta tables.
Sample JSON config for controlling bulk imports
This JSON file acts as the central control structure — defining all tables that will be imported. Typically, it is stored in Lakehouse files.
{
"sourceSystem": "AdventureWorks",
"tables": [
{
"schema": "SalesLT",
"tableName": "Address",
"parquetName": "address",
"deltaSchema": "adventureworks",
"deltaName": "bt_aw_address",
"deltaNotebook": "2a0b0af8...",
"source_updated_at": "ModifiedDate",
"source_businesskey": "AddressID",
"is_enabled": true,
"ingestion_method": "fullload"
},
{
"schema": "SalesLT",
"tableName": "Customer",
"parquetName": "customer",
"deltaSchema": "adventureworks",
"deltaName": "bt_aw_customer",
"deltaNotebook": "acf935d3...",
"source_updated_at": "ModifiedDate",
"source_businesskey": "CustomerID",
"is_enabled": true,
"ingestion_method": "fullload"
}
/* ...more tables... */
]
}
Step-by-step guide for creating the Data Factory pipeline
Step 1 – Preparing the JSON configuration file
Store the JSON configuration in Fabric Lakehouse or ADLS: /_configs/{sourceSystem}_{Name}.json.
Step 2 – Creating a new pipeline
Create a new pipeline.
Step 3 – Lookup activity and JSON configuration
Add a Lookup activity and load the JSON configuration from the Lakehouse.
Step 4 – ForEach loop activity and expression setup
Add a ForEach loop activity and configure it to read the collection of tables from the lookup output. Enable sequential processing.
Step 5 – If conditions activity
We must ensure filtering using the is_enabled parameter. This can be done using a Filter activity or an If Condition activity.
- Drag the If Condition inside the ForEach activity, then click edit to enter its inner flow.
- Configure the activity to evaluate the
is_enabledflag for the currently processed table. Whentrue, activities in the True branch will be executed. Whenfalse, nothing happens.
Step 6 – Copy Activity: universal import and automated ingestion folder structure
- (a) Source: Parameterize the dataset using expressions such as
@item().schemaand@item().tableName.
- (b) Source: Additional columns may be configured under the Advanced section. These columns are important for further processing in the silver layer and template automation — we do not need to manually define primary keys or SCD2 tracking columns, which supports downstream automation.
- Sink: Parameterize the dataset so that parquet files follow a structure optimized for partition pruning:
/data-lake/
└── <source_system>/
└── <table_name>/
└── <ingestion_method>/
└── year=<ingestion_year>/
└── month=<ingestion_month>/
└── day=<ingestion_day>/
└── <ingestion_timestamp>data.parquet
- File path is defined as:
@concat(pipeline().parameters.p_destination_root,'/', pipeline().parameters.p_source_system, '/', item().parquetName, '/ingestion_method=',item().ingestion_method, '/ingestion_year=', formatDateTime(utcNow(),'yyyy'), '/ingestion_month=', formatDateTime(utcNow(),'MM'), '/ingestion_day=', formatDateTime(utcNow(),'dd') )
- And the file name:
@concat(formatDateTime(utcNow(),'yyyyMMdd_HHmmss'),'_',item().parquetName,'.parquet')
Step 7 – Additional downstream steps (notebook)
After Step 5, all tables marked as enabled in the configuration file are processed and stored as parquet files. Optionally, we can also trigger a downstream notebook. The JSON config references a notebook GUID, which can be executed after data ingestion. This notebook may include deduplication, cleaning, SCD2 processing, or any other logic.
Conclusion, testing, and check
After running the prepared solution, we can see that the tables are processed only when is_enabled = true (entries marked as disabled are skipped):
When inspecting the Lakehouse, we see a clean, partitioned folder structure ready for efficient parquet access, partition pruning, and simplified maintenance.
This approach enables processing dozens of tables using a single pipeline, simplifies adding and removing tables, and significantly reduces data ingestion management overhead. Updating the JSON configuration is sufficient for the pipeline to automatically execute the import according to the new configuration. For larger data volumes or table counts, further optimizations and asynchronous processing should be considered.











