In the last article, I introduced the Control Flow feature. Using control flow we can manage through tasks what SSIS package does. The most frequently used is a Data flow task, which contains data transfer logic (ETL processes).
Difference between Control Flow and Data flow task
Unfortunately, these 2 terms are often confused. Control flow is not a task. It is the SSIS package management layer. It defines tasks (eg Execute SQL, Script task, Dataflow task, etc.) and relations/sequence between them.
Data flow is one of the tasks that can be implemented within Control flow and contains a logic to get data from point A to point B. Control flow can contain 1 or more Data flow tasks.
Data Flow – Introduction
As I mentioned earlier, it is probably the most used task at all. This is because the SSIS package is most often created to provide some data flow or data transformation before it is transferred to a target destination.
Together with the connection manager, data flow ensures connection to a data source (and destination), transformation and data transfer to a certain destination. On the screenshot below is the final state of our SSIS package from the previous article where we have prepared the general control flow logic. We have used Execute SQL tasks that provide logging (start and end) and the data flow task (but we have no logic yet in it).
The data flow task on the screenshot is in the middle of the workspace, named “ETL”. We can double-click this task and look inside.
The content of the task is still completely empty. Note that the SSIS Toolbox (on the left in the workspace) offers a different set of components than a control flow.
SSIS Data Flow – Component Types, Source, Destination
Let’s take a look at the SSIS Toolbox – what we have in Data flow by default in Business Intelligence Studio. The toolbox can be logically classified into two basic categories:
- Data Connection (Source and Destination) and
- Data Transformation
In SSIS Toolbox, components are categorized as follows:
1. Data Connection
-
- 1.1. Favorites – We can find here Source and Destination Assistant (wizard), which will help us to create a provider to some file or database table
- 1.2. Other Sources – We can choose the connection provider. E.g. we would choose OLE DB Source to connect to SQL Server.
- ADO NET Source
- CDC Source
- Excel Source
- Flat File Source
- ODBC Source
- OLE DB Source
- Raw File Source
- XML Source
- 1.3. Other Destination – We can choose the connection provider, for example, to connect to SQL Server we would choose OLE DB Source. Data will be imported to this destination
- ADO NET Destination
- Data Mining Model Training
- DataReader Destination
- Dimension processing
- Excel Destination
- Flat File Destination
- ODBC Destination
- OLE DB Destination
- Partion Processing
- Raw File Destination
- Recordset Destination
- SQL Server Compact Destination
- SQL Server Destination
2. Data Transformation
-
- 2.1. Common – Here we find the most commonly used tools for data transformation, such as Derived Column, Data Conversion, etc.
- Aggregate component
- Balanced Data Distributor
- Conditional Split
- Data Conversion
- Data Streaming Destination
- Derived Column
- HDFS File Destination
- HDFS FIle Source
- Lookup component
- Merge component
- Merge Join component
- Multicast component
- OData Source
- OLE DB Command
- Row Count component
- Script Component
- Slowly Changing dimension
- Sort component
- Union All component
- 2.2. Other Transformations
- Audit
- Cache Transform
- CDC Splitter
- Character Map
- Copy Column
- Data Mining Query
- DQS Cleansing
- Export Column
- Fuzzy Grouping
- Fuzzy Lookup
- Import Column
- Percentage Sampling
- Pivot
- Row sampling
- Term Extraction
- Term Lookup
- Unpivot
- 2.1. Common – Here we find the most commonly used tools for data transformation, such as Derived Column, Data Conversion, etc.
Individual components will be presented in separate articles.
Example – Load data from Excel into SQL Server table
The task in our SSIS package will be to modify the Data flow task to do the following logic:
- In the C: \ Biportal_Data folder we have an excel named “Source_Data.xls”. To this file we want to connect
- Add a new Date_Timestamp column as a timestamp
- Save the data to SQL Server – localhost, dbo.Excel_Data table
The source data looks like this:
The target table in SQL Server looks like this:
Solution:
1) First, we need to connect to the data in Excel
- In other Sources, drag the Excel Source component to the Data flow space
- Then click on New connection manager (because the connection to Excel is not created yet)
- Set the path to the file C:\Biportal_Data\Source_Data.xls and then click OK
- Next, select Sheet on which the data is located
- Then click Columns in the left sidebar to see if the result contains our data column
- Everything is OK and we can click OK, Data source is ready
2) Now we want to add a new Date_Timestamp column to our source data
- Select the Derived Column component from the SSIS toolbox and drag it to the data flow. Connect the two components using the arrow
- Double click on the derived column and set it exactly as on the screenshot. It’s done
3) Prepare the Data Destination Provider – we need to load our source data together with the Timestamp (derived column) into the SQL Server table
- Similarly to the Excel source (step 1), we also prepare the destination connection. In SSIS Toolbox under category “Other destination”, select OLE DB Destination and drag it to the Data Flow space. Then open it and set the connection to the target server (localhost) and database (biportal). Under “Name of the table or view” select our target table dbo.Excel_Data
- Then we have to map the columns from the source file to the target table. Click on “Mappings”. If the column names in the source Excel and in the destination table have the same name, the OLE DB destination provider understands to map them. If they are named differently, it is necessary to map the columns manually using black arrows. So there is no need to do anything in our case. Confirm OK
Our final Control flow and data flow look like this
So we can run the entire SSIS package and look at the result
Data has been successfully delivered to a SQL Server table