This article serves as an introduction to SQL Server Integration Services (SSIS) for SQL Server. It contains information about this function and its basic parts. SQL Server is generally understood mainly as a database platform containing a powerful database engine and client – Management Studio for working with databases. There are a number of editions, some of which are available for free with certain limitations, such as SQL Server Express or Developer editions.
But the truth is that SQL Server offers much much more. In addition to the above, it also supports Business Intelligence through SQL Server Integration Services (SSIS) and SQL Reporting Services (SSRS).
What is SSIS used for?
As I mentioned, SSIS is a kind of extension of the database platform within SQL Server. Let’s imagine a data warehouse or data mart environment.
Data Warehouses – Reminder
A data warehouse is a centralized repository where information (data) from the entire company is located and stored in a form that can be easily reported. Data warehouses contain mostly historical context.
This implies that data warehouses have significant differences from the database for applications.
- Data from many enterprise systems is processed
- Data is transformed before storing in Data Warehouse (naming convention, transformation into suitable schema, etc.)
- Data is historical – the application contains the current status, data warehouse also contains historical versions of the record
- Significantly larger volumes of data compared to applications
- Data is processed into the warehouse in bulk outside the business hours – usually daily at night
It is clear that we need a tool to be able to process all this data automatically and regularly (daily) efficiently. And that is exactly what SSIS exists for.
SQL Server Integration Services (SSIS) – Introduction, ETL processes
SSIS is a feature that allows you to efficiently create data integration. Within these integrations we are talking about ETL processes:
- Extract – data extraction from the source system
- Transform – Data in the central repository should have strict rules regarding object structure, naming and architecture
- Load – Upload data to the target storage
We create data integration using SSIS packages. These packages have the extension .dtsx and each package contains some logic. For example, we can have a separate package for each phase of the ETL process, or we can cover all the phases in one package. It is up to us how we design the structure to be as transparent as possible. In larger environments, hundreds of SSIS packages are maintained, and appropriate logical structure and naming is important for clarity.
SSIS package is created in a special program BIDS – Business Intelligence Development Studio. There is an equivalent SSDS (SQL Server Data Tools) name for the tool
BIDS / SSDS – SQL Server Data Tools and SSIS
It is software that is similar to Visual Studio and can be installed when you install SQL Server. It contains a set of tools for developing reports and SSIS packages (data integration). Development with this program is quite simple and intuitive. There is no need for programming because it contains a graphical interface.
You can install SSDS later (Visual Studio is not needed) – SQL Server Data Tools (SSDT) for Visual Studio
To create the first SSIS package, just run the program and start the Integration Services Project. Create a new package under the project and start creating.
After the SSIS project is created, the first package is automatically created
Description of SSIS Business Intelligence Development Studio (BIDS)
If you see such a picture for the first time after opening the first project, you may have a headache because you do not know what it is. But it is not difficult at all and it is only necessary to become familiar with the tool. Let’s describe what we see in the screenshot.
Solution Explorer
In the upper right, we see our project, which I named “ssis_test”. In this project we also see the SSIS package, which is automatically created with the name “package” – it can be renamed. But we can see more:
- Project params – we won’t need it at the beginning, but it’s parameters that we can define at the project level and then call them into packages. This is a global parameter
- Project Connection managers – Here we define a connection to various sources and destinations (eg source is Mysql database and destination SQL Server)
- SSIS Packages – Here we see individual packages, usually within one project we have tens/hundreds of packages of similar character
SSIS package and BIDS/SSDS workspace
Let’s leave the Solution Explorer now and focus on the middle of the screen. We still have an automatically created SSIS package named “Package”.
- Package Connection manager – If the package is open, above the package we can see the connection manager and variables. This is because we do not always want to have a connection defined over a project, but sometimes we want a connection over a package.
- Package Variables – Variables that have scope only on the package, the project does not see them. In variables, we can store various auxiliary computed variables (they can also be taken from an external source) and then pass these variables to SSIS components (see below).
- SSIS Toolbox – On the left part of the desktop you can see the SSIS toolbox. It contains a set of components that are used to develop the package itself (download data, transform and upload data to the target repository). Drag and drop individual components into the control flow section and then set them as needed. The components can be placed behind each other, that is, they are then executed sequentially.
If you want to get acquainted with SSIS toolbox in detail, I recommend the article – SSIS Toolbox – List of all components and tasks. The article lists what the toolbox contains. Hover over a component to get to know it.
Once the SSIS package is created, it can be uploaded directly to the SQL server and run automatically by the SQL agent (every day, for example). In this way, we achieve the automation of the whole process, which we no longer have to worry about.
In the future, I would like to introduce the individual components in the form of SSIS tutorials with examples.