I mainly use SSIS (SQL Server Integration Services) as the tool for creating ETL pipelines. But in the data warehouse world, we’re shifting more from on-prem solutions to the cloud and from conservative (and expensive) platforms to alternatives. This is especially true for ETL platforms. One promising alternative is Mage.ai.
Mage.ai as a ETL framework
“The most commonly used ETL platform for large and enterprise clients is often (maybe most often) the open-source Apache Airflow. It’s a powerful tool based on Python, but let’s face it – it’s quite complex. From installation, setting up pipelines and orchestration, to operation. However, similar, much more user-friendly open-source alternatives like Mage.ai are starting to appear.
Mage.ai is a framework designed to automate ETL processes – including pipelines and streaming. The tool offers a bunch of predefined connectors (Python/SQL scripts) for various sources, a clear file structure, dbt integrations and scalability options. The framework also supports a number of configuration settings, secret management, graph creation, testing scenarios, data cleaning, and more. The documentation is very good.”
Mage.ai installation is quite simple
“Mage.ai can be run on systems that support Docker – including Windows, Linux, and Mac. Installation can be done either via (i) Docker, (ii) pip, or (iii) Kubernetes, with Docker being the preferred method. You can check out a demo project at this URL – demo.mage.ai.
Installation instructions can be found here – https://docs.mage.ai/getting-started/setup. The simplest (though not preferred) method of installation is through pip/conda.
- Create a conda environment janzednicek_mageai_article
- Then, open the terminal and activate the environment.
- Install mage.ai – First, I tried to install it using conda, but it didn’t work. However, it was successful with pip (see screenshot). Simply type pip install mage-ai
After installation we can start a first project
- Open terminal and go to folder where you want to create a new mage-ai project
- type mage start my_new_mage_project
After that, everything should be ready at localhost:6790/
I highly recommend to go through the demo project. Next time, we’ll create our first pipeline – we’ll create a connection to the Czech National Bank’s website as a data source, then download the currency rates and save all to a SQL Server database, which we’ll configure for this purpose in the io_config.yml file.