Category Archives: Mage.ai – Tutorials and Intro

Mage.ai is one of the solid alternatives to Apache Airflow. It is a lighter and, from my perspective, much more user-friendly option for working with ETL data flows. It includes a wide range of predefined connectors to many sources/destinations (based on Python, SQL), so you don’t have to write code from scratch. Development is thus time-efficient. However, Mage can do much more.

Mage.ai – Installation (Anaconda, Windows)

  • Install Python – https://www.python.org/
  • Install Anaconda – https://www.anaconda.com/download
  • Open cmd as an administrator
    • conda create –name mage-ai python=3.8
    • conda activate mage-ai
    • conda install pyodbc
  • Install Mage in the new environment
    • pip install mage-ai
  • Create a directory for your Mage project somewhere on your disk
    • cd C:\mage\
  • Initialization or running of Mage is done through the command below in cmd, where the last word is the Mage project (if it doesn’t exist, it’s created, if it exists, it’s started)
    • mage start mage-ai
  • Mage instance runs at http://localhost:6789/

Tutorials for data wizards:

Mage.ai File Structure

  • Mage has a logical directory structure (similar to dbt)
  • All configurations for data sources/targets are in the io_config.yaml file. Here are also setting templates for various databases

Structure (most important components)

  • Pipelines folder – contains data pipelines composed of blocks
  • Block types (within a pipeline) – the same blocks can be used in different pipelines, with many templates (python/SQL) – scripts don’t have to be written from scratch
    • data loaders – blocks that load data (connect to a system and extract data). They use one of the connections configured in io_config
    • transformers – take data from data loaders and transform it.
    • data exporters – exporters are blocks that upload data to the destination
    • dbt – a special component, where Mage allows integration with dbt – possible only when Mage is running using Docker.
    • sensor – checks at regular intervals if a condition is met and if so, takes action – for example, checks if a file is in a folder, etc.
  • Charts – contain graphical representation of data – various data quality checks, and more
  • Dbt – dbt folder is empty by default. You can initialize a dbt project in it following the instructions – ETL | Mage.ai – Dbt Installation (pip/conda) and project initialization

Mage.ai Functionalities and Use Cases

  • ETL processes – data pumps. Many predefined connectors + detailed logging, scheduling
    • Predefined blocks for various systems
    • Predefined graphs for various scenarios – testing, data quality
    • Detailed logging
  • File manager
  • Integration with Dbt (running pipelines and running dbt depending on completion)
  • ETL creation using AI (connecting to Chat GPT API)
  • Secrets management
  • Authentication options (default is off – used locally after installation, but can be enabled if running on a server)
  • Automatic integration with Git – clone/commit/pull/push/tracking changes
  • Integrated terminal
  • Predefined charts and summaries for datasets – testing and data exploration
  • Ability to run pipelines through an API
  • Variables, local and global
  • Active and growing community
  • Good documentation

Troubleshooting and Error fix

So far, while exploring the tool, I’ve encountered the following issues:

ETL | Mage.ai – Charts, Analysis, Testing, Overview, Cleansing

In this guide, we will take a look at the features that Mage.ai offers for data analysis. While this tool is primarily used for ETL pipelines, it also includes features for exploratory analysis, data statistics, and charts. With these features, you can perform initial data analysis within the tool. Mage offers a wide range of… Read More »

ETL | Mage.ai – Database configuration in io_config.yaml and secrets (passwords)

In this guide, we will take a look at how to configure the io_config.yaml file in Mage.ai. We will also explore how to hide and encrypt access passwords so that they are not readily available in this configuration file. Mage.ai io_config.yaml Configuration and Location The io_config.yaml file is the main configuration file for setting up… Read More »

Mage.ai | Error UnicodeDecodeError: ‘charmap’ codec – Windows

This article will be related to troubleshooting. Today, I managed somehow to write a comment that caused the entire Mage.ai instance to crash due to a UnicodeDecodeError. How did I manage to do that? Mage.ai UnicodeDecodeError: ‘charmap’ codec can’t decode byte Mage.ai failed to decode a specific character “ň,” which is specific to the Czech… Read More »

ETL | Mage.ai – Dbt Installation (pip/conda) and project initialization

In the previous article – ETL | Mage.ai – Solid Alternative to Airflow – Intro and Installation we introduced the ETL tool Mage.ai as a lighter alternative to Apache Airflow. We demonstrated how to get the framework up and running through the terminal and learned that after installation, it runs on localhost:6790/. I promised in… Read More »

ETL | Mage.ai Pipeline – data flow – Python, SQL Server

In a recent article dedicated to introducing Mage.ai – a tool for creating and managing ETL processes, I promised at the end that we would try to create a Mage.ai pipeline in the next article. If you are not familiar with this ETL framework, I recommend going through the introductory article. Source and Destination Databases… Read More »

ETL | Mage.ai Docker Installation – dbtsqlserver – Dbt Debug Error, Fix

Today, I attempted to install Mage.ai via Docker as part of my familiarization with Mage.ai. This is currently (as of 2024-01-26) the only scenario for running Dbt together with Mage.ai within pipelines natively. Of course, there is a possibility to run dbt model using custom python code anyway (in case you use pip/conda installed mage)… Read More »

ETL | Mage.ai – Error [Errno 2] No such file or directory

Mage.ai is a great tool for data nerds, but it’s not completely done when it comes to user experience just yet. We might encounter some errors that are inherently of a primitive nature, making them sometimes challenging to identify. One of these errors is Error [Errno 2] No such file or directory, and today, I… Read More »

ETL | Dbt debug – Configuration and testing of SQL Server database (profiles.yml) – Windows

The previous article focused on installing dbt in the Mage.ai environment or independently, followed by the initialization of a project named mage_dbt – Dbt Installation (pip/conda) and project initialization. So, we have the mage-ai environment installed, into which we have installed dbt-sqlserver. We then tested that we can see the established file structure of the… Read More »

ETL | Mage.ai – Intro and Installation – Solid Alternative to Airflow

I mainly use SSIS (SQL Server Integration Services) as the tool for creating ETL pipelines. But in the data warehouse world, we’re shifting more from on-prem solutions to the cloud and from conservative (and expensive) platforms to alternatives. This is especially true for ETL platforms. One promising alternative is Mage.ai. Mage.ai as a ETL framework “The… Read More »