Mage.ai is one of the solid alternatives to Apache Airflow. It is a lighter and, from my perspective, much more user-friendly option for working with ETL data flows. It includes a wide range of predefined connectors to many sources/destinations (based on Python, SQL), so you don’t have to write code from scratch. Development is thus time-efficient. However, Mage can do much more.

Mage.ai – Installation (Anaconda, Windows)

  • Install Python – https://www.python.org/
  • Install Anaconda – https://www.anaconda.com/download
  • Open cmd as an administrator
    • conda create –name mage-ai python=3.8
    • conda activate mage-ai
    • conda install pyodbc
  • Install Mage in the new environment
    • pip install mage-ai
  • Create a directory for your Mage project somewhere on your disk
    • cd C:\mage\
  • Initialization or running of Mage is done through the command below in cmd, where the last word is the Mage project (if it doesn’t exist, it’s created, if it exists, it’s started)
    • mage start mage-ai
  • Mage instance runs at http://localhost:6789/

Tutorials for data wizards:

Mage.ai File Structure

  • Mage has a logical directory structure (similar to dbt)
  • All configurations for data sources/targets are in the io_config.yaml file. Here are also setting templates for various databases

Structure (most important components)

  • Pipelines folder – contains data pipelines composed of blocks
  • Block types (within a pipeline) – the same blocks can be used in different pipelines, with many templates (python/SQL) – scripts don’t have to be written from scratch
    • data loaders – blocks that load data (connect to a system and extract data). They use one of the connections configured in io_config
    • transformers – take data from data loaders and transform it.
    • data exporters – exporters are blocks that upload data to the destination
    • dbt – a special component, where Mage allows integration with dbt – possible only when Mage is running using Docker.
    • sensor – checks at regular intervals if a condition is met and if so, takes action – for example, checks if a file is in a folder, etc.
  • Charts – contain graphical representation of data – various data quality checks, and more
  • Dbt – dbt folder is empty by default. You can initialize a dbt project in it following the instructions – ETL | Mage.ai – Dbt Installation (pip/conda) and project initialization

Mage.ai Functionalities and Use Cases

  • ETL processes – data pumps. Many predefined connectors + detailed logging, scheduling
    • Predefined blocks for various systems
    • Predefined graphs for various scenarios – testing, data quality
    • Detailed logging
  • File manager
  • Integration with Dbt (running pipelines and running dbt depending on completion)
  • ETL creation using AI (connecting to Chat GPT API)
  • Secrets management
  • Authentication options (default is off – used locally after installation, but can be enabled if running on a server)
  • Automatic integration with Git – clone/commit/pull/push/tracking changes
  • Integrated terminal
  • Predefined charts and summaries for datasets – testing and data exploration
  • Ability to run pipelines through an API
  • Variables, local and global
  • Active and growing community
  • Good documentation

Troubleshooting and Error fix

So far, while exploring the tool, I’ve encountered the following issues: