In this guide, we will take a look at the features that offers for data analysis. While this tool is primarily used for ETL pipelines, it also includes features for exploratory analysis, data statistics, and charts. With these features, you can perform initial data analysis within the tool. Mage offers a wide range of pre-defined analytical instruments, so there’s no need to write them manually.

Data Analysis, Visualization, and Exploration in – How and Why?

Before we start data integrations, it is necessary to analyze data for completeness and accuracy. Based on this analysis, we perform data cleaning or other actions to improve the data quality of the data source. At the end of the day, we don’t want to download dirty data!

Each visualization (graph, summary, etc.) is associated with an object (typically a block), and the visualization itself is stored in a Python file. You can find chart files in the “charts” folder (see screenshot). Currently, this folder is empty because we don’t have any graphs yet.

mage-ai charts

Types of Charts and Data Summaries in Mage-ai with an Example

Let’s try creating some analytics objects. Our data source is the AdventureWorks database as always and the Salesorderdetail table in this database. If you are wondering how to create a pipeline – we have the ETL pipeline creation process covered in the ETL | Pipeline – Data Load – Python, SQL Server guide.

In Mage, we have a wide range of predefined visualizations to choose from

  1. Charts: This group is used for the visual representation of data, such as frequency (histogram) or trend analysis over time.
    • Bar chart
    • Histogram
    • Line chart
    • and more
  2. Templates: This group is used for data exploration, data quality analysis, and descriptive statistics.
    • % of missing values (identifying null values across columns)
    • Unique values – exploring redundancy
    • Most frequent values
    • Summary – provides fundamental statistics, such as row count, column count, etc.

Chart and templates examples (preview)

Below you can find a few previews of pre-defined visualizations that can be further processed using Python.

A) Line chart line chart

B) Column descriptive statistics (missing values, min, max, unique,mean, median,mode)

mage descriptive database table statistics

C) Table summary and missing values (NULL) analysis

mage analysis - summary and missing values in table

D) Table unique values analysis of columns

mage unique values analysis database table

E) Most frequent values in salesorderdetail table

Rate this post

Ing. Jan Zedníček - Data Engineer & Controlling

My name is Jan Zedníček and I have been working as a freelancer for many companies for more than 10 years. I used to work as a financial controller, analyst and manager at many different companies in field of banking and manufacturing. When I am not at work, I like playing volleyball, chess, doing a workout in the gym.

🔥 If you found this article helpful, please share it or mention me on your website

Leave a Reply

Your email address will not be published. Required fields are marked *