In the previous article, we introduced ourselves briefly – ETL | Keboola – Introduction, Pricing, Products – An Alternative to Fivetran. We know that Keboola offers a Freemium model, so we can try this tool for free. We just have to tolerate some limitations when it comes to usage (the limit of minutes is 120). Let’s take a look at the application environment and get acquainted with a demo project, which is a kind of tutorial through which Keboola trains new users with real-life examples.

Keboola Free – How to Create an Account and Project?

Keboola has a website – https://www.keboola.com. First, register if you don’t have an account yet.

After registration, click on “Start Free Project,” and Keboola will guide you through the process of creating a new project. Before creating the project, you will have the option to receive training (if desired) in the form of:

  • Demo project – a demo project that can be explored without registration
  • Educational videos
  • Request a call with a Keboola representative who will demonstrate the application to you

If you choose not to receive training, you can go back one step and be redirected directly to the application. The default page after logging in is the dashboard of your project.

keboola-dashboard

Structure of Keboola – Menu, Dashboard, Components, Storage, Jobs, etc.

As I mentioned earlier, the dashboard is the default page you land on.

Dashboard – Everything Essential in One Place

On the dashboard, you can see all the basic characteristics of your project, including:

  • Storage overview (used space on Keboola’s disk) – the capacity available in the free plan is up to 250 GB
  • Free minutes overview – in the free plan, you get 120 minutes in the first month and an additional 60 minutes every month.

120 minutes is sufficient for experimenting or setting up a small project. During this time, I managed to set up 3 flows (pipelines) that consumed about 30 minutes during testing. The rest of the time was spent on a Python workspace, which I didn’t realize was consuming minutes, so I had to purchase more minutes to continue testing.

Flows – Data Flows/Pipelines in Keboola

The “Flows” menu item contains all data flows or pipelines in Keboola (we call them “Flows” in Keboola). A data flow consists of components. As you can see in the screenshot below, I already have 3 flows created. Individual flows can be organized into a directory structure. You can also see some basic characteristics of data flows:

  • Scheduled – whether the data flow is scheduled for regular runs
  • Last change – the date of the last change
  • Run results – how the last 10 runs of the flow performed
  • Last use – when the flow was last executed

If you click on a flow, it will take you to the data flow detail, where you can find defined components (see below) for your data sources and destinations, as well as the relationships between them. In the case of the data flow named AdventureWorks, it looks as shown below (more articles will cover flows).

flow detail keboola

Components – Data Sources and Destinations in Keboola

On the components page, you can see all configurations for data sources and destinations. Components are blocks that make up a flow, among other things. A flow can also contain other blocks (such as transformations, etc.), but it should always contain a component of type a) source and b) destination.

As you can see, I already have a few components defined (Google Analytics, OneDrive, etc.). Each component needs to be configured – that is, telling it where to connect (e.g., my Google Analytics account). However, this process is straightforward with Keboola (as discussed in another article) because Keboola includes over 200 predefined components for various systems.

keboola-components

If you click on “Add component,” the application will take you to an interface where you can choose from a large number of components. Once you have configured the components, you can assign them to a flow.

Keboola komponenty

Templates – Predefined Data Flows in Keboola

In this section, you will find predefined flows. Clicking on a template will open the flow settings, where you need to fill in the configuration details. I will try this in another article.

keboola-templates

Storage Buckets (250 GB Limit) in Keboola

The next page is more interesting. On the storage tab, you can see your files, which are located in buckets, containers, or storage for files. There are essentially 2 types of buckets in Keboola:

  • IN bucket/file – where files are stored as a result of extractions
  • OUT bucket/file – where files are stored as a result of transformations – data that enters transformations is of type IN, and what we get out is OUT

storage-keboola

Each bucket contains files:

Each file has its page with a preview of the data, statistics, and characteristics. We will cover storage in a separate article.

keboola-files

Transformations – Transformation Tasks in Data Flows

On the transformations page, you can see another group of blocks that can be used to build flows. As I mentioned earlier, a flow must include a) source and b) destination. However, sometimes we want to transform data before storing it in the destination, and this is done here.

keboola-transformations

If you want to create a new transformation, click on Create Transformation and create the transformation according to your preference:

  • Python
  • R
  • Snowflake
  • No-code
  • dbt-core transformation

keboola-transformation

Keboola Testing vs. Production Environment

In Keboola, you can create multiple different environments (typically for testing purposes). At the top of the screen, you have the option to click on “New development branch.”

keboola new environment

Workspaces in Keboola

The last section in the menu is workspaces, where an environment is created based on your choice. Be careful when creating Python and R workspaces for testing purposes – see the warning in the screenshot. These environments incur charges, and they deduct minutes during their activity.

workspaces-keboola

You can choose from 3 types: Python, R, and Snowflake.

workspaces keboola

Rate this post

Ing. Jan Zedníček - Data Engineer & Controlling

My name is Jan Zedníček and I have been working as a freelancer for many companies for more than 10 years. I used to work as a financial controller, analyst and manager at many different companies in field of banking and manufacturing. When I am not at work, I like playing volleyball, chess, doing a workout in the gym.

🔥 If you found this article helpful, please share it or mention me on your website

Leave a Reply

Your email address will not be published. Required fields are marked *