Keboola belongs to cloud services provided as SaaS – specifically it is a data platform for data pipelines and storage. In the past, data extractions, transformations, and imports – abbreviated as ETL processes – were the domain of IT or BI specialists because they required knowledge of software architecture, programming, and the right tools. In Keboola, data flows can be created by even a person without technical knowledge.
Introduction to Keboola
Keboola is a cloud service, so you don’t need to install anything. The service operates on a Freemium model, where you can use Keboola for free with certain limitations that won’t significantly restrict you for testing or smaller projects. The most important limits are (1) 120 minutes of computational time and (2) 250GB of storage. Details can be found on the Keboola pricing page.
I recommend these resources as a starting point
- Data | Democratization and Data-Driven Approach – why are tools like Keboola becoming increasingly popular?
- ETL | Keboola – Introduction, Pricing, Products – Alternative to Fivetran – an introduction to Keboola
- Keboola Demo Project – a project demonstration on the service provider’s website. There are also some basic interactive tutorials.
Keboola Guides and Information on This Website
On this website, you will find several articles and guides about Keboola that dive into selected topics in detail. To try out the tutorials practically, you only need a Keboola account. The AdventureWorks database (sample data) is available online – Big thanks to sqlservercentral.com.
- Part 1 – ETL | Keboola Free (for free) – Creating a Project, Basics
- Part 2 – ETL | Keboola – Data Flow Guide – SQL Server to Google Drive
- Part 3 – ETL | Keboola Free – Components, Types, Security
- Part 4 – ETL | Keboola Free – File Storage, Limit, Tokens, IN/OUT
- Part 5 – ETL | Keboola – Flow Transformation (Snowflake), Power BI Report Refresh
Comparison of Keboola Freemium with Fivetran Free
Keboola could be compared, among the more well-known tools, to Fivetran – Challenger from the Gartner report of 2023. Both tools are good, and I would say that one is not significantly better than the other.
Overall Evaluation of Keboola vs. Fivetran
Below are comparison criteria that came to mind. It is evident that overall, it’s roughly a tie, and the choice between the two tools depends on specific current and future conditions in your data environment. Depending on what I expect from the tool, I would choose as follows (I have elaborated on this in more detail in the next chapter):
Fivetran: If you have a data warehouse or data lake on-premises or in the cloud and you actually need only to get data from primary systems to you, and then you perform transformations and orchestrations on your own, perhaps via dbt, airflow, or otherwise.
Keboola: If you are looking for an ETL/ELT tool and the Snowflake ecosystem rather than an ETL pipeline. I would also choose Keboola if I expect better orchestration capabilities from the service, the ability to perform transformations directly in Keboola, and the ability to code these transformations in SQL/Python workspace. The ability to write custom components and much more.
Detailed Evaluation of Keboola vs. Fivetran
There are, of course, many differences between Fivetran and Keboola; I will attempt to quantify the most significant ones. This is, of course, my subjective opinion.
- (Draw) Easy to use – Both platforms are user-friendly, setting up data components is straightforward.
- (Draw) Support – User support is, I would say, at a high level for both.
- (Draw) Security – Both platforms offer secure connections and various authorization options for cloud services through built-in connectors (OAuth, SSH, certificates, tokens).
- (Draw) Both platforms offer wide options for incremental identification and choices for storage mode (replace, increment, etc.) and table metadata settings.
- (Loss) Keboola has pricing based on usage time, whereas Fivetran operates on MAR (monthly active rows).
- In the Keboola Freemium model, you have 60 minutes available, which you consume relatively quickly because Keboola has quite significant overheads for processing. This Keboola model is quite logical because it extensively uses Snowflake as a backend for each component (computing costs).
- In contrast, Fivetran does not care how long it runs and charges you by rows – you get 500,000 rows/month for free. So, quite often, you have ELT tools for smaller projects for free.
- (Loss) In my testing, Keboola is slower in processing data for small tables, which, combined with pricing based on usage time and a larger number of smaller tables, may result in increased costs. I tested about 8 tables with a total of about 20,000 rows, and the flow from an SQL server database to Keboola storage takes about 4-5 minutes. That seems like a lot to me. On the other hand, it’s certainly not true that the relationship between the number of records and the runtime is linear, so I recommend testing it on your scenario.
- (Win) Keboola is an ETL/Data platform ecosystem coexisting with Snowflake and many other platforms.
- Keboola, therefore, offers versatility for a broader range of users – tools for both laymen (non-technical users) and professionals (Python, API, R, etc.).
- Fivetran is more oriented as a flow heater – it takes Data Source – Destination, delivers the data, and that’s it. So, it is somewhat limited in versatility, but on the other hand, it has a perfectly mastered ELT part, and everything is lightning-fast.
- (Win) Orchestration Features – Keboola has certain orchestration capabilities. I certainly don’t want to compare Keboola to tools like Airflow, Dagster, Mage.ai, and the like, but Keboola can handle most scenarios requiring flow orchestration at the application level. Moreover, within Keboola flows, we are not limited only to components in the application, but we can do things like:
- Call SQL Server procedures (typically we want to calculate the semantic layer after downloading raw data)
- Refresh Power BI reports (we want to refresh reports after calculating the semantic layer and datasets)
- Trigger something via API
- and more