ETL | Dbt file structure and dbt_project.yml configuration

In Dbt (data build tools), files are organized in a logical structure based on files/folder organization and configuration (dbt_project.yml). To navigate the project effectively as the codebase grows, it’s perhaps a good idea to create a system within the files.

File Structure of a Dbt Project

After initializing an empty Dbt project, the directory structure is as follows. The root is our Dbt project, containing configuration files dbt_project.yml and dbt_profiles.yml, along with several directories:

my_dbt_project/
|-- dbt_project.yml
|-- dbt_profiles.yml
|-- models/
|   |-- example/
|   |   |-- my_first_dbt_model.sql
|   |   |-- my_second_dbt_model.sql
|-- analysis/
|-- macros/
|-- snapshots/
|-- tests/

Dbt Models and Snapshot Directory Organization

Sql scripts placed in the models and snapshots folders are materialized into our target destination. The target environment (dev and prod) is configured in the profiles.yml file.

Initially, there might be only a few scripts, but as the number grows, it’s a best practice to create a meaningful logical structure. Below is an example, or you can refer to my public Dbt project on GitHub.

my_dbt_project/
|-- dbt_project.yml
|-- models/
|   |-- datamarts/
|   |   |-- datamart1/
|   |   |   |-- Dim_Table1.sql
|   |   |   |-- Fact_Table2.sql
|   |   |-- datamart2/
|   |   |   |-- Dim_Table1.sql
|   |   |   |-- Fact_Table2.sql
|   |-- staging/
|   |   |-- curr_source_system_1_tablename1.sql
|   |   |-- curr_source_system_1_tablename2.sql
|   |   |-- hist_source_system_1_tablename1.sql
|   |   |-- hist_source_system_1_tablename2.sql
|-- analysis/
|   |-- Analysis_purpose1/
|   |   |-- analysis1.sql
|   |   |-- analysis2.sql
|-- macros/
|   |-- my_macro.sql
|-- snapshots/
|   |-- source_system_1/
|   |   |-- snap_source_system_1_tablename1.sql
|   |   |-- snap_source_system_1_tablename2.sql
|   |-- source_system_2/
|   |   |-- snap_source_system_2_tablename1.sql
|   |   |-- snap_source_system_2_tablename2.sql
|-- tests/
|   |-- datamart1/
|   |   |-- test1.sql
|   |   |-- test2.sql
|   |-- datamart2/
|   |   |-- test1.sql
|   |   |-- test2.sql
|   |-- source_system_1/
|   |   |-- source_system_1_test1.sql
|   |   |-- source_system_1_test2.sql
|   |-- source_system_2/
|   |   |-- source_system_2_test1.sql
|   |   |-- source_system_2_test2.sql

Configuration in dbt_project.yml

The structure mentioned above is suitable for data warehouse/datamart environments. Each directory in Models represents a database schema, with tables from the snapshots directory redirected to the staging schema in the models directory. This is configured in the dbt_project.yml file.

models:
  dbt_project:
    staging:
        +materialized: table
        +schema: staging
    datamarts:
        datamart1:
            +materialized: table
            +schema: datamart1
        datamart2:
            +materialized: table
            +schema: datamart2

With this configuration, we instruct Dbt to:

  • Materialize scripts in the models/staging folder as tables in the staging schema.
  • Materialize scripts in the datamarts/datamart1 folder into the datamart1 schema.
  • Materialize scripts in the datamarts/datamart2 folder into the datamart2 schema.

If the schemas do not exist, Dbt will create them in the target environment, and the dbt_user in the profiles.yml file should be set up with the appropriate permissions.

Rate this post
Category: Dbt - Data Build Tool

About Ing. Jan Zedníček - Data Engineer & Controlling

My name is Jan Zednicek, and I have been working as a freelance Data Engineer for roughly 10 years. During this time, I have been publishing case studies and technical guides on this website, targeting professionals, students, and enthusiasts interested in Data Engineering particularly on Microsoft technologies as well as corporate finance and reporting solutions. 🔥 If you found this article helpful, please share it or mention me on your website or Community forum

Leave a Reply

Your email address will not be published. Required fields are marked *