• 12. 2. 2024
  • Ing. Jan Zedníček - Data Engineer & Controlling
  • 0

In Dbt (data build tools), files are organized in a logical structure based on files/folder organization and configuration (dbt_project.yml). To navigate the project effectively as the codebase grows, it’s perhaps a good idea to create a system within the files.

File Structure of a Dbt Project

After initializing an empty Dbt project, the directory structure is as follows. The root is our Dbt project, containing configuration files dbt_project.yml and dbt_profiles.yml, along with several directories:

my_dbt_project/
|-- dbt_project.yml
|-- dbt_profiles.yml
|-- models/
|   |-- example/
|   |   |-- my_first_dbt_model.sql
|   |   |-- my_second_dbt_model.sql
|-- analysis/
|-- macros/
|-- snapshots/
|-- tests/

Dbt Models and Snapshot Directory Organization

Sql scripts placed in the models and snapshots folders are materialized into our target destination. The target environment (dev and prod) is configured in the profiles.yml file.

Initially, there might be only a few scripts, but as the number grows, it’s a best practice to create a meaningful logical structure. Below is an example, or you can refer to my public Dbt project on GitHub.

my_dbt_project/
|-- dbt_project.yml
|-- models/
|   |-- datamarts/
|   |   |-- datamart1/
|   |   |   |-- Dim_Table1.sql
|   |   |   |-- Fact_Table2.sql
|   |   |-- datamart2/
|   |   |   |-- Dim_Table1.sql
|   |   |   |-- Fact_Table2.sql
|   |-- staging/
|   |   |-- curr_source_system_1_tablename1.sql
|   |   |-- curr_source_system_1_tablename2.sql
|   |   |-- hist_source_system_1_tablename1.sql
|   |   |-- hist_source_system_1_tablename2.sql
|-- analysis/
|   |-- Analysis_purpose1/
|   |   |-- analysis1.sql
|   |   |-- analysis2.sql
|-- macros/
|   |-- my_macro.sql
|-- snapshots/
|   |-- source_system_1/
|   |   |-- snap_source_system_1_tablename1.sql
|   |   |-- snap_source_system_1_tablename2.sql
|   |-- source_system_2/
|   |   |-- snap_source_system_2_tablename1.sql
|   |   |-- snap_source_system_2_tablename2.sql
|-- tests/
|   |-- datamart1/
|   |   |-- test1.sql
|   |   |-- test2.sql
|   |-- datamart2/
|   |   |-- test1.sql
|   |   |-- test2.sql
|   |-- source_system_1/
|   |   |-- source_system_1_test1.sql
|   |   |-- source_system_1_test2.sql
|   |-- source_system_2/
|   |   |-- source_system_2_test1.sql
|   |   |-- source_system_2_test2.sql

Configuration in dbt_project.yml

The structure mentioned above is suitable for data warehouse/datamart environments. Each directory in Models represents a database schema, with tables from the snapshots directory redirected to the staging schema in the models directory. This is configured in the dbt_project.yml file.

models:
  dbt_project:
    staging:
        +materialized: table
        +schema: staging
    datamarts:
        datamart1:
            +materialized: table
            +schema: datamart1
        datamart2:
            +materialized: table
            +schema: datamart2

With this configuration, we instruct Dbt to:

  • Materialize scripts in the models/staging folder as tables in the staging schema.
  • Materialize scripts in the datamarts/datamart1 folder into the datamart1 schema.
  • Materialize scripts in the datamarts/datamart2 folder into the datamart2 schema.

If the schemas do not exist, Dbt will create them in the target environment, and the dbt_user in the profiles.yml file should be set up with the appropriate permissions.

Rate this post

Ing. Jan Zedníček - Data Engineer & Controlling

My name is Jan Zedníček and I have been working as a freelancer for many companies for more than 10 years. I used to work as a financial controller, analyst and manager at many different companies in field of banking and manufacturing. When I am not at work, I like playing volleyball, chess, doing a workout in the gym.

🔥 If you found this article helpful, please share it or mention me on your website

Leave a Reply

Your email address will not be published. Required fields are marked *