In Dbt (data build tools), files are organized in a logical structure based on files/folder organization and configuration (dbt_project.yml). To navigate the project effectively as the codebase grows, it’s perhaps a good idea to create a system within the files.
File Structure of a Dbt Project
After initializing an empty Dbt project, the directory structure is as follows. The root is our Dbt project, containing configuration files dbt_project.yml
and dbt_profiles.yml, along with several directories:
my_dbt_project/ |-- dbt_project.yml |-- dbt_profiles.yml |-- models/ | |-- example/ | | |-- my_first_dbt_model.sql | | |-- my_second_dbt_model.sql |-- analysis/ |-- macros/ |-- snapshots/ |-- tests/
Dbt Models and Snapshot Directory Organization
Sql scripts placed in the models and snapshots folders are materialized into our target destination. The target environment (dev and prod) is configured in the profiles.yml
file.
Initially, there might be only a few scripts, but as the number grows, it’s a best practice to create a meaningful logical structure. Below is an example, or you can refer to my public Dbt project on GitHub.
my_dbt_project/ |-- dbt_project.yml |-- models/ | |-- datamarts/ | | |-- datamart1/ | | | |-- Dim_Table1.sql | | | |-- Fact_Table2.sql | | |-- datamart2/ | | | |-- Dim_Table1.sql | | | |-- Fact_Table2.sql | |-- staging/ | | |-- curr_source_system_1_tablename1.sql | | |-- curr_source_system_1_tablename2.sql | | |-- hist_source_system_1_tablename1.sql | | |-- hist_source_system_1_tablename2.sql |-- analysis/ | |-- Analysis_purpose1/ | | |-- analysis1.sql | | |-- analysis2.sql |-- macros/ | |-- my_macro.sql |-- snapshots/ | |-- source_system_1/ | | |-- snap_source_system_1_tablename1.sql | | |-- snap_source_system_1_tablename2.sql | |-- source_system_2/ | | |-- snap_source_system_2_tablename1.sql | | |-- snap_source_system_2_tablename2.sql |-- tests/ | |-- datamart1/ | | |-- test1.sql | | |-- test2.sql | |-- datamart2/ | | |-- test1.sql | | |-- test2.sql | |-- source_system_1/ | | |-- source_system_1_test1.sql | | |-- source_system_1_test2.sql | |-- source_system_2/ | | |-- source_system_2_test1.sql | | |-- source_system_2_test2.sql
Configuration in dbt_project.yml
The structure mentioned above is suitable for data warehouse/datamart environments. Each directory in Models represents a database schema, with tables from the snapshots directory redirected to the staging schema in the models directory. This is configured in the dbt_project.yml file.
models: dbt_project: staging: +materialized: table +schema: staging datamarts: datamart1: +materialized: table +schema: datamart1 datamart2: +materialized: table +schema: datamart2
With this configuration, we instruct Dbt to:
- Materialize scripts in the
models/staging
folder as tables in thestaging
schema. - Materialize scripts in the
datamarts/datamart1
folder into thedatamart1
schema. - Materialize scripts in the
datamarts/datamart2
folder into thedatamart2
schema.
If the schemas do not exist, Dbt will create them in the target environment, and the dbt_user
in the profiles.yml
file should be set up with the appropriate permissions.