In Dbt (data build tools), files are organized in a logical structure based on files/folder organization and configuration (dbt_project.yml). To navigate the project effectively as the codebase grows, it’s perhaps a good idea to create a system within the files.
File Structure of a Dbt Project
After initializing an empty Dbt project, the directory structure is as follows. The root is our Dbt project, containing configuration files dbt_project.yml and dbt_profiles.yml, along with several directories:
my_dbt_project/ |-- dbt_project.yml |-- dbt_profiles.yml |-- models/ | |-- example/ | | |-- my_first_dbt_model.sql | | |-- my_second_dbt_model.sql |-- analysis/ |-- macros/ |-- snapshots/ |-- tests/
Dbt Models and Snapshot Directory Organization
Sql scripts placed in the models and snapshots folders are materialized into our target destination. The target environment (dev and prod) is configured in the profiles.yml file.
Initially, there might be only a few scripts, but as the number grows, it’s a best practice to create a meaningful logical structure. Below is an example, or you can refer to my public Dbt project on GitHub.
my_dbt_project/ |-- dbt_project.yml |-- models/ | |-- datamarts/ | | |-- datamart1/ | | | |-- Dim_Table1.sql | | | |-- Fact_Table2.sql | | |-- datamart2/ | | | |-- Dim_Table1.sql | | | |-- Fact_Table2.sql | |-- staging/ | | |-- curr_source_system_1_tablename1.sql | | |-- curr_source_system_1_tablename2.sql | | |-- hist_source_system_1_tablename1.sql | | |-- hist_source_system_1_tablename2.sql |-- analysis/ | |-- Analysis_purpose1/ | | |-- analysis1.sql | | |-- analysis2.sql |-- macros/ | |-- my_macro.sql |-- snapshots/ | |-- source_system_1/ | | |-- snap_source_system_1_tablename1.sql | | |-- snap_source_system_1_tablename2.sql | |-- source_system_2/ | | |-- snap_source_system_2_tablename1.sql | | |-- snap_source_system_2_tablename2.sql |-- tests/ | |-- datamart1/ | | |-- test1.sql | | |-- test2.sql | |-- datamart2/ | | |-- test1.sql | | |-- test2.sql | |-- source_system_1/ | | |-- source_system_1_test1.sql | | |-- source_system_1_test2.sql | |-- source_system_2/ | | |-- source_system_2_test1.sql | | |-- source_system_2_test2.sql
Configuration in dbt_project.yml
The structure mentioned above is suitable for data warehouse/datamart environments. Each directory in Models represents a database schema, with tables from the snapshots directory redirected to the staging schema in the models directory. This is configured in the dbt_project.yml file.
models:
dbt_project:
staging:
+materialized: table
+schema: staging
datamarts:
datamart1:
+materialized: table
+schema: datamart1
datamart2:
+materialized: table
+schema: datamart2
With this configuration, we instruct Dbt to:
- Materialize scripts in the
models/stagingfolder as tables in thestagingschema. - Materialize scripts in the
datamarts/datamart1folder into thedatamart1schema. - Materialize scripts in the
datamarts/datamart2folder into thedatamart2schema.
If the schemas do not exist, Dbt will create them in the target environment, and the dbt_user in the profiles.yml file should be set up with the appropriate permissions.