A data warehouse is a central information system designed to store data from one or more enterprise operational systems or other external sources such as Excel, SharePoint, and others. Data in the data warehouse is organized into schemas and undergoes a cleansing and structuring process before storage to make it understandable to users and easily reportable. Unlike operational systems, a data warehouse includes mostly a complete history of how data changes over time. The key concept here are fact and dimensional tables.

Facts and Dimension Tables (Facts and Dimensions)

In a data warehouse, you’ll encounter two fundamental types of tables: facts and dimensions. The categorization into facts and dimension tables stems from the necessity to structure data into schemas for clarity and user orientation. This structured approach also reduces the generation of historical records over time. Data warehouses often contain hundreds or even thousands of tables, making data structure crucial. Typically, you’ll come across star schema and snowflake schema designs.

For demonstration purposes, let’s use a freely available sample database from Microsoft – AdventureworksDW2016CTP3 AdventureWorks sample databases. I created the model using a trial version of dbForge Studio Download dbForge Studio for SQL Server. A look at the schema will reveal much. The tables are arranged in a star schema (except for one exception – currencyKey in DimOrganization) and contain financial data (financial statements – balance sheet and income statement).

  • Fact Table: FactFinance contains financial balances and foreign keys to dimensions.
  • Dimension Tables: These tables provide context for the balances.
    • DimAccount – accounts and accounting hierarchies (assets, liabilities, long-term assets, etc.)
    • DimDate – date dimension
    • DimDepartmentGroup – corporate departments (marketing, sales, manufacturing)
    • DimOrganization – corporate divisions and their hierarchies (North America, France, Czech Republic, Europe, etc.)
    • DimScenario – type of value in the fact table (Actual, Budget, Forecast)

Datový model - fakta a dimenze - finance

Fact Tables with numerical data and foreign keys

These tables store metrics or measurable atomic data such as revenue, expenses, or transactional data (point of sale, etc.). Fact tables logically contain the largest number of records and form the foundation of the data warehouse or data lake. They also contain primary keys and foreign keys to dimension tables. All relationships go from the fact table outwards (see Star/Snowflake schema).

Conversely, fact tables must not contain any attributes – text fields, flags, labels, etc. These pieces of information belong in dimension tables. An example of a fact table containing revenue is FactFinance:

Faktová tabulka

Dimension Table in Data Warehouse

Unlike fact tables, dimension tables do not store measurable data. They are used to provide context to facts. For example, if the fact table contains balances from accounting, by connecting the dimension table DimAccount (see the model) and DimScenario, you can obtain revenue by accounts and balance type. By connecting the DimDate dimension, you gain detailed time context (i.e., the time point when the balance is valid) because this table contains time hierarchies (day, week, month, quarter, etc.).

Dimenzní tabulka - DimAccount

Current balances by individual accounts for January 2013:

SELECT
dat.FullDateAlternateKey
acc.AccountDescription
SUM(fin.Amount) AS AccountBalance
FROM
[AdventureworksDW2016CTP3].[dbo].[FactFinance] fin
JOIN [AdventureworksDW2016CTP3].[dbo].[DimAccount] acc
ON fin.AccountKey=acc.AccountKey
JOIN [AdventureworksDW2016CTP3].[dbo].[DimScenario] sce
ON fin.ScenarioKey=sce.ScenarioKey
JOIN [AdventureworksDW2016CTP3].[dbo].[DimDate] dat
ON fin.DateKey=dat.DateKey
WHERE
sce.ScenarioName='Actual'
AND Dat.CalendarYear=2013
AND Dat.EnglishMonthName='January'
GROUP BY
dat.FullDateAlternateKey,
acc.AccountDescription;

Result:

Výsledek příklad na spojení faktů a dimenzí

Rate this post

Ing. Jan Zedníček - Data Engineer & Controlling

My name is Jan Zedníček and I have been working as a freelancer for many companies for more than 10 years. I used to work as a financial controller, analyst and manager at many different companies in field of banking and manufacturing. When I am not at work, I like playing volleyball, chess, doing a workout in the gym.

🔥 If you found this article helpful, please share it or mention me on your website

Leave a Reply

Your email address will not be published. Required fields are marked *