As a business intelligence developer experimenting with ChatGPT, I explore various scenarios that could assist me in my work. These scenarios certainly include assistance in writing code (such as in Python), reverse engineering code, detailed planning of the breakdown of an epic/story/project into JIRA, or perhaps financial analysis of a company? Yes, for these scenarios, I often use ChatGPT. Today, I would like to test how ChatGPT handles the design of a data warehouse. I would approach the application in a similar manner.
ChatGPT Data Warehouse or Database Model
My expectation would not be to receive a finished model, but if I were to use ChatGPT for this scenario, I would rather aim to save time. This means I expect some usable draft of the model, which I can then develop further. For example, in the case of a date dimension, I can have a script created to calculate attributes.
So, the assignment will be to create a simple data model in a star schema with 1 fact table and several dimension tables (facts and dimensions).
- The fact table for us will be a table with orders
- Dimensions for us will be
- Order date
- Product
- Order status
- Customer
We also want the model to be designed so that we use the slowly changing dimension type 2 approach (SCD2) – i.e., attribute historization.
Thus, the prompt looks like this:
It answered something like this:
CREATE TABLE D_Customer (
CustomerKey INT IDENTITY(1,1) PRIMARY KEY,
CustomerID INT,
FirstName NVARCHAR(50),
LastName NVARCHAR(50),
Email NVARCHAR(100),
Phone NVARCHAR(20),
ValidFrom DATETIME,
ValidTo DATETIME,
IsCurrent BIT,
— Additional customer-related columns
);
CREATE TABLE D_Product (
ProductKey INT IDENTITY(1,1) PRIMARY KEY,
ProductID INT,
ProductName NVARCHAR(100),
Category NVARCHAR(50),
Price DECIMAL(10,2),
ValidFrom DATETIME,
ValidTo DATETIME,
IsCurrent BIT,
— Additional product-related columns
);
CREATE TABLE D_OrderStatus (
OrderStatusKey INT IDENTITY(1,1) PRIMARY KEY,
StatusDescription NVARCHAR(100)
— Additional status-related columns
);
OK, that doesn’t look too bad. The tables contain an artificial primary key. They also contain, as is proper and customary, a business key (from the primary system) – which we did not even define in the prompt. The fact table contains surrogate keys. The dimensions in the design contain columns through which we can work with slowly changing dimensions.
Let’s also have the procedure for calculating the date dimension printed:
Based on this test, I think ChatGPT may have significant potential for tasks of this type.