In some situations, we need to get an overview of the data present in a table. Or perhaps we want to check if there are any duplicate values in the table that we do not want. How can we retrieve a list of such duplicate records?

How to Identify Duplicate Rows in an SQL Table?

I will demonstrate the task with a simple example. Let’s assume we have a table of customers named DimCustomer (see the image below). Our customer database has around 20,000 records, and we want to verify if there are customers in the database who appear more than once because we want to record each customer only once.

sql duplikace hodnot - příklad

First, we need to decide which values we will use to detect duplicates. We cannot do it, for example, based on the ‘Name’ column because many people can have the same name.

Testing for duplicates based on email addresses is also not suitable because the same person can be in the database twice with different emails, and we would not detect the duplication. The most reliable way would be to test for duplicates using a unique identifier for a person, such as a social security number.

However, in our table, we do not have that. So, we need to come up with another approach. Let’s say, for illustrative purposes in our case, it should be reasonably reliable to say that unique combinations of FirstName, LastName, Gender, and BirthDate should exist.

SELECT
[FirstName]
,[LastName]
,[BirthDate]
,[Gender]
,COUNT(*)
FROM [AdventureworksDW2016CTP3].[dbo].[DimCustomer]
GROUP BY
[FirstName]
,[LastName]
,[BirthDate]
,[Gender]
HAVING COUNT(*) >1

The script is quite simple. We retrieve all records where the count of identical combinations of FirstName, LastName, BirthDate, and Gender is greater than 1. The result shows 5 records, and in all cases, these individuals appear twice in the DimCustomer table (duplicate count = 2).

pocet duplikaci v sql tabulce

Rate this post

Ing. Jan Zedníček - Data Engineer & Controlling

My name is Jan Zedníček and I have been working as a freelancer for many companies for more than 10 years. I used to work as a financial controller, analyst and manager at many different companies in field of banking and manufacturing. When I am not at work, I like playing volleyball, chess, doing a workout in the gym.

🔥 If you found this article helpful, please share it or mention me on your website

Leave a Reply

Your email address will not be published. Required fields are marked *