Did you ever need to compare 2 lists and test it for duplicities? The easiest way to do it in Excel is to use standard functions as VLOOKUP. Sometimes though, we might need to compare 2 lists where one of them contains typos. Here, Fuzzy lookup comes into play (can be downloaded here).
Typical example is manually maintained database (of people, companies etc.) in Excel. We want to add more records into this evidence from time to time. But before the addition, we want to verify if the added data are not already present in the list. This counts also for cases when we want to add records where the risk of typos exists.
Fuzzy Lookup Example in Excel
Let’s have a list as displayed below (imagine thousands of items). In it, evidence of companies and their web pages. We want to maintain the list without duplicities and broaden it with new data. We therefore need to make sure that the company we want to insert into list is not already present.
The list looks like this:
We want to add more companies into the list. I prepared a list of candidates for insertion into the evidence by copying few companies from the main list. In some of the, I made a typo on purpose (in red). Let’s check it by fuzzy lookup and see if it reveals that the companies are already in the list.
If we would search for duplicities on the basis of VLOOKUP, it would find companies as Chevron, AFLAC, Charter Communications and Coca- cola as new. That would obviously be incorrect. Fuzzy lookup is here to help us detect typos.
Steplist – Fuzzy Lookup Example
1) Install Excel addin from Microsoft web. New bookmark named Fuzzy lookup will appear after the installation.
2) Mark both lists as named area or format the list as a table. If not, it will be impossible to link to the area in fuzzy lookup. Then, add both lists to fuzzy lookup area. Left table will be the table containing data for addition and right table will be containing the list. All columns that will appear in Left/right columns section must be marked, then click on the icon. By doing this, you will set what to compare. Finally, click on GO.
3) Result is a table describing degree of similarity between the 2 lists. It says it through the Similarity number with value between 0 – 1. Closer the number is to 1, the higher is the similarity of the two items from both of the lists (1 meaning absolute match).
Evaluation: All items are duplicates since they have high degree of similarity. No record would be inserted into our list:
- Similarity = 1 means absolute match
- Similarity >= 0,8 means that company and web page already exist in the list. It seems that we are trying to insert company again, but with a typo (which is correct)
- Similarity <0,8 here the degree of similarity is lower, but still high. In our case, it is Charter Communication that haves a mistake in the name of the company and web page address. Second company is Koka-Loka, which haves identical web page and different name. Fuzzy lookup detected high probability of duplicity based on web page match.