Home |
Search |
Today's Posts |
#1
|
|||
|
|||
Evaluating similarity of text strings
Hi All, I have a data list - roughly 2000 items in all, which are names (people, companies, organisations etc) and an amount they have paid. The data is collected from four separate sources and trimmed already to avoid extra spaces. The upshot is that there are instances where the same name is entered more than one way. For example: Alan B Chadd Bob Charles Chadd Alan B Now a human eye can easily spot that the first and third are (probably) the same person and it is worth investigating further. This is fine when they are close to each other, but not practical when they are two pages apart. Therefore, I would like to write a formula that could give a score that ranks the likely similar entries. I am thinking that something like this would be good: 1) Take each letter of the target name, and count how many times it appears in every other of the 2000 entries ignoring capitals (it is possible someone will have typed in a name with or without any proper capitalisation). 2) Add up the totals for each of the 2000 entries 3) Show any items with a score over X (to be picked by trial) or just sort them by the score. Example From Above: The score for "Alan B Chadd" against "Bob Charles" would be: A = 0 l = 1 a = 0 n = 0 Space = 1 B = 1 Space = 1 C = 1 h = 1 a = 0 d = 0 d = 0 Total = 5 The score for "Alan B Chadd" against "Chadd Alan B" would be: A = 3 l = 1 a = 3 n = 1 Space = 2 B = 1 Space = 2 C = 1 h = 1 a = 3 d = 2 d = 2 Total = 22 Obviously it is not perfect, but it should be good enough to point the human in the right direction! Can anyone suggest a way to do this? Thanks in advance, Alan. |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
space between text strings with concatenate | Excel Discussion (Misc queries) | |||
Compare cells/columns and highlight matching text strings | Excel Worksheet Functions | |||
How to make a cell recognize multiple text strings? | Excel Worksheet Functions | |||
Read Text File into Excel Using VBA | Excel Discussion (Misc queries) | |||
Filter long Text strings | Excel Worksheet Functions |