View Single Post
  #2   Report Post  
Jason Morin
 
Posts: n/a
Default

Not perfect either, but...with names in column A, place
this formula in row 1 of an open column and fill down:

=SUMPRODUCT(CODE(MID(A1,ROW(INDIRECT("1:"&LEN(A1)) ),1)))

Then sort the data based on this column. Similar strings
should be closely grouped.

HTH
Jason
Atlanta, GA

-----Original Message-----

Hi All,

I have a data list - roughly 2000 items in all, which

are names
(people, companies, organisations etc) and an amount

they have paid.

The data is collected from four separate sources and

trimmed already
to
avoid extra spaces. The upshot is that there are

instances where the
same name is entered more than one way. For example:

Alan B Chadd
Bob Charles
Chadd Alan B

Now a human eye can easily spot that the first and third

are
(probably) the same person and it is worth investigating

further.
This is fine when they are close to each other, but not

practical
when they are two pages apart.

Therefore, I would like to write a formula that could

give a score
that
ranks the likely similar entries.

I am thinking that something like this would be good:

1) Take each letter of the target name, and count how

many times it
appears in every other of the 2000 entries ignoring

capitals (it is
possible
someone will have typed in a name with or without any

proper
capitalisation).

2) Add up the totals for each of the 2000 entries

3) Show any items with a score over X (to be picked by

trial) or just
sort them by the score.


Example From Above:

The score for "Alan B Chadd" against "Bob Charles" would

be:

A = 0
l = 1
a = 0
n = 0
Space = 1
B = 1
Space = 1
C = 1
h = 1
a = 0
d = 0
d = 0

Total = 5

The score for "Alan B Chadd" against "Chadd Alan B"

would be:

A = 3
l = 1
a = 3
n = 1
Space = 2
B = 1
Space = 2
C = 1
h = 1
a = 3
d = 2
d = 2

Total = 22

Obviously it is not perfect, but it should be good

enough to point the
human in the right direction!


Can anyone suggest a way to do this?

Thanks in advance,

Alan.




.