ExcelBanter - View Single Post

Not perfect either, but...with names in column A, place
this formula in row 1 of an open column and fill down:

=SUMPRODUCT(CODE(MID(A1,ROW(INDIRECT("1:"&LEN(A1)) ),1)))

Then sort the data based on this column. Similar strings
should be closely grouped.

HTH
Jason
Atlanta, GA

-----Original Message-----

Hi All,

I have a data list - roughly 2000 items in all, which
are names
(people, companies, organisations etc) and an amount
they have paid.

The data is collected from four separate sources and
trimmed already
to
avoid extra spaces. The upshot is that there are
instances where the
same name is entered more than one way. For example:

Alan B Chadd
Bob Charles
Chadd Alan B

Now a human eye can easily spot that the first and third
are
(probably) the same person and it is worth investigating
further.
This is fine when they are close to each other, but not
practical
when they are two pages apart.

Therefore, I would like to write a formula that could
give a score
that
ranks the likely similar entries.

I am thinking that something like this would be good:

1) Take each letter of the target name, and count how
many times it
appears in every other of the 2000 entries ignoring
capitals (it is
possible
someone will have typed in a name with or without any
proper
capitalisation).

2) Add up the totals for each of the 2000 entries

3) Show any items with a score over X (to be picked by
trial) or just
sort them by the score.

Example From Above:

The score for "Alan B Chadd" against "Bob Charles" would
be:

A = 0
l = 1
a = 0
n = 0
Space = 1
B = 1
Space = 1
C = 1
h = 1
a = 0
d = 0
d = 0

Total = 5

The score for "Alan B Chadd" against "Chadd Alan B"
would be:

A = 3
l = 1
a = 3
n = 1
Space = 2
B = 1
Space = 2
C = 1
h = 1
a = 3
d = 2
d = 2

Total = 22

Obviously it is not perfect, but it should be good
enough to point the
human in the right direction!

Can anyone suggest a way to do this?

Thanks in advance,

Alan.

.