Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 2,836
Default Find Unique Records; Must Use Fuzzy Logic

My supervisor and I are trying to come up with a way of finding and
eliminating, all duplicates from a large Excel worksheet which came from BCM
in Outlook. I am contemplating a few different ways of doing this. I found
a free utility on www.downloads.com that allows a user to do a €˜complex
filter to eliminate duplicates, but its not really giving me the results I
am looking for. I sorted by Company (Column F) and then by last name (Column
D) and then by first name (Column B). I am now wondering if there is a way
to use some kind of fuzzy logic to do a search for values that are almost
unique and hide the remainder of the rows, or almost duplicate and hide the
remainder of the rows. The issue is that Excel identifies lots of €˜unique
records because it identifies two people with two different office addresses
as two different records, but for our purposes this is one contact.
Similarly, a contacts name could be spelled Freuh, in the personal contacts
part of BCM, and the name could also be spelled in Frueh, in the public
contacts part of BCM. Again, these are two €˜unique records, but again, for
our purposes this is one contact. All company names are listed in Column F,
all last names are in Column D, and all first names are in Column B. I would
like to copy/paste all data on all rows with unique records (F, D, & B) to a
new sheet, or hide all rows with dupes. Any ideas? Would Access be able to
handle this?

Thanks, as always!
Ryan---




--
RyGuy
  #2   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 6,953
Default Find Unique Records; Must Use Fuzzy Logic

While your waiting for a good answer, as an excercise -
Mark the uniques and for the duplicates, list which unique it duplicates:

Frehu
Freuh
Erueh
Fruit
Freh
Rfeuh
Ereuh
Feurh
Fruhe
Furth
Feruh
Ferhu
Fruh
Frueh
Fureh

Now right down the rules you used to do it - define "our purposes". then
maybe someone will have some ideas.

--
Regards,
Tom Ogilvy


"ryguy7272" wrote:

My supervisor and I are trying to come up with a way of finding and
eliminating, all duplicates from a large Excel worksheet which came from BCM
in Outlook. I am contemplating a few different ways of doing this. I found
a free utility on www.downloads.com that allows a user to do a €˜complex
filter to eliminate duplicates, but its not really giving me the results I
am looking for. I sorted by Company (Column F) and then by last name (Column
D) and then by first name (Column B). I am now wondering if there is a way
to use some kind of fuzzy logic to do a search for values that are almost
unique and hide the remainder of the rows, or almost duplicate and hide the
remainder of the rows. The issue is that Excel identifies lots of €˜unique
records because it identifies two people with two different office addresses
as two different records, but for our purposes this is one contact.
Similarly, a contacts name could be spelled Freuh, in the personal contacts
part of BCM, and the name could also be spelled in Frueh, in the public
contacts part of BCM. Again, these are two €˜unique records, but again, for
our purposes this is one contact. All company names are listed in Column F,
all last names are in Column D, and all first names are in Column B. I would
like to copy/paste all data on all rows with unique records (F, D, & B) to a
new sheet, or hide all rows with dupes. Any ideas? Would Access be able to
handle this?

Thanks, as always!
Ryan---




--
RyGuy

  #3   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 373
Default Find Unique Records; Must Use Fuzzy Logic

Harlan Grove sent me this function years ago when I was trying to compare
two strings to see how similar they were. I also included a simple sub to
run it. It returns a number from 0 to 1, depending on how similar it thinks
the strings are. Works good. The three arguments are the strings to be
compared and a boolean that tells it whether to match the whole string or
any part of it. I haven't used it in a while now, so I'd advise just
setting the 3rd argument to True for now. Hope this helps! James

Sub TrySimilar()
MsgBox StrSimilar("Frued", "Freud", True)
End Sub

Function StrSimilar(s1 As String, s2 As String, FindWhole As Boolean) As
Double
'function returns a numerical grade for 2 similar strings, 1.00 being
perfect
'by H Grove
Dim I As Long, j As Long, k As Long, n(2) As Long
Dim c1 As String, c2 As String
Const alphanum As String = "1234567890abcdefghijklmnopqrstuvwxyz "
s1 = LCase(s1)
For I = 1 To Len(s1)
If Not InStr(alphanum, Mid(s1, I, 1)) 0 Then Mid(s1, I, 1) = " "
Next I
s1 = Application.WorksheetFunction.Trim(s1)
s2 = LCase(s2)
For j = 1 To Len(s2)
If Not InStr(alphanum, Mid(s2, j, 1)) 0 Then Mid(s2, j, 1) = " "
Next j
s2 = Application.WorksheetFunction.Trim(s2)
j = 1
n(1) = 0
For I = 1 To Len(s1)
c1 = LCase(Mid(s1, I, 1))
k = 0
Do
c2 = LCase(Mid(s2, j + k, 1))
k = k + 1
Loop Until j + k Len(s2) Or c1 = c2
If c1 = c2 Then
n(1) = n(1) + 1
If j < Len(s2) Then j = j + 1 Else Exit For
End If
Next I
I = 1
n(2) = 0
For j = 1 To Len(s2)
c2 = LCase(Mid(s2, j, 1))
k = 0
Do
c1 = LCase(Mid(s1, I + k, 1))
k = k + 1
Loop Until I + k Len(s1) Or c1 = c2
If c1 = c2 Then
n(2) = n(2) + 1
If I < Len(s1) Then I = I + 1 Else Exit For
End If
Next j
If FindWhole Then
StrSimilar = CDbl(Application.WorksheetFunction.Min(n(1), n(2))) _
/ CDbl(Application.WorksheetFunction.Max(Len(s1), Len(s2)))
Else
StrSimilar = CDbl(n(2)) / Len(s2)
End If
End Function


"Tom Ogilvy" wrote in message
...
While your waiting for a good answer, as an excercise -
Mark the uniques and for the duplicates, list which unique it duplicates:

Frehu
Freuh
Erueh
Fruit
Freh
Rfeuh
Ereuh
Feurh
Fruhe
Furth
Feruh
Ferhu
Fruh
Frueh
Fureh

Now right down the rules you used to do it - define "our purposes". then
maybe someone will have some ideas.

--
Regards,
Tom Ogilvy


"ryguy7272" wrote:

My supervisor and I are trying to come up with a way of finding and
eliminating, all duplicates from a large Excel worksheet which came from
BCM
in Outlook. I am contemplating a few different ways of doing this. I
found
a free utility on www.downloads.com that allows a user to do a 'complex
filter' to eliminate duplicates, but it's not really giving me the
results I
am looking for. I sorted by Company (Column F) and then by last name
(Column
D) and then by first name (Column B). I am now wondering if there is a
way
to use some kind of fuzzy logic to do a search for values that are almost
unique and hide the remainder of the rows, or almost duplicate and hide
the
remainder of the rows. The issue is that Excel identifies lots of 'unique'
records because it identifies two people with two different office
addresses
as two different records, but for our purposes this is one contact.
Similarly, a contact's name could be spelled Freuh, in the personal
contacts
part of BCM, and the name could also be spelled in Frueh, in the public
contacts part of BCM. Again, these are two 'unique' records, but again,
for
our purposes this is one contact. All company names are listed in Column
F,
all last names are in Column D, and all first names are in Column B. I
would
like to copy/paste all data on all rows with unique records (F, D, & B)
to a
new sheet, or hide all rows with dupes. Any ideas? Would Access be able
to
handle this?

Thanks, as always!
Ryan---




--
RyGuy



  #4   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 6,953
Default Find Unique Records; Must Use Fuzzy Logic

MsgBox StrSimilar("Freud", "Frdue", True)
gives a 0.8 as well.

--
Regards,
Tom Ogilvy


"Zone" wrote:

Harlan Grove sent me this function years ago when I was trying to compare
two strings to see how similar they were. I also included a simple sub to
run it. It returns a number from 0 to 1, depending on how similar it thinks
the strings are. Works good. The three arguments are the strings to be
compared and a boolean that tells it whether to match the whole string or
any part of it. I haven't used it in a while now, so I'd advise just
setting the 3rd argument to True for now. Hope this helps! James

Sub TrySimilar()
MsgBox StrSimilar("Frued", "Freud", True)
End Sub

Function StrSimilar(s1 As String, s2 As String, FindWhole As Boolean) As
Double
'function returns a numerical grade for 2 similar strings, 1.00 being
perfect
'by H Grove
Dim I As Long, j As Long, k As Long, n(2) As Long
Dim c1 As String, c2 As String
Const alphanum As String = "1234567890abcdefghijklmnopqrstuvwxyz "
s1 = LCase(s1)
For I = 1 To Len(s1)
If Not InStr(alphanum, Mid(s1, I, 1)) 0 Then Mid(s1, I, 1) = " "
Next I
s1 = Application.WorksheetFunction.Trim(s1)
s2 = LCase(s2)
For j = 1 To Len(s2)
If Not InStr(alphanum, Mid(s2, j, 1)) 0 Then Mid(s2, j, 1) = " "
Next j
s2 = Application.WorksheetFunction.Trim(s2)
j = 1
n(1) = 0
For I = 1 To Len(s1)
c1 = LCase(Mid(s1, I, 1))
k = 0
Do
c2 = LCase(Mid(s2, j + k, 1))
k = k + 1
Loop Until j + k Len(s2) Or c1 = c2
If c1 = c2 Then
n(1) = n(1) + 1
If j < Len(s2) Then j = j + 1 Else Exit For
End If
Next I
I = 1
n(2) = 0
For j = 1 To Len(s2)
c2 = LCase(Mid(s2, j, 1))
k = 0
Do
c1 = LCase(Mid(s1, I + k, 1))
k = k + 1
Loop Until I + k Len(s1) Or c1 = c2
If c1 = c2 Then
n(2) = n(2) + 1
If I < Len(s1) Then I = I + 1 Else Exit For
End If
Next j
If FindWhole Then
StrSimilar = CDbl(Application.WorksheetFunction.Min(n(1), n(2))) _
/ CDbl(Application.WorksheetFunction.Max(Len(s1), Len(s2)))
Else
StrSimilar = CDbl(n(2)) / Len(s2)
End If
End Function


"Tom Ogilvy" wrote in message
...
While your waiting for a good answer, as an excercise -
Mark the uniques and for the duplicates, list which unique it duplicates:

Frehu
Freuh
Erueh
Fruit
Freh
Rfeuh
Ereuh
Feurh
Fruhe
Furth
Feruh
Ferhu
Fruh
Frueh
Fureh

Now right down the rules you used to do it - define "our purposes". then
maybe someone will have some ideas.

--
Regards,
Tom Ogilvy


"ryguy7272" wrote:

My supervisor and I are trying to come up with a way of finding and
eliminating, all duplicates from a large Excel worksheet which came from
BCM
in Outlook. I am contemplating a few different ways of doing this. I
found
a free utility on www.downloads.com that allows a user to do a 'complex
filter' to eliminate duplicates, but it's not really giving me the
results I
am looking for. I sorted by Company (Column F) and then by last name
(Column
D) and then by first name (Column B). I am now wondering if there is a
way
to use some kind of fuzzy logic to do a search for values that are almost
unique and hide the remainder of the rows, or almost duplicate and hide
the
remainder of the rows. The issue is that Excel identifies lots of 'unique'
records because it identifies two people with two different office
addresses
as two different records, but for our purposes this is one contact.
Similarly, a contact's name could be spelled Freuh, in the personal
contacts
part of BCM, and the name could also be spelled in Frueh, in the public
contacts part of BCM. Again, these are two 'unique' records, but again,
for
our purposes this is one contact. All company names are listed in Column
F,
all last names are in Column D, and all first names are in Column B. I
would
like to copy/paste all data on all rows with unique records (F, D, & B)
to a
new sheet, or hide all rows with dupes. Any ideas? Would Access be able
to
handle this?

Thanks, as always!
Ryan---




--
RyGuy




  #5   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 373
Default Find Unique Records; Must Use Fuzzy Logic

Tom, so you test-drove it, huh? Four matching letters out of five would be
80%, so that seems reasonable. James
"Tom Ogilvy" wrote in message
...
MsgBox StrSimilar("Freud", "Frdue", True)
gives a 0.8 as well.

--
Regards,
Tom Ogilvy


"Zone" wrote:

Harlan Grove sent me this function years ago when I was trying to compare
two strings to see how similar they were. I also included a simple sub
to
run it. It returns a number from 0 to 1, depending on how similar it
thinks
the strings are. Works good. The three arguments are the strings to be
compared and a boolean that tells it whether to match the whole string or
any part of it. I haven't used it in a while now, so I'd advise just
setting the 3rd argument to True for now. Hope this helps! James

Sub TrySimilar()
MsgBox StrSimilar("Frued", "Freud", True)
End Sub

Function StrSimilar(s1 As String, s2 As String, FindWhole As Boolean) As
Double
'function returns a numerical grade for 2 similar strings, 1.00 being
perfect
'by H Grove
Dim I As Long, j As Long, k As Long, n(2) As Long
Dim c1 As String, c2 As String
Const alphanum As String = "1234567890abcdefghijklmnopqrstuvwxyz "
s1 = LCase(s1)
For I = 1 To Len(s1)
If Not InStr(alphanum, Mid(s1, I, 1)) 0 Then Mid(s1, I, 1) = " "
Next I
s1 = Application.WorksheetFunction.Trim(s1)
s2 = LCase(s2)
For j = 1 To Len(s2)
If Not InStr(alphanum, Mid(s2, j, 1)) 0 Then Mid(s2, j, 1) = " "
Next j
s2 = Application.WorksheetFunction.Trim(s2)
j = 1
n(1) = 0
For I = 1 To Len(s1)
c1 = LCase(Mid(s1, I, 1))
k = 0
Do
c2 = LCase(Mid(s2, j + k, 1))
k = k + 1
Loop Until j + k Len(s2) Or c1 = c2
If c1 = c2 Then
n(1) = n(1) + 1
If j < Len(s2) Then j = j + 1 Else Exit For
End If
Next I
I = 1
n(2) = 0
For j = 1 To Len(s2)
c2 = LCase(Mid(s2, j, 1))
k = 0
Do
c1 = LCase(Mid(s1, I + k, 1))
k = k + 1
Loop Until I + k Len(s1) Or c1 = c2
If c1 = c2 Then
n(2) = n(2) + 1
If I < Len(s1) Then I = I + 1 Else Exit For
End If
Next j
If FindWhole Then
StrSimilar = CDbl(Application.WorksheetFunction.Min(n(1), n(2))) _
/ CDbl(Application.WorksheetFunction.Max(Len(s1), Len(s2)))
Else
StrSimilar = CDbl(n(2)) / Len(s2)
End If
End Function


"Tom Ogilvy" wrote in message
...
While your waiting for a good answer, as an excercise -
Mark the uniques and for the duplicates, list which unique it
duplicates:

Frehu
Freuh
Erueh
Fruit
Freh
Rfeuh
Ereuh
Feurh
Fruhe
Furth
Feruh
Ferhu
Fruh
Frueh
Fureh

Now right down the rules you used to do it - define "our purposes".
then
maybe someone will have some ideas.

--
Regards,
Tom Ogilvy


"ryguy7272" wrote:

My supervisor and I are trying to come up with a way of finding and
eliminating, all duplicates from a large Excel worksheet which came
from
BCM
in Outlook. I am contemplating a few different ways of doing this. I
found
a free utility on www.downloads.com that allows a user to do a
'complex
filter' to eliminate duplicates, but it's not really giving me the
results I
am looking for. I sorted by Company (Column F) and then by last name
(Column
D) and then by first name (Column B). I am now wondering if there is
a
way
to use some kind of fuzzy logic to do a search for values that are
almost
unique and hide the remainder of the rows, or almost duplicate and
hide
the
remainder of the rows. The issue is that Excel identifies lots of
'unique'
records because it identifies two people with two different office
addresses
as two different records, but for our purposes this is one contact.
Similarly, a contact's name could be spelled Freuh, in the personal
contacts
part of BCM, and the name could also be spelled in Frueh, in the
public
contacts part of BCM. Again, these are two 'unique' records, but
again,
for
our purposes this is one contact. All company names are listed in
Column
F,
all last names are in Column D, and all first names are in Column B.
I
would
like to copy/paste all data on all rows with unique records (F, D, &
B)
to a
new sheet, or hide all rows with dupes. Any ideas? Would Access be
able
to
handle this?

Thanks, as always!
Ryan---




--
RyGuy








  #6   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 2,836
Default Find Unique Records; Must Use Fuzzy Logic

Thanks Tom, and thanks Zone, sweet function. However, I think I need to do
this thing (somewhat the old fashioned way). I have had ASAP utilities for
quite some time, but haven't used it much. I am using it now, and I am
seeing that it is quite helpful for this current project, but it will take
several iterations of applying filters, and doing sorting, and some other
things. Anyway, here is a link for others who want to obtain this valuable
utility:
http://www.asap-utilities.com/downlo...-utilities.php


Regards,
Ryan--



--
RyGuy


"Zone" wrote:

Tom, so you test-drove it, huh? Four matching letters out of five would be
80%, so that seems reasonable. James
"Tom Ogilvy" wrote in message
...
MsgBox StrSimilar("Freud", "Frdue", True)
gives a 0.8 as well.

--
Regards,
Tom Ogilvy


"Zone" wrote:

Harlan Grove sent me this function years ago when I was trying to compare
two strings to see how similar they were. I also included a simple sub
to
run it. It returns a number from 0 to 1, depending on how similar it
thinks
the strings are. Works good. The three arguments are the strings to be
compared and a boolean that tells it whether to match the whole string or
any part of it. I haven't used it in a while now, so I'd advise just
setting the 3rd argument to True for now. Hope this helps! James

Sub TrySimilar()
MsgBox StrSimilar("Frued", "Freud", True)
End Sub

Function StrSimilar(s1 As String, s2 As String, FindWhole As Boolean) As
Double
'function returns a numerical grade for 2 similar strings, 1.00 being
perfect
'by H Grove
Dim I As Long, j As Long, k As Long, n(2) As Long
Dim c1 As String, c2 As String
Const alphanum As String = "1234567890abcdefghijklmnopqrstuvwxyz "
s1 = LCase(s1)
For I = 1 To Len(s1)
If Not InStr(alphanum, Mid(s1, I, 1)) 0 Then Mid(s1, I, 1) = " "
Next I
s1 = Application.WorksheetFunction.Trim(s1)
s2 = LCase(s2)
For j = 1 To Len(s2)
If Not InStr(alphanum, Mid(s2, j, 1)) 0 Then Mid(s2, j, 1) = " "
Next j
s2 = Application.WorksheetFunction.Trim(s2)
j = 1
n(1) = 0
For I = 1 To Len(s1)
c1 = LCase(Mid(s1, I, 1))
k = 0
Do
c2 = LCase(Mid(s2, j + k, 1))
k = k + 1
Loop Until j + k Len(s2) Or c1 = c2
If c1 = c2 Then
n(1) = n(1) + 1
If j < Len(s2) Then j = j + 1 Else Exit For
End If
Next I
I = 1
n(2) = 0
For j = 1 To Len(s2)
c2 = LCase(Mid(s2, j, 1))
k = 0
Do
c1 = LCase(Mid(s1, I + k, 1))
k = k + 1
Loop Until I + k Len(s1) Or c1 = c2
If c1 = c2 Then
n(2) = n(2) + 1
If I < Len(s1) Then I = I + 1 Else Exit For
End If
Next j
If FindWhole Then
StrSimilar = CDbl(Application.WorksheetFunction.Min(n(1), n(2))) _
/ CDbl(Application.WorksheetFunction.Max(Len(s1), Len(s2)))
Else
StrSimilar = CDbl(n(2)) / Len(s2)
End If
End Function


"Tom Ogilvy" wrote in message
...
While your waiting for a good answer, as an excercise -
Mark the uniques and for the duplicates, list which unique it
duplicates:

Frehu
Freuh
Erueh
Fruit
Freh
Rfeuh
Ereuh
Feurh
Fruhe
Furth
Feruh
Ferhu
Fruh
Frueh
Fureh

Now right down the rules you used to do it - define "our purposes".
then
maybe someone will have some ideas.

--
Regards,
Tom Ogilvy


"ryguy7272" wrote:

My supervisor and I are trying to come up with a way of finding and
eliminating, all duplicates from a large Excel worksheet which came
from
BCM
in Outlook. I am contemplating a few different ways of doing this. I
found
a free utility on www.downloads.com that allows a user to do a
'complex
filter' to eliminate duplicates, but it's not really giving me the
results I
am looking for. I sorted by Company (Column F) and then by last name
(Column
D) and then by first name (Column B). I am now wondering if there is
a
way
to use some kind of fuzzy logic to do a search for values that are
almost
unique and hide the remainder of the rows, or almost duplicate and
hide
the
remainder of the rows. The issue is that Excel identifies lots of
'unique'
records because it identifies two people with two different office
addresses
as two different records, but for our purposes this is one contact.
Similarly, a contact's name could be spelled Freuh, in the personal
contacts
part of BCM, and the name could also be spelled in Frueh, in the
public
contacts part of BCM. Again, these are two 'unique' records, but
again,
for
our purposes this is one contact. All company names are listed in
Column
F,
all last names are in Column D, and all first names are in Column B.
I
would
like to copy/paste all data on all rows with unique records (F, D, &
B)
to a
new sheet, or hide all rows with dupes. Any ideas? Would Access be
able
to
handle this?

Thanks, as always!
Ryan---




--
RyGuy






  #7   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 373
Default Find Unique Records; Must Use Fuzzy Logic

You're welcome. Good luck with that. James
"ryguy7272" wrote in message
...
Thanks Tom, and thanks Zone, sweet function. However, I think I need to
do
this thing (somewhat the old fashioned way). I have had ASAP utilities
for
quite some time, but haven't used it much. I am using it now, and I am
seeing that it is quite helpful for this current project, but it will take
several iterations of applying filters, and doing sorting, and some other
things. Anyway, here is a link for others who want to obtain this
valuable
utility:
http://www.asap-utilities.com/downlo...-utilities.php


Regards,
Ryan--



--
RyGuy


"Zone" wrote:

Tom, so you test-drove it, huh? Four matching letters out of five would
be
80%, so that seems reasonable. James
"Tom Ogilvy" wrote in message
...
MsgBox StrSimilar("Freud", "Frdue", True)
gives a 0.8 as well.

--
Regards,
Tom Ogilvy


"Zone" wrote:

Harlan Grove sent me this function years ago when I was trying to
compare
two strings to see how similar they were. I also included a simple
sub
to
run it. It returns a number from 0 to 1, depending on how similar it
thinks
the strings are. Works good. The three arguments are the strings to
be
compared and a boolean that tells it whether to match the whole string
or
any part of it. I haven't used it in a while now, so I'd advise just
setting the 3rd argument to True for now. Hope this helps! James

Sub TrySimilar()
MsgBox StrSimilar("Frued", "Freud", True)
End Sub

Function StrSimilar(s1 As String, s2 As String, FindWhole As Boolean)
As
Double
'function returns a numerical grade for 2 similar strings, 1.00
being
perfect
'by H Grove
Dim I As Long, j As Long, k As Long, n(2) As Long
Dim c1 As String, c2 As String
Const alphanum As String = "1234567890abcdefghijklmnopqrstuvwxyz "
s1 = LCase(s1)
For I = 1 To Len(s1)
If Not InStr(alphanum, Mid(s1, I, 1)) 0 Then Mid(s1, I, 1) = "
"
Next I
s1 = Application.WorksheetFunction.Trim(s1)
s2 = LCase(s2)
For j = 1 To Len(s2)
If Not InStr(alphanum, Mid(s2, j, 1)) 0 Then Mid(s2, j, 1) = "
"
Next j
s2 = Application.WorksheetFunction.Trim(s2)
j = 1
n(1) = 0
For I = 1 To Len(s1)
c1 = LCase(Mid(s1, I, 1))
k = 0
Do
c2 = LCase(Mid(s2, j + k, 1))
k = k + 1
Loop Until j + k Len(s2) Or c1 = c2
If c1 = c2 Then
n(1) = n(1) + 1
If j < Len(s2) Then j = j + 1 Else Exit For
End If
Next I
I = 1
n(2) = 0
For j = 1 To Len(s2)
c2 = LCase(Mid(s2, j, 1))
k = 0
Do
c1 = LCase(Mid(s1, I + k, 1))
k = k + 1
Loop Until I + k Len(s1) Or c1 = c2
If c1 = c2 Then
n(2) = n(2) + 1
If I < Len(s1) Then I = I + 1 Else Exit For
End If
Next j
If FindWhole Then
StrSimilar = CDbl(Application.WorksheetFunction.Min(n(1), n(2)))
_
/ CDbl(Application.WorksheetFunction.Max(Len(s1), Len(s2)))
Else
StrSimilar = CDbl(n(2)) / Len(s2)
End If
End Function


"Tom Ogilvy" wrote in message
...
While your waiting for a good answer, as an excercise -
Mark the uniques and for the duplicates, list which unique it
duplicates:

Frehu
Freuh
Erueh
Fruit
Freh
Rfeuh
Ereuh
Feurh
Fruhe
Furth
Feruh
Ferhu
Fruh
Frueh
Fureh

Now right down the rules you used to do it - define "our purposes".
then
maybe someone will have some ideas.

--
Regards,
Tom Ogilvy


"ryguy7272" wrote:

My supervisor and I are trying to come up with a way of finding and
eliminating, all duplicates from a large Excel worksheet which came
from
BCM
in Outlook. I am contemplating a few different ways of doing this.
I
found
a free utility on www.downloads.com that allows a user to do a
'complex
filter' to eliminate duplicates, but it's not really giving me the
results I
am looking for. I sorted by Company (Column F) and then by last
name
(Column
D) and then by first name (Column B). I am now wondering if there
is
a
way
to use some kind of fuzzy logic to do a search for values that are
almost
unique and hide the remainder of the rows, or almost duplicate and
hide
the
remainder of the rows. The issue is that Excel identifies lots of
'unique'
records because it identifies two people with two different office
addresses
as two different records, but for our purposes this is one contact.
Similarly, a contact's name could be spelled Freuh, in the personal
contacts
part of BCM, and the name could also be spelled in Frueh, in the
public
contacts part of BCM. Again, these are two 'unique' records, but
again,
for
our purposes this is one contact. All company names are listed in
Column
F,
all last names are in Column D, and all first names are in Column
B.
I
would
like to copy/paste all data on all rows with unique records (F, D,
&
B)
to a
new sheet, or hide all rows with dupes. Any ideas? Would Access
be
able
to
handle this?

Thanks, as always!
Ryan---




--
RyGuy








  #8   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 2,836
Default Find Unique Records; Must Use Fuzzy Logic

One more thing...for anyone who stumbles upon this post sometime in the future.
I used this utility (below), along with the other one (above), and got
pretty darn close to where I wanted to be.

http://members.iinet.net.au/~brettdj...l_Column_Match


Regards,
Ryan--

--
RyGuy


"Zone" wrote:

You're welcome. Good luck with that. James
"ryguy7272" wrote in message
...
Thanks Tom, and thanks Zone, sweet function. However, I think I need to
do
this thing (somewhat the old fashioned way). I have had ASAP utilities
for
quite some time, but haven't used it much. I am using it now, and I am
seeing that it is quite helpful for this current project, but it will take
several iterations of applying filters, and doing sorting, and some other
things. Anyway, here is a link for others who want to obtain this
valuable
utility:
http://www.asap-utilities.com/downlo...-utilities.php


Regards,
Ryan--



--
RyGuy


"Zone" wrote:

Tom, so you test-drove it, huh? Four matching letters out of five would
be
80%, so that seems reasonable. James
"Tom Ogilvy" wrote in message
...
MsgBox StrSimilar("Freud", "Frdue", True)
gives a 0.8 as well.

--
Regards,
Tom Ogilvy


"Zone" wrote:

Harlan Grove sent me this function years ago when I was trying to
compare
two strings to see how similar they were. I also included a simple
sub
to
run it. It returns a number from 0 to 1, depending on how similar it
thinks
the strings are. Works good. The three arguments are the strings to
be
compared and a boolean that tells it whether to match the whole string
or
any part of it. I haven't used it in a while now, so I'd advise just
setting the 3rd argument to True for now. Hope this helps! James

Sub TrySimilar()
MsgBox StrSimilar("Frued", "Freud", True)
End Sub

Function StrSimilar(s1 As String, s2 As String, FindWhole As Boolean)
As
Double
'function returns a numerical grade for 2 similar strings, 1.00
being
perfect
'by H Grove
Dim I As Long, j As Long, k As Long, n(2) As Long
Dim c1 As String, c2 As String
Const alphanum As String = "1234567890abcdefghijklmnopqrstuvwxyz "
s1 = LCase(s1)
For I = 1 To Len(s1)
If Not InStr(alphanum, Mid(s1, I, 1)) 0 Then Mid(s1, I, 1) = "
"
Next I
s1 = Application.WorksheetFunction.Trim(s1)
s2 = LCase(s2)
For j = 1 To Len(s2)
If Not InStr(alphanum, Mid(s2, j, 1)) 0 Then Mid(s2, j, 1) = "
"
Next j
s2 = Application.WorksheetFunction.Trim(s2)
j = 1
n(1) = 0
For I = 1 To Len(s1)
c1 = LCase(Mid(s1, I, 1))
k = 0
Do
c2 = LCase(Mid(s2, j + k, 1))
k = k + 1
Loop Until j + k Len(s2) Or c1 = c2
If c1 = c2 Then
n(1) = n(1) + 1
If j < Len(s2) Then j = j + 1 Else Exit For
End If
Next I
I = 1
n(2) = 0
For j = 1 To Len(s2)
c2 = LCase(Mid(s2, j, 1))
k = 0
Do
c1 = LCase(Mid(s1, I + k, 1))
k = k + 1
Loop Until I + k Len(s1) Or c1 = c2
If c1 = c2 Then
n(2) = n(2) + 1
If I < Len(s1) Then I = I + 1 Else Exit For
End If
Next j
If FindWhole Then
StrSimilar = CDbl(Application.WorksheetFunction.Min(n(1), n(2)))
_
/ CDbl(Application.WorksheetFunction.Max(Len(s1), Len(s2)))
Else
StrSimilar = CDbl(n(2)) / Len(s2)
End If
End Function


"Tom Ogilvy" wrote in message
...
While your waiting for a good answer, as an excercise -
Mark the uniques and for the duplicates, list which unique it
duplicates:

Frehu
Freuh
Erueh
Fruit
Freh
Rfeuh
Ereuh
Feurh
Fruhe
Furth
Feruh
Ferhu
Fruh
Frueh
Fureh

Now right down the rules you used to do it - define "our purposes".
then
maybe someone will have some ideas.

--
Regards,
Tom Ogilvy


"ryguy7272" wrote:

My supervisor and I are trying to come up with a way of finding and
eliminating, all duplicates from a large Excel worksheet which came
from
BCM
in Outlook. I am contemplating a few different ways of doing this.
I
found
a free utility on www.downloads.com that allows a user to do a
'complex
filter' to eliminate duplicates, but it's not really giving me the
results I
am looking for. I sorted by Company (Column F) and then by last
name
(Column
D) and then by first name (Column B). I am now wondering if there
is
a
way
to use some kind of fuzzy logic to do a search for values that are
almost
unique and hide the remainder of the rows, or almost duplicate and
hide
the
remainder of the rows. The issue is that Excel identifies lots of
'unique'
records because it identifies two people with two different office
addresses
as two different records, but for our purposes this is one contact.
Similarly, a contact's name could be spelled Freuh, in the personal
contacts
part of BCM, and the name could also be spelled in Frueh, in the
public
contacts part of BCM. Again, these are two 'unique' records, but
again,
for
our purposes this is one contact. All company names are listed in
Column
F,
all last names are in Column D, and all first names are in Column
B.
I
would
like to copy/paste all data on all rows with unique records (F, D,
&
B)
to a
new sheet, or hide all rows with dupes. Any ideas? Would Access
be
able
to
handle this?

Thanks, as always!
Ryan---




--
RyGuy









Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
find unique records frankfurtjoe Excel Discussion (Misc queries) 4 May 19th 09 04:29 PM
find unique records frankfurtjoe Excel Discussion (Misc queries) 1 May 19th 09 01:30 PM
Routine with fuzzy logic to determine the relative comparison of two strings? Elmer Smurdley Excel Worksheet Functions 7 October 13th 07 04:01 PM
Find Unique Records from two sheets M.Siler Excel Programming 1 July 25th 05 10:07 PM
Using fuzzy logic in excel James Excel Discussion (Misc queries) 7 February 16th 05 07:08 PM


All times are GMT +1. The time now is 08:58 AM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"