Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
Find Unique Records; Must Use Fuzzy Logic
My supervisor and I are trying to come up with a way of finding and
eliminating, all duplicates from a large Excel worksheet which came from BCM in Outlook. I am contemplating a few different ways of doing this. I found a free utility on www.downloads.com that allows a user to do a €˜complex filter to eliminate duplicates, but its not really giving me the results I am looking for. I sorted by Company (Column F) and then by last name (Column D) and then by first name (Column B). I am now wondering if there is a way to use some kind of fuzzy logic to do a search for values that are almost unique and hide the remainder of the rows, or almost duplicate and hide the remainder of the rows. The issue is that Excel identifies lots of €˜unique records because it identifies two people with two different office addresses as two different records, but for our purposes this is one contact. Similarly, a contacts name could be spelled Freuh, in the personal contacts part of BCM, and the name could also be spelled in Frueh, in the public contacts part of BCM. Again, these are two €˜unique records, but again, for our purposes this is one contact. All company names are listed in Column F, all last names are in Column D, and all first names are in Column B. I would like to copy/paste all data on all rows with unique records (F, D, & B) to a new sheet, or hide all rows with dupes. Any ideas? Would Access be able to handle this? Thanks, as always! Ryan--- -- RyGuy |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
Find Unique Records; Must Use Fuzzy Logic
While your waiting for a good answer, as an excercise -
Mark the uniques and for the duplicates, list which unique it duplicates: Frehu Freuh Erueh Fruit Freh Rfeuh Ereuh Feurh Fruhe Furth Feruh Ferhu Fruh Frueh Fureh Now right down the rules you used to do it - define "our purposes". then maybe someone will have some ideas. -- Regards, Tom Ogilvy "ryguy7272" wrote: My supervisor and I are trying to come up with a way of finding and eliminating, all duplicates from a large Excel worksheet which came from BCM in Outlook. I am contemplating a few different ways of doing this. I found a free utility on www.downloads.com that allows a user to do a €˜complex filter to eliminate duplicates, but its not really giving me the results I am looking for. I sorted by Company (Column F) and then by last name (Column D) and then by first name (Column B). I am now wondering if there is a way to use some kind of fuzzy logic to do a search for values that are almost unique and hide the remainder of the rows, or almost duplicate and hide the remainder of the rows. The issue is that Excel identifies lots of €˜unique records because it identifies two people with two different office addresses as two different records, but for our purposes this is one contact. Similarly, a contacts name could be spelled Freuh, in the personal contacts part of BCM, and the name could also be spelled in Frueh, in the public contacts part of BCM. Again, these are two €˜unique records, but again, for our purposes this is one contact. All company names are listed in Column F, all last names are in Column D, and all first names are in Column B. I would like to copy/paste all data on all rows with unique records (F, D, & B) to a new sheet, or hide all rows with dupes. Any ideas? Would Access be able to handle this? Thanks, as always! Ryan--- -- RyGuy |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
Find Unique Records; Must Use Fuzzy Logic
Harlan Grove sent me this function years ago when I was trying to compare
two strings to see how similar they were. I also included a simple sub to run it. It returns a number from 0 to 1, depending on how similar it thinks the strings are. Works good. The three arguments are the strings to be compared and a boolean that tells it whether to match the whole string or any part of it. I haven't used it in a while now, so I'd advise just setting the 3rd argument to True for now. Hope this helps! James Sub TrySimilar() MsgBox StrSimilar("Frued", "Freud", True) End Sub Function StrSimilar(s1 As String, s2 As String, FindWhole As Boolean) As Double 'function returns a numerical grade for 2 similar strings, 1.00 being perfect 'by H Grove Dim I As Long, j As Long, k As Long, n(2) As Long Dim c1 As String, c2 As String Const alphanum As String = "1234567890abcdefghijklmnopqrstuvwxyz " s1 = LCase(s1) For I = 1 To Len(s1) If Not InStr(alphanum, Mid(s1, I, 1)) 0 Then Mid(s1, I, 1) = " " Next I s1 = Application.WorksheetFunction.Trim(s1) s2 = LCase(s2) For j = 1 To Len(s2) If Not InStr(alphanum, Mid(s2, j, 1)) 0 Then Mid(s2, j, 1) = " " Next j s2 = Application.WorksheetFunction.Trim(s2) j = 1 n(1) = 0 For I = 1 To Len(s1) c1 = LCase(Mid(s1, I, 1)) k = 0 Do c2 = LCase(Mid(s2, j + k, 1)) k = k + 1 Loop Until j + k Len(s2) Or c1 = c2 If c1 = c2 Then n(1) = n(1) + 1 If j < Len(s2) Then j = j + 1 Else Exit For End If Next I I = 1 n(2) = 0 For j = 1 To Len(s2) c2 = LCase(Mid(s2, j, 1)) k = 0 Do c1 = LCase(Mid(s1, I + k, 1)) k = k + 1 Loop Until I + k Len(s1) Or c1 = c2 If c1 = c2 Then n(2) = n(2) + 1 If I < Len(s1) Then I = I + 1 Else Exit For End If Next j If FindWhole Then StrSimilar = CDbl(Application.WorksheetFunction.Min(n(1), n(2))) _ / CDbl(Application.WorksheetFunction.Max(Len(s1), Len(s2))) Else StrSimilar = CDbl(n(2)) / Len(s2) End If End Function "Tom Ogilvy" wrote in message ... While your waiting for a good answer, as an excercise - Mark the uniques and for the duplicates, list which unique it duplicates: Frehu Freuh Erueh Fruit Freh Rfeuh Ereuh Feurh Fruhe Furth Feruh Ferhu Fruh Frueh Fureh Now right down the rules you used to do it - define "our purposes". then maybe someone will have some ideas. -- Regards, Tom Ogilvy "ryguy7272" wrote: My supervisor and I are trying to come up with a way of finding and eliminating, all duplicates from a large Excel worksheet which came from BCM in Outlook. I am contemplating a few different ways of doing this. I found a free utility on www.downloads.com that allows a user to do a 'complex filter' to eliminate duplicates, but it's not really giving me the results I am looking for. I sorted by Company (Column F) and then by last name (Column D) and then by first name (Column B). I am now wondering if there is a way to use some kind of fuzzy logic to do a search for values that are almost unique and hide the remainder of the rows, or almost duplicate and hide the remainder of the rows. The issue is that Excel identifies lots of 'unique' records because it identifies two people with two different office addresses as two different records, but for our purposes this is one contact. Similarly, a contact's name could be spelled Freuh, in the personal contacts part of BCM, and the name could also be spelled in Frueh, in the public contacts part of BCM. Again, these are two 'unique' records, but again, for our purposes this is one contact. All company names are listed in Column F, all last names are in Column D, and all first names are in Column B. I would like to copy/paste all data on all rows with unique records (F, D, & B) to a new sheet, or hide all rows with dupes. Any ideas? Would Access be able to handle this? Thanks, as always! Ryan--- -- RyGuy |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
Find Unique Records; Must Use Fuzzy Logic
MsgBox StrSimilar("Freud", "Frdue", True)
gives a 0.8 as well. -- Regards, Tom Ogilvy "Zone" wrote: Harlan Grove sent me this function years ago when I was trying to compare two strings to see how similar they were. I also included a simple sub to run it. It returns a number from 0 to 1, depending on how similar it thinks the strings are. Works good. The three arguments are the strings to be compared and a boolean that tells it whether to match the whole string or any part of it. I haven't used it in a while now, so I'd advise just setting the 3rd argument to True for now. Hope this helps! James Sub TrySimilar() MsgBox StrSimilar("Frued", "Freud", True) End Sub Function StrSimilar(s1 As String, s2 As String, FindWhole As Boolean) As Double 'function returns a numerical grade for 2 similar strings, 1.00 being perfect 'by H Grove Dim I As Long, j As Long, k As Long, n(2) As Long Dim c1 As String, c2 As String Const alphanum As String = "1234567890abcdefghijklmnopqrstuvwxyz " s1 = LCase(s1) For I = 1 To Len(s1) If Not InStr(alphanum, Mid(s1, I, 1)) 0 Then Mid(s1, I, 1) = " " Next I s1 = Application.WorksheetFunction.Trim(s1) s2 = LCase(s2) For j = 1 To Len(s2) If Not InStr(alphanum, Mid(s2, j, 1)) 0 Then Mid(s2, j, 1) = " " Next j s2 = Application.WorksheetFunction.Trim(s2) j = 1 n(1) = 0 For I = 1 To Len(s1) c1 = LCase(Mid(s1, I, 1)) k = 0 Do c2 = LCase(Mid(s2, j + k, 1)) k = k + 1 Loop Until j + k Len(s2) Or c1 = c2 If c1 = c2 Then n(1) = n(1) + 1 If j < Len(s2) Then j = j + 1 Else Exit For End If Next I I = 1 n(2) = 0 For j = 1 To Len(s2) c2 = LCase(Mid(s2, j, 1)) k = 0 Do c1 = LCase(Mid(s1, I + k, 1)) k = k + 1 Loop Until I + k Len(s1) Or c1 = c2 If c1 = c2 Then n(2) = n(2) + 1 If I < Len(s1) Then I = I + 1 Else Exit For End If Next j If FindWhole Then StrSimilar = CDbl(Application.WorksheetFunction.Min(n(1), n(2))) _ / CDbl(Application.WorksheetFunction.Max(Len(s1), Len(s2))) Else StrSimilar = CDbl(n(2)) / Len(s2) End If End Function "Tom Ogilvy" wrote in message ... While your waiting for a good answer, as an excercise - Mark the uniques and for the duplicates, list which unique it duplicates: Frehu Freuh Erueh Fruit Freh Rfeuh Ereuh Feurh Fruhe Furth Feruh Ferhu Fruh Frueh Fureh Now right down the rules you used to do it - define "our purposes". then maybe someone will have some ideas. -- Regards, Tom Ogilvy "ryguy7272" wrote: My supervisor and I are trying to come up with a way of finding and eliminating, all duplicates from a large Excel worksheet which came from BCM in Outlook. I am contemplating a few different ways of doing this. I found a free utility on www.downloads.com that allows a user to do a 'complex filter' to eliminate duplicates, but it's not really giving me the results I am looking for. I sorted by Company (Column F) and then by last name (Column D) and then by first name (Column B). I am now wondering if there is a way to use some kind of fuzzy logic to do a search for values that are almost unique and hide the remainder of the rows, or almost duplicate and hide the remainder of the rows. The issue is that Excel identifies lots of 'unique' records because it identifies two people with two different office addresses as two different records, but for our purposes this is one contact. Similarly, a contact's name could be spelled Freuh, in the personal contacts part of BCM, and the name could also be spelled in Frueh, in the public contacts part of BCM. Again, these are two 'unique' records, but again, for our purposes this is one contact. All company names are listed in Column F, all last names are in Column D, and all first names are in Column B. I would like to copy/paste all data on all rows with unique records (F, D, & B) to a new sheet, or hide all rows with dupes. Any ideas? Would Access be able to handle this? Thanks, as always! Ryan--- -- RyGuy |
#5
Posted to microsoft.public.excel.programming
|
|||
|
|||
Find Unique Records; Must Use Fuzzy Logic
Tom, so you test-drove it, huh? Four matching letters out of five would be
80%, so that seems reasonable. James "Tom Ogilvy" wrote in message ... MsgBox StrSimilar("Freud", "Frdue", True) gives a 0.8 as well. -- Regards, Tom Ogilvy "Zone" wrote: Harlan Grove sent me this function years ago when I was trying to compare two strings to see how similar they were. I also included a simple sub to run it. It returns a number from 0 to 1, depending on how similar it thinks the strings are. Works good. The three arguments are the strings to be compared and a boolean that tells it whether to match the whole string or any part of it. I haven't used it in a while now, so I'd advise just setting the 3rd argument to True for now. Hope this helps! James Sub TrySimilar() MsgBox StrSimilar("Frued", "Freud", True) End Sub Function StrSimilar(s1 As String, s2 As String, FindWhole As Boolean) As Double 'function returns a numerical grade for 2 similar strings, 1.00 being perfect 'by H Grove Dim I As Long, j As Long, k As Long, n(2) As Long Dim c1 As String, c2 As String Const alphanum As String = "1234567890abcdefghijklmnopqrstuvwxyz " s1 = LCase(s1) For I = 1 To Len(s1) If Not InStr(alphanum, Mid(s1, I, 1)) 0 Then Mid(s1, I, 1) = " " Next I s1 = Application.WorksheetFunction.Trim(s1) s2 = LCase(s2) For j = 1 To Len(s2) If Not InStr(alphanum, Mid(s2, j, 1)) 0 Then Mid(s2, j, 1) = " " Next j s2 = Application.WorksheetFunction.Trim(s2) j = 1 n(1) = 0 For I = 1 To Len(s1) c1 = LCase(Mid(s1, I, 1)) k = 0 Do c2 = LCase(Mid(s2, j + k, 1)) k = k + 1 Loop Until j + k Len(s2) Or c1 = c2 If c1 = c2 Then n(1) = n(1) + 1 If j < Len(s2) Then j = j + 1 Else Exit For End If Next I I = 1 n(2) = 0 For j = 1 To Len(s2) c2 = LCase(Mid(s2, j, 1)) k = 0 Do c1 = LCase(Mid(s1, I + k, 1)) k = k + 1 Loop Until I + k Len(s1) Or c1 = c2 If c1 = c2 Then n(2) = n(2) + 1 If I < Len(s1) Then I = I + 1 Else Exit For End If Next j If FindWhole Then StrSimilar = CDbl(Application.WorksheetFunction.Min(n(1), n(2))) _ / CDbl(Application.WorksheetFunction.Max(Len(s1), Len(s2))) Else StrSimilar = CDbl(n(2)) / Len(s2) End If End Function "Tom Ogilvy" wrote in message ... While your waiting for a good answer, as an excercise - Mark the uniques and for the duplicates, list which unique it duplicates: Frehu Freuh Erueh Fruit Freh Rfeuh Ereuh Feurh Fruhe Furth Feruh Ferhu Fruh Frueh Fureh Now right down the rules you used to do it - define "our purposes". then maybe someone will have some ideas. -- Regards, Tom Ogilvy "ryguy7272" wrote: My supervisor and I are trying to come up with a way of finding and eliminating, all duplicates from a large Excel worksheet which came from BCM in Outlook. I am contemplating a few different ways of doing this. I found a free utility on www.downloads.com that allows a user to do a 'complex filter' to eliminate duplicates, but it's not really giving me the results I am looking for. I sorted by Company (Column F) and then by last name (Column D) and then by first name (Column B). I am now wondering if there is a way to use some kind of fuzzy logic to do a search for values that are almost unique and hide the remainder of the rows, or almost duplicate and hide the remainder of the rows. The issue is that Excel identifies lots of 'unique' records because it identifies two people with two different office addresses as two different records, but for our purposes this is one contact. Similarly, a contact's name could be spelled Freuh, in the personal contacts part of BCM, and the name could also be spelled in Frueh, in the public contacts part of BCM. Again, these are two 'unique' records, but again, for our purposes this is one contact. All company names are listed in Column F, all last names are in Column D, and all first names are in Column B. I would like to copy/paste all data on all rows with unique records (F, D, & B) to a new sheet, or hide all rows with dupes. Any ideas? Would Access be able to handle this? Thanks, as always! Ryan--- -- RyGuy |
#6
Posted to microsoft.public.excel.programming
|
|||
|
|||
Find Unique Records; Must Use Fuzzy Logic
Thanks Tom, and thanks Zone, sweet function. However, I think I need to do
this thing (somewhat the old fashioned way). I have had ASAP utilities for quite some time, but haven't used it much. I am using it now, and I am seeing that it is quite helpful for this current project, but it will take several iterations of applying filters, and doing sorting, and some other things. Anyway, here is a link for others who want to obtain this valuable utility: http://www.asap-utilities.com/downlo...-utilities.php Regards, Ryan-- -- RyGuy "Zone" wrote: Tom, so you test-drove it, huh? Four matching letters out of five would be 80%, so that seems reasonable. James "Tom Ogilvy" wrote in message ... MsgBox StrSimilar("Freud", "Frdue", True) gives a 0.8 as well. -- Regards, Tom Ogilvy "Zone" wrote: Harlan Grove sent me this function years ago when I was trying to compare two strings to see how similar they were. I also included a simple sub to run it. It returns a number from 0 to 1, depending on how similar it thinks the strings are. Works good. The three arguments are the strings to be compared and a boolean that tells it whether to match the whole string or any part of it. I haven't used it in a while now, so I'd advise just setting the 3rd argument to True for now. Hope this helps! James Sub TrySimilar() MsgBox StrSimilar("Frued", "Freud", True) End Sub Function StrSimilar(s1 As String, s2 As String, FindWhole As Boolean) As Double 'function returns a numerical grade for 2 similar strings, 1.00 being perfect 'by H Grove Dim I As Long, j As Long, k As Long, n(2) As Long Dim c1 As String, c2 As String Const alphanum As String = "1234567890abcdefghijklmnopqrstuvwxyz " s1 = LCase(s1) For I = 1 To Len(s1) If Not InStr(alphanum, Mid(s1, I, 1)) 0 Then Mid(s1, I, 1) = " " Next I s1 = Application.WorksheetFunction.Trim(s1) s2 = LCase(s2) For j = 1 To Len(s2) If Not InStr(alphanum, Mid(s2, j, 1)) 0 Then Mid(s2, j, 1) = " " Next j s2 = Application.WorksheetFunction.Trim(s2) j = 1 n(1) = 0 For I = 1 To Len(s1) c1 = LCase(Mid(s1, I, 1)) k = 0 Do c2 = LCase(Mid(s2, j + k, 1)) k = k + 1 Loop Until j + k Len(s2) Or c1 = c2 If c1 = c2 Then n(1) = n(1) + 1 If j < Len(s2) Then j = j + 1 Else Exit For End If Next I I = 1 n(2) = 0 For j = 1 To Len(s2) c2 = LCase(Mid(s2, j, 1)) k = 0 Do c1 = LCase(Mid(s1, I + k, 1)) k = k + 1 Loop Until I + k Len(s1) Or c1 = c2 If c1 = c2 Then n(2) = n(2) + 1 If I < Len(s1) Then I = I + 1 Else Exit For End If Next j If FindWhole Then StrSimilar = CDbl(Application.WorksheetFunction.Min(n(1), n(2))) _ / CDbl(Application.WorksheetFunction.Max(Len(s1), Len(s2))) Else StrSimilar = CDbl(n(2)) / Len(s2) End If End Function "Tom Ogilvy" wrote in message ... While your waiting for a good answer, as an excercise - Mark the uniques and for the duplicates, list which unique it duplicates: Frehu Freuh Erueh Fruit Freh Rfeuh Ereuh Feurh Fruhe Furth Feruh Ferhu Fruh Frueh Fureh Now right down the rules you used to do it - define "our purposes". then maybe someone will have some ideas. -- Regards, Tom Ogilvy "ryguy7272" wrote: My supervisor and I are trying to come up with a way of finding and eliminating, all duplicates from a large Excel worksheet which came from BCM in Outlook. I am contemplating a few different ways of doing this. I found a free utility on www.downloads.com that allows a user to do a 'complex filter' to eliminate duplicates, but it's not really giving me the results I am looking for. I sorted by Company (Column F) and then by last name (Column D) and then by first name (Column B). I am now wondering if there is a way to use some kind of fuzzy logic to do a search for values that are almost unique and hide the remainder of the rows, or almost duplicate and hide the remainder of the rows. The issue is that Excel identifies lots of 'unique' records because it identifies two people with two different office addresses as two different records, but for our purposes this is one contact. Similarly, a contact's name could be spelled Freuh, in the personal contacts part of BCM, and the name could also be spelled in Frueh, in the public contacts part of BCM. Again, these are two 'unique' records, but again, for our purposes this is one contact. All company names are listed in Column F, all last names are in Column D, and all first names are in Column B. I would like to copy/paste all data on all rows with unique records (F, D, & B) to a new sheet, or hide all rows with dupes. Any ideas? Would Access be able to handle this? Thanks, as always! Ryan--- -- RyGuy |
#7
Posted to microsoft.public.excel.programming
|
|||
|
|||
Find Unique Records; Must Use Fuzzy Logic
You're welcome. Good luck with that. James
"ryguy7272" wrote in message ... Thanks Tom, and thanks Zone, sweet function. However, I think I need to do this thing (somewhat the old fashioned way). I have had ASAP utilities for quite some time, but haven't used it much. I am using it now, and I am seeing that it is quite helpful for this current project, but it will take several iterations of applying filters, and doing sorting, and some other things. Anyway, here is a link for others who want to obtain this valuable utility: http://www.asap-utilities.com/downlo...-utilities.php Regards, Ryan-- -- RyGuy "Zone" wrote: Tom, so you test-drove it, huh? Four matching letters out of five would be 80%, so that seems reasonable. James "Tom Ogilvy" wrote in message ... MsgBox StrSimilar("Freud", "Frdue", True) gives a 0.8 as well. -- Regards, Tom Ogilvy "Zone" wrote: Harlan Grove sent me this function years ago when I was trying to compare two strings to see how similar they were. I also included a simple sub to run it. It returns a number from 0 to 1, depending on how similar it thinks the strings are. Works good. The three arguments are the strings to be compared and a boolean that tells it whether to match the whole string or any part of it. I haven't used it in a while now, so I'd advise just setting the 3rd argument to True for now. Hope this helps! James Sub TrySimilar() MsgBox StrSimilar("Frued", "Freud", True) End Sub Function StrSimilar(s1 As String, s2 As String, FindWhole As Boolean) As Double 'function returns a numerical grade for 2 similar strings, 1.00 being perfect 'by H Grove Dim I As Long, j As Long, k As Long, n(2) As Long Dim c1 As String, c2 As String Const alphanum As String = "1234567890abcdefghijklmnopqrstuvwxyz " s1 = LCase(s1) For I = 1 To Len(s1) If Not InStr(alphanum, Mid(s1, I, 1)) 0 Then Mid(s1, I, 1) = " " Next I s1 = Application.WorksheetFunction.Trim(s1) s2 = LCase(s2) For j = 1 To Len(s2) If Not InStr(alphanum, Mid(s2, j, 1)) 0 Then Mid(s2, j, 1) = " " Next j s2 = Application.WorksheetFunction.Trim(s2) j = 1 n(1) = 0 For I = 1 To Len(s1) c1 = LCase(Mid(s1, I, 1)) k = 0 Do c2 = LCase(Mid(s2, j + k, 1)) k = k + 1 Loop Until j + k Len(s2) Or c1 = c2 If c1 = c2 Then n(1) = n(1) + 1 If j < Len(s2) Then j = j + 1 Else Exit For End If Next I I = 1 n(2) = 0 For j = 1 To Len(s2) c2 = LCase(Mid(s2, j, 1)) k = 0 Do c1 = LCase(Mid(s1, I + k, 1)) k = k + 1 Loop Until I + k Len(s1) Or c1 = c2 If c1 = c2 Then n(2) = n(2) + 1 If I < Len(s1) Then I = I + 1 Else Exit For End If Next j If FindWhole Then StrSimilar = CDbl(Application.WorksheetFunction.Min(n(1), n(2))) _ / CDbl(Application.WorksheetFunction.Max(Len(s1), Len(s2))) Else StrSimilar = CDbl(n(2)) / Len(s2) End If End Function "Tom Ogilvy" wrote in message ... While your waiting for a good answer, as an excercise - Mark the uniques and for the duplicates, list which unique it duplicates: Frehu Freuh Erueh Fruit Freh Rfeuh Ereuh Feurh Fruhe Furth Feruh Ferhu Fruh Frueh Fureh Now right down the rules you used to do it - define "our purposes". then maybe someone will have some ideas. -- Regards, Tom Ogilvy "ryguy7272" wrote: My supervisor and I are trying to come up with a way of finding and eliminating, all duplicates from a large Excel worksheet which came from BCM in Outlook. I am contemplating a few different ways of doing this. I found a free utility on www.downloads.com that allows a user to do a 'complex filter' to eliminate duplicates, but it's not really giving me the results I am looking for. I sorted by Company (Column F) and then by last name (Column D) and then by first name (Column B). I am now wondering if there is a way to use some kind of fuzzy logic to do a search for values that are almost unique and hide the remainder of the rows, or almost duplicate and hide the remainder of the rows. The issue is that Excel identifies lots of 'unique' records because it identifies two people with two different office addresses as two different records, but for our purposes this is one contact. Similarly, a contact's name could be spelled Freuh, in the personal contacts part of BCM, and the name could also be spelled in Frueh, in the public contacts part of BCM. Again, these are two 'unique' records, but again, for our purposes this is one contact. All company names are listed in Column F, all last names are in Column D, and all first names are in Column B. I would like to copy/paste all data on all rows with unique records (F, D, & B) to a new sheet, or hide all rows with dupes. Any ideas? Would Access be able to handle this? Thanks, as always! Ryan--- -- RyGuy |
#8
Posted to microsoft.public.excel.programming
|
|||
|
|||
Find Unique Records; Must Use Fuzzy Logic
One more thing...for anyone who stumbles upon this post sometime in the future.
I used this utility (below), along with the other one (above), and got pretty darn close to where I wanted to be. http://members.iinet.net.au/~brettdj...l_Column_Match Regards, Ryan-- -- RyGuy "Zone" wrote: You're welcome. Good luck with that. James "ryguy7272" wrote in message ... Thanks Tom, and thanks Zone, sweet function. However, I think I need to do this thing (somewhat the old fashioned way). I have had ASAP utilities for quite some time, but haven't used it much. I am using it now, and I am seeing that it is quite helpful for this current project, but it will take several iterations of applying filters, and doing sorting, and some other things. Anyway, here is a link for others who want to obtain this valuable utility: http://www.asap-utilities.com/downlo...-utilities.php Regards, Ryan-- -- RyGuy "Zone" wrote: Tom, so you test-drove it, huh? Four matching letters out of five would be 80%, so that seems reasonable. James "Tom Ogilvy" wrote in message ... MsgBox StrSimilar("Freud", "Frdue", True) gives a 0.8 as well. -- Regards, Tom Ogilvy "Zone" wrote: Harlan Grove sent me this function years ago when I was trying to compare two strings to see how similar they were. I also included a simple sub to run it. It returns a number from 0 to 1, depending on how similar it thinks the strings are. Works good. The three arguments are the strings to be compared and a boolean that tells it whether to match the whole string or any part of it. I haven't used it in a while now, so I'd advise just setting the 3rd argument to True for now. Hope this helps! James Sub TrySimilar() MsgBox StrSimilar("Frued", "Freud", True) End Sub Function StrSimilar(s1 As String, s2 As String, FindWhole As Boolean) As Double 'function returns a numerical grade for 2 similar strings, 1.00 being perfect 'by H Grove Dim I As Long, j As Long, k As Long, n(2) As Long Dim c1 As String, c2 As String Const alphanum As String = "1234567890abcdefghijklmnopqrstuvwxyz " s1 = LCase(s1) For I = 1 To Len(s1) If Not InStr(alphanum, Mid(s1, I, 1)) 0 Then Mid(s1, I, 1) = " " Next I s1 = Application.WorksheetFunction.Trim(s1) s2 = LCase(s2) For j = 1 To Len(s2) If Not InStr(alphanum, Mid(s2, j, 1)) 0 Then Mid(s2, j, 1) = " " Next j s2 = Application.WorksheetFunction.Trim(s2) j = 1 n(1) = 0 For I = 1 To Len(s1) c1 = LCase(Mid(s1, I, 1)) k = 0 Do c2 = LCase(Mid(s2, j + k, 1)) k = k + 1 Loop Until j + k Len(s2) Or c1 = c2 If c1 = c2 Then n(1) = n(1) + 1 If j < Len(s2) Then j = j + 1 Else Exit For End If Next I I = 1 n(2) = 0 For j = 1 To Len(s2) c2 = LCase(Mid(s2, j, 1)) k = 0 Do c1 = LCase(Mid(s1, I + k, 1)) k = k + 1 Loop Until I + k Len(s1) Or c1 = c2 If c1 = c2 Then n(2) = n(2) + 1 If I < Len(s1) Then I = I + 1 Else Exit For End If Next j If FindWhole Then StrSimilar = CDbl(Application.WorksheetFunction.Min(n(1), n(2))) _ / CDbl(Application.WorksheetFunction.Max(Len(s1), Len(s2))) Else StrSimilar = CDbl(n(2)) / Len(s2) End If End Function "Tom Ogilvy" wrote in message ... While your waiting for a good answer, as an excercise - Mark the uniques and for the duplicates, list which unique it duplicates: Frehu Freuh Erueh Fruit Freh Rfeuh Ereuh Feurh Fruhe Furth Feruh Ferhu Fruh Frueh Fureh Now right down the rules you used to do it - define "our purposes". then maybe someone will have some ideas. -- Regards, Tom Ogilvy "ryguy7272" wrote: My supervisor and I are trying to come up with a way of finding and eliminating, all duplicates from a large Excel worksheet which came from BCM in Outlook. I am contemplating a few different ways of doing this. I found a free utility on www.downloads.com that allows a user to do a 'complex filter' to eliminate duplicates, but it's not really giving me the results I am looking for. I sorted by Company (Column F) and then by last name (Column D) and then by first name (Column B). I am now wondering if there is a way to use some kind of fuzzy logic to do a search for values that are almost unique and hide the remainder of the rows, or almost duplicate and hide the remainder of the rows. The issue is that Excel identifies lots of 'unique' records because it identifies two people with two different office addresses as two different records, but for our purposes this is one contact. Similarly, a contact's name could be spelled Freuh, in the personal contacts part of BCM, and the name could also be spelled in Frueh, in the public contacts part of BCM. Again, these are two 'unique' records, but again, for our purposes this is one contact. All company names are listed in Column F, all last names are in Column D, and all first names are in Column B. I would like to copy/paste all data on all rows with unique records (F, D, & B) to a new sheet, or hide all rows with dupes. Any ideas? Would Access be able to handle this? Thanks, as always! Ryan--- -- RyGuy |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
find unique records | Excel Discussion (Misc queries) | |||
find unique records | Excel Discussion (Misc queries) | |||
Routine with fuzzy logic to determine the relative comparison of two strings? | Excel Worksheet Functions | |||
Find Unique Records from two sheets | Excel Programming | |||
Using fuzzy logic in excel | Excel Discussion (Misc queries) |