Home |
Search |
Today's Posts |
|
#1
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
I have a spreadsheet with about 30,000 records in it, that has names &
addresses, as well as ID numbers in it. The problem is, there are some duplicates in it. Not strictly duplicates, but duplicates nonetheless. The id numbers are unique, but other columns may only be similar. They'd have the same City, State & Zip, but there may be differences like: Bud's Country Store 123 N Main St Podunk, IA 40444 Buds Country Store 123 North Main Street Podunk, IA 40444 Even though the ID number would be unique for each of them, these are duplicate records, even though there are slight differences in the way the name is listed or even in the address line. Is there a macro that I can use to highlight & remove duplicates in a case like this? Thanks, BC |
#2
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
I would write a routine that compared company name, street address, and
city/state and when two of three match, copy these rows to a check sheet. Then manually remove items from the check sheet that aren't duplicates. Finally write a second script that removes the remaining item on the check sheet. I've used this process many times. "Blasting Cap" wrote: I have a spreadsheet with about 30,000 records in it, that has names & addresses, as well as ID numbers in it. The problem is, there are some duplicates in it. Not strictly duplicates, but duplicates nonetheless. The id numbers are unique, but other columns may only be similar. They'd have the same City, State & Zip, but there may be differences like: Bud's Country Store 123 N Main St Podunk, IA 40444 Buds Country Store 123 North Main Street Podunk, IA 40444 Even though the ID number would be unique for each of them, these are duplicate records, even though there are slight differences in the way the name is listed or even in the address line. Is there a macro that I can use to highlight & remove duplicates in a case like this? Thanks, BC |
#3
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
Can you post some sample code that's part of your routine?
I'm not sure if the 2 of 3 match would not give me a lot of false positives. For instance, if I were comparing grocery stores, I could conceivably have any number of correct matches within the same city. What I need to do is to remove/standardize differences - Street becomes ST., Avenue becomes AVE and so on. BC Joel wrote: I would write a routine that compared company name, street address, and city/state and when two of three match, copy these rows to a check sheet. Then manually remove items from the check sheet that aren't duplicates. Finally write a second script that removes the remaining item on the check sheet. I've used this process many times. "Blasting Cap" wrote: I have a spreadsheet with about 30,000 records in it, that has names & addresses, as well as ID numbers in it. The problem is, there are some duplicates in it. Not strictly duplicates, but duplicates nonetheless. The id numbers are unique, but other columns may only be similar. They'd have the same City, State & Zip, but there may be differences like: Bud's Country Store 123 N Main St Podunk, IA 40444 Buds Country Store 123 North Main Street Podunk, IA 40444 Even though the ID number would be unique for each of them, these are duplicate records, even though there are slight differences in the way the name is listed or even in the address line. Is there a macro that I can use to highlight & remove duplicates in a case like this? Thanks, BC |
#4
![]()
Posted to microsoft.public.excel.programming
|
|||
|
|||
![]()
This is a good example. It is a quick test. See how well it works. Modify
as necessary. Add a workseet DupNames to check results. The code will probably take a few minutes to run with 10000 lines. It is comapring every line with every other line with is (Rows * (Rows - 1))/2 Sub cmpcompnames() Const DupNames = "DupNames" LastRows = Cells(Rows.Count, "A").End(xlUp).Row Duprows = 1 With Worksheet(DupNames) For I = 1 To (LastRows - 1) For J = (I + 1) To LastRow CompareCount = 0 If Cells(I, "A") = Cells(J, "A") Then CompareCount = 1 End If If Cells(I, "B") = Cells(J, "B") Then CompareCount = CompareCount + 1 End If If Cells(I, "C") = Cells(J, "C") Then CompareCount = CompareCount + 1 End If 'check for near and perfect matches If CompareCount 2 Then .Cells(RowCount, "A") = Cells(I, "A") .Cells(RowCount, "B") = Cells(I, "B") .Cells(RowCount, "C") = Cells(I, "C") .Cells(RowCount, "D") = Cells(J, "A") .Cells(RowCount, "E") = Cells(J, "B") .Cells(RowCount, "F") = Cells(J, "C") Duprows = Duprows + 1 End If Next J Next I End With End Sub "Blasting Cap" wrote: Can you post some sample code that's part of your routine? I'm not sure if the 2 of 3 match would not give me a lot of false positives. For instance, if I were comparing grocery stores, I could conceivably have any number of correct matches within the same city. What I need to do is to remove/standardize differences - Street becomes ST., Avenue becomes AVE and so on. BC Joel wrote: I would write a routine that compared company name, street address, and city/state and when two of three match, copy these rows to a check sheet. Then manually remove items from the check sheet that aren't duplicates. Finally write a second script that removes the remaining item on the check sheet. I've used this process many times. "Blasting Cap" wrote: I have a spreadsheet with about 30,000 records in it, that has names & addresses, as well as ID numbers in it. The problem is, there are some duplicates in it. Not strictly duplicates, but duplicates nonetheless. The id numbers are unique, but other columns may only be similar. They'd have the same City, State & Zip, but there may be differences like: Bud's Country Store 123 N Main St Podunk, IA 40444 Buds Country Store 123 North Main Street Podunk, IA 40444 Even though the ID number would be unique for each of them, these are duplicate records, even though there are slight differences in the way the name is listed or even in the address line. Is there a macro that I can use to highlight & remove duplicates in a case like this? Thanks, BC |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
remove duplicate rows but keeping all columns | Excel Worksheet Functions | |||
highlight duplicate rows | Excel Discussion (Misc queries) | |||
How to Highlight Two Rows With Two Columns of Duplicate Values | Excel Worksheet Functions | |||
How do I remove Duplicate rows? | Excel Discussion (Misc queries) | |||
Remove duplicate rows and sum cells | Excel Programming |