Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 16
Default Highlight/remove duplicate rows

I have a spreadsheet with about 30,000 records in it, that has names &
addresses, as well as ID numbers in it.

The problem is, there are some duplicates in it.

Not strictly duplicates, but duplicates nonetheless.

The id numbers are unique, but other columns may only be similar.

They'd have the same City, State & Zip, but there may be differences like:

Bud's Country Store
123 N Main St
Podunk, IA 40444

Buds Country Store
123 North Main Street
Podunk, IA 40444

Even though the ID number would be unique for each of them, these are
duplicate records, even though there are slight differences in the way
the name is listed or even in the address line.

Is there a macro that I can use to highlight & remove duplicates in a
case like this?

Thanks,

BC
  #2   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 9,101
Default Highlight/remove duplicate rows

I would write a routine that compared company name, street address, and
city/state and when two of three match, copy these rows to a check sheet.
Then manually remove items from the check sheet that aren't duplicates.
Finally write a second script that removes the remaining item on the check
sheet.

I've used this process many times.

"Blasting Cap" wrote:

I have a spreadsheet with about 30,000 records in it, that has names &
addresses, as well as ID numbers in it.

The problem is, there are some duplicates in it.

Not strictly duplicates, but duplicates nonetheless.

The id numbers are unique, but other columns may only be similar.

They'd have the same City, State & Zip, but there may be differences like:

Bud's Country Store
123 N Main St
Podunk, IA 40444

Buds Country Store
123 North Main Street
Podunk, IA 40444

Even though the ID number would be unique for each of them, these are
duplicate records, even though there are slight differences in the way
the name is listed or even in the address line.

Is there a macro that I can use to highlight & remove duplicates in a
case like this?

Thanks,

BC

  #3   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 16
Default Highlight/remove duplicate rows

Can you post some sample code that's part of your routine?

I'm not sure if the 2 of 3 match would not give me a lot of false
positives. For instance, if I were comparing grocery stores, I could
conceivably have any number of correct matches within the same city.

What I need to do is to remove/standardize differences - Street becomes
ST., Avenue becomes AVE and so on.

BC



Joel wrote:
I would write a routine that compared company name, street address, and
city/state and when two of three match, copy these rows to a check sheet.
Then manually remove items from the check sheet that aren't duplicates.
Finally write a second script that removes the remaining item on the check
sheet.

I've used this process many times.

"Blasting Cap" wrote:

I have a spreadsheet with about 30,000 records in it, that has names &
addresses, as well as ID numbers in it.

The problem is, there are some duplicates in it.

Not strictly duplicates, but duplicates nonetheless.

The id numbers are unique, but other columns may only be similar.

They'd have the same City, State & Zip, but there may be differences like:

Bud's Country Store
123 N Main St
Podunk, IA 40444

Buds Country Store
123 North Main Street
Podunk, IA 40444

Even though the ID number would be unique for each of them, these are
duplicate records, even though there are slight differences in the way
the name is listed or even in the address line.

Is there a macro that I can use to highlight & remove duplicates in a
case like this?

Thanks,

BC

  #4   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 9,101
Default Highlight/remove duplicate rows

This is a good example. It is a quick test. See how well it works. Modify
as necessary. Add a workseet DupNames to check results. The code will
probably take a few minutes to run with 10000 lines. It is comapring every
line with every other line with is (Rows * (Rows - 1))/2

Sub cmpcompnames()
Const DupNames = "DupNames"

LastRows = Cells(Rows.Count, "A").End(xlUp).Row

Duprows = 1
With Worksheet(DupNames)

For I = 1 To (LastRows - 1)
For J = (I + 1) To LastRow

CompareCount = 0
If Cells(I, "A") = Cells(J, "A") Then
CompareCount = 1
End If
If Cells(I, "B") = Cells(J, "B") Then
CompareCount = CompareCount + 1
End If
If Cells(I, "C") = Cells(J, "C") Then
CompareCount = CompareCount + 1
End If

'check for near and perfect matches
If CompareCount 2 Then
.Cells(RowCount, "A") = Cells(I, "A")
.Cells(RowCount, "B") = Cells(I, "B")
.Cells(RowCount, "C") = Cells(I, "C")
.Cells(RowCount, "D") = Cells(J, "A")
.Cells(RowCount, "E") = Cells(J, "B")
.Cells(RowCount, "F") = Cells(J, "C")
Duprows = Duprows + 1
End If
Next J
Next I
End With

End Sub


"Blasting Cap" wrote:

Can you post some sample code that's part of your routine?

I'm not sure if the 2 of 3 match would not give me a lot of false
positives. For instance, if I were comparing grocery stores, I could
conceivably have any number of correct matches within the same city.

What I need to do is to remove/standardize differences - Street becomes
ST., Avenue becomes AVE and so on.

BC



Joel wrote:
I would write a routine that compared company name, street address, and
city/state and when two of three match, copy these rows to a check sheet.
Then manually remove items from the check sheet that aren't duplicates.
Finally write a second script that removes the remaining item on the check
sheet.

I've used this process many times.

"Blasting Cap" wrote:

I have a spreadsheet with about 30,000 records in it, that has names &
addresses, as well as ID numbers in it.

The problem is, there are some duplicates in it.

Not strictly duplicates, but duplicates nonetheless.

The id numbers are unique, but other columns may only be similar.

They'd have the same City, State & Zip, but there may be differences like:

Bud's Country Store
123 N Main St
Podunk, IA 40444

Buds Country Store
123 North Main Street
Podunk, IA 40444

Even though the ID number would be unique for each of them, these are
duplicate records, even though there are slight differences in the way
the name is listed or even in the address line.

Is there a macro that I can use to highlight & remove duplicates in a
case like this?

Thanks,

BC


Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
remove duplicate rows but keeping all columns DC Excel Worksheet Functions 3 June 10th 09 03:10 AM
highlight duplicate rows James Excel Discussion (Misc queries) 4 October 10th 08 02:43 PM
How to Highlight Two Rows With Two Columns of Duplicate Values JSteph Excel Worksheet Functions 1 December 12th 07 01:59 AM
How do I remove Duplicate rows? 85225 Excel Discussion (Misc queries) 15 March 9th 07 11:41 PM
Remove duplicate rows and sum cells ToddG[_4_] Excel Programming 7 September 27th 04 09:55 PM


All times are GMT +1. The time now is 04:07 PM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright ©2004-2025 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"