About Us

Blasting Cap

I have a spreadsheet with about 30,000 records in it, that has names &
addresses, as well as ID numbers in it.

The problem is, there are some duplicates in it.

Not strictly duplicates, but duplicates nonetheless.

The id numbers are unique, but other columns may only be similar.

They'd have the same City, State & Zip, but there may be differences like:

Bud's Country Store
123 N Main St
Podunk, IA 40444

Buds Country Store
123 North Main Street
Podunk, IA 40444

Even though the ID number would be unique for each of them, these are
duplicate records, even though there are slight differences in the way
the name is listed or even in the address line.

Is there a macro that I can use to highlight & remove duplicates in a
case like this?

Thanks,

BC

joel

I would write a routine that compared company name, street address, and
city/state and when two of three match, copy these rows to a check sheet.
Then manually remove items from the check sheet that aren't duplicates.
Finally write a second script that removes the remaining item on the check
sheet.

I've used this process many times.

"Blasting Cap" wrote:

I have a spreadsheet with about 30,000 records in it, that has names &
addresses, as well as ID numbers in it.

The problem is, there are some duplicates in it.

Not strictly duplicates, but duplicates nonetheless.

The id numbers are unique, but other columns may only be similar.

They'd have the same City, State & Zip, but there may be differences like:

Bud's Country Store
123 N Main St
Podunk, IA 40444

Buds Country Store
123 North Main Street
Podunk, IA 40444

Even though the ID number would be unique for each of them, these are
duplicate records, even though there are slight differences in the way
the name is listed or even in the address line.

Is there a macro that I can use to highlight & remove duplicates in a
case like this?

Thanks,

BC

Blasting Cap

Can you post some sample code that's part of your routine?

I'm not sure if the 2 of 3 match would not give me a lot of false
positives. For instance, if I were comparing grocery stores, I could
conceivably have any number of correct matches within the same city.

What I need to do is to remove/standardize differences - Street becomes
ST., Avenue becomes AVE and so on.

BC

Joel wrote:
I would write a routine that compared company name, street address, and
city/state and when two of three match, copy these rows to a check sheet.
Then manually remove items from the check sheet that aren't duplicates.
Finally write a second script that removes the remaining item on the check
sheet.

I've used this process many times.

"Blasting Cap" wrote:

I have a spreadsheet with about 30,000 records in it, that has names &
addresses, as well as ID numbers in it.

The problem is, there are some duplicates in it.

Not strictly duplicates, but duplicates nonetheless.

The id numbers are unique, but other columns may only be similar.

They'd have the same City, State & Zip, but there may be differences like:

Bud's Country Store
123 N Main St
Podunk, IA 40444

Buds Country Store
123 North Main Street
Podunk, IA 40444

Even though the ID number would be unique for each of them, these are
duplicate records, even though there are slight differences in the way
the name is listed or even in the address line.

Is there a macro that I can use to highlight & remove duplicates in a
case like this?

Thanks,

BC

joel

This is a good example. It is a quick test. See how well it works. Modify
as necessary. Add a workseet DupNames to check results. The code will
probably take a few minutes to run with 10000 lines. It is comapring every
line with every other line with is (Rows * (Rows - 1))/2

Sub cmpcompnames()
Const DupNames = "DupNames"

LastRows = Cells(Rows.Count, "A").End(xlUp).Row

Duprows = 1
With Worksheet(DupNames)

For I = 1 To (LastRows - 1)
For J = (I + 1) To LastRow

CompareCount = 0
If Cells(I, "A") = Cells(J, "A") Then
CompareCount = 1
End If
If Cells(I, "B") = Cells(J, "B") Then
CompareCount = CompareCount + 1
End If
If Cells(I, "C") = Cells(J, "C") Then
CompareCount = CompareCount + 1
End If

'check for near and perfect matches
If CompareCount 2 Then
.Cells(RowCount, "A") = Cells(I, "A")
.Cells(RowCount, "B") = Cells(I, "B")
.Cells(RowCount, "C") = Cells(I, "C")
.Cells(RowCount, "D") = Cells(J, "A")
.Cells(RowCount, "E") = Cells(J, "B")
.Cells(RowCount, "F") = Cells(J, "C")
Duprows = Duprows + 1
End If
Next J
Next I
End With

End Sub

"Blasting Cap" wrote:

Can you post some sample code that's part of your routine?

I'm not sure if the 2 of 3 match would not give me a lot of false
positives. For instance, if I were comparing grocery stores, I could
conceivably have any number of correct matches within the same city.

What I need to do is to remove/standardize differences - Street becomes
ST., Avenue becomes AVE and so on.

BC

Joel wrote:
I would write a routine that compared company name, street address, and
city/state and when two of three match, copy these rows to a check sheet.
Then manually remove items from the check sheet that aren't duplicates.
Finally write a second script that removes the remaining item on the check
sheet.

I've used this process many times.

"Blasting Cap" wrote:

I have a spreadsheet with about 30,000 records in it, that has names &
addresses, as well as ID numbers in it.

The problem is, there are some duplicates in it.

Not strictly duplicates, but duplicates nonetheless.

The id numbers are unique, but other columns may only be similar.

They'd have the same City, State & Zip, but there may be differences like:

Bud's Country Store
123 N Main St
Podunk, IA 40444

Buds Country Store
123 North Main Street
Podunk, IA 40444

Even though the ID number would be unique for each of them, these are
duplicate records, even though there are slight differences in the way
the name is listed or even in the address line.

Is there a macro that I can use to highlight & remove duplicates in a
case like this?

Thanks,

BC

Thread Tools	Search this Thread
Show Printable Version	Search this Thread: Advanced Search
Display Modes
Switch to Linear Mode Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
remove duplicate rows but keeping all columns	DC	Excel Worksheet Functions	3	June 10th 09 03:10 AM
highlight duplicate rows	James	Excel Discussion (Misc queries)	4	October 10th 08 02:43 PM
How to Highlight Two Rows With Two Columns of Duplicate Values	JSteph	Excel Worksheet Functions	1	December 12th 07 01:59 AM
How do I remove Duplicate rows?	85225	Excel Discussion (Misc queries)	15	March 9th 07 11:41 PM
Remove duplicate rows and sum cells	ToddG[_4_]	Excel Programming	7	September 27th 04 09:55 PM

Menu

About Us