ExcelBanter

ExcelBanter (https://www.excelbanter.com/)
-   Excel Discussion (Misc queries) (https://www.excelbanter.com/excel-discussion-misc-queries/)
-   -   delete duplicate entries in a large worksheet (https://www.excelbanter.com/excel-discussion-misc-queries/216995-delete-duplicate-entries-large-worksheet.html)

David

delete duplicate entries in a large worksheet
 
Does anyone know a way to delete rows with duplicate data in a document with
over 60,000 rows? Some data is duplicated as much as 10 times and some as
few as once or not at all.
--
Thanks,
David

Bernard Liengme

delete duplicate entries in a large worksheet
 
Visit Chip at www.cpearson.com.
He has lots of good stuff on the subject of duplicates
best wishes
--
Bernard V Liengme
Microsoft Excel MVP
http://people.stfx.ca/bliengme
remove caps from email

"David" wrote in message
...
Does anyone know a way to delete rows with duplicate data in a document
with
over 60,000 rows? Some data is duplicated as much as 10 times and some as
few as once or not at all.
--
Thanks,
David




xlmate

delete duplicate entries in a large worksheet
 
try using Advance Filter

1. Go to Data on the menu bar
2. Select Filter
3. Chosse Advance Filter
4. check Copy to another location ( to keep the original data )
5. Select the range
6. Select a cell on the same worksheet which is out of the existing data range
7. check the Unique records only
8. click OK

HTH
--
Your feedback is very much appreciate, pls click on the Yes button below if
this posting is helpful.

Thank You

cheers, francis









"David" wrote:

Does anyone know a way to delete rows with duplicate data in a document with
over 60,000 rows? Some data is duplicated as much as 10 times and some as
few as once or not at all.
--
Thanks,
David


Chip Pearson

delete duplicate entries in a large worksheet
 
Try some code like the following:


Sub AAA()

Dim LastRow As Long
Dim TestColumn As String
Dim RowNdx As Long
Dim TopRow As Long
Dim WS As Worksheet
Dim DeleteThese As Range

Set WS = ActiveSheet
TestColumn = "A" '<<<< column to test for duplicates
TopRow = 1 '<<<< top-most row of data to test.

With WS
LastRow = .Cells(.Rows.Count, TestColumn).End(xlUp).Row
For RowNdx = LastRow To TopRow Step -1
If Application.CountIf(.Range(.Cells(TopRow, TestColumn), _
.Cells(RowNdx, TestColumn)), _
.Cells(RowNdx, TestColumn)) 1 Then
If DeleteThese Is Nothing Then
Set DeleteThese = .Rows(RowNdx)
Else
Set DeleteThese = _
Application.Union(DeleteThese, .Rows(RowNdx))
End If
End If
Next RowNdx
End With
If Not DeleteThese Is Nothing Then
DeleteThese.Delete
End If

End Sub


Change TestColumn to the letter of the column that is to be used to
test for duplicates. Change TopRow to a value 1 if your worksheet
has some header rows that should not be tested.

Cordially,
Chip Pearson
Microsoft Most Valuable Professional
Excel Product Group, 1998 - 2009
Pearson Software Consulting, LLC
www.cpearson.com
(email on web site)


On Mon, 19 Jan 2009 05:53:02 -0800, David
wrote:

Does anyone know a way to delete rows with duplicate data in a document with
over 60,000 rows? Some data is duplicated as much as 10 times and some as
few as once or not at all.


JB

delete duplicate entries in a large worksheet
 
For Sort BD:

Sub DeleteDuplicateClassic()
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
[A1].Sort Key1:=Range("A2"), Order1:=xlAscending, Header:=xlGuess
For i = [A65000].End(xlUp).Row To 2 Step -1
If Cells(i, 1) = Cells(i - 1, 1) Then Rows(i).Delete
Next i
Application.Calculation = xlCalculationAutomatic
End Sub

Respect Order:2 s for 10.0000 rows and suppress 5%

Sub RespectOrderDictionary()
Set MonDico = CreateObject("Scripting.Dictionary")
Application.ScreenUpdating = False
i = 2
Do While Cells(i, "A") < ""
If Not MonDico.Exists(Cells(i, "A") & Cells(i, "C")) Then
MonDico.Add Cells(i, "A") & Cells(i, "C"), Cells(i, "A") &
Cells(i, "C")
i = i + 1
Else
Rows(i).EntireRow.Delete
End If
Loop
End Sub

1,17 sec for 10.000 rows and 80% suppress:

Sub DeleteDuplicateQuick()
t = Timer()
Application.ScreenUpdating = False
[A1].Sort Key1:=Range("A2"), Order1:=xlAscending, _
Header:=xlGuess
Columns("b:b").Insert Shift:=xlToRight
[B1] = "ColB"
[B2].FormulaR1C1 = "=IF(RC[-1]=R[-1]C[-1],1,0)"
[B2].AutoFill Destination:=Range("B2:B" & [A65000].End(xlUp).Row)
[B:B].Value = [B:B].Value
[A2].CurrentRegion.Sort Key1:=Range("B2"), Order1:=xlAscending,
Header:=xlGuess
[B:B].Replace What:="1", Replacement:="", LookAt:=xlPart
Range("B2:B65000").SpecialCells(xlCellTypeBlanks). EntireRow.Delete
Columns("b:b").Delete Shift:=xlToLeft
MsgBox Timer() - t
End Sub

http://cjoint.com/?bvpZzDBano

JB
http://boisgontierjacques.free.fr


On 19 jan, 14:53, David wrote:
Does anyone know a way to delete rows with duplicate data in a document with
over 60,000 rows? *Some data is duplicated as much as 10 times and some as
few as once or not at all.
--
Thanks,
David



David

delete duplicate entries in a large worksheet
 
Thanks for the responses everyone. Because I don't know anything about VBA
I'm going to try this option first. One follow up question. Is there a way
to tell how many rows were filtered or how many remain?
--
Thanks,
David


"xlmate" wrote:

try using Advance Filter

1. Go to Data on the menu bar
2. Select Filter
3. Chosse Advance Filter
4. check Copy to another location ( to keep the original data )
5. Select the range
6. Select a cell on the same worksheet which is out of the existing data range
7. check the Unique records only
8. click OK

HTH
--
Your feedback is very much appreciate, pls click on the Yes button below if
this posting is helpful.

Thank You

cheers, francis









"David" wrote:

Does anyone know a way to delete rows with duplicate data in a document with
over 60,000 rows? Some data is duplicated as much as 10 times and some as
few as once or not at all.
--
Thanks,
David


xlmate

delete duplicate entries in a large worksheet
 
Hi David,
You may also try this formula
I am assuming that your range data is in Col A and duplicates can be found
in the same Col. Change the range to yours

=IF(COUNTIF($A$2:A2,A2)1,"TRUE","")

copy this formula and select a cell, let C2, Ctrl-V into the formula bar.
Copy as far down as your data is.
Filter for Blanks in Col C, Excel will show all entries with a blank in Col C
Copy this set of data to a new sheet.

or you can filter for "TRUE" in col C and delete all entries with the word
"TRUE"
either way will give you a set of unique data range.

pls note to save a back up before doing any of the suggestions.

To count the number of entries, use =count(A:A)

HTH
--
Your feedback is very much appreciate, pls click on the Yes button below if
this posting is helpful.

Thank You

cheers, francis



"David" wrote:

Thanks for the responses everyone. Because I don't know anything about VBA
I'm going to try this option first. One follow up question. Is there a way
to tell how many rows were filtered or how many remain?
--
Thanks,
David


"xlmate" wrote:

try using Advance Filter

1. Go to Data on the menu bar
2. Select Filter
3. Chosse Advance Filter
4. check Copy to another location ( to keep the original data )
5. Select the range
6. Select a cell on the same worksheet which is out of the existing data range
7. check the Unique records only
8. click OK

HTH
--
Your feedback is very much appreciate, pls click on the Yes button below if
this posting is helpful.

Thank You

cheers, francis









"David" wrote:

Does anyone know a way to delete rows with duplicate data in a document with
over 60,000 rows? Some data is duplicated as much as 10 times and some as
few as once or not at all.
--
Thanks,
David



All times are GMT +1. The time now is 06:02 AM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com