Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
Best Method to Remove Data Rows
I have a large spreadsheet 60 columns by up to 20,000 rows. I wish to
delete all rows where a specific value does not appear in one column. In the past I have used a loop that reads all the rows from last to first, deleting the entire row when the value in the column does not meet the criteria. This works great except for large files - where it is very slow. I had thought about sorting the file and then deleting those groups of records where the criteria is not being met but this requires extra code to identifier each part of the file to be removed. Another idea was to use autofilters, where the negated criteria selection allows me to delete all visible rows. Any ideas as to the best and fastest method, without having to trawl through all the data, row by row? -- Cheers Nigel |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
Best Method to Remove Data Rows
You might want to try turning off automatic calculations.
Nigel wrote: I have a large spreadsheet 60 columns by up to 20,000 rows. I wish to delete all rows where a specific value does not appear in one column. In the past I have used a loop that reads all the rows from last to first, deleting the entire row when the value in the column does not meet the criteria. This works great except for large files - where it is very slow. I had thought about sorting the file and then deleting those groups of records where the criteria is not being met but this requires extra code to identifier each part of the file to be removed. Another idea was to use autofilters, where the negated criteria selection allows me to delete all visible rows. Any ideas as to the best and fastest method, without having to trawl through all the data, row by row? -- Cheers Nigel |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
Best Method to Remove Data Rows
It's a good tip, but I already have that in my existing code. In my
spreadsheet most of the data is passive (very few formulas), but there are some overall counters using subtotal that when switched off speed up the deletions, but only slightly. -- Cheers Nigel "Antchi" wrote in message oups.com... You might want to try turning off automatic calculations. Nigel wrote: I have a large spreadsheet 60 columns by up to 20,000 rows. I wish to delete all rows where a specific value does not appear in one column. In the past I have used a loop that reads all the rows from last to first, deleting the entire row when the value in the column does not meet the criteria. This works great except for large files - where it is very slow. I had thought about sorting the file and then deleting those groups of records where the criteria is not being met but this requires extra code to identifier each part of the file to be removed. Another idea was to use autofilters, where the negated criteria selection allows me to delete all visible rows. Any ideas as to the best and fastest method, without having to trawl through all the data, row by row? -- Cheers Nigel |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
Best Method to Remove Data Rows
Filter your data for the rows you want to delete and then run the following code Sub Remove_Data() Dim rng As Range Dim rng2 As Range With ActiveSheet.AutoFilter.Range On Error Resume Next Set rng2 = .Offset(1, 0).Resize(.Rows.Count - 1, 1) _ .SpecialCells(xlCellTypeVisible) On Error GoTo 0 End With If rng2 Is Nothing Then ' Do Nothing Else ' Delete the rows Set rng = ActiveSheet.AutoFilter.Range rng.Offset(1, 0).Resize(rng.Rows.Count - 1).Delete End If ActiveSheet.AutoFilterMode = False End Sub |
#5
Posted to microsoft.public.excel.programming
|
|||
|
|||
Best Method to Remove Data Rows
Nigel,
Both stepping through the rows and using the autofilter method will be slow with large workbooks, as Excel needs to move all that data around every time a row or group of rows is deleted. You could try this, which I have found to be faster: it sorts twice, each time getting rid of the rows above the desired value, so Excel is only deleting two blocks of rows. Note that you didn't say what your criteria for saving was: hence the use of a variant. This assumes your data starts in row 2 with a header in row 1. Give it a try - and I would be interested in hearing if it made much of a difference in your case. HTH, Bernie MS Excel MVP Sub Macro1() Dim myCell As Range Dim myCol As Integer Dim WhatToSave As Variant Dim myCalc As Variant WhatToSave = "Criteria for saving" myCol = 3 With Application .ScreenUpdating = False myCalc = .Calculation .Calculation = xlCalculationManual .EnableEvents = False End With Columns(myCol).Sort Key1:=Cells(2, myCol), Order1:=xlAscending, Header:=xlYes Set myCell = Columns(myCol).Find(What:=WhatToSave, After:=Cells(1, myCol), LookAt:=xlWhole) If myCell.Row < 2 Then Range(Cells(2, myCol), myCell(0, 1)).EntireRow.Delete Columns(myCol).Sort Key1:=Cells(2, myCol), Order1:=xlDescending, Header:=xlYes Set myCell = Columns(myCol).Find(What:=WhatToSave, After:=Cells(1, myCol), LookAt:=xlWhole) If myCell.Row < 2 Then Range(Cells(2, myCol), myCell(0, 1)).EntireRow.Delete With Application .ScreenUpdating = True .Calculation = myCalc .EnableEvents = True End With End Sub "Nigel" wrote in message ... I have a large spreadsheet 60 columns by up to 20,000 rows. I wish to delete all rows where a specific value does not appear in one column. In the past I have used a loop that reads all the rows from last to first, deleting the entire row when the value in the column does not meet the criteria. This works great except for large files - where it is very slow. I had thought about sorting the file and then deleting those groups of records where the criteria is not being met but this requires extra code to identifier each part of the file to be removed. Another idea was to use autofilters, where the negated criteria selection allows me to delete all visible rows. Any ideas as to the best and fastest method, without having to trawl through all the data, row by row? -- Cheers Nigel |
#6
Posted to microsoft.public.excel.programming
|
|||
|
|||
Best Method to Remove Data Rows
Thanks for the code, this is similar to the method I developed that uses a
filter selection and deletes the rest. But although faster than the scan all rows method it is still quite slow with my worksheet.. But it is an improvement! -- Cheers Nigel wrote in message oups.com... Filter your data for the rows you want to delete and then run the following code Sub Remove_Data() Dim rng As Range Dim rng2 As Range With ActiveSheet.AutoFilter.Range On Error Resume Next Set rng2 = .Offset(1, 0).Resize(.Rows.Count - 1, 1) _ .SpecialCells(xlCellTypeVisible) On Error GoTo 0 End With If rng2 Is Nothing Then ' Do Nothing Else ' Delete the rows Set rng = ActiveSheet.AutoFilter.Range rng.Offset(1, 0).Resize(rng.Rows.Count - 1).Delete End If ActiveSheet.AutoFilterMode = False End Sub |
#7
Posted to microsoft.public.excel.programming
|
|||
|
|||
Best Method to Remove Data Rows
Thanks for the tip, I had come to the same conclusion that using sort was
the best and fastest method. What I had not considered was using two sorts to create the two blocks of data, neat trick. This works the fastest of the three methods I have explored. Thanks for the advice -- Cheers Nigel "Bernie Deitrick" <deitbe @ consumer dot org wrote in message ... Nigel, Both stepping through the rows and using the autofilter method will be slow with large workbooks, as Excel needs to move all that data around every time a row or group of rows is deleted. You could try this, which I have found to be faster: it sorts twice, each time getting rid of the rows above the desired value, so Excel is only deleting two blocks of rows. Note that you didn't say what your criteria for saving was: hence the use of a variant. This assumes your data starts in row 2 with a header in row 1. Give it a try - and I would be interested in hearing if it made much of a difference in your case. HTH, Bernie MS Excel MVP Sub Macro1() Dim myCell As Range Dim myCol As Integer Dim WhatToSave As Variant Dim myCalc As Variant WhatToSave = "Criteria for saving" myCol = 3 With Application .ScreenUpdating = False myCalc = .Calculation .Calculation = xlCalculationManual .EnableEvents = False End With Columns(myCol).Sort Key1:=Cells(2, myCol), Order1:=xlAscending, Header:=xlYes Set myCell = Columns(myCol).Find(What:=WhatToSave, After:=Cells(1, myCol), LookAt:=xlWhole) If myCell.Row < 2 Then Range(Cells(2, myCol), myCell(0, 1)).EntireRow.Delete Columns(myCol).Sort Key1:=Cells(2, myCol), Order1:=xlDescending, Header:=xlYes Set myCell = Columns(myCol).Find(What:=WhatToSave, After:=Cells(1, myCol), LookAt:=xlWhole) If myCell.Row < 2 Then Range(Cells(2, myCol), myCell(0, 1)).EntireRow.Delete With Application .ScreenUpdating = True .Calculation = myCalc .EnableEvents = True End With End Sub "Nigel" wrote in message ... I have a large spreadsheet 60 columns by up to 20,000 rows. I wish to delete all rows where a specific value does not appear in one column. In the past I have used a loop that reads all the rows from last to first, deleting the entire row when the value in the column does not meet the criteria. This works great except for large files - where it is very slow. I had thought about sorting the file and then deleting those groups of records where the criteria is not being met but this requires extra code to identifier each part of the file to be removed. Another idea was to use autofilters, where the negated criteria selection allows me to delete all visible rows. Any ideas as to the best and fastest method, without having to trawl through all the data, row by row? -- Cheers Nigel |
#8
Posted to microsoft.public.excel.programming
|
|||
|
|||
Best Method to Remove Data Rows
Nigel,
You're welcome. For any other interested parties, I compared the three methods, using a high precision timing routine and a workbook with 22,222 rows of intermixed values, with 3,333 values that should be kept (meaning about 19,000 rows to be deleted) Double Sorting / deletion took between 0.1 and 0.11 seconds. Filter / deletion took 10.42 seconds, about 100 times longer. Stepping up through the rows took 92.85 seconds, about 1000 times longer. Note that for each macro, I turned off screen updating, events, and set calculation to manual. It is an interesting side note (at least to me) that it was this exact problem that got me interested in programming Excel, almost 10 years ago. In Quattro Pro, stepping up and deleting the rows took almost no time, but with Excel 5, it was go out and get a cup of coffee, eat breakfast, run to the bank, etc..... obviously requiring a different technique. HTH, Bernie MS Excel MVP "Nigel" wrote in message ... Thanks for the tip, I had come to the same conclusion that using sort was the best and fastest method. What I had not considered was using two sorts to create the two blocks of data, neat trick. This works the fastest of the three methods I have explored. Thanks for the advice -- Cheers Nigel "Bernie Deitrick" <deitbe @ consumer dot org wrote in message ... Nigel, Both stepping through the rows and using the autofilter method will be slow with large workbooks, as Excel needs to move all that data around every time a row or group of rows is deleted. You could try this, which I have found to be faster: it sorts twice, each time getting rid of the rows above the desired value, so Excel is only deleting two blocks of rows. Note that you didn't say what your criteria for saving was: hence the use of a variant. This assumes your data starts in row 2 with a header in row 1. Give it a try - and I would be interested in hearing if it made much of a difference in your case. HTH, Bernie MS Excel MVP Sub Macro1() Dim myCell As Range Dim myCol As Integer Dim WhatToSave As Variant Dim myCalc As Variant WhatToSave = "Criteria for saving" myCol = 3 With Application .ScreenUpdating = False myCalc = .Calculation .Calculation = xlCalculationManual .EnableEvents = False End With Columns(myCol).Sort Key1:=Cells(2, myCol), Order1:=xlAscending, Header:=xlYes Set myCell = Columns(myCol).Find(What:=WhatToSave, After:=Cells(1, myCol), LookAt:=xlWhole) If myCell.Row < 2 Then Range(Cells(2, myCol), myCell(0, 1)).EntireRow.Delete Columns(myCol).Sort Key1:=Cells(2, myCol), Order1:=xlDescending, Header:=xlYes Set myCell = Columns(myCol).Find(What:=WhatToSave, After:=Cells(1, myCol), LookAt:=xlWhole) If myCell.Row < 2 Then Range(Cells(2, myCol), myCell(0, 1)).EntireRow.Delete With Application .ScreenUpdating = True .Calculation = myCalc .EnableEvents = True End With End Sub "Nigel" wrote in message ... I have a large spreadsheet 60 columns by up to 20,000 rows. I wish to delete all rows where a specific value does not appear in one column. In the past I have used a loop that reads all the rows from last to first, deleting the entire row when the value in the column does not meet the criteria. This works great except for large files - where it is very slow. I had thought about sorting the file and then deleting those groups of records where the criteria is not being met but this requires extra code to identifier each part of the file to be removed. Another idea was to use autofilters, where the negated criteria selection allows me to delete all visible rows. Any ideas as to the best and fastest method, without having to trawl through all the data, row by row? -- Cheers Nigel |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
remove duplicate rows from specific data in a column; excel 2007 | Excel Worksheet Functions | |||
Remove Blank and Non Data Rows | Excel Discussion (Misc queries) | |||
How to conditionally remove rows with data | Excel Worksheet Functions | |||
Remove data from certain rows and place in new row or spreadsheet | Excel Worksheet Functions | |||
Is there any method to remove... | Excel Programming |