Home |
Search |
Today's Posts |
#1
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
Col A has 360,000 cells.
Col B has 240,000 cells. A B 0000000021957 0000000022002 0000000022002 0000000032002 0000000031957 0000000032003 0000000032002 0000000042002 0000000032003 0000000052002 0000000042002 0000000052003 0000000052002 0000000062002 0000000052003 0000000102002 0000000061967 0000000121996 0000000061968 0000000142002 0000000062002 0000000152002 0000000081963 0000000162002 0000000102002 0000000481994 0000000121996 0000000481995 0000000142002 0000000481996 0000000152002 0000000481997 0000000162002 0000000481998 0000000341991 0000000481999 0000000401961 0000000482000 How can I delete the cells in Col A whose contents match cells in Col B? |
#2
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
One method.
Insert a column left of Column A Insert a row at top Add titles in A1:C1..........will need for filtering In A2 enter =COUNTIF($C$2:$C$20,B2)<0 D-click on fill handle to copy down to bottom of Column B Select Columns A and B only DataFilterAutofilter. Filter for True on Column A Select from A2 to bottom of Column B. F5SpecialVisible CellsOK EditClearContents Remove Filter. Select A and B then F5SpecialBlanksOK EditDeleteShift Cells Up Done Gord On Sat, 14 Jan 2012 09:26:02 -0800 (PST), gary wrote: Col A has 360,000 cells. Col B has 240,000 cells. A B 0000000021957 0000000022002 0000000022002 0000000032002 0000000031957 0000000032003 0000000032002 0000000042002 0000000032003 0000000052002 0000000042002 0000000052003 0000000052002 0000000062002 0000000052003 0000000102002 0000000061967 0000000121996 0000000061968 0000000142002 0000000062002 0000000152002 0000000081963 0000000162002 0000000102002 0000000481994 0000000121996 0000000481995 0000000142002 0000000481996 0000000152002 0000000481997 0000000162002 0000000481998 0000000341991 0000000481999 0000000401961 0000000482000 How can I delete the cells in Col A whose contents match cells in Col B? |
#3
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
You posted in the Programming group so I guess VBA is in order.
Sub test() For Each cell In Range("A1:A360000") If WorksheetFunction.CountIf(Range("B1:B240000"), _ cell.Value) < 0 Then cell.ClearContents End If Next End Sub You can get rid of the blanks after running. Gord On Sat, 14 Jan 2012 11:03:53 -0800, Gord Dibben wrote: One method. Insert a column left of Column A Insert a row at top Add titles in A1:C1..........will need for filtering In A2 enter =COUNTIF($C$2:$C$20,B2)<0 D-click on fill handle to copy down to bottom of Column B Select Columns A and B only DataFilterAutofilter. Filter for True on Column A Select from A2 to bottom of Column B. F5SpecialVisible CellsOK EditClearContents Remove Filter. Select A and B then F5SpecialBlanksOK EditDeleteShift Cells Up Done Gord On Sat, 14 Jan 2012 09:26:02 -0800 (PST), gary wrote: Col A has 360,000 cells. Col B has 240,000 cells. A B 0000000021957 0000000022002 0000000022002 0000000032002 0000000031957 0000000032003 0000000032002 0000000042002 0000000032003 0000000052002 0000000042002 0000000052003 0000000052002 0000000062002 0000000052003 0000000102002 0000000061967 0000000121996 0000000061968 0000000142002 0000000062002 0000000152002 0000000081963 0000000162002 0000000102002 0000000481994 0000000121996 0000000481995 0000000142002 0000000481996 0000000152002 0000000481997 0000000162002 0000000481998 0000000341991 0000000481999 0000000401961 0000000482000 How can I delete the cells in Col A whose contents match cells in Col B? |
#4
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Jan 14, 11:19*am, Gord Dibben wrote:
You posted in the Programming group so I guess VBA is in order. Sub test() For Each cell In Range("A1:A360000") If WorksheetFunction.CountIf(Range("B1:B240000"), _ cell.Value) < 0 *Then cell.ClearContents End If Next End Sub You can get rid of the blanks after running. Gord On Sat, 14 Jan 2012 11:03:53 -0800, Gord Dibben wrote: One method. Insert a column left of Column A Insert a row at top Add titles in A1:C1..........will need for filtering In A2 enter * =COUNTIF($C$2:$C$20,B2)<0 D-click on fill handle to copy down to bottom of Column B Select Columns A and B only DataFilterAutofilter. Filter for True on Column A Select from A2 to bottom of Column B. F5SpecialVisible CellsOK EditClearContents Remove Filter. Select A and B then F5SpecialBlanksOK EditDeleteShift Cells Up Done Gord On Sat, 14 Jan 2012 09:26:02 -0800 (PST), gary wrote: Col A has 360,000 cells. Col B has 240,000 cells. * * * * * * A * * * * * * * * * * * * * * * * * *B 0000000021957 * 0000000022002 0000000022002 * 0000000032002 0000000031957 * 0000000032003 0000000032002 * 0000000042002 0000000032003 * 0000000052002 0000000042002 * 0000000052003 0000000052002 * 0000000062002 0000000052003 * 0000000102002 0000000061967 * 0000000121996 0000000061968 * 0000000142002 0000000062002 * 0000000152002 0000000081963 * 0000000162002 0000000102002 * 0000000481994 0000000121996 * 0000000481995 0000000142002 * 0000000481996 0000000152002 * 0000000481997 0000000162002 * 0000000481998 0000000341991 * 0000000481999 0000000401961 * 0000000482000 How can I delete the cells in Col A whose contents match cells in Col B?- Hide quoted text - - Show quoted text - Hi Gord, How long should your macro run? (It's been running for more than 2 hours). Is there any way to determine its progress? |
#5
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Sat, 14 Jan 2012 09:26:02 -0800 (PST), gary wrote:
Col A has 360,000 cells. Col B has 240,000 cells. A B 0000000021957 0000000022002 0000000022002 0000000032002 0000000031957 0000000032003 0000000032002 0000000042002 0000000032003 0000000052002 0000000042002 0000000052003 0000000052002 0000000062002 0000000052003 0000000102002 0000000061967 0000000121996 0000000061968 0000000142002 0000000062002 0000000152002 0000000081963 0000000162002 0000000102002 0000000481994 0000000121996 0000000481995 0000000142002 0000000481996 0000000152002 0000000481997 0000000162002 0000000481998 0000000341991 0000000481999 0000000401961 0000000482000 How can I delete the cells in Col A whose contents match cells in Col B? Here's another macro, using the AdvancedFilter. Please do this on a copy of your data. You will need to set ws to the proper worksheet. I used Sheet2. ============================= Option Explicit Sub PruneColA() Dim ws As Worksheet Dim rColA As Range, rColB As Range Dim c As Range Dim rCrit As Range Dim i As Long Dim v As Variant Set ws = Worksheets("Sheet2") With ws Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp)) Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp)) Set rCrit = .UsedRange.Resize(rowsize:=2, columnsize:=1).Offset _ (columnoffset:=.UsedRange.Columns.Count + 2) End With Application.ScreenUpdating = False rCrit(1).ClearContents rCrit(2) = "=countif(" & rColB.Address & "," & rColA(2).Address(False, False) & ")0" With rColA .AdvancedFilter Action:=xlFilterInPlace, criteriarange:=rCrit End With rCrit.EntireColumn.Delete On Error Resume Next rColA.Offset(rowoffset:=1).Resize(rowsize:=rColA.R ows.Count - 1) _ .SpecialCells(xlCellTypeVisible).ClearContents On Error GoTo 0 i = 0 ReDim v(1 To WorksheetFunction.CountA(rColA)) For Each c In rColA c.EntireRow.RowHeight = 15 If c.Value < "" Then i = i + 1 v(i) = c.Text End If Next c rColA.ClearContents Set rColA = rColA.Resize(rowsize:=UBound(v)) rColA = WorksheetFunction.Transpose(v) Application.ScreenUpdating = True End Sub =================================== |
#6
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
I have no idea how long it should take but 2 hours+ sounds a little
much. I ran it on about 100 rows which is somewhat smaller than the range you have. Took a second. See Ron's macro..........proably much faster than a loop. Did you try the manual method? Gord On Sat, 14 Jan 2012 14:19:10 -0800 (PST), gary wrote: On Jan 14, 11:19*am, Gord Dibben wrote: You posted in the Programming group so I guess VBA is in order. Sub test() For Each cell In Range("A1:A360000") If WorksheetFunction.CountIf(Range("B1:B240000"), _ cell.Value) < 0 *Then cell.ClearContents End If Next End Sub You can get rid of the blanks after running. Gord On Sat, 14 Jan 2012 11:03:53 -0800, Gord Dibben wrote: One method. Insert a column left of Column A Insert a row at top Add titles in A1:C1..........will need for filtering In A2 enter * =COUNTIF($C$2:$C$20,B2)<0 D-click on fill handle to copy down to bottom of Column B Select Columns A and B only DataFilterAutofilter. Filter for True on Column A Select from A2 to bottom of Column B. F5SpecialVisible CellsOK EditClearContents Remove Filter. Select A and B then F5SpecialBlanksOK EditDeleteShift Cells Up Done Gord On Sat, 14 Jan 2012 09:26:02 -0800 (PST), gary wrote: Col A has 360,000 cells. Col B has 240,000 cells. * * * * * * A * * * * * * * * * * * * * * * * * *B 0000000021957 * 0000000022002 0000000022002 * 0000000032002 0000000031957 * 0000000032003 0000000032002 * 0000000042002 0000000032003 * 0000000052002 0000000042002 * 0000000052003 0000000052002 * 0000000062002 0000000052003 * 0000000102002 0000000061967 * 0000000121996 0000000061968 * 0000000142002 0000000062002 * 0000000152002 0000000081963 * 0000000162002 0000000102002 * 0000000481994 0000000121996 * 0000000481995 0000000142002 * 0000000481996 0000000152002 * 0000000481997 0000000162002 * 0000000481998 0000000341991 * 0000000481999 0000000401961 * 0000000482000 How can I delete the cells in Col A whose contents match cells in Col B?- Hide quoted text - - Show quoted text - Hi Gord, How long should your macro run? (It's been running for more than 2 hours). Is there any way to determine its progress? |
#7
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Jan 14, 2:31*pm, Ron Rosenfeld wrote:
On Sat, 14 Jan 2012 09:26:02 -0800 (PST), gary wrote: Col A has 360,000 cells. Col B has 240,000 cells. * * * * * * A * * * * * * * * * * * * * * * * * *B 0000000021957 * * * 0000000022002 0000000022002 * * * 0000000032002 0000000031957 * * * 0000000032003 0000000032002 * * * 0000000042002 0000000032003 * * * 0000000052002 0000000042002 * * * 0000000052003 0000000052002 * * * 0000000062002 0000000052003 * * * 0000000102002 0000000061967 * * * 0000000121996 0000000061968 * * * 0000000142002 0000000062002 * * * 0000000152002 0000000081963 * * * 0000000162002 0000000102002 * * * 0000000481994 0000000121996 * * * 0000000481995 0000000142002 * * * 0000000481996 0000000152002 * * * 0000000481997 0000000162002 * * * 0000000481998 0000000341991 * * * 0000000481999 0000000401961 * * * 0000000482000 How can I delete the cells in Col A whose contents match cells in Col B? Here's another macro, using the AdvancedFilter. Please do this on a copy of your data. You will need to set ws to the proper worksheet. *I used Sheet2. ============================= Option Explicit Sub PruneColA() * * Dim ws As Worksheet * * Dim rColA As Range, rColB As Range * * Dim c As Range * * Dim rCrit As Range * * Dim i As Long * * Dim v As Variant Set ws = Worksheets("Sheet2") With ws * * Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp)) * * Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp)) * * Set rCrit = .UsedRange.Resize(rowsize:=2, columnsize:=1).Offset _ * * * * (columnoffset:=.UsedRange.Columns.Count + 2) End With Application.ScreenUpdating = False rCrit(1).ClearContents rCrit(2) = "=countif(" & rColB.Address & "," & rColA(2).Address(False, False) & ")0" With rColA * * .AdvancedFilter Action:=xlFilterInPlace, criteriarange:=rCrit End With rCrit.EntireColumn.Delete On Error Resume Next rColA.Offset(rowoffset:=1).Resize(rowsize:=rColA.R ows.Count - 1) _ * * .SpecialCells(xlCellTypeVisible).ClearContents On Error GoTo 0 i = 0 ReDim v(1 To WorksheetFunction.CountA(rColA)) For Each c In rColA * * c.EntireRow.RowHeight = 15 * * If c.Value < "" Then * * * * i = i + 1 * * * * v(i) = c.Text * * End If Next c rColA.ClearContents Set rColA = rColA.Resize(rowsize:=UBound(v)) rColA = WorksheetFunction.Transpose(v) Application.ScreenUpdating = True End Sub ===================================- Hide quoted text - - Show quoted text - How long should your macro run? Is there any way to determine its progress? |
#8
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Jan 14, 3:01*pm, Gord Dibben wrote:
I have no idea how long it should take but 2 hours+ sounds a little much. I ran it on about 100 rows which is somewhat smaller than the range you have. *Took a second. See Ron's macro..........proably much faster than a loop. Did you try the manual method? Gord On Sat, 14 Jan 2012 14:19:10 -0800 (PST), gary wrote: On Jan 14, 11:19*am, Gord Dibben wrote: You posted in the Programming group so I guess VBA is in order. Sub test() For Each cell In Range("A1:A360000") If WorksheetFunction.CountIf(Range("B1:B240000"), _ cell.Value) < 0 *Then cell.ClearContents End If Next End Sub You can get rid of the blanks after running. Gord On Sat, 14 Jan 2012 11:03:53 -0800, Gord Dibben wrote: One method. Insert a column left of Column A Insert a row at top Add titles in A1:C1..........will need for filtering In A2 enter * =COUNTIF($C$2:$C$20,B2)<0 D-click on fill handle to copy down to bottom of Column B Select Columns A and B only DataFilterAutofilter. Filter for True on Column A Select from A2 to bottom of Column B. F5SpecialVisible CellsOK EditClearContents Remove Filter. Select A and B then F5SpecialBlanksOK EditDeleteShift Cells Up Done Gord On Sat, 14 Jan 2012 09:26:02 -0800 (PST), gary wrote: Col A has 360,000 cells. Col B has 240,000 cells. * * * * * * A * * * * * * * * * * * * * * * * * *B 0000000021957 * 0000000022002 0000000022002 * 0000000032002 0000000031957 * 0000000032003 0000000032002 * 0000000042002 0000000032003 * 0000000052002 0000000042002 * 0000000052003 0000000052002 * 0000000062002 0000000052003 * 0000000102002 0000000061967 * 0000000121996 0000000061968 * 0000000142002 0000000062002 * 0000000152002 0000000081963 * 0000000162002 0000000102002 * 0000000481994 0000000121996 * 0000000481995 0000000142002 * 0000000481996 0000000152002 * 0000000481997 0000000162002 * 0000000481998 0000000341991 * 0000000481999 0000000401961 * 0000000482000 How can I delete the cells in Col A whose contents match cells in Col B?- Hide quoted text - - Show quoted text - Hi Gord, How long should your macro run? *(It's been running for more than 2 hours). *Is there any way to determine its progress?- Hide quoted text - - Show quoted text - In your =Count formula, I changed $C$20 to $C$239820 (which is the number of cells in my spreadsheet) When copying that formula down to the bottom of Col B, it's already taken 30 minutes and "Calculating" is at 4%.. .. |
#9
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Jan 14, 2:31*pm, Ron Rosenfeld wrote:
On Sat, 14 Jan 2012 09:26:02 -0800 (PST), gary wrote: Col A has 360,000 cells. Col B has 240,000 cells. * * * * * * A * * * * * * * * * * * * * * * * * *B 0000000021957 * * * 0000000022002 0000000022002 * * * 0000000032002 0000000031957 * * * 0000000032003 0000000032002 * * * 0000000042002 0000000032003 * * * 0000000052002 0000000042002 * * * 0000000052003 0000000052002 * * * 0000000062002 0000000052003 * * * 0000000102002 0000000061967 * * * 0000000121996 0000000061968 * * * 0000000142002 0000000062002 * * * 0000000152002 0000000081963 * * * 0000000162002 0000000102002 * * * 0000000481994 0000000121996 * * * 0000000481995 0000000142002 * * * 0000000481996 0000000152002 * * * 0000000481997 0000000162002 * * * 0000000481998 0000000341991 * * * 0000000481999 0000000401961 * * * 0000000482000 How can I delete the cells in Col A whose contents match cells in Col B? Here's another macro, using the AdvancedFilter. Please do this on a copy of your data. You will need to set ws to the proper worksheet. *I used Sheet2. ============================= Option Explicit Sub PruneColA() * * Dim ws As Worksheet * * Dim rColA As Range, rColB As Range * * Dim c As Range * * Dim rCrit As Range * * Dim i As Long * * Dim v As Variant Set ws = Worksheets("Sheet2") With ws * * Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp)) * * Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp)) * * Set rCrit = .UsedRange.Resize(rowsize:=2, columnsize:=1).Offset _ * * * * (columnoffset:=.UsedRange.Columns.Count + 2) End With Application.ScreenUpdating = False rCrit(1).ClearContents rCrit(2) = "=countif(" & rColB.Address & "," & rColA(2).Address(False, False) & ")0" With rColA * * .AdvancedFilter Action:=xlFilterInPlace, criteriarange:=rCrit End With rCrit.EntireColumn.Delete On Error Resume Next rColA.Offset(rowoffset:=1).Resize(rowsize:=rColA.R ows.Count - 1) _ * * .SpecialCells(xlCellTypeVisible).ClearContents On Error GoTo 0 i = 0 ReDim v(1 To WorksheetFunction.CountA(rColA)) For Each c In rColA * * c.EntireRow.RowHeight = 15 * * If c.Value < "" Then * * * * i = i + 1 * * * * v(i) = c.Text * * End If Next c rColA.ClearContents Set rColA = rColA.Resize(rowsize:=UBound(v)) rColA = WorksheetFunction.Transpose(v) Application.ScreenUpdating = True End Sub ===================================- Hide quoted text - - Show quoted text - Your macro (using the Advanced Filter) is getting Run-time Error '1004' of "AdvancedFilter method of Range clsss failed". Note: I've set ws = Worksheets("Sheet1") |
#10
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
One way...
Option Explicit Sub StripDupes() Dim i&, j&, lRows1&, lRows2&, lLastRow& 'as long Dim vRng As Variant lRows1 = Cells(Rows.Count, "A").End(xlUp).Row lRows2 = Cells(Rows.Count, "B").End(xlUp).Row If lRows1 lRows2 Then lLastRow = lRows1 Else lLastRow = lRows2 vRng = Range("A1:B" & lLastRow) For i = UBound(vRng) To 1 Step -1 If Not vRng(i, 1) = "" Then For j = 1 To UBound(vRng) If vRng(i, 1) = vRng(j, 2) Then Cells(i, 1).Delete shift:=xlUp: Exit For End If Next 'j End If Next 'i End Sub -- Garry Free usenet access at http://www.eternal-september.org ClassicVB Users Regroup! comp.lang.basic.visual.misc |
#11
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
You can speed it up by turning ScreenUpdating off...
One way... Option Explicit Sub StripDupes() Dim i&, j&, lRows1&, lRows2&, lLastRow& 'as long Dim vRng As Variant lRows1 = Cells(Rows.Count, "A").End(xlUp).Row lRows2 = Cells(Rows.Count, "B").End(xlUp).Row If lRows1 lRows2 Then lLastRow = lRows1 Else lLastRow = lRows2 vRng = Range("A1:B" & lLastRow) Application.ScreenUpdating = False For i = UBound(vRng) To 1 Step -1 If Not vRng(i, 1) = "" Then For j = 1 To UBound(vRng) If vRng(i, 1) = vRng(j, 2) Then Cells(i, 1).Delete shift:=xlUp: Exit For End If Next 'j End If Next 'i Application.ScreenUpdating = True End Sub -- Garry Free usenet access at http://www.eternal-september.org ClassicVB Users Regroup! comp.lang.basic.visual.misc |
#12
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Sat, 14 Jan 2012 15:15:17 -0800 (PST), gary wrote:
How long should your macro run? Is there any way to determine its progress? It is not possible to say how long the macro would take to run as that is dependent not only on the code, but also on the characteristics of your particular machine and environment. I rewrote the macro in a manner which should significantly improve the speed. The re-written macro, run against the data that you provided which I copied and repeated down to 600,000 rows, took about five minutes to run. However, I only disabled screen updating as there is nothing else on that worksheet of mine. If you have calculations which refer to the cells that are being altered, event triggered macros, and other calls on the resources, that can slow things down. All of those things can be disabled if they are an issue, but for now let's see if we can't get something running. The way it has been rewritten, there is no way to determine its progress. The implementation of AdvancedFilter is an Excel feature, not VBA. Although it should execute much more quickly than looping through the rows of cells, I don't know of any way to monitor its progress. However, there are issues with speed, function, and the AdvancedFilter in Excel especially when dealing with large data bases. I have developed a method which I think should run MUCH more quickly, but it is hard to test on a small database. It does assume that there are no duplicates in Column A, or if there are that you only want to display unique values. Is that a valid assumption? The method needs a little refinement but with your database duplicated down to about 500,000 rows, it runs in less than five seconds. Of course, there are only seven entries in ColA that do not appear in Colb. I have not idea how it would run with a different data set. But try it and let me know. Also, with this method, it would be possible to keep track of where it is. Note the comment at the beginning about setting a reference. It will NOT run if that reference isn't set. If this will be distributed, we can use late binding, but not tonight. =================================== Option Explicit Sub PruneColA() 'Requires setting reference (tools/references) to ' Microsoft Scripting Runtime Dim ws As Worksheet Dim rColA As Range, rColB As Range Dim vColA As Variant, vColB As Variant Dim dColA As Dictionary, dColB As Dictionary Dim i As Long Dim d As Variant Set dColA = New Dictionary Set dColB = New Dictionary Set ws = Worksheets("Sheet2") With ws Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp)) Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp)) End With vColB = rColB vColA = rColA For i = LBound(vColB, 1) + 1 To UBound(vColB, 1) With dColB If Not .Exists(Key:=vColB(i, 1)) Then .Add Key:=vColB(i, 1), Item:=vColB(i, 1) End With Next i For i = LBound(vColA, 1) + 1 To UBound(vColA, 1) If Not dColB.Exists(Key:=vColA(i, 1)) Then With dColA If Not .Exists(Key:=vColA(i, 1)) Then .Add Key:=vColA(i, 1), Item:=vColA(i, 1) End With End If Next i ReDim vColA(1 To dColA.Count) i = 0 For Each d In dColA i = i + 1 vColA(i) = dColA(d) Next d rColA.Offset(rowoffset:=1).ClearContents Set rColA = rColA.Resize(rowsize:=dColA.Count).Offset(rowoffse t:=1) rColA = WorksheetFunction.Transpose(vColA) End Sub +++++++++++++++++++++++++++++++ |
#13
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Jan 14, 8:52*pm, Ron Rosenfeld wrote:
On Sat, 14 Jan 2012 15:15:17 -0800 (PST), gary wrote: How long should your macro run? Is there any way to determine its progress? It is not possible to say how long the macro would take to run as that is dependent not only on the code, but also on the characteristics of your particular machine and environment. I rewrote the macro in a manner which should significantly improve the speed. *The re-written macro, run against the data that you provided which I copied and repeated down to 600,000 rows, took about five minutes to run. However, I only disabled screen updating as there is nothing else on that worksheet of mine. *If you have calculations which refer to the cells that are being altered, event triggered macros, and other calls on the resources, that can slow things down. *All of those things can be disabled if they are an issue, but for now let's see if we can't get something running. The way it has been rewritten, there is no way to determine its progress. *The implementation of AdvancedFilter is an Excel feature, not VBA. *Although it should execute much more quickly than looping through the rows of cells, I don't know of any way to monitor its progress. However, there are issues with speed, function, and the AdvancedFilter in Excel especially when dealing with large data bases. *I have developed a method which I think should run MUCH more quickly, but it is hard to test on a small database. *It does assume that there are no duplicates in Column A, or if there are that you only want to display unique values. *Is that a valid assumption? The method needs a little refinement but with your database duplicated down to about 500,000 rows, it runs in less than five seconds. *Of course, there are only seven entries in ColA that do not appear in Colb. *I have not idea how it would run with a different data set. *But try it and let me know. Also, with this method, it would be possible to keep track of where it is.. Note the comment at the beginning about setting a reference. *It will NOT run if that reference isn't set. *If this will be distributed, we can use late binding, but not tonight. =================================== Option Explicit Sub PruneColA() 'Requires setting reference (tools/references) to ' *Microsoft Scripting Runtime * * Dim ws As Worksheet * * Dim rColA As Range, rColB As Range * * Dim vColA As Variant, vColB As Variant * * Dim dColA As Dictionary, dColB As Dictionary * * Dim i As Long * * Dim d As Variant Set dColA = New Dictionary Set dColB = New Dictionary Set ws = Worksheets("Sheet2") With ws * * Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp)) * * Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp)) End With vColB = rColB vColA = rColA For i = LBound(vColB, 1) + 1 To UBound(vColB, 1) * * With dColB * * * * If Not .Exists(Key:=vColB(i, 1)) Then .Add Key:=vColB(i, 1), Item:=vColB(i, 1) * * End With Next i For i = LBound(vColA, 1) + 1 To UBound(vColA, 1) * * If Not dColB.Exists(Key:=vColA(i, 1)) Then * * * * With dColA * * * * * * If Not .Exists(Key:=vColA(i, 1)) Then .Add Key:=vColA(i, 1), Item:=vColA(i, 1) * * * * End With * * End If Next i ReDim vColA(1 To dColA.Count) i = 0 For Each d In dColA * * i = i + 1 * * vColA(i) = dColA(d) Next d rColA.Offset(rowoffset:=1).ClearContents Set rColA = rColA.Resize(rowsize:=dColA.Count).Offset(rowoffse t:=1) rColA = WorksheetFunction.Transpose(vColA) End Sub +++++++++++++++++++++++++++++++ Run-time Error '13' Type Mismatch in rColA = WorksheetFunction.Transpose(vColA) |
#14
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Jan 14, 8:52*pm, Ron Rosenfeld wrote:
On Sat, 14 Jan 2012 15:15:17 -0800 (PST), gary wrote: How long should your macro run? Is there any way to determine its progress? It is not possible to say how long the macro would take to run as that is dependent not only on the code, but also on the characteristics of your particular machine and environment. I rewrote the macro in a manner which should significantly improve the speed. *The re-written macro, run against the data that you provided which I copied and repeated down to 600,000 rows, took about five minutes to run. However, I only disabled screen updating as there is nothing else on that worksheet of mine. *If you have calculations which refer to the cells that are being altered, event triggered macros, and other calls on the resources, that can slow things down. *All of those things can be disabled if they are an issue, but for now let's see if we can't get something running. The way it has been rewritten, there is no way to determine its progress. *The implementation of AdvancedFilter is an Excel feature, not VBA. *Although it should execute much more quickly than looping through the rows of cells, I don't know of any way to monitor its progress. However, there are issues with speed, function, and the AdvancedFilter in Excel especially when dealing with large data bases. *I have developed a method which I think should run MUCH more quickly, but it is hard to test on a small database. *It does assume that there are no duplicates in Column A, or if there are that you only want to display unique values. *Is that a valid assumption? The method needs a little refinement but with your database duplicated down to about 500,000 rows, it runs in less than five seconds. *Of course, there are only seven entries in ColA that do not appear in Colb. *I have not idea how it would run with a different data set. *But try it and let me know. Also, with this method, it would be possible to keep track of where it is.. Note the comment at the beginning about setting a reference. *It will NOT run if that reference isn't set. *If this will be distributed, we can use late binding, but not tonight. =================================== Option Explicit Sub PruneColA() 'Requires setting reference (tools/references) to ' *Microsoft Scripting Runtime * * Dim ws As Worksheet * * Dim rColA As Range, rColB As Range * * Dim vColA As Variant, vColB As Variant * * Dim dColA As Dictionary, dColB As Dictionary * * Dim i As Long * * Dim d As Variant Set dColA = New Dictionary Set dColB = New Dictionary Set ws = Worksheets("Sheet2") With ws * * Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp)) * * Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp)) End With vColB = rColB vColA = rColA For i = LBound(vColB, 1) + 1 To UBound(vColB, 1) * * With dColB * * * * If Not .Exists(Key:=vColB(i, 1)) Then .Add Key:=vColB(i, 1), Item:=vColB(i, 1) * * End With Next i For i = LBound(vColA, 1) + 1 To UBound(vColA, 1) * * If Not dColB.Exists(Key:=vColA(i, 1)) Then * * * * With dColA * * * * * * If Not .Exists(Key:=vColA(i, 1)) Then .Add Key:=vColA(i, 1), Item:=vColA(i, 1) * * * * End With * * End If Next i ReDim vColA(1 To dColA.Count) i = 0 For Each d In dColA * * i = i + 1 * * vColA(i) = dColA(d) Next d rColA.Offset(rowoffset:=1).ClearContents Set rColA = rColA.Resize(rowsize:=dColA.Count).Offset(rowoffse t:=1) rColA = WorksheetFunction.Transpose(vColA) End Sub +++++++++++++++++++++++++++++++ I have deleted the duplicates in col A and in Col B. When I re-ran your macro, I still got the Run-time Error '13' Type Mismatch in rColA = WorksheetFunction.Transpose(vColA) |
#15
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Sat, 14 Jan 2012 22:08:27 -0800 (PST), gary wrote:
I have deleted the duplicates in col A and in Col B. When I re-ran your macro, I still got the Run-time Error '13' Type Mismatch in rColA = WorksheetFunction.Transpose(vColA) The presence or absence of duplicates is irrelevant to this error. I cannot be sure, but in older versions of Excel (prior to 2003) there were limits in the size of an array that you could use worksheetfunction.transpose. I thought it had been removed, but perhaps not. How long did the macro run before hitting that error? Because that's at the end and the next step would be quick. Try this variation where transposing is not required: ================================== Option Explicit Sub PruneColA() 'Requires setting reference (tools/references) to ' Microsoft Scripting Runtime Dim ws As Worksheet Dim rColA As Range, rColB As Range Dim vColA As Variant, vColB As Variant Dim dColA As Dictionary, dColB As Dictionary Dim i As Long Dim d As Variant Set dColA = New Dictionary Set dColB = New Dictionary Set ws = Worksheets("Sheet2") With ws Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp)) Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp)) End With vColB = rColB vColA = rColA For i = LBound(vColB, 1) + 1 To UBound(vColB, 1) With dColB If Not .Exists(Key:=vColB(i, 1)) Then .Add Key:=vColB(i, 1), Item:=vColB(i, 1) End With Next i For i = LBound(vColA, 1) + 1 To UBound(vColA, 1) If Not dColB.Exists(Key:=vColA(i, 1)) Then With dColA If Not .Exists(Key:=vColA(i, 1)) Then .Add Key:=vColA(i, 1), Item:=vColA(i, 1) End With End If Next i ReDim vColA(1 To dColA.Count, 1 To 1) i = 0 For Each d In dColA i = i + 1 vColA(i, 1) = dColA(d) Next d rColA.Offset(rowoffset:=1).ClearContents Set rColA = rColA.Resize(rowsize:=dColA.Count).Offset(rowoffse t:=1) rColA = vColA End Sub =================================== |
#16
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Sat, 14 Jan 2012 19:33:33 -0800 (PST), gary wrote:
Your macro (using the Advanced Filter) is getting Run-time Error '1004' of "AdvancedFilter method of Range clsss failed". Note: I've set ws = Worksheets("Sheet1") As I wrote, previously, there are problems with AdvancedFilter in very large databases in certain environments. This may or may not have been resolved in Excel 2010 Also see my latest posting from today. |
#17
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Sat, 14 Jan 2012 22:08:27 -0800 (PST), gary wrote:
I have deleted the duplicates in col A and in Col B. When I re-ran your macro, I still got the Run-time Error '13' Type Mismatch in rColA = WorksheetFunction.Transpose(vColA) Gary, If the last version I provided, which does NOT use TRANSPOSE, doesn't work, we will have to transfer the data to ColA using a loop. It may be that we are running into one of the limitations of VBA and Excel. |
#18
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Sun, 15 Jan 2012 07:54:14 -0500, Ron Rosenfeld wrote:
Gary, If the last version I provided, which does NOT use TRANSPOSE, doesn't work, we will have to transfer the data to ColA using a loop. It may be that we are running into one of the limitations of VBA and Excel. Well, in Excel 2007 I just made up a test case of 500,000 entries in col a and col b. There were about 238,000 in col a that were not in col b and the last version worked fine and executed in about 1 minute. |
#19
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Sun, 15 Jan 2012 07:39:25 -0500, Ron Rosenfeld wrote:
I cannot be sure, but in older versions of Excel (prior to 2003) there were limits in the size of an array that you could use worksheetfunction.transpose. I thought it had been removed, but perhaps not. Empirically testing this theory in my Excel 2007 reveals that worksheetfunction.transpose works OK with 2^16 elements, but returns the type mismatch error with 2^17 elements, so that is probably why you ran into that error. Avoiding transpose, as I did, should fix it (unless we run into a different limit). |
#20
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Jan 15, 5:43*am, Ron Rosenfeld wrote:
On Sun, 15 Jan 2012 07:39:25 -0500, Ron Rosenfeld wrote: I cannot be sure, but in older versions of Excel (prior to 2003) there were limits in the size of an array that you could use worksheetfunction.transpose. *I thought it had been removed, but perhaps not. Empirically testing this theory in my Excel 2007 reveals that worksheetfunction.transpose works OK with 2^16 elements, but returns the type mismatch error with 2^17 elements, so that is probably why you ran into that error. *Avoiding transpose, as I did, should fix it (unless we run into a different limit). Hi Ron, I ran without transpose and it finished in about 1 minute! In Col A, a cell contains 0507811951990 That value is not in Col B. Your macro is displaying it as 5.07811E+11 How can I get it to be displayed as 0507811951990 ? |
#21
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
Hi Ron,
My spreadsheet has: A B 0000000021957 0000000022002 0000000022002 0000000032002 0000000031957 0000000032003 0000000032002 0000000042002 0000000032003 0000000052002 0000000042002 0000000052003 0000000052002 0000000062002 0000000052003 0000000102002 0000000061967 0000000121996 0000000061968 0000000142002 0000000062002 0000000152002 0000000081963 0000000162002 Your macro (without Transpose) returns this: 0000000021957 22002 31957 61967 61968 81963 341991 401961 431978 482010 482011 In my spreadsheet: A2 contains 0000000022002 B1 contains 0000000022002 But your macro results contains 22002 Why are the leading zeroes being dropped? |
#22
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Sun, 15 Jan 2012 07:44:41 -0800 (PST), gary wrote:
Hi Ron, My spreadsheet has: A B 0000000021957 0000000022002 0000000022002 0000000032002 0000000031957 0000000032003 0000000032002 0000000042002 0000000032003 0000000052002 0000000042002 0000000052003 0000000052002 0000000062002 0000000052003 0000000102002 0000000061967 0000000121996 0000000061968 0000000142002 0000000062002 0000000152002 0000000081963 0000000162002 Your macro (without Transpose) returns this: 0000000021957 22002 31957 61967 61968 81963 341991 401961 431978 482010 482011 In my spreadsheet: A2 contains 0000000022002 B1 contains 0000000022002 But your macro results contains 22002 Why are the leading zeroes being dropped? I'm glad to hear that the macro is working and not taking hours :-) The leading zero's are being dropped because Excel is trying to be helpful and interpreting the data as numeric. We have two choices to change this and retain the speed: We can format the column as text. We can custom format the column to "0000000000000" (thirteen zero's) The latter retains the numeric characteristics; the former does not, but some Excel functions will still interpret this as a number. The choice is yours. Here's how to modify the code to provide for that. Note the lines near the bottom. =================================== Option Explicit Sub PruneColA() 'Requires setting reference (tools/references) to ' Microsoft Scripting Runtime Dim ws As Worksheet Dim rColA As Range, rColB As Range Dim vColA As Variant, vColB As Variant Dim dColA As Dictionary, dColB As Dictionary Dim i As Long Dim d As Variant Set dColA = New Dictionary Set dColB = New Dictionary Set ws = ActiveSheet With ws Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp)) Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp)) End With vColB = rColB vColA = rColA For i = LBound(vColB, 1) + 1 To UBound(vColB, 1) With dColB If Not .Exists(Key:=vColB(i, 1)) Then .Add Key:=vColB(i, 1), Item:=vColB(i, 1) End With Next i For i = LBound(vColA, 1) + 1 To UBound(vColA, 1) If Not dColB.Exists(Key:=vColA(i, 1)) Then With dColA If Not .Exists(Key:=vColA(i, 1)) Then .Add Key:=vColA(i, 1), Item:=vColA(i, 1) End With End If Next i ReDim vColA(1 To dColA.Count, 1 To 1) i = 0 For Each d In dColA i = i + 1 vColA(i, 1) = dColA(d) Next d rColA.Offset(rowoffset:=1).ClearContents Set rColA = rColA.Resize(rowsize:=dColA.Count).Offset(rowoffse t:=1) 'UNcomment one or the other of the next two lines depending on your preference 'rColA.EntireColumn.NumberFormat = "0000000000000" rColA.EntireColumn.NumberFormat = "@" rColA = vColA End Sub ======================================= |
#23
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Sun, 15 Jan 2012 06:34:23 -0800 (PST), gary wrote:
Hi Ron, I ran without transpose and it finished in about 1 minute! In Col A, a cell contains 0507811951990 That value is not in Col B. Your macro is displaying it as 5.07811E+11 How can I get it to be displayed as 0507811951990 ? Same answer as previous: format column as text or custom format "0000000000000" |
#24
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
Great work Ron
Thanks for coming to the rescue. Gord On Sun, 15 Jan 2012 14:03:40 -0500, Ron Rosenfeld wrote: I'm glad to hear that the macro is working and not taking hours :-) The leading zero's are being dropped because Excel is trying to be helpful and interpreting the data as numeric. We have two choices to change this and retain the speed: We can format the column as text. We can custom format the column to "0000000000000" (thirteen zero's) The latter retains the numeric characteristics; the former does not, but some Excel functions will still interpret this as a number. The choice is yours. Here's how to modify the code to provide for that. Note the lines near the bottom. =================================== Option Explicit Sub PruneColA() 'Requires setting reference (tools/references) to ' Microsoft Scripting Runtime Dim ws As Worksheet Dim rColA As Range, rColB As Range Dim vColA As Variant, vColB As Variant Dim dColA As Dictionary, dColB As Dictionary Dim i As Long Dim d As Variant Set dColA = New Dictionary Set dColB = New Dictionary Set ws = ActiveSheet With ws Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp)) Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp)) End With vColB = rColB vColA = rColA For i = LBound(vColB, 1) + 1 To UBound(vColB, 1) With dColB If Not .Exists(Key:=vColB(i, 1)) Then .Add Key:=vColB(i, 1), Item:=vColB(i, 1) End With Next i For i = LBound(vColA, 1) + 1 To UBound(vColA, 1) If Not dColB.Exists(Key:=vColA(i, 1)) Then With dColA If Not .Exists(Key:=vColA(i, 1)) Then .Add Key:=vColA(i, 1), Item:=vColA(i, 1) End With End If Next i ReDim vColA(1 To dColA.Count, 1 To 1) i = 0 For Each d In dColA i = i + 1 vColA(i, 1) = dColA(d) Next d rColA.Offset(rowoffset:=1).ClearContents Set rColA = rColA.Resize(rowsize:=dColA.Count).Offset(rowoffse t:=1) 'UNcomment one or the other of the next two lines depending on your preference 'rColA.EntireColumn.NumberFormat = "0000000000000" rColA.EntireColumn.NumberFormat = "@" rColA = vColA End Sub ======================================= |
#25
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
You can further optimize performance as follows...
Option Explicit Sub StripDupes() Dim i&, j&, lRows1&, lRows2&, lLastRow&, lCalcMode& 'as long Dim vRng As Variant, bEventsEnabled As Boolean lRows1 = Cells(Rows.Count, "A").End(xlUp).Row lRows2 = Cells(Rows.Count, "B").End(xlUp).Row If lRows1 lRows2 Then lLastRow = lRows1 Else lLastRow = lRows2 vRng = Range("A1:B" & lLastRow) With Application lCalcMode = .Calculation: .Calculation = xlCalculationManual bEventsEnabled = .EnableEvents: .EnableEvents = False .ScreenUpdating = False End With For i = UBound(vRng) To 1 Step -1 If Not vRng(i, 1) = "" Then For j = 1 To UBound(vRng) If vRng(i, 1) = vRng(j, 2) Then Cells(i, 1).Delete shift:=xlUp: Exit For End If Next 'j End If Next 'i 'Cleanup With Application .Calculation = lCalcMode .EnableEvents = bEventsEnabled .ScreenUpdating = True End With End Sub -- Garry Free usenet access at http://www.eternal-september.org ClassicVB Users Regroup! comp.lang.basic.visual.misc |
#26
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Sun, 15 Jan 2012 11:08:54 -0800, Gord Dibben wrote:
Great work Ron Thanks for coming to the rescue. Gord It was an interesting problem. When I started off, I thought Advanced Filter would be the way to go. I was surprised when it was so problematic dealing with very large sets of data. But then I did some research and reminded myself that others had reported problems in the past with 2007 and the Advanced Filter. Of coursae, then I ran into limitations in using VBA arrays and the Transpose worksheetfunction. But in developing this solution, I learned a lot. I think I could have also used Collections, and avoided the reference to the Scripting Runtime package. I don't know which would be faster. Clearly, the Exist property of a dictionary takes fewer steps than error processing for a collection, in order to tell if something exists; but you have to execute the Exist function each time. |
#27
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Sun, 15 Jan 2012 11:08:54 -0800, Gord Dibben wrote:
Great work Ron Thanks for coming to the rescue. Gord And I was really pleasantly surprised at how quickly the dictionary method worked. |
#28
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
on 1/15/2012, Ron Rosenfeld supposed :
On Sun, 15 Jan 2012 07:54:14 -0500, Ron Rosenfeld wrote: Gary, If the last version I provided, which does NOT use TRANSPOSE, doesn't work, we will have to transfer the data to ColA using a loop. It may be that we are running into one of the limitations of VBA and Excel. Well, in Excel 2007 I just made up a test case of 500,000 entries in col a and col b. There were about 238,000 in col a that were not in col b and the last version worked fine and executed in about 1 minute. Ron, I'd appreciate feedback on using my (3rd) posted code on your test data! I extracted the concept of using the array approach from an app I have for filtering out rows of data from a data logging output file. This requires at least xl12 to work due to the amount of data being just under 1GB. I believe the limit on array size is 2GB but since my app uses its own instance of Excel there's nothing else running in its memory space. -- Garry Free usenet access at http://www.eternal-september.org ClassicVB Users Regroup! comp.lang.basic.visual.misc |
#29
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Jan 15, 11:03*am, Ron Rosenfeld wrote:
On Sun, 15 Jan 2012 07:44:41 -0800 (PST), gary wrote: Hi Ron, My spreadsheet has: * * * * * A * * * * * * * * * * * * * * * *B 0000000021957 * * * 0000000022002 0000000022002 * * * 0000000032002 0000000031957 * * * 0000000032003 0000000032002 * * * 0000000042002 0000000032003 * * * 0000000052002 0000000042002 * * * 0000000052003 0000000052002 * * * 0000000062002 0000000052003 * * * 0000000102002 0000000061967 * * * 0000000121996 0000000061968 * * * 0000000142002 0000000062002 * * * 0000000152002 0000000081963 * * * 0000000162002 Your macro (without Transpose) returns this: 0000000021957 22002 31957 61967 61968 81963 341991 401961 431978 482010 482011 In my spreadsheet: A2 contains 0000000022002 B1 contains 0000000022002 But your macro results contains 22002 Why are the leading zeroes being dropped? I'm glad to hear that the macro is working and not taking hours :-) The leading zero's are being dropped because Excel is trying to be helpful and interpreting the data as numeric. *We have two choices to change this and retain the speed: * We can format the column as text. * We can custom format the column to "0000000000000" *(thirteen zero's) The latter retains the numeric characteristics; the former does not, but some Excel functions will still interpret this as a number. *The choice is yours. Here's how to modify the code to provide for that. *Note the lines near the bottom. =================================== Option Explicit Sub PruneColA() 'Requires setting reference (tools/references) to ' *Microsoft Scripting Runtime * * Dim ws As Worksheet * * Dim rColA As Range, rColB As Range * * Dim vColA As Variant, vColB As Variant * * Dim dColA As Dictionary, dColB As Dictionary * * Dim i As Long * * Dim d As Variant Set dColA = New Dictionary Set dColB = New Dictionary Set ws = ActiveSheet With ws * * Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp)) * * Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp)) End With vColB = rColB vColA = rColA For i = LBound(vColB, 1) + 1 To UBound(vColB, 1) * * With dColB * * * * If Not .Exists(Key:=vColB(i, 1)) Then .Add Key:=vColB(i, 1), Item:=vColB(i, 1) * * End With Next i For i = LBound(vColA, 1) + 1 To UBound(vColA, 1) * * If Not dColB.Exists(Key:=vColA(i, 1)) Then * * * * With dColA * * * * * * If Not .Exists(Key:=vColA(i, 1)) Then .Add Key:=vColA(i, 1), Item:=vColA(i, 1) * * * * End With * * End If Next i ReDim vColA(1 To dColA.Count, 1 To 1) i = 0 For Each d In dColA * * i = i + 1 * * vColA(i, 1) = dColA(d) Next d rColA.Offset(rowoffset:=1).ClearContents Set rColA = rColA.Resize(rowsize:=dColA.Count).Offset(rowoffse t:=1) 'UNcomment one or the other of the next two lines depending on your preference 'rColA.EntireColumn.NumberFormat = "0000000000000" rColA.EntireColumn.NumberFormat = "@" rColA = vColA End Sub =======================================- Hide quoted text - - Show quoted text - I'm using: rColA.EntireColumn.NumberFormat = "0000000000000" 'rColA.EntireColumn.NumberFormat = "@" Because the result still contains 0000000022002 (which is in Col B) and this makes the result suspect. |
#30
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Sun, 15 Jan 2012 14:39:30 -0500, GS wrote:
Ron, I'd appreciate feedback on using my (3rd) posted code on your test data! I extracted the concept of using the array approach from an app I have for filtering out rows of data from a data logging output file. This requires at least xl12 to work due to the amount of data being just under 1GB. I believe the limit on array size is 2GB but since my app uses its own instance of Excel there's nothing else running in its memory space. Gary, When I interrupted it it had been running for 219 seconds. At that point in time it had eliminated 617 entries from the column A list. I then started up my "dictionary" routine. It ran for about 58.6 seconds and eliminated the remaining 260,493 duplicated entries. To set up the sample data, I enter a formula like: A1 & B1: =text(randbetween(1,10^6),"0000000000000") Fill down to row 500,000. Then copy/Paste Values For timing I use the HiRes timer. I initially tried an approach like yours: Examine each cell If the data is invalid, delete the cell and rearrange the rest (delete xlshiftup) After some thought, I decided it should be faster to Get all the good data into a sequential array. Delete ALL the original data Write back the good data array. The approach I used, using the dictionary, works pretty fast. It's disadvantage is that if duplicates in the original data should be retained, it would have to be modified. (i.e. if there are multiple 0000000123456's in column A, and none of that value in Column B, and the multiple values all need to be retained in column A; and they need to be retained in their original order). Fortunately, that is not the case. And if I had Excel 2010, the Advanced Filter might work. I would filter/copy; then delete the original and write back the copy. That would work even with duplicates. But it won't work in Excel 2007 with this data base (and seems to run slower even with smaller databases). |
#31
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Jan 15, 12:30*pm, Ron Rosenfeld wrote:
On Sun, 15 Jan 2012 14:39:30 -0500, GS wrote: Ron, I'd appreciate feedback on using my (3rd) posted code on your test data! I extracted the concept of using the array approach from an app I have for filtering out rows of data from a data logging output file. This requires at least xl12 to work due to the amount of data being just under 1GB. I believe the limit on array size is 2GB but since my app uses its own instance of Excel there's nothing else running in its memory space. Gary, When I interrupted it it had been running for 219 seconds. *At that point in time it had eliminated 617 entries from the column A list. I then started up my "dictionary" routine. *It ran for about 58.6 seconds and eliminated the remaining 260,493 duplicated entries. To set up the sample data, I enter a formula like: A1 & B1: *=text(randbetween(1,10^6),"0000000000000") * *Fill down to row 500,000. * Then copy/Paste Values For timing I use the HiRes timer. I initially tried an approach like yours: * *Examine each cell * *If the data is invalid, delete the cell and rearrange the rest (delete xlshiftup) After some thought, I decided it should be faster to * *Get all the good data into a sequential array. * *Delete ALL the original data * *Write back the good data array. The approach I used, using the dictionary, works pretty fast. *It's disadvantage is that if duplicates in the original data should be retained, it would have to be modified. *(i.e. if there are multiple 0000000123456's in column A, and none of that value in Column B, and the multiple values all need to be retained in column A; and they need to be retained in their original order). *Fortunately, that is not the case. And if I had Excel 2010, the Advanced Filter might work. *I would filter/copy; then delete the original and write back the copy. *That would work even with duplicates. *But it won't work in Excel 2007 with this data base (and seems to run *slower even with smaller databases). ========================== I'm using: rColA.EntireColumn.NumberFormat = "0000000000000" 'rColA.EntireColumn.NumberFormat = "@" But the results have: 0000000021957 in A1 (but that value is NOT in Col B) 0000000022002 in A2 (but that value IS in Col B). and this makes the results suspect. |
#32
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
Ron,
For what it is worth... Since, a variant filled with an array of Range.Values is one based and a Dictionary object is one based, the two code lines below should probably omit the "+ 1". For i = LBound(vColB, 1) + 1 To UBound(vColB, 1) For i = LBound(vColA, 1) + 1 To UBound(vColA, 1) '--- Jim Cone Portland, Oregon USA http://www.mediafire.com/PrimitiveSoftware (XL Companion add-in: compares, matches, counts, lists, finds, deletes...) "Ron Rosenfeld" wrote in message ... I'm glad to hear that the macro is working and not taking hours :-) The leading zero's are being dropped because Excel is trying to be helpful and interpreting the data as numeric. We have two choices to change this and retain the speed: We can format the column as text. We can custom format the column to "0000000000000" (thirteen zero's) The latter retains the numeric characteristics; the former does not, but some Excel functions will still interpret this as a number. The choice is yours. Here's how to modify the code to provide for that. Note the lines near the bottom. =================================== Option Explicit Sub PruneColA() 'Requires setting reference (tools/references) to ' Microsoft Scripting Runtime Dim ws As Worksheet Dim rColA As Range, rColB As Range Dim vColA As Variant, vColB As Variant Dim dColA As Dictionary, dColB As Dictionary Dim i As Long Dim d As Variant Set dColA = New Dictionary Set dColB = New Dictionary Set ws = ActiveSheet With ws Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp)) Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp)) End With vColB = rColB vColA = rColA For i = LBound(vColB, 1) + 1 To UBound(vColB, 1) With dColB If Not .Exists(Key:=vColB(i, 1)) Then .Add Key:=vColB(i, 1), Item:=vColB(i, 1) End With Next i For i = LBound(vColA, 1) + 1 To UBound(vColA, 1) If Not dColB.Exists(Key:=vColA(i, 1)) Then With dColA If Not .Exists(Key:=vColA(i, 1)) Then .Add Key:=vColA(i, 1), Item:=vColA(i, 1) End With End If Next i ReDim vColA(1 To dColA.Count, 1 To 1) i = 0 For Each d In dColA i = i + 1 vColA(i, 1) = dColA(d) Next d rColA.Offset(rowoffset:=1).ClearContents Set rColA = rColA.Resize(rowsize:=dColA.Count).Offset(rowoffse t:=1) 'UNcomment one or the other of the next two lines depending on your preference 'rColA.EntireColumn.NumberFormat = "0000000000000" rColA.EntireColumn.NumberFormat = "@" rColA = vColA End Sub ======================================= |
#33
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
From the responses and their results, I think it'd be best to re-state
my OP: I need a list of the values in Col A that are NOT found in Col B. My spreadsheet contains: A B 0000000021957 0000000022002 0000000022002 0000000032002 0000000031957 0000000032003 0000000032002 0000000042002 0000000032003 0000000052002 0000000042002 0000000052003 0000000052002 0000000062002 0000000052003 0000000102002 0000000061967 0000000121996 0000000061968 0000000142002 0000000062002 0000000152002 0000000081963 0000000162002 0000000102002 0000000481994 0000000121996 0000000481995 0000000142002 0000000481996 0000000152002 0000000481997 0000000162002 0000000481998 0000000341991 0000000481999 0000000401961 0000000482000 0000000431978 0000000482002 0000000481994 0000000482008 0000000481995 0000000482009 0000000481996 0000000631995 0000000481997 0000000631996 0000000481998 0000000631997 0000000481999 0000000631998 0000000482000 0000000631999 0000000482002 0000000632000 0000000482008 0000000632001 0000000482009 0000000642000 0000000482010 0000000681994 0000000482011 0000000681995 0000000491959 0000000681996 0000000511958 0000000681997 0000000591982 0000000681998 0000000591983 0000000911997 0000000591984 0000000962001 0000000591990 0000000962003 0000000591991 0000001001997 0000000611962 0000001082006 0000000631993 0000001381994 0000000631995 0000001381995 0000000631996 0000002001994 0000000631997 0000002122007 0000000631998 0000002291995 0000000631999 0000002291996 0000000632000 0000002291997 0000000632001 0000002601999 0000000641984 0000002602000 0000000642000 0000002641998 0000000661957 0000002731994 0000000681994 0000003031994 0000000681995 0000003161994 0000000681996 0000003161995 0000000681997 0000003161996 0000000681998 0000003161997 0000000691959 0000003392009 0000000751990 0000003901998 0000000811961 0000004062006 0000000811991 0000004091994 0000000811992 0000004091995 0000000811993 0000004131998 0000000821959 0000004231998 0000000851958 0000004371995 0000000881990 0000004521995 0000000911997 0000004522000 0000000951959 0000004541997 0000000962001 0000004542000 0000000962003 0000004542001 0000001001997 0000005001998 0000001031957 0000005002002 0000001082006 0000005121997 0000001121970 0000005181994 0000001121973 0000005181998 0000001121974 0000005381995 0000001121975 0000005381996 0000001121976 0000005471994 0000001181960 0000005471995 0000001191952 0000005471996 0000001311961 0000005581996 0000001341959 0000005622000 0000001381994 0000005622001 0000001381995 0000005622002 0000001411981 0000005971994 0000001411982 0000006202004 0000001411983 0000006491995 0000001411991 0000006511994 0000001411992 0000006511996 0000001421956 0000006571994 0000001451982 0000006571995 0000001471982 0000006571996 0000001541990 0000007291994 0000001561957 0000007291996 0000001631971 0000007291998 0000001631972 0000007321994 0000001681959 0000007341997 0000001711991 0000007341998 0000001781953 0000007432003 0000001871955 0000007751995 0000001881952 0000007931996 0000001881955 0000008071994 0000001881959 0000008071995 0000001921986 0000008071997 0000001951958 0000008432000 0000001981958 0000008432001 0000002001954 0000008432002 0000002001955 0000008631995 0000002001994 0000008631996 0000002011957 0000008662000 0000002031958 0000008681998 0000002101957 0000008702003 0000002122007 0000008702004 0000002191957 0000008702005 0000002191963 0000008702006 0000002211979 0000008771994 0000002211980 0000008771995 0000002241960 0000008771996 0000002251958 0000008771997 0000002271976 0000008771998 0000002281981 0000008772000 0000002281982 0000008772003 0000002291982 0000008961994 0000002291983 0000008991996 0000002291984 0000008992005 0000002291985 0000008992006 0000002291986 0000009061997 0000002291987 0000009061998 0000002291993 0000009062006 0000002291995 0000009062007 0000002291996 0000009062009 0000002291997 0000009171995 0000002331955 0000009171997 0000002331961 0000009172003 0000002381990 0000009221994 0000002391960 0000009221996 0000002411958 0000009361995 0000002411967 0000009362003 0000002421958 0000009362004 0000002461981 0000009401994 0000002461982 0000009581998 0000002461983 0000009691994 0000002531986 0000009691995 0000002571990 0000009691996 0000002571991 0000009821996 0000002591977 0000009831996 0000002601963 0000009831997 0000002601999 0000009971994 0000002602000 0000009971995 0000002641998 0000009971996 0000002691990 0000009971997 0000002691991 0000009971998 0000002711961 0000009971999 0000002731990 0000009972000 0000002731992 0000009972001 0000002731993 0000009972002 0000002731994 0000009972003 0000002741956 0000009972004 0000002741990 0000009972005 0000002871977 0000009972006 0000002891956 0000010001998 0000002921961 0000010011994 0000002971956 0000010011995 0000002971991 0000010321994 0000002981959 0000010361997 0000003001989 0000010411996 0000003001990 0000010411997 0000003021990 0000010411998 0000003031994 0000010411999 |
#34
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
Ron Rosenfeld wrote on 1/15/2012 :
On Sun, 15 Jan 2012 14:39:30 -0500, GS wrote: Ron, I'd appreciate feedback on using my (3rd) posted code on your test data! I extracted the concept of using the array approach from an app I have for filtering out rows of data from a data logging output file. This requires at least xl12 to work due to the amount of data being just under 1GB. I believe the limit on array size is 2GB but since my app uses its own instance of Excel there's nothing else running in its memory space. Gary, When I interrupted it it had been running for 219 seconds. At that point in time it had eliminated 617 entries from the column A list. I then started up my "dictionary" routine. It ran for about 58.6 seconds and eliminated the remaining 260,493 duplicated entries. To set up the sample data, I enter a formula like: A1 & B1: =text(randbetween(1,10^6),"0000000000000") Fill down to row 500,000. Then copy/Paste Values For timing I use the HiRes timer. I initially tried an approach like yours: Examine each cell If the data is invalid, delete the cell and rearrange the rest (delete xlshiftup) After some thought, I decided it should be faster to Get all the good data into a sequential array. Delete ALL the original data Write back the good data array. The approach I used, using the dictionary, works pretty fast. It's disadvantage is that if duplicates in the original data should be retained, it would have to be modified. (i.e. if there are multiple 0000000123456's in column A, and none of that value in Column B, and the multiple values all need to be retained in column A; and they need to be retained in their original order). Fortunately, that is not the case. And if I had Excel 2010, the Advanced Filter might work. I would filter/copy; then delete the original and write back the copy. That would work even with duplicates. But it won't work in Excel 2007 with this data base (and seems to run slower even with smaller databases). Thanks, Ron. I suspected it would take a long time since it writes the worksheet in the loop. My source code does everything in memory using the arrays and sett matches to an empty string. Note that the source data is read n from a data logger output file, NOT from a worksheet. I just dump the result back into a blank sheet. I can't use the dictionary because I need to preserve duplicates. The match criteria is an ambient temperature value in a line of text at a specific position and so if it's not '=' then I reset the array element to an empty string, then use the Filter() function to dump the resulting data into a worksheet. Since there's only 1 array to loop once only the process is really fast. -- Garry Free usenet access at http://www.eternal-september.org ClassicVB Users Regroup! comp.lang.basic.visual.misc |
#35
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
Your code took 46 secs to run your sample data on my machine. That's
amazing! I'm curious, now, how it performs if we did similar using a Collection so we can have duplicates in ColA! -- Garry Free usenet access at http://www.eternal-september.org ClassicVB Users Regroup! comp.lang.basic.visual.misc |
#36
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
Ron,
Actually, I misspoke, the Dictionary is zero based, but the recommendation stands. Also, the two lines near the end should read... rColA.ClearContents Set rColA = rColA.Resize(dColA.Count, 1) '--- Jim Cone "Jim Cone" wrote in message ... Ron, For what it is worth... Since, a variant filled with an array of Range.Values is one based and a Dictionary object is one based, the two code lines below should probably omit the "+ 1". For i = LBound(vColB, 1) + 1 To UBound(vColB, 1) For i = LBound(vColA, 1) + 1 To UBound(vColA, 1) '--- Jim Cone Portland, Oregon USA http://www.mediafire.com/PrimitiveSoftware (XL Companion add-in: compares, matches, counts, lists, finds, deletes...) |
#37
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
GS expressed precisely :
Your code took 46 secs to run your sample data on my machine. That's amazing! I'm curious, now, how it performs if we did similar using a Collection so we can have duplicates in ColA! I modified my sub to use your idea to use Dictionary, but NOT put colA in a dictionary and it shaved 11 secs off the ET... Sub StripDupes() Dim i&, j&, lRows1&, lRows2& lMatchesFound& 'as long Dim vRngA, vRngB, vRngOut() 'as variant Dim dRngB As Dictionary lRows1 = Cells(Rows.Count, "A").End(xlUp).Row lRows2 = Cells(Rows.Count, "B").End(xlUp).Row vRngA = Range("A1:A" & lRows1): vRngB = Range("B1:B" & lRows2) Set dRngB = New Dictionary Debug.Print Now() For j = LBound(vRngB) To UBound(vRngB) With dRngB If Not .Exists(Key:=vRngB(j, 1)) Then _ .Add Key:=vRngB(j, 1), Item:=vRngB(j, 1) End With Next 'j sTemp = Mid$(sTemp, 2) ' Debug.Print Now() For i = LBound(vRngA) To UBound(vRngA) If dRngB.Exists(Key:=vRngA(i, 1)) Then _ vRngA(i, 1) = "": lMatchesFound = lMatchesFound + 1 Next 'i ' Debug.Print Now() j = 0: ReDim vRngOut(UBound(vRngA) - lMatchesFound, 1 To 1) For i = LBound(vRngA) To UBound(vRngA) If Not vRngA(i, 1) = "" Then vRngOut(j, 1) = vRngA(i, 1): j = j + 1 End If Next 'i ' Debug.Print Now() Range("A1:A" & lRows1).ClearContents Range("A1").Resize(UBound(vRngOut), 1) = vRngOut Debug.Print Now() End Sub -- Garry Free usenet access at http://www.eternal-september.org ClassicVB Users Regroup! comp.lang.basic.visual.misc |
#38
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Sun, 15 Jan 2012 13:31:13 -0800, "Jim Cone" wrote:
Ron, For what it is worth... Since, a variant filled with an array of Range.Values is one based and a Dictionary object is one based, the two code lines below should probably omit the "+ 1". For i = LBound(vColB, 1) + 1 To UBound(vColB, 1) For i = LBound(vColA, 1) + 1 To UBound(vColA, 1) Jim, I assumed that row 1 had a label that did not need to be included in the comparison. If there are no column labels, or if there are but it is guaranteed that they won't be repeated in the data, then you are correct. -- Ron |
#39
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Sun, 15 Jan 2012 12:28:24 -0800 (PST), gary wrote:
I'm using: rColA.EntireColumn.NumberFormat = "0000000000000" 'rColA.EntireColumn.NumberFormat = "@" Because the result still contains 0000000022002 (which is in Col B) and this makes the result suspect. Well, if you need column A to be numeric, then column B must be numeric also. If column B values are text, then you should use the Text format "@". When I was testing, I had preformatted both columns as text, and had no problems. Also, please note that I assumed you would have some label in Row 1. If there are no labels, then try this variation, which should work whether or not there is a label: =============================== Option Explicit Sub PruneColA() 'Requires setting reference (tools/references) to ' Microsoft Scripting Runtime Dim ws As Worksheet Dim rColA As Range, rColB As Range Dim vColA As Variant, vColB As Variant Dim dColA As Dictionary, dColB As Dictionary Dim i As Long Dim d As Variant Set dColA = New Dictionary Set dColB = New Dictionary Set ws = ActiveSheet With ws Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp)) Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp)) End With vColB = rColB vColA = rColA For i = LBound(vColB, 1) To UBound(vColB, 1) With dColB If Not .Exists(Key:=vColB(i, 1)) Then .Add Key:=vColB(i, 1), Item:=vColB(i, 1) End With Next i For i = LBound(vColA, 1) To UBound(vColA, 1) If Not dColB.Exists(Key:=vColA(i, 1)) Then With dColA If Not .Exists(Key:=vColA(i, 1)) Then .Add Key:=vColA(i, 1), Item:=vColA(i, 1) End With End If Next i ReDim vColA(1 To dColA.Count, 1 To 1) i = 0 For Each d In dColA i = i + 1 vColA(i, 1) = dColA(d) Next d rColA.Offset.ClearContents rColA.EntireColumn.NumberFormat = "@" Set rColA = rColA.Resize(rowsize:=dColA.Count) rColA = vColA End Sub ================================================= |
#40
Posted to microsoft.public.excel.programming
|
|||
|
|||
Delete matching cells
On Sun, 15 Jan 2012 16:47:41 -0800, "Jim Cone" wrote:
Ron, Actually, I misspoke, the Dictionary is zero based, but the recommendation stands. Also, the two lines near the end should read... rColA.ClearContents Set rColA = rColA.Resize(dColA.Count, 1) '--- Jim Cone That is if you remove the +1. As I wrote, when that was written I had assumed a column label. If the column label is unique, and not found in Column B, then it can be included as I did in a subsequent response, following your suggestion, to the OP. |
Reply |
|
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Delete all cells in range matching certain values | Excel Programming | |||
Help with Matching Text Fields - Then Moving the Matching Cells Side by Side | Excel Discussion (Misc queries) | |||
DELETE ROW 3 MATCHING CRITERIA | Excel Programming | |||
delete all matching rows | Excel Discussion (Misc queries) | |||
Perform Lookup and delete non matching rows? | Excel Programming |