Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default Delete matching cells

on 1/16/2012, Jim Cone supposed :
Garry,
It may turn out to be one of those days, especially if the snow sticks.

I plugged in a new set of sample data into xl2010 this morning.
It appears that your code is returning mismatched items... items in col A
that are not in Col B.
But it is not eliminating duplicates.
Column A has 360,000 random 6 digit numbers.
Column B has 240,000 random 6 digit numbers.

Ron's code returns 231,414 unique entries.
Your code returns 279,200 entries: 231,514 unique and 47,686 duplicates.
(i ran my own unique counter on your returned data)

It's too early in the day for me to try to figure out why. <g
'---
Regards,
Jim Cone



"GS"

wrote in message
...
Ron Rosenfeld wrote :

Thanks, Jim. Can you run my final version on your sample data and report
back. I'd be curious to know the results. I'm running XP SP3 and did the
test in xl2007. Thanks in advance...

-- Garry
Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc




"GS" wrote in message ...
Jim Cone explained on 1/15/2012 :
Garry,
More FWIW...
I ran Ron's code on my XP, i3 machine (xl2007) against 600,000 6 digit
random numbers.
360,000 in col A and 240,000 in col B. It took about 7 1/2 seconds.
It returned ~87000 numbers not in col B.
'---
Jim Cone


Jim,
Ron explains the what/why of my version of the task fairly well. What
I'm more interested in is how long it took on your machine to process
the same amount of data as when you ran Ron's version.

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


  #2   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 1,549
Default Delete matching cells

Garry,
Ran your code and Ron's twice each.
Ron: 14.0 seconds
Garry: 8.2 seconds
Times were identical for tests on each.
Xl2010 on WindowsXP - 360,000 nums in col A, 240,000 nums in col B.

Changes from yesterday: xl2010 vs. xl2007 and more data overlap between columns.
I'm thinking that the xl2010 vba Rnd function may be different.
'---
Jim Cone
Portland, Oregon USA
http://blog.contextures.com/archives...ith-excel-vba/
(workbook with "universal" Last Row function code - free)





"GS"
wrote in message ...

Jim,
Ron explains the what/why of my version of the task fairly well. What I'm more interested in is
how long it took on your machine to process the same amount of data as when you ran Ron's version.

--
Garry



  #3   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default Delete matching cells

Jim Cone wrote :
Garry,
Ran your code and Ron's twice each.
Ron: 14.0 seconds
Garry: 8.2 seconds
Times were identical for tests on each.
Xl2010 on WindowsXP - 360,000 nums in col A, 240,000 nums in col B.

Changes from yesterday: xl2010 vs. xl2007 and more data overlap between
columns.
I'm thinking that the xl2010 vba Rnd function may be different.
'--- Jim Cone
Portland, Oregon USA
http://blog.contextures.com/archives...ith-excel-vba/
(workbook with "universal" Last Row function code - free)





"GS"
wrote in message ...

Jim,
Ron explains the what/why of my version of the task fairly well. What I'm
more interested in is how long it took on your machine to process the same
amount of data as when you ran Ron's version.

-- Garry


Thanks, Jim. I guess I was expecting a slower time as compared to Ron's
(approx RonsTime * 0.75), but I'm very happy that you report it was
better by almost half.

Obviously, the Scripting Dictionary is the better way to compare
columns of data. What I find interesting is how slow doing VB
comparison using arrays is. I've learned something valuable here..! My
thanks to you and Ron for your efforts...

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


  #4   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 1,045
Default Delete matching cells

On Mon, 16 Jan 2012 13:57:59 -0500, GS wrote:

Jim,
Ron explains the what/why of my version of the task fairly well. What
I'm more interested in is how long it took on your machine to process
the same amount of data as when you ran Ron's version.

--
Garry


Garry,

I tried it on a dataset similar to Jim's. 360,000 entries in column A; 240,000 entries in Column B. They were 13 digit text strings with values from 1 to 10^6.

Removing duplicates by using two dictionaries: 28 seconds
Not removing duplicates using code similar to yours: 16.1 seconds

Ignore the timer stuff. It depends on a class installed in my personal .xlam add-in

=====================
Option Explicit
Sub PreserveDups()
Dim oTimer As RonsLibrary.CHiResTimer
Set oTimer = RonsLibrary.New_CHiResTimer
oTimer.StartTimer
'Requires setting reference (tools/references) to
' Microsoft Scripting Runtime

Dim ws As Worksheet
Dim rColA As Range, rColB As Range
Dim vColA As Variant, vColB As Variant
Dim vResults As Variant
Dim dColA As Dictionary, dColB As Dictionary
Dim i As Long
Dim lBlanks As Long
Dim d As Variant
Dim rDest As Range

Set dColA = New Dictionary
Set dColB = New Dictionary
Set ws = ActiveSheet
With ws
Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp))
Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp))
Set rDest = .Cells(1, 7)
End With

vColB = rColB
vColA = rColA

For i = LBound(vColB, 1) To UBound(vColB, 1)
With dColB
If Not .Exists(Key:=vColB(i, 1)) Then .Add Key:=vColB(i, 1), Item:=vColB(i, 1)
End With
Next i

For i = LBound(vColA, 1) To UBound(vColA, 1)
If dColB.Exists(Key:=vColA(i, 1)) Then
vColA(i, 1) = ""
lBlanks = lBlanks + 1
End If
Next i

ReDim vResults(1 To UBound(vColA) - lBlanks, 1 To 1)
i = 0
For Each d In vColA
If d < "" Then
i = i + 1
vResults(i, 1) = d
End If
Next d


rDest.EntireColumn.ClearContents
rDest.EntireColumn.NumberFormat = "@"

Set rDest = rDest.Resize(rowsize:=UBound(vResults, 1))
rDest = vResults

oTimer.StopTimer
Debug.Print oTimer.Elapsed
End Sub
===========================
  #5   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default Delete matching cells

Thanks, Ron. I really appreciate your efforts!

I was thinking to now create a function that returns a boolean on
success, and accepts "Optional AllowDupes As Boolean = True" as its arg
so the faster code runs unless the user wants to remove dupes in ColA.
Does that make sense?

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc




  #6   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default Delete matching cells

Something to play with...

Function StripDupes(Optional AllowDupes As Boolean = True) As Boolean
' Compares colA to colB and removes colA matches found in colB.
' Args In: AllowDupes: True by default
' Keeps duplicate values found in colA that are not found
in colB;
' If False, duplicate values in colA not found in colB are
removed.
'
' Returns: True if matches are found --AND-- no error occurs;
' False if matches are NOT found --OR-- error occurs.
'
' Sources: Ron Rosenfeld, Jim Cone, Garry Sansom

Dim i&, j&, lRows1&, lRows2&, lMatchesFound& 'as long
Dim vRngA, vRngB, vRngOut() 'as variant
Dim dRngB As New Dictionary

On Error GoTo ErrExit

lRows1 = Cells(Rows.Count, "A").End(xlUp).Row
lRows2 = Cells(Rows.Count, "B").End(xlUp).Row
vRngA = Range("A1:A" & lRows1): vRngB = Range("B1:B" & lRows2)

For j = LBound(vRngB) To UBound(vRngB)
With dRngB
If Not .Exists(Key:=vRngB(j, 1)) Then _
.Add Key:=vRngB(j, 1), Item:=vRngB(j, 1)
End With
Next 'j

If AllowDupes Then '//fastest
For i = LBound(vRngA) To UBound(vRngA)
If dRngB.Exists(Key:=vRngA(i, 1)) Then _
vRngA(i, 1) = "": lMatchesFound = lMatchesFound + 1
Next 'i

j = 0: ReDim vRngOut(UBound(vRngA) - lMatchesFound, 0)
For i = LBound(vRngA) To UBound(vRngA)
If Not vRngA(i, 1) = "" Then _
vRngOut(j, 0) = vRngA(i, 1): j = j + 1
Next 'i

Else '//slowest
Dim dRngA As New Dictionary
For i = LBound(vRngA) To UBound(vRngA)
If Not dRngB.Exists(vRngA(i, 1)) Then
With dRngA
If Not .Exists(Key:=vRngA(i, 1)) Then _
.Add Key:=vRngA(i, 1), Item:=vRngA(i, 1)
End With 'dRngA
End If 'Not dRngB.Exists(vRngA(i, 1))
Next 'i

Dim v As Variant
j = 0: ReDim vRngOut(dRngA.Count, 0)
For Each v In dRngA
vRngOut(j, 0) = dRngA(v): j = j + 1
Next 'v
End If 'AllowDupes

Range("A1:A" & lRows1).ClearContents
Range("A1").Resize(UBound(vRngOut), 1) = vRngOut

ErrExit:
StripDupes = (Err = 0)
End Function 'StripDupes()

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


  #7   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default Delete matching cells

We can shave off even more time if we eliminate the checks when adding
items to the dictionary because the dictionary won't allow dupes...

Function StripDupes(Optional AllowDupes As Boolean = True) As Boolean
' Compares colA to colB and removes colA matches found in colB.
' Args In: AllowDupes: True by default. Keeps duplicate values
' found in colA that are not found in colB. If False,
' duplicate values in colA not found in colB are removed.
'
' Returns: True if matches found and no error occurs;
' False if matches not found --OR-- error occurs.
'
' Sources: Ron Rosenfeld, Jim Cone, Garry Sansom

Dim i&, j&, lRows1&, lRows2&, lMatchesFound& 'as long
Dim vRngA, vRngB, vRngOut() 'as variant
Dim dRngB As New Dictionary

On Error GoTo ErrExit

lRows1 = Cells(Rows.Count, "A").End(xlUp).Row
lRows2 = Cells(Rows.Count, "B").End(xlUp).Row
vRngA = Range("A1:A" & lRows1): vRngB = Range("B1:B" & lRows2)

On Error Resume Next
For j = LBound(vRngB) To UBound(vRngB)
dRngB.Add Key:=vRngB(j, 1), Item:=vRngB(j, 1)
Next 'j
On Error GoTo 0

If AllowDupes Then '//fastest
For i = LBound(vRngA) To UBound(vRngA)
If dRngB.Exists(Key:=vRngA(i, 1)) Then _
vRngA(i, 1) = "": lMatchesFound = lMatchesFound + 1
Next 'i

j = 0: ReDim vRngOut(UBound(vRngA) - lMatchesFound, 0)
For i = LBound(vRngA) To UBound(vRngA)
If Not vRngA(i, 1) = "" Then _
vRngOut(j, 0) = vRngA(i, 1): j = j + 1
Next 'i

Else '//slowest
Dim dRngA As New Dictionary
On Error Resume Next
For i = LBound(vRngA) To UBound(vRngA)
If Not dRngB.Exists(vRngA(i, 1)) Then _
dRngA.Add Key:=vRngA(i, 1), Item:=vRngA(i, 1)
Next 'i
On Error GoTo 0

Dim v As Variant
j = 0: ReDim vRngOut(dRngA.Count, 0)
For Each v In dRngA
vRngOut(j, 0) = dRngA(v): j = j + 1
Next 'v
End If 'AllowDupes

Range("A1:A" & lRows1).ClearContents
Range("A1").Resize(UBound(vRngOut), 1) = vRngOut

ErrExit:
StripDupes = (Err = 0)
End Function 'StripDupes()

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


  #8   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 1,045
Default Delete matching cells

On Mon, 16 Jan 2012 18:50:24 -0500, GS wrote:

We can shave off even more time if we eliminate the checks when adding
items to the dictionary because the dictionary won't allow dupes...


Your efforts prompted me to study whether dictionary or collection would work faster. And it turns out that my "prune" routine, which eliminates duplicates in Col A, when rewritten using Collections, runs in about 1/6 the time!

My last effort, using dictionaries for col a and col b, on the Jim style database (240,000 entries colA; 360,000 entries col b) took about 29 sec to run. The following process that same data base in 5.5 seconds!!

=============================
Option Explicit
Sub PruneColA2()
Dim ws As Worksheet
Dim rColA As Range, rColB As Range
Dim vColA As Variant, vColB As Variant
Dim vResults As Variant
Dim cColB As Collection
Dim i As Long
Dim lBlanks As Long
Dim v As Variant
Dim rDest As Range

Set cColB = New Collection
Set ws = ActiveSheet
With ws
Set rColA = Range(.Cells(1, "A"), .Cells(.Rows.Count, "A").End(xlUp))
Set rColB = Range(.Cells(1, "B"), .Cells(.Rows.Count, "B").End(xlUp))
Set rDest = .Cells(1, 10)
End With

vColB = rColB
vColA = rColA

On Error Resume Next
For i = LBound(vColB, 1) To UBound(vColB, 1)
With cColB
.Add Key:=vColB(i, 1), Item:=vColB(i, 1)
End With
Next i
On Error GoTo 0

On Error GoTo NotUniqueItem
For i = LBound(vColA, 1) To UBound(vColA, 1)
cColB.Add Item:=vColA(i, 1), Key:=vColA(i, 1)
Next i

ReDim vResults(1 To UBound(vColA) - lBlanks, 1 To 1)
i = 0
For Each v In vColA
If v < "" Then
i = i + 1
vResults(i, 1) = v
End If
Next v


rDest.EntireColumn.ClearContents
rDest.EntireColumn.NumberFormat = "@"

Set rDest = rDest.Resize(rowsize:=UBound(vResults, 1))

rDest = vResults

Exit Sub

NotUniqueItem:
vColA(i, 1) = ""
lBlanks = lBlanks + 1
Resume Next
End Sub
=================================
  #9   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default Delete matching cells

I was thinking the same thing! I'll report back in a new thread as Jim
suggested...

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


  #10   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default Delete matching cells

I like how you were able to eliminate the final If...Then construct.<g

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc




  #11   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default Delete matching cells

Uh.., I'm seeing that this approach raises a 'Type Mismatch' error in
the first loop where it loads colB into the collection.

Also, this approach empties colA and raises an error trying to write
the output array because the UBound(vColA) and lBlanks are the same
value.

What am I missing?

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


  #12   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 1,549
Default Delete matching cells

Garry,
I wonder if gary (the original poster) is still around?
He would have his moneys worth by now. <g

Its time for further posts, if any, on this subject to go into a brand new post.
It has gotten a little unwieldy.
'---
Jim Cone




"GS"
wrote in message
...
We can shave off even more time if we eliminate the checks when adding items to the dictionary
because the dictionary won't allow dupes...

Function StripDupes(Optional AllowDupes As Boolean = True) As Boolean
' Compares colA to colB and removes colA matches found in colB.
' Args In: AllowDupes: True by default. Keeps duplicate values
' found in colA that are not found in colB. If False,
' duplicate values in colA not found in colB are removed.
'
' Returns: True if matches found and no error occurs;
' False if matches not found --OR-- error occurs.
'
' Sources: Ron Rosenfeld, Jim Cone, Garry Sansom

Dim i&, j&, lRows1&, lRows2&, lMatchesFound& 'as long
Dim vRngA, vRngB, vRngOut() 'as variant
Dim dRngB As New Dictionary

On Error GoTo ErrExit

lRows1 = Cells(Rows.Count, "A").End(xlUp).Row
lRows2 = Cells(Rows.Count, "B").End(xlUp).Row
vRngA = Range("A1:A" & lRows1): vRngB = Range("B1:B" & lRows2)

On Error Resume Next
For j = LBound(vRngB) To UBound(vRngB)
dRngB.Add Key:=vRngB(j, 1), Item:=vRngB(j, 1)
Next 'j
On Error GoTo 0

If AllowDupes Then '//fastest
For i = LBound(vRngA) To UBound(vRngA)
If dRngB.Exists(Key:=vRngA(i, 1)) Then _
vRngA(i, 1) = "": lMatchesFound = lMatchesFound + 1
Next 'i

j = 0: ReDim vRngOut(UBound(vRngA) - lMatchesFound, 0)
For i = LBound(vRngA) To UBound(vRngA)
If Not vRngA(i, 1) = "" Then _
vRngOut(j, 0) = vRngA(i, 1): j = j + 1
Next 'i

Else '//slowest
Dim dRngA As New Dictionary
On Error Resume Next
For i = LBound(vRngA) To UBound(vRngA)
If Not dRngB.Exists(vRngA(i, 1)) Then _
dRngA.Add Key:=vRngA(i, 1), Item:=vRngA(i, 1)
Next 'i
On Error GoTo 0

Dim v As Variant
j = 0: ReDim vRngOut(dRngA.Count, 0)
For Each v In dRngA
vRngOut(j, 0) = dRngA(v): j = j + 1
Next 'v
End If 'AllowDupes

Range("A1:A" & lRows1).ClearContents
Range("A1").Resize(UBound(vRngOut), 1) = vRngOut

ErrExit:
StripDupes = (Err = 0)
End Function 'StripDupes()

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc




  #13   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 80
Default Delete matching cells

On Jan 16, 5:54*pm, "Jim Cone" wrote:
Garry,
I wonder if gary (the original poster) is still around?
He would have his moneys worth by now. <g

Its time for further posts, if any, on this subject to go into a brand new post.
It has gotten a little unwieldy.
'---
Jim Cone

"GS"
wrote in ...



We can shave off even more time if we eliminate the checks when adding items to the dictionary
because the dictionary won't allow dupes...


Function StripDupes(Optional AllowDupes As Boolean = True) As Boolean
' Compares colA to colB and removes colA matches found in colB.
' Args In: * *AllowDupes: True by default. *Keeps duplicate values
' * * * * * * found in colA that are not found in colB. If False,
' * * * * * * duplicate values in colA not found in colB are removed.
'
' Returns: * *True if matches found and no error occurs;
' * * * * * * False if matches not found --OR-- error occurs.
'
' Sources: * *Ron Rosenfeld, Jim Cone, Garry Sansom


*Dim i&, j&, lRows1&, lRows2&, lMatchesFound& 'as long
*Dim vRngA, vRngB, vRngOut() 'as variant
*Dim dRngB As New Dictionary


*On Error GoTo ErrExit


*lRows1 = Cells(Rows.Count, "A").End(xlUp).Row
*lRows2 = Cells(Rows.Count, "B").End(xlUp).Row
*vRngA = Range("A1:A" & lRows1): vRngB = Range("B1:B" & lRows2)


*On Error Resume Next
* *For j = LBound(vRngB) To UBound(vRngB)
* * *dRngB.Add Key:=vRngB(j, 1), Item:=vRngB(j, 1)
* *Next 'j
*On Error GoTo 0


*If AllowDupes Then '//fastest
* *For i = LBound(vRngA) To UBound(vRngA)
* * *If dRngB.Exists(Key:=vRngA(i, 1)) Then _
* * * *vRngA(i, 1) = "": lMatchesFound = lMatchesFound + 1
* *Next 'i


* *j = 0: ReDim vRngOut(UBound(vRngA) - lMatchesFound, 0)
* *For i = LBound(vRngA) To UBound(vRngA)
* * *If Not vRngA(i, 1) = "" Then _
* * * *vRngOut(j, 0) = vRngA(i, 1): j = j + 1
* *Next 'i


*Else '//slowest
* *Dim dRngA As New Dictionary
* *On Error Resume Next
* * *For i = LBound(vRngA) To UBound(vRngA)
* * * *If Not dRngB.Exists(vRngA(i, 1)) Then _
* * * * *dRngA.Add Key:=vRngA(i, 1), Item:=vRngA(i, 1)
* * *Next 'i
* *On Error GoTo 0


* *Dim v As Variant
* *j = 0: ReDim vRngOut(dRngA.Count, 0)
* *For Each v In dRngA
* * *vRngOut(j, 0) = dRngA(v): j = j + 1
* *Next 'v
*End If 'AllowDupes


*Range("A1:A" & lRows1).ClearContents
*Range("A1").Resize(UBound(vRngOut), 1) = vRngOut


ErrExit:
*StripDupes = (Err = 0)
End Function 'StripDupes()


--
Garry


Free usenet access athttp://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc- Hide quoted text -


- Show quoted text -



Yes, Gary (the OP) is still around.

Ron's macro gave me the results I was looking for! Thanks for all the
contributions!

Gary
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Delete all cells in range matching certain values Tommy[_4_] Excel Programming 2 August 13th 07 04:03 PM
Help with Matching Text Fields - Then Moving the Matching Cells Side by Side [email protected] Excel Discussion (Misc queries) 2 June 11th 07 02:38 PM
DELETE ROW 3 MATCHING CRITERIA FIRSTROUNDKO via OfficeKB.com Excel Programming 4 May 2nd 06 03:39 PM
delete all matching rows Rich Excel Discussion (Misc queries) 16 December 25th 05 02:26 AM
Perform Lookup and delete non matching rows? Kobayashi[_11_] Excel Programming 1 October 2nd 03 01:11 PM


All times are GMT +1. The time now is 02:52 AM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"