Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 130
Default how to find union of two arrays

Hello,
Does anyone know a means of comparing two very large arrays (50000
elements) to determine which elements are common to both arrays? I've
tried using looping functions and that is far too slow.
  #2   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 587
Default how to find union of two arrays

hi,

i don't know why your looping is far too slow, it should not, can you show your code

--
isabelle

Le 2011-05-21 11:43, Andrew a écrit :
Hello,
Does anyone know a means of comparing two very large arrays (50000
elements) to determine which elements are common to both arrays? I've
tried using looping functions and that is far too slow.

  #3   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 130
Default how to find union of two arrays

On May 21, 7:43*pm, isabelle wrote:
hi,

i don't know why your looping is far too slow, it should not, can you show your code

--
isabelle

Le 2011-05-21 11:43, Andrew a écrit :







Hello,
Does anyone know a means of comparing two very large arrays (50000
elements) to determine which elements are common to both arrays? *I've
tried using looping functions and that is far too slow.


Isabel,
Imagine one range (A) with 50,000 rows, 2 columns, a second range with
50,000 rows (B) 1 column. I want to find all the elements in B which
have matches in column 1 of A. For each match, I want to copy the
corresponding entry in A column 2 over to a new column 2 in B.

Example.
A=
1 a
2 f
3 u
4 w
5 q

B=
5
1
6
8
9


After running the code, B would be transformed to:
B=
5 q
1 a
8
9
6

To do this I create the following loops.

For k1 = 1 to 50000
dataX=worksheets(1).cells(k1,1)
For k2=1 to 50000
dataY= worksheets(2).cells(k2,1)

If dataX=dataY then
worksheets(1).cells(k1,2)=worksheets(2).cells(k2,2 )
end if

next
next

This code is accurate, but terrribly slow.

This loop has to execute 50000^2 loops, 2.5 billion loops. Let's say
that each iteration requires 20 CPU clock cycles, that's 50 billion
clock cycles. My CPU has a 2.5 GHz processor, so this ideally would
take 20 seconds. It takes more than that, about 45 seconds. If I
were doing this in Matlab or C, I could process this much data in less
than a second, but not because the CPU is faster. There are other
programming constructs available which are faster. I know Excel can
run faster than this, but I don't know any tricks on how to speed this
up.

Any ideas?

  #4   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 587
Default how to find union of two arrays

hi Andrew,

with the functions "Index" and "Match" you can reduce the number of loop


Code:
Sub Macro1()
Dim Wks1 As Worksheet, Wks2 As Worksheet, k2 As Integer
Set Wks1 = Worksheets(1)
Set Wks2 = Worksheets(2)
For k2 = 1 To 5
  If Not IsError(Application.Match(Wks2.Cells(k2, 1), Wks1.Range("A:A"), 0)) Then
   Wks2.Cells(k2, 2) = Application.Index(Wks1.Range("B:B"), Application.Match(Wks2.Cells(k2, 1), Wks1.Range("A:A"), 0))
  End If
Next
Set Wks1 = Nothing
Set Wks2 = Nothing
End Sub

--
isabelle

Le 2011-05-22 09:34, Andrew a écrit :
On May 21, 7:43 pm, wrote:
hi,

i don't know why your looping is far too slow, it should not, can you show your code

--
isabelle

Le 2011-05-21 11:43, Andrew a écrit :







Hello,
Does anyone know a means of comparing two very large arrays (50000
elements) to determine which elements are common to both arrays? I've
tried using looping functions and that is far too slow.


Isabel,
Imagine one range (A) with 50,000 rows, 2 columns, a second range with
50,000 rows (B) 1 column. I want to find all the elements in B which
have matches in column 1 of A. For each match, I want to copy the
corresponding entry in A column 2 over to a new column 2 in B.

Example.
A=
1 a
2 f
3 u
4 w
5 q

B=
5
1
6
8
9


After running the code, B would be transformed to:
B=
5 q
1 a
8
9
6

To do this I create the following loops.

For k1 = 1 to 50000
dataX=worksheets(1).cells(k1,1)
For k2=1 to 50000
dataY= worksheets(2).cells(k2,1)

If dataX=dataY then
worksheets(1).cells(k1,2)=worksheets(2).cells(k2,2 )
end if

next
next

This code is accurate, but terrribly slow.

This loop has to execute 50000^2 loops, 2.5 billion loops. Let's say
that each iteration requires 20 CPU clock cycles, that's 50 billion
clock cycles. My CPU has a 2.5 GHz processor, so this ideally would
take 20 seconds. It takes more than that, about 45 seconds. If I
were doing this in Matlab or C, I could process this much data in less
than a second, but not because the CPU is faster. There are other
programming constructs available which are faster. I know Excel can
run faster than this, but I don't know any tricks on how to speed this
up.

Any ideas?

  #5   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,934
Default how to find union of two arrays

Imagine one range (A) with 50,000 rows, 2 columns, a second range with
50,000 rows (B) 1 column. I want to find all the elements in B which
have matches in column 1 of A. For each match, I want to copy the
corresponding entry in A column 2 over to a new column 2 in B.

Example.
A=
1 a
2 f
3 u
4 w
5 q

B=
5
1
6
8
9


After running the code, B would be transformed to:
B=
5 q
1 a
8
9
6

To do this I create the following loops.

For k1 = 1 To 50000
dataX = Worksheets(1).Cells(k1, 1)
For k2 = 1 To 50000
dataY = Worksheets(2).Cells(k2, 1)

If dataX = dataY Then
Worksheets(1).Cells(k1, 2) = Worksheets(2).Cells(k2, 2)
End If

Next
Next

This code is accurate, but terrribly slow.

This loop has to execute 50000^2 loops, 2.5 billion loops. Let's say
that each iteration requires 20 CPU clock cycles, that's 50 billion
clock cycles. My CPU has a 2.5 GHz processor, so this ideally would
take 20 seconds. It takes more than that, about 45 seconds. If I
were doing this in Matlab or C, I could process this much data in less
than a second, but not because the CPU is faster. There are other
programming constructs available which are faster. I know Excel can
run faster than this, but I don't know any tricks on how to speed this
up.


Why use code at all? You can do this with straight Excel formulas. Assuming
your range "A" is located on Sheet1, Columns A and B, starting in Row 1; and
assuming your range "B" is located in Column A starting in Row 1 on any
other sheet, put this formula in B1 on that other sheet and copy it down...

=IF(COUNTIF(Sheet1!A$1:A$500,A1),MATCH(A1,Sheet1!A $1:A$500,0),"")

Rick Rothstein (MVP - Excel)



  #6   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 1,045
Default how to find union of two arrays

On Sun, 22 May 2011 06:34:35 -0700 (PDT), Andrew wrote:

On May 21, 7:43*pm, isabelle wrote:
hi,

i don't know why your looping is far too slow, it should not, can you show your code

--
isabelle

Le 2011-05-21 11:43, Andrew a écrit :







Hello,
Does anyone know a means of comparing two very large arrays (50000
elements) to determine which elements are common to both arrays? *I've
tried using looping functions and that is far too slow.


Isabel,
Imagine one range (A) with 50,000 rows, 2 columns, a second range with
50,000 rows (B) 1 column. I want to find all the elements in B which
have matches in column 1 of A. For each match, I want to copy the
corresponding entry in A column 2 over to a new column 2 in B.

Example.
A=
1 a
2 f
3 u
4 w
5 q

B=
5
1
6
8
9


After running the code, B would be transformed to:
B=
5 q
1 a
8
9
6

To do this I create the following loops.

For k1 = 1 to 50000
dataX=worksheets(1).cells(k1,1)
For k2=1 to 50000
dataY= worksheets(2).cells(k2,1)

If dataX=dataY then
worksheets(1).cells(k1,2)=worksheets(2).cells(k2, 2)
end if

next
next

This code is accurate, but terrribly slow.

This loop has to execute 50000^2 loops, 2.5 billion loops. Let's say
that each iteration requires 20 CPU clock cycles, that's 50 billion
clock cycles. My CPU has a 2.5 GHz processor, so this ideally would
take 20 seconds. It takes more than that, about 45 seconds. If I
were doing this in Matlab or C, I could process this much data in less
than a second, but not because the CPU is faster. There are other
programming constructs available which are faster. I know Excel can
run faster than this, but I don't know any tricks on how to speed this
up.

Any ideas?


If you don't want to do it with worksheet formulas, as suggested by Rick, I think the following code will only have to execute 50,000 loops, so should run quicker.

It does assume that your RangeA is two columns and RangeB is one column, as you wrote. It also assumes that the data in RangeA is not the result of formulas, but that can obviously be changed in the code.

The code also assumes that if there are any duplicates in RangeA column1, that you only want to return the first one for the corresponding RangeB entry.

===================================
Option Explicit
Sub RangeAtoB()
Dim arB As Variant
Dim i As Long
Dim c As Range
'since RangeB is one column, must make it two
arB = Range("RangeB").Resize(columnsize:=2)

With Range("RangeA").Resize(columnsize:=1)
For i = 1 To UBound(arB)
Set c = .Find(what:=arB(i, 1), _
after:=.Item(1), _
LookIn:=xlValues, _
lookat:=xlWhole)
If Not c Is Nothing Then arB(i, 2) = c.Offset(columnoffset:=1)
Next i
End With

Range("rangeB").Resize(columnsize:=2) = arB
End Sub
==============================
  #7   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 2,059
Default how to find union of two arrays

On May 22, 6:34*am, Andrew wrote:
For k1 = 1 to 50000
dataX=worksheets(1).cells(k1,1)
For k2=1 to 50000
dataY= worksheets(2).cells(k2,1)
If dataX=dataY then
worksheets(1).cells(k1,2)=worksheets(2).cells(k2,2 )
end if
next
next

[....]
This loop has to execute 50000^2 loops, 2.5 billion loops.

[....]
I don't know any tricks on how to speed this up.


Well, that depends on details of the data and algorithm that you have
not mentioned.

(Note: I will use Sheet1 for Worksheets(1) and Sheet2 for
Worksheets(2), although those might not be the actual worksheet
names.)

First, your algorithm continues to look for a match even after finding
one. Effectively, it finds the __last__ match. Is that a
requirement? Or is it sufficient to find the __first__ match?

If you want to find the last match, your inner for-loop can be
improved by doing For k2=50000 to 1 step -1.

In either case, the inner loop can stop after finding a match. So you
can put the following statement before End If: Exit For.

Thus, on average a match is found after 25000 comparisons instead of
50000, assuming unique data in both Sheet1!A1:A50000 and Sheet2!
A1:A50000.

That cuts the total iterations in half, down to 1.25 billion.

Second, in your example, Sheet1!A1:A50000 is in ascending order. If
that is always the case, you could use a binary search instead of
linear search. Thus, a match is found in at most 16 comparisions.

That cuts the total iterations down to 800,000 at most, with some
probability that it is much less.

Finally, it is much more efficient to copy Sheet1!A1:B50000 and Sheet2!
A1:A50000 into a Variant variable one time, and to retain the results
in a Variant variable and copy them to Sheet2!B1:B50000 one time.

The time savings can be huge. Every time you access an Excel range,
VBA and Excel processes much exchange information -- two interprocess
communications -- and the Excel process must be scheduled to run while
the VBA process waits.

Moreover, even if Sheet1!A1:A50000 is not in ascending order, it is
probably advantageous to take the time to sort dataX below, if we
implemented a binary search (not included below).

So at a minimum (without the binary search), your VBA algorithm should
be:

Dim dataX, dataY
dataX = Worksheets(1).Range("A1:B50000")
dataY = Worksheets(2).Range("A1:A50000")
redim resultY(1 to 50000, 1)
for k1 = 1 to 50000: for k2 = 1 to 50000
if dataY(k1,1) = dataX(k2,1) then
resultY(k1,2) = dataX(k2,2)
exit for
end if
next: next
Worksheets(2).Range("B1:B50000") = resultY

(Note that I reversed the order of the for-loops. It would make a
difference if you implement a binary search of Sheet1!A1:A50000.)

Other improvements are possible and probably desirable; for example,
avoiding the hardcoded ranges. That is why I use Redim instead of Dim
for resultY.

By reducing the interprocess communications (copying of data between
VBA and Excel), I would be very surprised if that VBA implementation
fails to outperform any of the alternatives mentioned so far: VBA
Match/Index, .Find, and Excel COUNTIF/INDEX/MATCH.

(Although the Excel alternative might be more efficient if IFERROR can
be used instead of COUNTIF.)

I might post relative performance data later, if I get around to it.
  #8   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 2,059
Default how to find union of two arrays

Errata....

On May 22, 4:14*pm, I wrote:
redim resultY(1 to 50000, 1)


Should be:

redim resultY(1 to 50000, 1 to 1)

I wrote:
resultY(k1,2) = dataX(k2,2)


Should be:

resultY(k1,1) = dataX(k2,2)

I wrote:
Every time you access an Excel range, VBA and Excel
processes much exchange information -- two interprocess
communications


Should be: "must exchange"

  #9   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 2,059
Default how to find union of two arrays

On May 22, 7:43*am, "Rick Rothstein"
put this formula in B1 on that other sheet
and copy it down...
=IF(COUNTIF(Sheet1!A$1:A$500,A1),
MATCH(A1,Sheet1!A$1:A$500,0),"")


Presumably you mean:

=IF(COUNTIF(Sheet1!A$1:A$50000,A1),
INDEX(Sheet1!B$1:B$50000,MATCH(A1,Sheet1!A$1:A$500 00,0)),"")

Or

=IF(COUNTIF(Sheet1!A$1:A$50000,A1),
VLOOKUP(A1,Sheet1!A$1:B$50000,2,0),"")

Since COUNTIF will always do 50000 comparisons, on average this does
75000 comparisons for each of 50000 entries in Sheet2!A1:A50000, a
total of 3.75 billion comparisions versus 1.25 billion for Andrew's
algorithm (corrected).

Nonetheless, this __might__ be an improvement because it avoids the
interprocess communication and process scheduling time between VBA and
Excel. (But I would be surprised.)
  #10   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 2,059
Default how to find union of two arrays

On May 22, 7:07*am, isabelle wrote:
Sub Macro1()
Dim Wks1 As Worksheet, Wks2 As Worksheet, k2 As Integer
Set Wks1 = Worksheets(1)
Set Wks2 = Worksheets(2)

[....]
Set Wks1 = Nothing
Set Wks2 = Nothing
End Sub


Why set Wks1 and Wks2 to Nothing before exiting the macro?

Since they are non-static local variables, I would expect VBA to
effectively do the same thing automagically.

Is there a memory leak in some revisions of VBA?


  #11   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 587
Default how to find union of two arrays

hi joeu2004,

Is there a memory leak ?

yes there are

in some revisions of VBA?

it's valid for all versions of VBA

--
isabelle

  #12   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default how to find union of two arrays

joeu2004 was thinking very hard :
On May 22, 7:07*am, isabelle wrote:
Sub Macro1()
Dim Wks1 As Worksheet, Wks2 As Worksheet, k2 As Integer
Set Wks1 = Worksheets(1)
Set Wks2 = Worksheets(2)

[....]
Set Wks1 = Nothing
Set Wks2 = Nothing
End Sub


Why set Wks1 and Wks2 to Nothing before exiting the macro?

Since they are non-static local variables, I would expect VBA to
effectively do the same thing automagically.

Is there a memory leak in some revisions of VBA?


My understanding of how this works is that it's just good programming
practice to do this because even though the objects are destroyed when
the procedure ends, the memory space that they occupied in the Stack
still persists. I may be wrong, but this is how it was explained to me
by the leading minds in Excel application development.

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


  #13   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 2,059
Default how to find union of two arrays

On May 22, 4:14 pm, joeu2004 wrote:
Moreover, even if Sheet1!A1:A50000 is not in ascending
order, it is probably advantageous to take the time to
sort dataX below, if we implemented a binary search


No! A selection sort requires nearly 1.250 billion (10^9)
comparisions, overwhelming any benefit of the binary search.

Excel sort can do that in a blink of an eye. But a VBA implementation
takes "forever". Presumably VBA is significantly slower because it is
interpreted, not compiled.

I wrote:
By reducing the interprocess communications (copying
of data between VBA and Excel), I would be very surprised
if that VBA implementation fails to outperform any of the
alternatives mentioned so far: VBA Match/Index, .Find,
and Excel COUNTIF/INDEX/MATCH.


Color me "surprised"! When I made that comment, I forgot to account
for the fact that VBA is interpreted, not compiled.

Nonetheless, on my computer, a VBA binary search (50,000 times through
50,000 elements) is nearly 2.4 times faster than Isabelle's VBA Match
algorithm (significantly improved and modified to use a binary
search), and nearly 800 to 1200 times faster than Ron's VBA Find
algorithm and Rick's Excel COUNTIF/MATCH formula (improved to use
VLOOKUP and a binary search).

However, Rick's Excel formula can be nearly 3.7 times faster than the
VBA implementation if we use a helper column for the VLOOKUP (with a
binary search) and IF(ISERROR) instead of COUNTIF. In XL2007 and
later, I suspect we would get comparable results with IFERROR(VLOOKUP)
without the need for a helper column.

The improved VBA linear search is 1.8 to 2.2 times faster than Ron's
VBA Find algorithm and Rick's Excel COUNTIF/MATCH formula (improved to
use VLOOKUP, but a linear search).

However, Isabelle's VBA Match algorithm (significantly improved, but a
linear search) is nearly 5.8 times faster than the improved VBA linear
search. Even Isabelle's obviously inefficient implementation (doing
the Match twice) is nearly 2.9 times faster.

And again, Rick's Excel formula can be nearly 6 times faster than the
improved VBA linear search if we use a helper column for the VLOOKUP
(with a linear search).

YMMV significantly for a variety of reasons. Since my improved VBA
linear search algorithm runs nearly 7 times longer than what Andrew
claims ("45 seconds"), I suspect that my system is slow because of
small memory, small cache or both relative to more modern systems.

Also, my CPU is single-core and single-threaded (instruction
"threads", not process threads). At first, I did not think that would
be a factor, since VBA does not take advantage of multiple cores and
instruction threads. However, Excel can, perhaps favoring Rick's
approach. And I suspect it might improve Isabelle's and Ron's
approaches for similar reasons; but that is based on speculation.
  #14   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 84
Default how to find union of two arrays


"Andrew" wrote in message

This loop has to execute 50000^2 loops, 2.5 billion loops.
Let's say that each iteration requires 20 CPU clock
cycles, that's 50 billion clock cycles.


You've been given many suggestions, particularly about taking advantage of
using Excel's built in functions. However which ever approach, with this
type of thing it's really worthwhile to start with both arrays sorted,
typically that's a one off task, or not much extra work when adding small
amounts of data.

Once both sorted, search for the first matching item (and further
duplicates). Then search the second item starting at index+1 of the last
item found. Obviously in any loop after finding a matching item, as soon as
the next item doesn't match (ie after any duplicates) exit that loop and
start the next. You should end up with a fraction of the amount of working
you are doing now.

If regularly trying to find individual items in a large sorted array, search
"binary chop", which will find matches in negligible time even in large
arrays.

Peter Thornton

  #15   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,934
Default how to find union of two arrays

put this formula in B1 on that other sheet
and copy it down...
=IF(COUNTIF(Sheet1!A$1:A$500,A1),
MATCH(A1,Sheet1!A$1:A$500,0),"")


Presumably you mean:

=IF(COUNTIF(Sheet1!A$1:A$50000,A1),
INDEX(Sheet1!B$1:B$50000,MATCH(A1,Sheet1!A$1:A$500 00,0)),"")

Or

=IF(COUNTIF(Sheet1!A$1:A$50000,A1),
VLOOKUP(A1,Sheet1!A$1:B$50000,2,0),"")


I can't believe I forgot to included it... I was going for the INDEX formula
version. Thanks for catching that.

Rick Rothstein (MVP - Excel)



  #16   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,934
Default how to find union of two arrays

I really like reading your messages... the amount of analysis you are
willing to perform, and then write up for us to read, is simply
awe-inspiring. Thank you for always taking the time to do this.

Rick Rothstein (MVP - Excel)

  #17   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 2,059
Default how to find union of two arrays

On May 22, 7:35*pm, isabelle wrote:
[joeu2004 wrote:]
Is there a memory leak ?

yes there are

in some revisions of VBA?

it's valid for all versions of VBA


I do not notice any such memory leak in XL2003 SP3 with VBA 6.5.1024.

I have only Task Manager Mem Usage and VM Size statistics to look at.
If there is a kernel function that returns the current process's
memory usage, please let me know. I was unable to find one.

I tried the module below. After the MsgBox appears, allow 1-2 seconds
for Task Manager to update.

With #Const globDef set to False (local ws array), both Mem Usage and
VM Size increases by about 4 MB. After exiting the procedure, the Mem
Usage and VM Size reverts to their previous values.

Also note that setting the ws array to Nothing does __not__ reduce
memory usage. Moreover, it is not even necessary to set the ws to
Worksheets(1). The local declaration alone is sufficient to increase
Mem Usage and VM Size.

This is consistent with my understanding that the Dim statement alone
causes the allocation of the memory for the ws array.

(Aside.... I also tried changing Dim to ReDim, with the same
results.)

With #Const globDef set to True (non-local ws array), both Mem Usage
and VM Size increases by about 4 MB the first time the procedure is
executed.

In this case, after exiting the procedure, Mem Usage and VM Size
remains at that level, about 4 MB over their previous values. That is
to be expected, since global storage __should__ persist between
procedure calls.

And again note that setting the ws array to Nothing does __not__
reduce memory usage. And again note that it is not even necessary to
set ws to Worksheets(1).

Mem Usage and VM Size does revert to their previous values when I
remove the global ws array declaration, for example by setting #Const
globDef to False. The memory reduction happens almost immediately,
after waiting 1-2 seconds for Task Manager to update.

(Aside.... I would expect a local Static declaration to behave the
same way as a non-local declaration. But I did not test it. "The
exercise is left to the student" ;-.)

-----

#Const globDef = False

Private Declare Function GetCurrentProcessId _
Lib "kernel32" () As Long

#If globDef Then
Dim ws(1 To 1000000) As Worksheet
#End If

Sub testit()
#If Not globDef Then
Dim ws(1 To 1000000) As Worksheet
#End If
For i = 1 To UBound(ws)
Set ws(i) = Worksheets(1)
Next
MsgBox GetCurrentProcessId() & " set to Worksheets"
For i = 1 To UBound(ws)
Set ws(i) = Nothing
Next
MsgBox GetCurrentProcessId() & " set to Nothing"
End Sub
  #18   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default how to find union of two arrays

<FWIW
Here's a snippet from Professional Excel Development Ch.07 that
explains how memory leaks occur, and why it's considered *good
practice* to set objects we're done with to *= Nothing*.


<Quote
"...Normally when you overwrite an object in VBA, VBA cleans up the old
version of the object and reclaims the memory that was used to hold it.
You can also set an object equal to Nothing to reclaim the memory used
by it. It is good practice to do this explicitly when you no longer
need an object, rather than relying on VBA to do it.



Set gclsCells = Nothing



When you create two objects that store references to each other the
system will no longer reclaim the memory they used when they are set to
new versions or when they are set to Nothing. When analyzing the
worksheet in Analysis5.xls with 574 cells in the used range, there is a
loss of about 250KB of RAM each time CreateCellsCollection is executed
during an Excel session.

NOTE

If you are running Windows NT, 2000 or XP you can check the amount of
RAM currently used by Excel by pressing Ctrl+Shift+Esc to display the
Processes window in Task Manager and examining the Mem Usage column for
the row where the Image Name column is EXCEL.EXE.



One way to avoid this problem is to make sure you remove the cross
references from the linked objects before the objects are removed. You
can do this by adding a method like the Terminate method shown in
Listing 7-15 to the problem classes, in our case the CCell class.



Listing 7-15

The Terminate Method in the CCell Class Module

Public Sub Terminate()

Set mclsParent = Nothing

End Sub



The code in Listing 7-16 is added to the CCells class module. It calls
the Terminate method of each Cell class contained in the collection to
destroy the cross reference between the classes.



Listing 7-16

The Terminate Method in the CCells Class Module

Public Sub Terminate()

Dim clsCell As CCell

For Each clsCell In mcolCells

clsCell.Terminate

Set clsCell = Nothing

Next clsCell

Set mcolCells = Nothing

End Sub..."
</Quote

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


  #19   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default how to find union of two arrays

Peter T pretended :
"Andrew" wrote in message

This loop has to execute 50000^2 loops, 2.5 billion loops.
Let's say that each iteration requires 20 CPU clock
cycles, that's 50 billion clock cycles.


You've been given many suggestions, particularly about taking advantage of
using Excel's built in functions. However which ever approach, with this type
of thing it's really worthwhile to start with both arrays sorted, typically
that's a one off task, or not much extra work when adding small amounts of
data.

Once both sorted, search for the first matching item (and further
duplicates). Then search the second item starting at index+1 of the last item
found. Obviously in any loop after finding a matching item, as soon as the
next item doesn't match (ie after any duplicates) exit that loop and start
the next. You should end up with a fraction of the amount of working you are
doing now.

If regularly trying to find individual items in a large sorted array, search
"binary chop", which will find matches in negligible time even in large
arrays.

Peter Thornton


Peter,
I think the time it takes lies in the fact that *Arrays* aren't being
compared, but rather the loops read the ranges directly.

If the 2 ranges were dumped into arrays where at least 1 array was
sorted, a binary search would cut the processing time considerably.
Better, though, (as you say) that both arrays are sorted. The
bottleneck is trying to loop the ranges directly while doing this in
memory would run orders of magnitude faster.<IMO

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


  #20   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 130
Default how to find union of two arrays

On May 23, 5:42*pm, GS wrote:
Peter T pretended :



"Andrew" wrote in message


This loop has to execute 50000^2 loops, 2.5 billion loops.
Let's say that each iteration requires 20 CPU clock
cycles, that's 50 billion clock cycles.


You've been given many suggestions, particularly about taking advantage of
using Excel's built in functions. However which ever approach, with this type
of thing it's really worthwhile to start with both arrays sorted, typically
that's a one off task, or not much extra work when adding small amounts of
data.


Once both sorted, search for the first matching item (and further
duplicates). Then search the second item starting at index+1 of the last item
found. Obviously in any loop after finding a matching item, as soon as the
next item doesn't match (ie after any duplicates) exit that loop and start
the next. You should end up with a fraction of the amount of working you are
doing now.


If regularly trying to find individual items in a large sorted array, search
"binary chop", which will find matches in negligible time even in large
arrays.


Peter Thornton


Peter,
I think the time it takes lies in the fact that *Arrays* aren't being
compared, but rather the loops read the ranges directly.

If the 2 ranges were dumped into arrays where at least 1 array was
sorted, a binary search would cut the processing time considerably.
Better, though, (as you say) that both arrays are sorted. The
bottleneck is trying to loop the ranges directly while doing this in
memory would run orders of magnitude faster.<IMO

--
Garry

Free usenet access athttp://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


There's some very good information in this thread. Thank you. I have
yet to try these find and match methods. Although I did use VLOOKUP
for a search and reduced the search time (for the same 50,000 rows of
data) down to 3 seconds. In using VLOOKUP I used named ranges for the
first two input arguments, the search parameter and the lookup table.
Then I tried to do the same thing using arrays instead of ranges. It
took at least one hundred times longer. I would think that it would
be faster to sort through an array than a range, but clearly this is
not true.

thanks again,
Andrew


  #21   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default how to find union of two arrays

After serious thinking Andrew wrote :
I would think that it would
be faster to sort through an array than a range, but clearly this is
not true.


Well, that depends on how you do it. Clearly there are fast ways and
slow ways, and so if it took as long as you say it did then I suspect
you used a slow way. For sure, if at least one of your arrays wasn't
sorted, it would take a significant amount of time to process. Far less
time if 1 array is sorted. Orders of magnitude less time if both arrays
are sorted.

Regardless, built-in Excel functions will almost always perform much
faster than VBA will and so if Rick's suggestion works for you I'd be
inclined to use it over VBA.

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


  #22   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 84
Default how to find union of two arrays


"GS" wrote in message
Peter T pretended :
"Andrew" wrote in message

This loop has to execute 50000^2 loops, 2.5 billion loops.
Let's say that each iteration requires 20 CPU clock
cycles, that's 50 billion clock cycles.


You've been given many suggestions, particularly about taking advantage
of using Excel's built in functions. However which ever approach, with
this type of thing it's really worthwhile to start with both arrays
sorted, typically that's a one off task, or not much extra work when
adding small amounts of data.

Once both sorted, search for the first matching item (and further
duplicates). Then search the second item starting at index+1 of the last
item found. Obviously in any loop after finding a matching item, as soon
as the next item doesn't match (ie after any duplicates) exit that loop
and start the next. You should end up with a fraction of the amount of
working you are doing now.

If regularly trying to find individual items in a large sorted array,
search "binary chop", which will find matches in negligible time even in
large arrays.

Peter Thornton


Peter,
I think the time it takes lies in the fact that *Arrays* aren't being
compared, but rather the loops read the ranges directly.

If the 2 ranges were dumped into arrays where at least 1 array was sorted,
a binary search would cut the processing time considerably. Better,
though, (as you say) that both arrays are sorted. The bottleneck is trying
to loop the ranges directly while doing this in memory would run orders of
magnitude faster.<IMO


Garry,

Indeed, reading cells is considerably longer than reading array values,
though there will be some overhead initially reading 2x50k cells to a array
or pair of arrays.

It really is important to sort the first array, whether using built-in Excel
functions or VBA. Also if using VBA extremely worthwhile to sort the 2nd
array. That's because for each cell in the 2nd array, code will only need to
look from the index+1 of the last match to the first item that doesn't
match, so much less work than available to Excel's built-in functions.
Despite Excel's normally faster methods, it's difficult to take advantage of
that approach in Excel (although could be done) so VBA might be
easier/faster.

As for sorting, as you say in your adjacent post "that depends on how you do
it". There are some extremely fast sort methods out there for VBA, some
compare well if not better than Excel's built-in method (excluding the time
to transfer from cells to array to cells), particularly if the data is all
say doubles.

However to keep things simple, maybe get VBA to leverage Excel's sort, then
compare the sorted data in VBA as above. Without testing not sure which'd be
fastest, VBA+Excel or just Excel.

Peter T


  #23   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 84
Default how to find union of two arrays


"GS" wrote in message ...
<FWIW
Here's a snippet from Professional Excel Development Ch.07 that explains
how memory leaks occur, and why it's considered *good practice* to set
objects we're done with to *= Nothing*.


<Quote
"...Normally when you overwrite an object in VBA, VBA cleans up the old
version of the object and reclaims the memory that was used to hold it.
You can also set an object equal to Nothing to reclaim the memory used by
it. It is good practice to do this explicitly when you no longer need an
object, rather than relying on VBA to do it.



Set gclsCells = Nothing


Not sure I entirely agree, or rather it depends. There may be good reasons
to do that for other objectives, eg done with the object and want to test if
Nothing later. Also probably a good idea to clean up globals as you go when
done.

However simply good practice for the sake of being sure the object reference
is destroyed, perhaps not. Ideally there should never be any doubt about
when and where the reference will looses scope. VB/A's builtin garbage
collection works very efficiently, and explicitly setting to Nothing is
unnecessary work.

Regards,
Peter T

  #24   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default how to find union of two arrays

Peter T wrote on 5/24/2011 :
However to keep things simple, maybe get VBA to leverage Excel's sort, then
compare the sorted data in VBA as above. Without testing not sure which'd be
fastest, VBA+Excel or just Excel.


Peter,
This is what I usually do with large arrays as I find Excel's Sort much
faster than any array sort. I simply dump the array into a temp wks,
sort the data, dump the data back into the array, and delete the wks
after I'm done with it. Works a treat.

Otherwise, I use VB[A] methods as described in PED ch17 for
sorting/matching in arrays.

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


  #25   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 2,059
Default how to find union of two arrays

On May 23, 2:01*pm, GS wrote:
Here's a snippet from Professional Excel Development
Ch.07 that explains how memory leaks occur, and why
it's considered *good practice* to set objects we're
done with to *= Nothing*.


That's a matter of opinion, just as some people might say it is "good
practice" to explicitly initialize non-static local variables to zero
in a procedure instead of relying on the fact they are initialized to
zero each time a procedure is entered.

The development guide describes a complex situation where, I would
agree, it is necessary to set an object variable to Nothing, namely
objects that reference other objects.

But I was asking about a very limited case, namely: setting the
object variable to Nothing just before exiting a procedure, when the
object variable is a non-static local variable.

In that case, VBA help explains:

"When several object variables refer to the same object, memory and
system resources associated with the object to which the variables
refer are released only after all of them have been set to Nothing,
either explicitly using Set, __or_implicitly__ after the last object
variable set to Nothing goes out of scope."

Exiting a procedure is an example of non-static local variable going
out of scope. In that case, VBA should implicitly set the object
variable to Nothing, decreasing the object reference count. If that
causes the reference count to go to zero, the system should free the
memory associated with it.

When Isabelle mentioned a "memory leak", I was thinking of a systemic
memory leak (defect in VBA), not a program memory leak (programming
error).


  #26   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 2,059
Default how to find union of two arrays

On May 23, 11:56*am, joeu2004 wrote:
I do not notice any such memory leak in XL2003 SP3 with
VBA 6.5.1024. [....] I tried the module below.


Not good enough. That's not the memory leak we are concerned with.
  #27   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default how to find union of two arrays

Excellent example of the snippet I posted earlier!

I must re-iterate Rick's compliment to you about the effort and
research you do to provide such detailed replies. Most awesome! I hope
the casual posters appreciate those details as much as some of us
regulars do.

Personally, I was satisfied with the first sentence of your earlier
post:

"That's a matter of opinion..."

Some people may just decide to make it their personal rule that any
objects they create will be set "= Nothing" when done with so they are
covered for all situations, *regardless of context*. That, of course,
is entirely their choice.<g

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


  #28   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 2,059
Default how to find union of two arrays

On May 25, 2:24 pm, "Peter T" wrote:
I haven't fully followed this thread so apologies in
advance if I've missed something, however I don't follow
how this example relates to Isabelle's wanting to explicitly
destroy worksheet references.


You may be right. In a last-minute effort to simplify things, I
removed optional code that I believe would have related to Isabelle's
example better.

The VBA help page states: "memory and system resources associated
with the object to which the variables refer are released only after
all of them have been set to Nothing, either explicitly using Set, or
implicitly after the last object variable set to Nothing goes out of
scope."

I believe that says that non-static local object variables are
implicitly set to Nothing when a procedure is exited.

The better demonstration of that fact would be if in testit1st, I set
the global object variable "a" to Nothing explicitly some time after
returning from testit2, and use Task Manager to show that the memory
is freed.

That could only happen if the reference count for the object went to
zero at that point, which could only happen if the testit2 reference
were implicitly decreased when leaving the testit2 scope by implicitly
setting the local object variable "a" to Nothing.

In fact, I had done exactly that in one version of the example. "The
exercise is left to the student".

Not a very good example, I agree. I could clean all this up. But I
think I'm beating a dead horse at this point.

You wrote:
FWIW issues can arise with classes and circular
references (classes refer to each other), then they can
be difficult if not impossible to tear down and release
memory, but that's a different matter.


Yes it is. Nonetheless, I had intended to demonstrate that very
situation with a separate example anyway, primarily for GS's benefit.
I forgot.

You wrote:
if the reference was to a sheet in an automated instance,
failure to release could leave the Excel instance hanging
in memory.


Can you explain that with an example? I do not understand the
terminology "automated instance" and how that can lead to a memory
leak.

One example that I believe would "leave the Excel instance hanging in
memory" is if we Set a global object variable to a Worksheet or Range
object and neglect to Set the global object variable to Nothing.

That was really the point of the last step in my testit1st macro,
which goes beyond the question I had for Isabelle. It is certainly
the programmer's responsibility to know when it is necessary to Set a
global object variable to Nothing. It depends on the program design.

But that is not the kind of memory leak I thought Isabelle is
referring to. Since the VBA help documentation explains that we can
expect that exiting from the scope effectively sets non-static local
object variables to Nothing (i.e. decrease the object reference
count), I thought Isabelle might be referring to a __VBA__ defect, not
a programming error.

In any case, I believe that you and I are in "violent agreement" not
only on when we __must__ use Set=Nothing, but also when it is really
not necessary, notwithstanding some people's idea of "good practice".

  #29   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 84
Default how to find union of two arrays


"joeu2004" wrote in message
You wrote:
if the reference was to a sheet in an automated instance,
failure to release could leave the Excel instance hanging
in memory.


Can you explain that with an example? I do not understand the
terminology "automated instance" and how that can lead to a memory
leak.


"Automation" is where code (in particular object variables) in one app
refers to another application, whether after creating a new instance or
grabbing an existing one. All code in say VB6 doing anything with Excel is
"Automation", and any instance of Excel so referenced is an "automated
instance".

One example that I believe would "leave the Excel instance hanging in
memory" is if we Set a global object variable to a Worksheet or Range
object and neglect to Set the global object variable to Nothing.


*Only* if the reference is to an automated instance, eg code in Excel1
refers to Excel2, otherwise no problem (good practice is another matter).
Try this -

Private mWs As Worksheet
Sub Test1()
Dim xl As Excel.Application
Dim wb As Workbook
Set xl = New Excel.Application
Set wb = xl.Workbooks.Add
Set mWs = wb.Worksheets(1)
xl.Quit
End Sub
Sub CleanUp()
Set mWs = Nothing
End Sub

Look at Processes in Task Manager (maybe sort "Image Name"), step through
Test1 and see the 2nd Excel instance appear, and remain after doing xl.Quit.
Step through CleanUp and see the 2nd instance removed as mWs is destroyed.
Actually before doing CleanUp stop or break on 'end sub' and look at mWs in
Locals.

Now try this -
Sub Test2()
Dim wb As Workbook
Set wb = Workbooks.Add
Set mWs = wb.Worksheets(1)
wb.Close False
End Sub

See the difference with mWs this time in Locals. mWs retains the now
redundant pointer but that's all, no memory leak. However it's not good
practice, eg "mWs = Not Nothing" would give a misleading result. Of course
in this example no need to explicity release 'wb'.


In any case, I believe that you and I are in "violent agreement" not
only on when we __must__ use Set=Nothing, but also when it is really
not necessary, notwithstanding some people's idea of "good practice".


Violent, ?, -:) But sure, I think we said pretty much the same in response
to Garry's quote from the book.

Regards,
Peter T

Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Find Min and Max in variable length arrays Dave Excel Programming 1 June 3rd 10 06:22 AM
Compare two arrays in VBA to find dupes John Michl Excel Programming 2 February 6th 07 05:34 PM
Arrays - declaration, adding values to arrays and calculation Maxi[_2_] Excel Programming 1 August 17th 06 04:13 PM
UNION of Arrays - is possible? Marina Limeira Excel Discussion (Misc queries) 1 January 22nd 06 12:38 PM
Where can I find a template for a Credit Union account? Bill Sisk Excel Discussion (Misc queries) 1 August 13th 05 01:43 PM


All times are GMT +1. The time now is 06:41 AM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"