Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 130
Default how to find union of two arrays

Hello,
Does anyone know a means of comparing two very large arrays (50000
elements) to determine which elements are common to both arrays? I've
tried using looping functions and that is far too slow.
  #2   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 587
Default how to find union of two arrays

hi,

i don't know why your looping is far too slow, it should not, can you show your code

--
isabelle

Le 2011-05-21 11:43, Andrew a écrit :
Hello,
Does anyone know a means of comparing two very large arrays (50000
elements) to determine which elements are common to both arrays? I've
tried using looping functions and that is far too slow.

  #3   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 130
Default how to find union of two arrays

On May 21, 7:43*pm, isabelle wrote:
hi,

i don't know why your looping is far too slow, it should not, can you show your code

--
isabelle

Le 2011-05-21 11:43, Andrew a écrit :







Hello,
Does anyone know a means of comparing two very large arrays (50000
elements) to determine which elements are common to both arrays? *I've
tried using looping functions and that is far too slow.


Isabel,
Imagine one range (A) with 50,000 rows, 2 columns, a second range with
50,000 rows (B) 1 column. I want to find all the elements in B which
have matches in column 1 of A. For each match, I want to copy the
corresponding entry in A column 2 over to a new column 2 in B.

Example.
A=
1 a
2 f
3 u
4 w
5 q

B=
5
1
6
8
9


After running the code, B would be transformed to:
B=
5 q
1 a
8
9
6

To do this I create the following loops.

For k1 = 1 to 50000
dataX=worksheets(1).cells(k1,1)
For k2=1 to 50000
dataY= worksheets(2).cells(k2,1)

If dataX=dataY then
worksheets(1).cells(k1,2)=worksheets(2).cells(k2,2 )
end if

next
next

This code is accurate, but terrribly slow.

This loop has to execute 50000^2 loops, 2.5 billion loops. Let's say
that each iteration requires 20 CPU clock cycles, that's 50 billion
clock cycles. My CPU has a 2.5 GHz processor, so this ideally would
take 20 seconds. It takes more than that, about 45 seconds. If I
were doing this in Matlab or C, I could process this much data in less
than a second, but not because the CPU is faster. There are other
programming constructs available which are faster. I know Excel can
run faster than this, but I don't know any tricks on how to speed this
up.

Any ideas?

  #4   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 587
Default how to find union of two arrays

hi Andrew,

with the functions "Index" and "Match" you can reduce the number of loop


Code:
Sub Macro1()
Dim Wks1 As Worksheet, Wks2 As Worksheet, k2 As Integer
Set Wks1 = Worksheets(1)
Set Wks2 = Worksheets(2)
For k2 = 1 To 5
  If Not IsError(Application.Match(Wks2.Cells(k2, 1), Wks1.Range("A:A"), 0)) Then
   Wks2.Cells(k2, 2) = Application.Index(Wks1.Range("B:B"), Application.Match(Wks2.Cells(k2, 1), Wks1.Range("A:A"), 0))
  End If
Next
Set Wks1 = Nothing
Set Wks2 = Nothing
End Sub

--
isabelle

Le 2011-05-22 09:34, Andrew a écrit :
On May 21, 7:43 pm, wrote:
hi,

i don't know why your looping is far too slow, it should not, can you show your code

--
isabelle

Le 2011-05-21 11:43, Andrew a écrit :







Hello,
Does anyone know a means of comparing two very large arrays (50000
elements) to determine which elements are common to both arrays? I've
tried using looping functions and that is far too slow.


Isabel,
Imagine one range (A) with 50,000 rows, 2 columns, a second range with
50,000 rows (B) 1 column. I want to find all the elements in B which
have matches in column 1 of A. For each match, I want to copy the
corresponding entry in A column 2 over to a new column 2 in B.

Example.
A=
1 a
2 f
3 u
4 w
5 q

B=
5
1
6
8
9


After running the code, B would be transformed to:
B=
5 q
1 a
8
9
6

To do this I create the following loops.

For k1 = 1 to 50000
dataX=worksheets(1).cells(k1,1)
For k2=1 to 50000
dataY= worksheets(2).cells(k2,1)

If dataX=dataY then
worksheets(1).cells(k1,2)=worksheets(2).cells(k2,2 )
end if

next
next

This code is accurate, but terrribly slow.

This loop has to execute 50000^2 loops, 2.5 billion loops. Let's say
that each iteration requires 20 CPU clock cycles, that's 50 billion
clock cycles. My CPU has a 2.5 GHz processor, so this ideally would
take 20 seconds. It takes more than that, about 45 seconds. If I
were doing this in Matlab or C, I could process this much data in less
than a second, but not because the CPU is faster. There are other
programming constructs available which are faster. I know Excel can
run faster than this, but I don't know any tricks on how to speed this
up.

Any ideas?

  #5   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 2,059
Default how to find union of two arrays

On May 22, 7:07*am, isabelle wrote:
Sub Macro1()
Dim Wks1 As Worksheet, Wks2 As Worksheet, k2 As Integer
Set Wks1 = Worksheets(1)
Set Wks2 = Worksheets(2)

[....]
Set Wks1 = Nothing
Set Wks2 = Nothing
End Sub


Why set Wks1 and Wks2 to Nothing before exiting the macro?

Since they are non-static local variables, I would expect VBA to
effectively do the same thing automagically.

Is there a memory leak in some revisions of VBA?


  #6   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 587
Default how to find union of two arrays

hi joeu2004,

Is there a memory leak ?

yes there are

in some revisions of VBA?

it's valid for all versions of VBA

--
isabelle

  #7   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default how to find union of two arrays

joeu2004 was thinking very hard :
On May 22, 7:07*am, isabelle wrote:
Sub Macro1()
Dim Wks1 As Worksheet, Wks2 As Worksheet, k2 As Integer
Set Wks1 = Worksheets(1)
Set Wks2 = Worksheets(2)

[....]
Set Wks1 = Nothing
Set Wks2 = Nothing
End Sub


Why set Wks1 and Wks2 to Nothing before exiting the macro?

Since they are non-static local variables, I would expect VBA to
effectively do the same thing automagically.

Is there a memory leak in some revisions of VBA?


My understanding of how this works is that it's just good programming
practice to do this because even though the objects are destroyed when
the procedure ends, the memory space that they occupied in the Stack
still persists. I may be wrong, but this is how it was explained to me
by the leading minds in Excel application development.

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


  #8   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,934
Default how to find union of two arrays

Imagine one range (A) with 50,000 rows, 2 columns, a second range with
50,000 rows (B) 1 column. I want to find all the elements in B which
have matches in column 1 of A. For each match, I want to copy the
corresponding entry in A column 2 over to a new column 2 in B.

Example.
A=
1 a
2 f
3 u
4 w
5 q

B=
5
1
6
8
9


After running the code, B would be transformed to:
B=
5 q
1 a
8
9
6

To do this I create the following loops.

For k1 = 1 To 50000
dataX = Worksheets(1).Cells(k1, 1)
For k2 = 1 To 50000
dataY = Worksheets(2).Cells(k2, 1)

If dataX = dataY Then
Worksheets(1).Cells(k1, 2) = Worksheets(2).Cells(k2, 2)
End If

Next
Next

This code is accurate, but terrribly slow.

This loop has to execute 50000^2 loops, 2.5 billion loops. Let's say
that each iteration requires 20 CPU clock cycles, that's 50 billion
clock cycles. My CPU has a 2.5 GHz processor, so this ideally would
take 20 seconds. It takes more than that, about 45 seconds. If I
were doing this in Matlab or C, I could process this much data in less
than a second, but not because the CPU is faster. There are other
programming constructs available which are faster. I know Excel can
run faster than this, but I don't know any tricks on how to speed this
up.


Why use code at all? You can do this with straight Excel formulas. Assuming
your range "A" is located on Sheet1, Columns A and B, starting in Row 1; and
assuming your range "B" is located in Column A starting in Row 1 on any
other sheet, put this formula in B1 on that other sheet and copy it down...

=IF(COUNTIF(Sheet1!A$1:A$500,A1),MATCH(A1,Sheet1!A $1:A$500,0),"")

Rick Rothstein (MVP - Excel)

  #9   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 2,059
Default how to find union of two arrays

On May 22, 7:43*am, "Rick Rothstein"
put this formula in B1 on that other sheet
and copy it down...
=IF(COUNTIF(Sheet1!A$1:A$500,A1),
MATCH(A1,Sheet1!A$1:A$500,0),"")


Presumably you mean:

=IF(COUNTIF(Sheet1!A$1:A$50000,A1),
INDEX(Sheet1!B$1:B$50000,MATCH(A1,Sheet1!A$1:A$500 00,0)),"")

Or

=IF(COUNTIF(Sheet1!A$1:A$50000,A1),
VLOOKUP(A1,Sheet1!A$1:B$50000,2,0),"")

Since COUNTIF will always do 50000 comparisons, on average this does
75000 comparisons for each of 50000 entries in Sheet2!A1:A50000, a
total of 3.75 billion comparisions versus 1.25 billion for Andrew's
algorithm (corrected).

Nonetheless, this __might__ be an improvement because it avoids the
interprocess communication and process scheduling time between VBA and
Excel. (But I would be surprised.)
  #10   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,934
Default how to find union of two arrays

put this formula in B1 on that other sheet
and copy it down...
=IF(COUNTIF(Sheet1!A$1:A$500,A1),
MATCH(A1,Sheet1!A$1:A$500,0),"")


Presumably you mean:

=IF(COUNTIF(Sheet1!A$1:A$50000,A1),
INDEX(Sheet1!B$1:B$50000,MATCH(A1,Sheet1!A$1:A$500 00,0)),"")

Or

=IF(COUNTIF(Sheet1!A$1:A$50000,A1),
VLOOKUP(A1,Sheet1!A$1:B$50000,2,0),"")


I can't believe I forgot to included it... I was going for the INDEX formula
version. Thanks for catching that.

Rick Rothstein (MVP - Excel)



  #11   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 1,045
Default how to find union of two arrays

On Sun, 22 May 2011 06:34:35 -0700 (PDT), Andrew wrote:

On May 21, 7:43*pm, isabelle wrote:
hi,

i don't know why your looping is far too slow, it should not, can you show your code

--
isabelle

Le 2011-05-21 11:43, Andrew a écrit :







Hello,
Does anyone know a means of comparing two very large arrays (50000
elements) to determine which elements are common to both arrays? *I've
tried using looping functions and that is far too slow.


Isabel,
Imagine one range (A) with 50,000 rows, 2 columns, a second range with
50,000 rows (B) 1 column. I want to find all the elements in B which
have matches in column 1 of A. For each match, I want to copy the
corresponding entry in A column 2 over to a new column 2 in B.

Example.
A=
1 a
2 f
3 u
4 w
5 q

B=
5
1
6
8
9


After running the code, B would be transformed to:
B=
5 q
1 a
8
9
6

To do this I create the following loops.

For k1 = 1 to 50000
dataX=worksheets(1).cells(k1,1)
For k2=1 to 50000
dataY= worksheets(2).cells(k2,1)

If dataX=dataY then
worksheets(1).cells(k1,2)=worksheets(2).cells(k2, 2)
end if

next
next

This code is accurate, but terrribly slow.

This loop has to execute 50000^2 loops, 2.5 billion loops. Let's say
that each iteration requires 20 CPU clock cycles, that's 50 billion
clock cycles. My CPU has a 2.5 GHz processor, so this ideally would
take 20 seconds. It takes more than that, about 45 seconds. If I
were doing this in Matlab or C, I could process this much data in less
than a second, but not because the CPU is faster. There are other
programming constructs available which are faster. I know Excel can
run faster than this, but I don't know any tricks on how to speed this
up.

Any ideas?


If you don't want to do it with worksheet formulas, as suggested by Rick, I think the following code will only have to execute 50,000 loops, so should run quicker.

It does assume that your RangeA is two columns and RangeB is one column, as you wrote. It also assumes that the data in RangeA is not the result of formulas, but that can obviously be changed in the code.

The code also assumes that if there are any duplicates in RangeA column1, that you only want to return the first one for the corresponding RangeB entry.

===================================
Option Explicit
Sub RangeAtoB()
Dim arB As Variant
Dim i As Long
Dim c As Range
'since RangeB is one column, must make it two
arB = Range("RangeB").Resize(columnsize:=2)

With Range("RangeA").Resize(columnsize:=1)
For i = 1 To UBound(arB)
Set c = .Find(what:=arB(i, 1), _
after:=.Item(1), _
LookIn:=xlValues, _
lookat:=xlWhole)
If Not c Is Nothing Then arB(i, 2) = c.Offset(columnoffset:=1)
Next i
End With

Range("rangeB").Resize(columnsize:=2) = arB
End Sub
==============================
  #12   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 2,059
Default how to find union of two arrays

On May 22, 6:34*am, Andrew wrote:
For k1 = 1 to 50000
dataX=worksheets(1).cells(k1,1)
For k2=1 to 50000
dataY= worksheets(2).cells(k2,1)
If dataX=dataY then
worksheets(1).cells(k1,2)=worksheets(2).cells(k2,2 )
end if
next
next

[....]
This loop has to execute 50000^2 loops, 2.5 billion loops.

[....]
I don't know any tricks on how to speed this up.


Well, that depends on details of the data and algorithm that you have
not mentioned.

(Note: I will use Sheet1 for Worksheets(1) and Sheet2 for
Worksheets(2), although those might not be the actual worksheet
names.)

First, your algorithm continues to look for a match even after finding
one. Effectively, it finds the __last__ match. Is that a
requirement? Or is it sufficient to find the __first__ match?

If you want to find the last match, your inner for-loop can be
improved by doing For k2=50000 to 1 step -1.

In either case, the inner loop can stop after finding a match. So you
can put the following statement before End If: Exit For.

Thus, on average a match is found after 25000 comparisons instead of
50000, assuming unique data in both Sheet1!A1:A50000 and Sheet2!
A1:A50000.

That cuts the total iterations in half, down to 1.25 billion.

Second, in your example, Sheet1!A1:A50000 is in ascending order. If
that is always the case, you could use a binary search instead of
linear search. Thus, a match is found in at most 16 comparisions.

That cuts the total iterations down to 800,000 at most, with some
probability that it is much less.

Finally, it is much more efficient to copy Sheet1!A1:B50000 and Sheet2!
A1:A50000 into a Variant variable one time, and to retain the results
in a Variant variable and copy them to Sheet2!B1:B50000 one time.

The time savings can be huge. Every time you access an Excel range,
VBA and Excel processes much exchange information -- two interprocess
communications -- and the Excel process must be scheduled to run while
the VBA process waits.

Moreover, even if Sheet1!A1:A50000 is not in ascending order, it is
probably advantageous to take the time to sort dataX below, if we
implemented a binary search (not included below).

So at a minimum (without the binary search), your VBA algorithm should
be:

Dim dataX, dataY
dataX = Worksheets(1).Range("A1:B50000")
dataY = Worksheets(2).Range("A1:A50000")
redim resultY(1 to 50000, 1)
for k1 = 1 to 50000: for k2 = 1 to 50000
if dataY(k1,1) = dataX(k2,1) then
resultY(k1,2) = dataX(k2,2)
exit for
end if
next: next
Worksheets(2).Range("B1:B50000") = resultY

(Note that I reversed the order of the for-loops. It would make a
difference if you implement a binary search of Sheet1!A1:A50000.)

Other improvements are possible and probably desirable; for example,
avoiding the hardcoded ranges. That is why I use Redim instead of Dim
for resultY.

By reducing the interprocess communications (copying of data between
VBA and Excel), I would be very surprised if that VBA implementation
fails to outperform any of the alternatives mentioned so far: VBA
Match/Index, .Find, and Excel COUNTIF/INDEX/MATCH.

(Although the Excel alternative might be more efficient if IFERROR can
be used instead of COUNTIF.)

I might post relative performance data later, if I get around to it.
  #13   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 2,059
Default how to find union of two arrays

Errata....

On May 22, 4:14*pm, I wrote:
redim resultY(1 to 50000, 1)


Should be:

redim resultY(1 to 50000, 1 to 1)

I wrote:
resultY(k1,2) = dataX(k2,2)


Should be:

resultY(k1,1) = dataX(k2,2)

I wrote:
Every time you access an Excel range, VBA and Excel
processes much exchange information -- two interprocess
communications


Should be: "must exchange"

  #14   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 2,059
Default how to find union of two arrays

On May 22, 4:14 pm, joeu2004 wrote:
Moreover, even if Sheet1!A1:A50000 is not in ascending
order, it is probably advantageous to take the time to
sort dataX below, if we implemented a binary search


No! A selection sort requires nearly 1.250 billion (10^9)
comparisions, overwhelming any benefit of the binary search.

Excel sort can do that in a blink of an eye. But a VBA implementation
takes "forever". Presumably VBA is significantly slower because it is
interpreted, not compiled.

I wrote:
By reducing the interprocess communications (copying
of data between VBA and Excel), I would be very surprised
if that VBA implementation fails to outperform any of the
alternatives mentioned so far: VBA Match/Index, .Find,
and Excel COUNTIF/INDEX/MATCH.


Color me "surprised"! When I made that comment, I forgot to account
for the fact that VBA is interpreted, not compiled.

Nonetheless, on my computer, a VBA binary search (50,000 times through
50,000 elements) is nearly 2.4 times faster than Isabelle's VBA Match
algorithm (significantly improved and modified to use a binary
search), and nearly 800 to 1200 times faster than Ron's VBA Find
algorithm and Rick's Excel COUNTIF/MATCH formula (improved to use
VLOOKUP and a binary search).

However, Rick's Excel formula can be nearly 3.7 times faster than the
VBA implementation if we use a helper column for the VLOOKUP (with a
binary search) and IF(ISERROR) instead of COUNTIF. In XL2007 and
later, I suspect we would get comparable results with IFERROR(VLOOKUP)
without the need for a helper column.

The improved VBA linear search is 1.8 to 2.2 times faster than Ron's
VBA Find algorithm and Rick's Excel COUNTIF/MATCH formula (improved to
use VLOOKUP, but a linear search).

However, Isabelle's VBA Match algorithm (significantly improved, but a
linear search) is nearly 5.8 times faster than the improved VBA linear
search. Even Isabelle's obviously inefficient implementation (doing
the Match twice) is nearly 2.9 times faster.

And again, Rick's Excel formula can be nearly 6 times faster than the
improved VBA linear search if we use a helper column for the VLOOKUP
(with a linear search).

YMMV significantly for a variety of reasons. Since my improved VBA
linear search algorithm runs nearly 7 times longer than what Andrew
claims ("45 seconds"), I suspect that my system is slow because of
small memory, small cache or both relative to more modern systems.

Also, my CPU is single-core and single-threaded (instruction
"threads", not process threads). At first, I did not think that would
be a factor, since VBA does not take advantage of multiple cores and
instruction threads. However, Excel can, perhaps favoring Rick's
approach. And I suspect it might improve Isabelle's and Ron's
approaches for similar reasons; but that is based on speculation.
  #15   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 5,934
Default how to find union of two arrays

I really like reading your messages... the amount of analysis you are
willing to perform, and then write up for us to read, is simply
awe-inspiring. Thank you for always taking the time to do this.

Rick Rothstein (MVP - Excel)



  #16   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 84
Default how to find union of two arrays


"Andrew" wrote in message

This loop has to execute 50000^2 loops, 2.5 billion loops.
Let's say that each iteration requires 20 CPU clock
cycles, that's 50 billion clock cycles.


You've been given many suggestions, particularly about taking advantage of
using Excel's built in functions. However which ever approach, with this
type of thing it's really worthwhile to start with both arrays sorted,
typically that's a one off task, or not much extra work when adding small
amounts of data.

Once both sorted, search for the first matching item (and further
duplicates). Then search the second item starting at index+1 of the last
item found. Obviously in any loop after finding a matching item, as soon as
the next item doesn't match (ie after any duplicates) exit that loop and
start the next. You should end up with a fraction of the amount of working
you are doing now.

If regularly trying to find individual items in a large sorted array, search
"binary chop", which will find matches in negligible time even in large
arrays.

Peter Thornton

  #17   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 3,514
Default how to find union of two arrays

Peter T pretended :
"Andrew" wrote in message

This loop has to execute 50000^2 loops, 2.5 billion loops.
Let's say that each iteration requires 20 CPU clock
cycles, that's 50 billion clock cycles.


You've been given many suggestions, particularly about taking advantage of
using Excel's built in functions. However which ever approach, with this type
of thing it's really worthwhile to start with both arrays sorted, typically
that's a one off task, or not much extra work when adding small amounts of
data.

Once both sorted, search for the first matching item (and further
duplicates). Then search the second item starting at index+1 of the last item
found. Obviously in any loop after finding a matching item, as soon as the
next item doesn't match (ie after any duplicates) exit that loop and start
the next. You should end up with a fraction of the amount of working you are
doing now.

If regularly trying to find individual items in a large sorted array, search
"binary chop", which will find matches in negligible time even in large
arrays.

Peter Thornton


Peter,
I think the time it takes lies in the fact that *Arrays* aren't being
compared, but rather the loops read the ranges directly.

If the 2 ranges were dumped into arrays where at least 1 array was
sorted, a binary search would cut the processing time considerably.
Better, though, (as you say) that both arrays are sorted. The
bottleneck is trying to loop the ranges directly while doing this in
memory would run orders of magnitude faster.<IMO

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


  #18   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 130
Default how to find union of two arrays

On May 23, 5:42*pm, GS wrote:
Peter T pretended :



"Andrew" wrote in message


This loop has to execute 50000^2 loops, 2.5 billion loops.
Let's say that each iteration requires 20 CPU clock
cycles, that's 50 billion clock cycles.


You've been given many suggestions, particularly about taking advantage of
using Excel's built in functions. However which ever approach, with this type
of thing it's really worthwhile to start with both arrays sorted, typically
that's a one off task, or not much extra work when adding small amounts of
data.


Once both sorted, search for the first matching item (and further
duplicates). Then search the second item starting at index+1 of the last item
found. Obviously in any loop after finding a matching item, as soon as the
next item doesn't match (ie after any duplicates) exit that loop and start
the next. You should end up with a fraction of the amount of working you are
doing now.


If regularly trying to find individual items in a large sorted array, search
"binary chop", which will find matches in negligible time even in large
arrays.


Peter Thornton


Peter,
I think the time it takes lies in the fact that *Arrays* aren't being
compared, but rather the loops read the ranges directly.

If the 2 ranges were dumped into arrays where at least 1 array was
sorted, a binary search would cut the processing time considerably.
Better, though, (as you say) that both arrays are sorted. The
bottleneck is trying to loop the ranges directly while doing this in
memory would run orders of magnitude faster.<IMO

--
Garry

Free usenet access athttp://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


There's some very good information in this thread. Thank you. I have
yet to try these find and match methods. Although I did use VLOOKUP
for a search and reduced the search time (for the same 50,000 rows of
data) down to 3 seconds. In using VLOOKUP I used named ranges for the
first two input arguments, the search parameter and the lookup table.
Then I tried to do the same thing using arrays instead of ranges. It
took at least one hundred times longer. I would think that it would
be faster to sort through an array than a range, but clearly this is
not true.

thanks again,
Andrew
  #19   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 84
Default how to find union of two arrays


"GS" wrote in message
Peter T pretended :
"Andrew" wrote in message

This loop has to execute 50000^2 loops, 2.5 billion loops.
Let's say that each iteration requires 20 CPU clock
cycles, that's 50 billion clock cycles.


You've been given many suggestions, particularly about taking advantage
of using Excel's built in functions. However which ever approach, with
this type of thing it's really worthwhile to start with both arrays
sorted, typically that's a one off task, or not much extra work when
adding small amounts of data.

Once both sorted, search for the first matching item (and further
duplicates). Then search the second item starting at index+1 of the last
item found. Obviously in any loop after finding a matching item, as soon
as the next item doesn't match (ie after any duplicates) exit that loop
and start the next. You should end up with a fraction of the amount of
working you are doing now.

If regularly trying to find individual items in a large sorted array,
search "binary chop", which will find matches in negligible time even in
large arrays.

Peter Thornton


Peter,
I think the time it takes lies in the fact that *Arrays* aren't being
compared, but rather the loops read the ranges directly.

If the 2 ranges were dumped into arrays where at least 1 array was sorted,
a binary search would cut the processing time considerably. Better,
though, (as you say) that both arrays are sorted. The bottleneck is trying
to loop the ranges directly while doing this in memory would run orders of
magnitude faster.<IMO


Garry,

Indeed, reading cells is considerably longer than reading array values,
though there will be some overhead initially reading 2x50k cells to a array
or pair of arrays.

It really is important to sort the first array, whether using built-in Excel
functions or VBA. Also if using VBA extremely worthwhile to sort the 2nd
array. That's because for each cell in the 2nd array, code will only need to
look from the index+1 of the last match to the first item that doesn't
match, so much less work than available to Excel's built-in functions.
Despite Excel's normally faster methods, it's difficult to take advantage of
that approach in Excel (although could be done) so VBA might be
easier/faster.

As for sorting, as you say in your adjacent post "that depends on how you do
it". There are some extremely fast sort methods out there for VBA, some
compare well if not better than Excel's built-in method (excluding the time
to transfer from cells to array to cells), particularly if the data is all
say doubles.

However to keep things simple, maybe get VBA to leverage Excel's sort, then
compare the sorted data in VBA as above. Without testing not sure which'd be
fastest, VBA+Excel or just Excel.

Peter T


Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Find Min and Max in variable length arrays Dave Excel Programming 1 June 3rd 10 06:22 AM
Compare two arrays in VBA to find dupes John Michl Excel Programming 2 February 6th 07 05:34 PM
Arrays - declaration, adding values to arrays and calculation Maxi[_2_] Excel Programming 1 August 17th 06 04:13 PM
UNION of Arrays - is possible? Marina Limeira Excel Discussion (Misc queries) 1 January 22nd 06 12:38 PM
Where can I find a template for a Credit Union account? Bill Sisk Excel Discussion (Misc queries) 1 August 13th 05 01:43 PM


All times are GMT +1. The time now is 02:42 AM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"