View Single Post
  #6   Report Post  
Posted to microsoft.public.excel.programming
Ron Rosenfeld[_2_] Ron Rosenfeld[_2_] is offline
external usenet poster
 
Posts: 1,045
Default how to find union of two arrays

On Sun, 22 May 2011 06:34:35 -0700 (PDT), Andrew wrote:

On May 21, 7:43*pm, isabelle wrote:
hi,

i don't know why your looping is far too slow, it should not, can you show your code

--
isabelle

Le 2011-05-21 11:43, Andrew a écrit :







Hello,
Does anyone know a means of comparing two very large arrays (50000
elements) to determine which elements are common to both arrays? *I've
tried using looping functions and that is far too slow.


Isabel,
Imagine one range (A) with 50,000 rows, 2 columns, a second range with
50,000 rows (B) 1 column. I want to find all the elements in B which
have matches in column 1 of A. For each match, I want to copy the
corresponding entry in A column 2 over to a new column 2 in B.

Example.
A=
1 a
2 f
3 u
4 w
5 q

B=
5
1
6
8
9


After running the code, B would be transformed to:
B=
5 q
1 a
8
9
6

To do this I create the following loops.

For k1 = 1 to 50000
dataX=worksheets(1).cells(k1,1)
For k2=1 to 50000
dataY= worksheets(2).cells(k2,1)

If dataX=dataY then
worksheets(1).cells(k1,2)=worksheets(2).cells(k2, 2)
end if

next
next

This code is accurate, but terrribly slow.

This loop has to execute 50000^2 loops, 2.5 billion loops. Let's say
that each iteration requires 20 CPU clock cycles, that's 50 billion
clock cycles. My CPU has a 2.5 GHz processor, so this ideally would
take 20 seconds. It takes more than that, about 45 seconds. If I
were doing this in Matlab or C, I could process this much data in less
than a second, but not because the CPU is faster. There are other
programming constructs available which are faster. I know Excel can
run faster than this, but I don't know any tricks on how to speed this
up.

Any ideas?


If you don't want to do it with worksheet formulas, as suggested by Rick, I think the following code will only have to execute 50,000 loops, so should run quicker.

It does assume that your RangeA is two columns and RangeB is one column, as you wrote. It also assumes that the data in RangeA is not the result of formulas, but that can obviously be changed in the code.

The code also assumes that if there are any duplicates in RangeA column1, that you only want to return the first one for the corresponding RangeB entry.

===================================
Option Explicit
Sub RangeAtoB()
Dim arB As Variant
Dim i As Long
Dim c As Range
'since RangeB is one column, must make it two
arB = Range("RangeB").Resize(columnsize:=2)

With Range("RangeA").Resize(columnsize:=1)
For i = 1 To UBound(arB)
Set c = .Find(what:=arB(i, 1), _
after:=.Item(1), _
LookIn:=xlValues, _
lookat:=xlWhole)
If Not c Is Nothing Then arB(i, 2) = c.Offset(columnoffset:=1)
Next i
End With

Range("rangeB").Resize(columnsize:=2) = arB
End Sub
==============================