ExcelBanter - View Single Post

GS[_2_]

Ron Rosenfeld wrote on 1/15/2012 :
On Sun, 15 Jan 2012 14:39:30 -0500, GS wrote:

Ron,
I'd appreciate feedback on using my (3rd) posted code on your test
data! I extracted the concept of using the array approach from an app I
have for filtering out rows of data from a data logging output file.
This requires at least xl12 to work due to the amount of data being
just under 1GB. I believe the limit on array size is 2GB but since my
app uses its own instance of Excel there's nothing else running in its
memory space.

Gary,

When I interrupted it it had been running for 219 seconds. At that point in
time it had eliminated 617 entries from the column A list.

I then started up my "dictionary" routine. It ran for about 58.6 seconds and
eliminated the remaining 260,493 duplicated entries.

To set up the sample data, I enter a formula like:

A1 & B1: =text(randbetween(1,10^6),"0000000000000")
Fill down to row 500,000.
Then copy/Paste Values

For timing I use the HiRes timer.

I initially tried an approach like yours:
Examine each cell
If the data is invalid, delete the cell and rearrange the rest (delete
xlshiftup)

After some thought, I decided it should be faster to
Get all the good data into a sequential array.
Delete ALL the original data
Write back the good data array.

The approach I used, using the dictionary, works pretty fast. It's
disadvantage is that if duplicates in the original data should be retained,
it would have to be modified. (i.e. if there are multiple 0000000123456's in
column A, and none of that value in Column B, and the multiple values all
need to be retained in column A; and they need to be retained in their
original order). Fortunately, that is not the case.

And if I had Excel 2010, the Advanced Filter might work. I would
filter/copy; then delete the original and write back the copy. That would
work even with duplicates. But it won't work in Excel 2007 with this data
base (and seems to run slower even with smaller databases).

Thanks, Ron.
I suspected it would take a long time since it writes the worksheet in
the loop. My source code does everything in memory using the arrays and
sett matches to an empty string. Note that the source data is read n
from a data logger output file, NOT from a worksheet. I just dump the
result back into a blank sheet. I can't use the dictionary because I
need to preserve duplicates. The match criteria is an ambient
temperature value in a line of text at a specific position and so if
it's not '=' then I reset the array element to an empty string, then
use the Filter() function to dump the resulting data into a worksheet.
Since there's only 1 array to loop once only the process is really
fast.

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc