View Single Post
  #13   Report Post  
Posted to microsoft.public.excel.programming
MikeZz MikeZz is offline
external usenet poster
 
Posts: 152
Default Very Slow reading excel data into an array (while opened in ne

Hi JP, I'll try to answer all your questions.
First, I went down the path of creating new instances because I

had a huge memory problem ( Excel kept going higher and higher

bringing it to a crawl ). I tried creating a new instance because

it seemed to resolve the issue. The problem then became that it

took forever just to read in the data... No hard data but just

looking at the clock I'd say it went from "instantly" to several

seconds.

Responding to your questions in a previous post:
First, let me explain the purpose of this app. Our customer sends

us hundreds of contracts each month in pdfs. The information is

structured so poorly it's very difficult to review... my guess is

it's done this way intentionally. If you saw it, you wouldn't

believe it.

We use a batch pdf converter to convert the files to excel. I

then wrote an app that can pick bits and pieces of info spread all

over a very chaotic the file and summarize it in an excel table.

The file itself is about 95% blank cells and 99% of the actual

data is useless. We only need the 1% that's randomly spread in

between. The end result of my app was really a quantum leap in

productivity. We would have someone manually read and retype a

lot of information from pdf contracts taking many hours a week.

Now it's relatively instant with a dramatic decrease in time with

a dramatic increase in accuracy of contract review.

Basically, I'm trying to fix a problem that in all honesty, could

be fixed by the customer just by reformating their contracts. The

thing is that if they make it easier to read, it would help to

highlight errors that are sometimes intentional... because it

gives them more money. Harsh but true.

So anyway, I have all the logic to scan through an array to pluck

the few bits of useful info out of the file. It works and it

works extremely well. I just need to get the data into an array.

1)... Because of the above described randomness, a pivot table

wouldn't work because the files I'm opening are not in a table

format.

2) I put the files into the same folder. They are just

translations from pdf so I don't update them. The ranges are all

random. Basically I have to read the entire file into an array...

which is usually under 20 columns but could be a couple thousand

rows long.

3) I can't simply copy and paste data... like I said, what I'm

doing is like taking a shotgun blast, and putting 1 out of 100

pellets in a nice straight line.

4) I've went through it pretty throughly. It's extremely

complicated but I think I've done a good job of getting it to

solve a difficult problem.

5) The few routines I pasted are only a fraction of the total

code. You are seeing the entire Declaration which may be quite

extensive.
colFileName = col0: col0 = col0 + 1
I've found the above to actually work rather nicely because in

development, I sometimes change the output table and this allows

the most flexibility to add or move data columns around.

6) r = 1 To rowsMaster
Part of my poor naming because I actually pulled this code from

another project.
Master is just the data file and arrMaster is just the temp array

containing the data found in the file called "Master".
I'll try the technique you mentioned as it is probably a lot

quicker.
Is there a quick way to call range a1 to the "bottom right" of the

file?

A couple other comments from Joel's post I'll also answer he
I occaisionally auto save the summary file since it does get

bigger and this solved a previous out of memory error I had

before, however, a summary file with 150 contracts summarized in

it is still only about 6MB of file size so it's not much. I

regularly deal with excel files pushing 30+MB without issues.

.... Ok, after all that, I just tried the bulk read a range into an

array like you suggested and that it seemed to help ALOT. Now, if

I pursue this method of creating new xlApps, how do I stop my

screen from blinking like crazy... Basically before creating all

the new apps, I could run the macro and do anything else outside

of excel. Now whenever the a file opens to read (about 2x a

second), it moves focus from whatever I'm doing back to excel...

can't even catch up on the news any more!

Thanks again for all your help... I'd be amazed if just a single

command will improve my code that much but hey, I'll take it any

way I can get.




"JP" wrote:

The proper way, which I'm sure you know, is to declare an object of
the type you want, then destroy it at the end of the macro:

Sub MyMacro()
Dim xl As Application ' Excel.Application
Set xl = Application
' your code here
Set xl = Nothing
End Sub

I think at least part of the problem is your goal statement. Your
actual goal appears to be to summarize data contained in several
hundred workbooks. But your goal is polluted with one particular path
towards it. Rethinking the path to your goal might help write better
code.

1) Is there a better way to accomplish this goal --
For example, can you use the Pivot Table's Multiple Consolidation
Ranges feature to grab the information from each workbook? The ranges
all have to have the same headers. If they aren't like that, can they
be made that way?

2) Is there anything you know about the files/folders that can be used
to optimize the code --
Are the files are kept in the same folder? If not, can they be moved
into the same folder for this purpose?
Are the files static, or are they updated periodically? If so, are all
of them updated, or just some?
Do the ranges you want to copy in each file have the same name and
size? Or is each range in each worksheet a different size?

3) Why does the information need to be read into an array?
Can you simply cut and paste the information into one super-worksheet?

4) Have you stepped through the code to see where it could be
shortened or changed?

5) The code section that starts "colFileName = col0: col0 = col0 + 1
'Filename "
What are you trying to do here? Can the information be handled some
other way?

6) What is the purpose of the code section that starts "For r = 1 To
rowsMaster " - you should try to lessen hits on the worksheet as much
as possible using the array technique I showed you. Then you can loop
through the array, which will be much faster than telling VBA to keep
hitting the worksheet, which (relatively speaking) is much more costly
in terms of speed and efficiency.


I found a routine that might help you. It loops through a folder and
you use it to do something with each file found.

http://www.vbaexpress.com/kb/getarticle.php?kb_id=245

HTH,
JP

On Sep 12, 10:44 pm, MikeZz wrote:
#2: I used "Dim xlApp as New...." because I was having problems with my
original code not releasing memory.
If you search for another of my threads "What is proper way to Open Close
Release xls files in VBA?", you will see my original code. The problem I was
originally having by not creating a new app instance was that the Task
Manager kept showing Excel memory going up. At around my 150th file, it all
started going very slow. The routine took about 45 minutes to complete with
TM showing over 100MB of Excel memory but it completed without memory crashes.

If you know of a better way to open and close several hundred excel files....
reading the data from sheet1 each time into an array, I'm all ears! BTW, the
data array gets recycled each time it opens a new file so I don't have to
maintain all the data in memory... I process the file's data then can release
the original data.

#3: Thanks for showing how to read a range into an array.... I've been
doing it the hard way for tooo long.

My Goal:
Read an excel file (with only one sheet) into an array.
Close the excel file.
Use my existing routines to extract pieces of data I need and put into a
"Summary" workbook.
Repeat this process up to several hundred times.
Not watch my hair turn grey while waiting for it to complete the job.