View Single Post
  #12   Report Post  
Posted to microsoft.public.excel.programming
MikeZz MikeZz is offline
external usenet poster
 
Posts: 152
Default Very Slow reading excel data into an array (while opened in ne

Hi Joel,
I just made a lengthy reply to JP with more info.

I think I have solved part of my issues with memory and speed but do have a
couple follow Q's for you:

Q1:
How can I read an excel file into memory without opening it?
The file is converstion of a pdf so it has no table structure.... no column
headers, just data spread all over with a lot of blank cells.

Q2:
Now the problem opening each data file in it's own xlApp, is that it takes
over my computer. Before, I could start the macro, minimize it and do other
things. Now Focus goes back to excel for each new file and so I can't type
for more than a half second before I jump back to excel.

Any further help would be great!

Thanks,
MikeZz

"Joel" wrote:

It sounds like you are opening and closing the files properly. I can't fix
memory leaks in excel. I know they exist. I don't know if the files you are
opening and closing are casing the meory leak or the spreadsheet you are
running the macro is growing very large. One thing yu may try is to perform
a save after you open every 10 files on the Master Worksheet. This may
release some memory. the second thing you can try is reading the workbooks
without opening the workbooks. There is a method in doing this. You see the
file struct of microsoft applications (word, access, excel) use the same
structure. When you have large Access Databases that are shared by many
people you want to allow the database to be read without opening. You can
use Access commands to read excel files. Excel and Access use the same TABLE
structure to store and read data. Excel command are more flexible than
access. Excel allows random access of the tables while access yu have to
move to the cell location before you can read the data. The move
instructions say move one row or move one column until you reach the lcoation
you are looking for.

"MikeZz" wrote:

Hi JP,
Thanks for the tip on #1. It's duely noted. I never bothered in most cases
because it never impacted performance for me before in other projects where
I've created arrays up to 80 columns wide with 250,000 rows.

#2: I used "Dim xlApp as New...." because I was having problems with my
original code not releasing memory.
If you search for another of my threads "What is proper way to Open Close
Release xls files in VBA?", you will see my original code. The problem I was
originally having by not creating a new app instance was that the Task
Manager kept showing Excel memory going up. At around my 150th file, it all
started going very slow. The routine took about 45 minutes to complete with
TM showing over 100MB of Excel memory but it completed without memory crashes.

If you know of a better way to open and close several hundred excel files...
reading the data from sheet1 each time into an array, I'm all ears! BTW, the
data array gets recycled each time it opens a new file so I don't have to
maintain all the data in memory... I process the file's data then can release
the original data.

#3: Thanks for showing how to read a range into an array.... I've been
doing it the hard way for tooo long.

My Goal:
Read an excel file (with only one sheet) into an array.
Close the excel file.
Use my existing routines to extract pieces of data I need and put into a
"Summary" workbook.
Repeat this process up to several hundred times.
Not watch my hair turn grey while waiting for it to complete the job.

The files themselves are relatively small... 10 columns of data and maybe
only 500 rows.
Out of that, I only need a small amount of information.

I can process 100 files in under a couple minutes.
It can take 30-45 minutes to do 300 files.

Thanks for your help again,
MikeZz

"JP" wrote:

With all due respect, your code is a mess. A few things I noticed:

1) You are declaring variables without typecasting them. For example:

Dim rowsMaster, colsMaster, lastCellMaster
Dim rowMax, rightCol
Dim FoundIndent
Dim matCount, matTotal
Dim ctCellMaster
Dim testRowCountMat
Dim alertStat
Dim tempXLFile

Excel is treating all of these as Variant type which is exponentially
increasing your runtime. Declaring them as variables of a specific
type will go a long way towards micro-optimization.

2) Never use "Dim xlApp As New Excel.Application" because you are
creating a new instance of Excel at the same time you are declaring
the variable. This is never the behavior you want. Are you using this
code in Excel? If so, you can just declare Workbook and Worksheet
variables and skip the reference to the Application object.

3) You can read a worksheet range into an array by simply putting the
Value property of the Range object into the array. For example:

Dim arrData() as Variant
arrData = Range("A1:D5480").Value

And the range will auto-dimension itself to accomodate.


I think perhaps also you should explain what your goal is, because
maybe there is a better way to do what you want, which would lead to
optimization of several orders of magnitude.

--JP


On Sep 12, 4:42 pm, MikeZz wrote:
I have a VBA application that reads in many excel files (sometimes in the
100's), one at a time, scans them for key info, and summarizs the data in a
new workbook. The routine actually doing the reading is the first one:
ReadNewContract below.

After about the 150th file (each usually under 40k), the macro started
coming to a crawl and I noticed in Task Manager that Excel was using more and
more memory (pushing 100MB). Excel was not releasing the files out of VBA
Project.

In a post from Tom Ogilvy, I saw that I could open the file in a new xlApp,
then close it when I was done reading the data.

This got rid of the memory issue but it dramatically increased the time it
took to import the excel data worksheet into an array I could use.

In the original code, I read the excel file into an array using this:
arrMaster(r, c - LeftIndent) = ActiveSheet.Cells(r, c)
I didn't need the xlApp reference because of the way I opened the workbook.

In my revised code, I read the excel file into an array using this:
arrMaster(r, c - LeftIndent) = xlApp.ActiveSheet.Cells(r, c)

For some reason, it is just taking an incredible amount of time just to put
an excel worksheet into an array. In my original code, it happened in a
blink of an eye. With the "improved" code, it takes several seconds.

Can anyone tell what I'm doing wrong?
Is there a better way to read in the excel data?

Thanks!
MikeZz

To make it a cut and paste for testing, I have all basic code attached.
I put all the delcarations at the end so it's easier to find the code in
question.
Search for: '############### QUESTION HERE
to find the area in question.

Thanks!
MikeZz

Sub ReadNewContract(fileNo, arrMaster)

Dim MasterFile
Dim f, c, r
Dim lngCount As Long

Dim Master As Workbook
Dim masterSht As Worksheet
Dim rowsMaster, colsMaster, lastCellMaster
Dim rowMax, rightCol
Dim FoundIndent

Dim matCount, matTotal
Dim ctCellMaster
Dim testRowCountMat
Dim alertStat
Dim tempXLFile

Dim xlApp As New Excel.Application 'ADDED FOR MEMORY

Dim FileString As String 'ADDED FOR MEMORY
xlApp.Application.Visible = True 'ADDED FOR MEMORY

'################################################# ###########################################
'########### READ IN MASTER FILE
'################################################# ###########################################

If fileNo = 1 Or IsEmpty(fileLocExcel) Then
fileLocExcel = Get_File_Info(arrFiles(fileNo, colFileName), "Directory")
End If

FileString = arrFiles(fileNo, colFileName) 'ADDED FOR MEMORY

xlApp.Workbooks.Open (FileString) 'Focus is now on the workbook 'ADDED FOR
MEMORY
'Workbooks.Open (arrFiles(fileNo, colFileName))

Set Master = xlApp.ActiveWorkbook
Set masterSht = xlApp.ActiveSheet
MasterFile = Master.Name

lastCellMaster = LastCellIn(masterSht)
rowsMaster = LastRowIn(masterSht)
arrFiles(fileNo, colFileRows) = rowsMaster
colsMaster = LastColIn(masterSht)

If rowsMaster = Empty Or colsMaster = Empty Then
Exit Sub
End If

ctCellMaster = 0
ReDim arrMaster(0)
ReDim arrMaster(1 To rowsMaster, 0 To colsMaster)
For r = 1 To rowsMaster
LeftIndent = 0
FoundIndent = False
rightCol = 0
For c = 1 To colsMaster

'################################################# ####################
'############### QUESTION HERE ###########################
'
' "xlApp.ActiveSheet.Cells(r, c)" seems to run magnitudes slower than using
' ActiveSheet.Cells(r, c) on a regular active sheet
' in original application instance.
' Is there another way?
'################################################# ####################
'################################################# ####################

If alignLeft = True And FoundIndent = False And
Len(xlApp.ActiveSheet.Cells(r, c)) = 0 Then
LeftIndent = LeftIndent + 1: GoTo nextMc
End If
FoundIndent = True
arrMaster(r, c - LeftIndent) = xlApp.ActiveSheet.Cells(r, c)
If Len(arrMaster(r, c - LeftIndent)) < 0 Then rightCol = c - LeftIndent
nextMc:
Next c
arrMaster(r, 0) = rightCol
Next r

Master.Close SaveChanges:=False

Set masterSht = Nothing
Set Master = Nothing
xlApp.Quit
Set xlApp = Nothing 'ADDED FOR MEMORY

End Sub

Private Sub Get_File_List()
Dim lngCount
Dim maxcols

Call Initialize_Values
maxcols = colFileMaxx

With Application.FileDialog(msoFileDialogOpen)
.AllowMultiSelect = True
.Show

fileCount = .SelectedItems.Count
ReDim arrFiles(0)
ReDim arrFiles(0 To .SelectedItems.Count, 1 To maxcols)
' Display paths of each file selected

If fileCount = 0 Then End

For lngCount = 1 To fileCount
arrFiles(lngCount, colFileName) = .SelectedItems(lngCount)
Next lngCount

End With

Dim f
For f = 1 To fileCount
Call ReadNewContract(f, arrImport)
Next f

End Sub

Private Sub Initialize_Values()
Dim col0, r

ScanTime = Format(Now, "Medium Time")

dateSummaryFormat = Control.Range("dateSummaryFormat")
Select Case dateSummaryFormat
Case "yyyy-mm-dd"
ScanDate = Format(Date, dateSummaryFormat)
Case "dd-mmm-yy"
ScanDate = Format(Date, dateSummaryFormat)
Case "yyyy-ww-ddd"
ScanDate = Format(Date, dateSummaryFormat)
Case Else
ScanDate = Format(Date, "Medium Time")

End Select

'AutoSaveFile = "x" & ScanDate & "Contracts Scanned " & " " &
Replace(ScanTime, ":", "_") & ".xls"
AutoSaveFile = ScanDate & "Contracts Scanned " & " " & Replace(ScanTime,
":", "_") & ".xls"

fileCount = 0
completeCount = 0
col0 = 1
colFileName = col0: col0 = col0 + 1 'Filename
colFileStat = col0: col0 = col0 + 1 'Scan Status
colFileCust = col0: col0 = col0 + 1
colFileType = col0: col0 = col0 + 1 'Contract Type - Part, Tooling, etc
colFileProd = col0: col0 = col0 + 1 'Contract Part Product Area AB/SB/SW/EL
colFilePro2 = col0: col0 = col0 + 1
colFileProj = col0: col0 = col0 + 1 'Contract ALV Project Number
colFileDesc = col0: col0 = col0 + 1 'Contract Part Description
colFilePNum = col0: col0 = col0 + 1 'Contract Part Number