View Single Post
  #5   Report Post  
Posted to microsoft.public.excel.programming
MikeZz MikeZz is offline
external usenet poster
 
Posts: 152
Default Very Slow reading excel data into an array (while opened in ne

Hi JP,
Thanks for the tip on #1. It's duely noted. I never bothered in most cases
because it never impacted performance for me before in other projects where
I've created arrays up to 80 columns wide with 250,000 rows.

#2: I used "Dim xlApp as New...." because I was having problems with my
original code not releasing memory.
If you search for another of my threads "What is proper way to Open Close
Release xls files in VBA?", you will see my original code. The problem I was
originally having by not creating a new app instance was that the Task
Manager kept showing Excel memory going up. At around my 150th file, it all
started going very slow. The routine took about 45 minutes to complete with
TM showing over 100MB of Excel memory but it completed without memory crashes.

If you know of a better way to open and close several hundred excel files...
reading the data from sheet1 each time into an array, I'm all ears! BTW, the
data array gets recycled each time it opens a new file so I don't have to
maintain all the data in memory... I process the file's data then can release
the original data.

#3: Thanks for showing how to read a range into an array.... I've been
doing it the hard way for tooo long.

My Goal:
Read an excel file (with only one sheet) into an array.
Close the excel file.
Use my existing routines to extract pieces of data I need and put into a
"Summary" workbook.
Repeat this process up to several hundred times.
Not watch my hair turn grey while waiting for it to complete the job.

The files themselves are relatively small... 10 columns of data and maybe
only 500 rows.
Out of that, I only need a small amount of information.

I can process 100 files in under a couple minutes.
It can take 30-45 minutes to do 300 files.

Thanks for your help again,
MikeZz

"JP" wrote:

With all due respect, your code is a mess. A few things I noticed:

1) You are declaring variables without typecasting them. For example:

Dim rowsMaster, colsMaster, lastCellMaster
Dim rowMax, rightCol
Dim FoundIndent
Dim matCount, matTotal
Dim ctCellMaster
Dim testRowCountMat
Dim alertStat
Dim tempXLFile

Excel is treating all of these as Variant type which is exponentially
increasing your runtime. Declaring them as variables of a specific
type will go a long way towards micro-optimization.

2) Never use "Dim xlApp As New Excel.Application" because you are
creating a new instance of Excel at the same time you are declaring
the variable. This is never the behavior you want. Are you using this
code in Excel? If so, you can just declare Workbook and Worksheet
variables and skip the reference to the Application object.

3) You can read a worksheet range into an array by simply putting the
Value property of the Range object into the array. For example:

Dim arrData() as Variant
arrData = Range("A1:D5480").Value

And the range will auto-dimension itself to accomodate.


I think perhaps also you should explain what your goal is, because
maybe there is a better way to do what you want, which would lead to
optimization of several orders of magnitude.

--JP


On Sep 12, 4:42 pm, MikeZz wrote:
I have a VBA application that reads in many excel files (sometimes in the
100's), one at a time, scans them for key info, and summarizs the data in a
new workbook. The routine actually doing the reading is the first one:
ReadNewContract below.

After about the 150th file (each usually under 40k), the macro started
coming to a crawl and I noticed in Task Manager that Excel was using more and
more memory (pushing 100MB). Excel was not releasing the files out of VBA
Project.

In a post from Tom Ogilvy, I saw that I could open the file in a new xlApp,
then close it when I was done reading the data.

This got rid of the memory issue but it dramatically increased the time it
took to import the excel data worksheet into an array I could use.

In the original code, I read the excel file into an array using this:
arrMaster(r, c - LeftIndent) = ActiveSheet.Cells(r, c)
I didn't need the xlApp reference because of the way I opened the workbook.

In my revised code, I read the excel file into an array using this:
arrMaster(r, c - LeftIndent) = xlApp.ActiveSheet.Cells(r, c)

For some reason, it is just taking an incredible amount of time just to put
an excel worksheet into an array. In my original code, it happened in a
blink of an eye. With the "improved" code, it takes several seconds.

Can anyone tell what I'm doing wrong?
Is there a better way to read in the excel data?

Thanks!
MikeZz

To make it a cut and paste for testing, I have all basic code attached.
I put all the delcarations at the end so it's easier to find the code in
question.
Search for: '############### QUESTION HERE
to find the area in question.

Thanks!
MikeZz

Sub ReadNewContract(fileNo, arrMaster)

Dim MasterFile
Dim f, c, r
Dim lngCount As Long

Dim Master As Workbook
Dim masterSht As Worksheet
Dim rowsMaster, colsMaster, lastCellMaster
Dim rowMax, rightCol
Dim FoundIndent

Dim matCount, matTotal
Dim ctCellMaster
Dim testRowCountMat
Dim alertStat
Dim tempXLFile

Dim xlApp As New Excel.Application 'ADDED FOR MEMORY

Dim FileString As String 'ADDED FOR MEMORY
xlApp.Application.Visible = True 'ADDED FOR MEMORY

'################################################# ###########################################
'########### READ IN MASTER FILE
'################################################# ###########################################

If fileNo = 1 Or IsEmpty(fileLocExcel) Then
fileLocExcel = Get_File_Info(arrFiles(fileNo, colFileName), "Directory")
End If

FileString = arrFiles(fileNo, colFileName) 'ADDED FOR MEMORY

xlApp.Workbooks.Open (FileString) 'Focus is now on the workbook 'ADDED FOR
MEMORY
'Workbooks.Open (arrFiles(fileNo, colFileName))

Set Master = xlApp.ActiveWorkbook
Set masterSht = xlApp.ActiveSheet
MasterFile = Master.Name

lastCellMaster = LastCellIn(masterSht)
rowsMaster = LastRowIn(masterSht)
arrFiles(fileNo, colFileRows) = rowsMaster
colsMaster = LastColIn(masterSht)

If rowsMaster = Empty Or colsMaster = Empty Then
Exit Sub
End If

ctCellMaster = 0
ReDim arrMaster(0)
ReDim arrMaster(1 To rowsMaster, 0 To colsMaster)
For r = 1 To rowsMaster
LeftIndent = 0
FoundIndent = False
rightCol = 0
For c = 1 To colsMaster

'################################################# ####################
'############### QUESTION HERE ###########################
'
' "xlApp.ActiveSheet.Cells(r, c)" seems to run magnitudes slower than using
' ActiveSheet.Cells(r, c) on a regular active sheet
' in original application instance.
' Is there another way?
'################################################# ####################
'################################################# ####################

If alignLeft = True And FoundIndent = False And
Len(xlApp.ActiveSheet.Cells(r, c)) = 0 Then
LeftIndent = LeftIndent + 1: GoTo nextMc
End If
FoundIndent = True
arrMaster(r, c - LeftIndent) = xlApp.ActiveSheet.Cells(r, c)
If Len(arrMaster(r, c - LeftIndent)) < 0 Then rightCol = c - LeftIndent
nextMc:
Next c
arrMaster(r, 0) = rightCol
Next r

Master.Close SaveChanges:=False

Set masterSht = Nothing
Set Master = Nothing
xlApp.Quit
Set xlApp = Nothing 'ADDED FOR MEMORY

End Sub

Private Sub Get_File_List()
Dim lngCount
Dim maxcols

Call Initialize_Values
maxcols = colFileMaxx

With Application.FileDialog(msoFileDialogOpen)
.AllowMultiSelect = True
.Show

fileCount = .SelectedItems.Count
ReDim arrFiles(0)
ReDim arrFiles(0 To .SelectedItems.Count, 1 To maxcols)
' Display paths of each file selected

If fileCount = 0 Then End

For lngCount = 1 To fileCount
arrFiles(lngCount, colFileName) = .SelectedItems(lngCount)
Next lngCount

End With

Dim f
For f = 1 To fileCount
Call ReadNewContract(f, arrImport)
Next f

End Sub

Private Sub Initialize_Values()
Dim col0, r

ScanTime = Format(Now, "Medium Time")

dateSummaryFormat = Control.Range("dateSummaryFormat")
Select Case dateSummaryFormat
Case "yyyy-mm-dd"
ScanDate = Format(Date, dateSummaryFormat)
Case "dd-mmm-yy"
ScanDate = Format(Date, dateSummaryFormat)
Case "yyyy-ww-ddd"
ScanDate = Format(Date, dateSummaryFormat)
Case Else
ScanDate = Format(Date, "Medium Time")

End Select

'AutoSaveFile = "x" & ScanDate & "Contracts Scanned " & " " &
Replace(ScanTime, ":", "_") & ".xls"
AutoSaveFile = ScanDate & "Contracts Scanned " & " " & Replace(ScanTime,
":", "_") & ".xls"

fileCount = 0
completeCount = 0
col0 = 1
colFileName = col0: col0 = col0 + 1 'Filename
colFileStat = col0: col0 = col0 + 1 'Scan Status
colFileCust = col0: col0 = col0 + 1
colFileType = col0: col0 = col0 + 1 'Contract Type - Part, Tooling, etc
colFileProd = col0: col0 = col0 + 1 'Contract Part Product Area AB/SB/SW/EL
colFilePro2 = col0: col0 = col0 + 1
colFileProj = col0: col0 = col0 + 1 'Contract ALV Project Number
colFileDesc = col0: col0 = col0 + 1 'Contract Part Description
colFilePNum = col0: col0 = col0 + 1 'Contract Part Number
colFileLNum = col0: col0 = col0 + 1 'Contract Less Finish Part Number

colFileCNum = col0: col0 = col0 + 1 'Contract Number
colFileRevs = col0: col0 = col0 + 1 'Contract Revision
colFileDate = col0: col0 = col0 + 1 'Contract Date
colFileReas = col0: col0 = col0 + 1 'Contract Amendment Reason

colFileESOP = col0: col0 = col0 + 1 'Contract Effective Start Date
colFileEEOP = col0: col0 = col0 + 1 'Contract Effective End Date
colFilePeri = col0: col0 = col0 + 1 'Price Periods
colFilePric = col0: col0 = col0 + 1 'Starting Price
colFileLTAs = col0: col0 = col0 + 1 'LTA %= col0: col0 = col0 + 1 's
colFilePri2 = col0: col0 = col0 + 1 'Ending Price (part contracts)
colFileCurr = col0: col0 = col0 + 1 'Currency Type - First
colFileCurX = col0: col0 = col0 + 1 'Currency Type - Change to

colFilePack = col0: col0 = col0 + 1 'Packaging Type - First
colFilePacX = col0: col0 = col0 + 1 'Packaging Type - Change to

colFileCapa = col0: col0 = col0 + 1 'Starting Price
colFileCap2 = col0: col0 = col0 + 1 'Ending Price (part contracts)
colFileHPDa = col0: col0 = col0 + 1 'Hours Per Day - First
colFileHPDX = col0: col0 = col0 + 1 'Hours Per Day - Change to

colFileDunM = col0: col0 = col0 + 1 'Contract Mfg Dunns Code
colFileDDun = col0: col0 = col0 + 1 'Delivery To Duns - First
colFileDDuX = col0: col0 = col0 + 1 'Delivery To Duns - Change to
colFileSDun = col0: col0 = col0 + 1 'Ship From Duns - First
colFileSDuX = col0: col0 = col0 + 1 'Ship From Duns - Change to

colFileTFre = col0: col0 = col0 + 1 'Terms Start - Freight
colFileTFrX = col0: col0 = col0 + 1 'Terms Ending - Freight
colFileTPay = col0: col0 = col0 + 1 'Terms Start - Payment
colFileTPaX = col0: col0 = col0 + 1 'Terms Ending - Payment
colFileTDel = col0: col0 = col0 + 1 'Terms Start - Delivery
colFileTDeX = col0: col0 = col0 + 1 'Terms Ending - Delivery

colFileBuyr = col0: col0 = col0 + 1 'Contract Buyer Name
colFileHead = col0: col0 = col0 + 1 'Contract Header
colFileDown = col0: col0 = col0 + 1 'Contract Download Date

colFileSNam = col0: col0 = col0 + 1 'Contract Sheet Name
colFileRows = col0: col0 = col0 + 1 'Contract Effective End Date
colFileNam2 = col0: col0 = col0 + 1 'New Smart File Name
colFileKeys = col0: col0 = col0 + 1 'Contract Key - Part English, Part
Mexico, Tooling English etc..

colFileMaxx = col0 + 5

col0 = 0
colKeyDeffName = col0 + 1 'Key Definition: Name
colKeyDeffCust = colKeyDeffName + 1 'Key Definition: OEM
colKeyDeffType = colKeyDeffCust + 1 'Key Definition: Contract Type
colKeyDeffLang = colKeyDeffType + 1 'Key Definition: Language
colKeyDeffIden = colKeyDeffLang + 1 'Key Definition: Unique Identifier
String used to deterime which key to use
colKeyDeffSNum = colKeyDeffIden + 1 'Key Definition: Key Sheet Number
colKeyDeffIMxR = colKeyDeffSNum + 1 'Key Definition:
colKeyDeffIMxC = colKeyDeffIMxR + 1 'Key Definition:
colKeyDeffITyp = colKeyDeffIMxC + 1 'Key Definition:
colKeyDeffSNam = colKeyDeffITyp + 1 'Key Definition: Key Sheet Name
colKeyDeffMaxx = colKeyDeffSNam + 5