View Single Post
  #1   Report Post  
Posted to microsoft.public.excel.misc
Robbro Robbro is offline
external usenet poster
 
Posts: 22
Default Lessons from optimizing a slow spreadsheet

I've been working on a report that contains YTD sales and cost info. Its
about 4 megs, took a little over 3 minutes to fully recalculate (I know, many
around here have larger/slower ones, but thats not the point).
The primary data tab has about 4k rows and 50 or so columns, there are 2
reports (top 10 customers and "others) which are about 400 rows and 3000 rows
respectively and 37 columns, which to begin with were had about 8 sum
products and 7 or so sumifs, one vlookup and the rest were sum or
multiplication. There are 2 date fields up top to enter beg and ending to
pull specific month's or quarters out of the data.
At first each sumifs or sumproduct had 4 conditions, it matched customer,
lot #, date after beg date and date before end date. My first step was to
add 2 calculation columns called include to the data tab (1 for top 10 report
1 for other report) that was as follows if(and(transaction date=beg
date,transaction date<=end date),1,0) and reduced the sumifs and sumproducts
to 3 conditions, customer, lot #, include=1. That alone reduced my calc time
by about 50% to just over 1.5 min.
Next I added calculation columns to the data tab to eliminate the need for
sumproducts and replaced all sumproducts with sumifs, this reduced my calc
time to under 1 min.
Finally I had about 15 dynamic named ranges, I calculated how many rows down
my data went just once in a cell using counta(a:a) then referred to that cell
in all my ranges. I had used this to limit the # of rows sumproduct had to
deal with, but since I had replaced all sumproducts with sumifs, and I had
read sumifs can intelligently handle whole column references and not waste
time on empty cells, I changed my sumifs to use entire columns and deleted
all named ranges, this saved me another 30 seconds, and got me to 28 sec for
a full recalc, which seems much more reasonable to me.

It seems with a lot of sumifs or sumproducts each extra condition increases
recalc time nearly exponentially.

My final idea which I have not tried yet and dont really know if it will
help would be to sort my data by date, calculate the rows included in each
month just once and have my sumifs use that range only, eliminating the need
the third condition, so then they would only have to match customer and lot
#. Again this seems the most complicated and I dont know if I would gain or
lose from it.