Improving Nested Formula Calculation Efficiency
On 03/08/2011 00:59, kittronald wrote:
I'm having difficulty with trying to improve the efficiency of a
nested formula.
I'm using Excel 2007 and a third party COM add-in that provides
additional functions written in C++ (not a free product).
The .XLSB workbook contains two worksheets.
Sheet1 has about 1,000,000 cells, each containing the nested
formula.
Sheet2 is the source of the names (On_Setting, Primary,
Secondary and Tertiary) using cell references, not formulas.
Cust_Func is a third party function and can return a number or a
text based #N/A value - not an Excel #N/A.
Below is an example of the formula.
=IF(On_Setting=FALSE,"",IF(IFERROR(PRODUCT(Cust_Fu nc($A1,Primary),1),
0)0,Cust_Func($A1,Primary),IF(IFERROR(PRODUCT(Cus t_Func($A1,Secondary),
1),0)0,Cust_Func($A1,Secondary),IF(IFERROR(PRODUC T(Cust_Func($A1,J
$2,Tertiary),1),0)0,Cust_Func($A1,Tertiary),"Erro r"))))
=IF(On_Setting=FALSE is linked to an ON/OFF radio button that is
used to disable calculation in case of cells that need to be edited
beforehand.
IFERROR(PRODUCT(Cust_Func($A1,Primary),1),0) is used to test
whether a zero or an #N/A value is returned by Cust_Func($A1,Primary).
It singularly fails to do that since PRODUCT(0,1)==0 is a valid number.
(unless Cust_Function returns a list of numbers and if it does you
should multiply them together on the other side of the data bridge)
If Cust_Func($A1,Primary) evaluates to zero or #N/A, the IF
statement moves to the next IF statement.
If Cust_Func($A1,Primary) evaluates to 0, then
Cust_Func($A1,Primary) is returned.
The only variables that change with each nested IF statement are
the names Primary, Secondary and Tertiary.
It would be better to offload the hunt through Primary, Secondary and
Tertiary onto the data server - and up to 6x faster.
You can easily double the speed by adding hidden columns B,C,D
containing Cust_Function($A1, Primary), Cust_Function($A1, Sec...)
Then simplifying the conditional to use the preloaded values.
=IF(On_Setting=FALSE,"",IF(ISNUMBER(B1), B1, IF(ISNUMBER(C1),C1,
IF(ISNUMBER(D1),D1,"Error"))))
It is possible you mean IF (AND(ISNUMBER(B1), B10)
Based on the spec. I can't see the point of the call to PRODUCT(x, 1) is
it is the identity and the extra error handling when it fails is an
additional and unnecessary performance hit.
*** ISSUES***
1) It takes almost 40 minutes for these calculations to complete.
The option of building an Intel octo-core, hyper-threaded and
overclocked desktop isn't immediately possible.
Suggested changes above will only fetch data once for each line instead
of twice. It should be at least 2x faster.
2) It has been speculated that all IF statements in a nested
formula are evaluated, regardless if the second IF statement evaluates
TRUE. Is this correct ?
No that is incorrect even for the lamentably slow XL2007 code. The
evaluation of IF statements is lazy only the ones down the true path are
executed. To test define a UDF in a module and set a breakpoint.
eg.
Function toy(a)
toy = a + 1
End Function
With a breakpoint on the addition then use it in the worksheet
=IF(toy(1), toy(2), toy(3))
BTW You might find that XL2003 runs 20-30% faster. XL2007 is slower.
3) Is it possible to write a UDF that calls the Cust_Func function
(supported) and iterates through the names until a value greater than
zero is returned and then stops iterating to prevent unnecessary
calculations ?
Easy, right ?
The main hit at present is from having to move the data from the server
to XL twice for each IF(,,) you execute. You need to put the choice
between 3 options on the other side so that only one item is transferred
back to XL. That will run somewhere between 2x and 6x faster depending
on the proportion of searches that go to full depth.
You could do a local XL UDF that is slightly better along the lines of
Function MyUDF(a)
on error resume next
b = Cust_Function($A1, Primary)
if b0 then
MyUDF = b
else
c = Cust_Function($A1, Secondary)
if c0 then
MyUDF =c
else
d = Cust_Function($A1, Tertiary)
if d0 then
MyUDF = d
else
MyUDF = "Error"
endif
endif
endif
end function
There may be some syntactic sugar needed around your externally supplied
UDF Cust_Function to make it work - see the manual.
This only moves data from your server when it has to.
Is there no way to recast the problem so that a range of queries could
be passed to the server and it passes back the answers in a similar
sized range? This would have considerably better performance.
Regards,
Martin Brown
|