ExcelBanter - View Single Post - LINEST maximum number of predictor variables

Peter N wrote:

My reply is at the bottom.

"Jerry W. Lewis" wrote:

LINEST also returns #REF! when you exceed its hardcoded limit on
predictors. Empirically, that limit appears to be 17 predictors if no
constant is fitted, or 16 predictors and a fitted constant.

As a practical matter I would be very suspicious of the numerical
properties of LINEST solutions long before hitting that hard coded
limit. Perhaps that is why it is there, even though MINVERSE in theory
can invert a 52x52 X'X matrix.

Jerry

Harlan Grove wrote:

Peter N wrote...

I am using Excel 2002. It seems to be unable to handle any more than
16
predictor variables. All it returns is a #REF error. The help does

It is probably falling over and giving wrong answers at around 6 or 7
variables unless your predictors are nearly orthogonal to begin with. On
some tricky cases the old Excel LINEST could fail when fitting anything
beyond a cubic polynomial.

not
document any limit. Is this an undocumented limitation? Is Excel
2003
LINEST capable of handling more that 16 variables? How many?

What's your *EXACT* formula? LINEST should return #VALUE! and #NUM!
errors when it can't invert the bilinear form of the independent
variables, but it only returns #REF! when there's a true range
reference error.

My *EXACT* formula is =LINEST(Y14:Y50,A14:Q50,1,1)

Is this hardcoded limit documented anywhere? Does Excel 2003 handle a
larger number of channels?

I have a limited understanding of matrix math, but I understand that the
"Normal Equations" involves inverting the [X'X] matrix. To minimize roundoff
errors in the computation, each vector in the X matrix should first be
shifted (centered about its mean). Hopefully LINEST performs this operation
automatically and transparently.

It would be safer (and more nearly true) to assume that it doesn't work
unless you are very careful to precondition the problem so that naive
code can solve it.

I read the article http://support.microsoft.com/kb/828533#kb3 and Microsoft
acknowledges weakness in numerical methods with LINEST in Excel 2002 and
earlier. This article claims that LINEST has been improved in Excel 2003,
using QR decomposition (I'll have to read up to understand that) resulting in
more robust performance. They also advertise
€¢ Better numeric stability (generally smaller round off errors)
€¢ Analysis of collinearity issues

It is pretty scary that earlier versions did not use QR decomposition!

In what form is the result of this "Analysis of collinearity" presented to
the Excel user? What statistic in the matrix returned by LINEST should I look
at to know that the X matrix is "nearing" collinearity.

If the predictors exhibit near collinearity, the solution is to remove
predictors from the group of predictors that are nearly collinear. Is there
an easy way to identify which predictors form a group that is nearly
collinear?

Compute their dot product divided by their magnitude.
Eg vectors A1..An, B1..Bn

=SUMPRODUCT(A1..An,B1..Bn)/SQRT(SUMPRODUCT(A1..An,A1..An)*SUMPRODUCT(B1..Bn,B 1..Bn))

This is zero if they are exactly orthogonal (good thing) or +/- 1 if
they are exactly collinear (very very bad). The closer the predictors
are to being collinear the more unstable the matrix problem becomes.

There are devious ways to fit orthogonal polynomials reliably in Excel
if you really need to do it. Allowing 16 general predictors to be used
without any defensive coding is very optimistic of them!

Fitting general vectors to data is frought with difficulties because of
the intrinsic numerical instability in the Excel algorithms. You must
always check that the fitted model reproduces your data to within the
claimed level of residual error. Trust nothing where LINEST is
concerned. The 2003 version is better but still not right.

Regards,
Martin Brown