Home |
Search |
Today's Posts |
#1
|
|||
|
|||
Regression with more than 16 variables
I noticed linest and the regrssion tool doesn't allow more than 16 variables.
Does anyone know the most convenient way to do this? I currently don't have SAS. Is it possible in Access or anything else? |
#2
|
|||
|
|||
Without being pessimistic I think you'll struggle to do this mutlivariate
regression in excel. There may be a way but it will be long-winded I am sure. As a suggestion, stats websites often allow you to input data and see the output running off Javascript as an example. Try and find a website that you can cut and paste your data to and crank the analysis... SAS could do it easily. The R project (google it to find it) offers a free stats package. It is the update of S and is pretty much as good as it gets. Upside is that it is free, downside is that it is not user friendly in that you would have to put some hours in before you got around to doing your multivariate regression. Sorry to be a naysayer.... Alex "showcase48" wrote: I noticed linest and the regrssion tool doesn't allow more than 16 variables. Does anyone know the most convenient way to do this? I currently don't have SAS. Is it possible in Access or anything else? |
#3
|
|||
|
|||
You could directly implement the equations that were used in Excel prior
to 2003 http://groups-beta.google.com/group/...595e596d?hl=en Also, the address of the R project is www.r-project.org, and there is an supplemental package for an R-Excel interface Jerry showcase48 wrote: I noticed linest and the regrssion tool doesn't allow more than 16 variables. Does anyone know the most convenient way to do this? I currently don't have SAS. Is it possible in Access or anything else? |
#4
|
|||
|
|||
How many rows/columns will the built in matrix functions (mmult, minverse, etc.) handle? Least squares regression "reduces" to some fairly simple matrix algebra which can be performed using the built in matrix functions. Most of the LSQ regression I do, I prefer to use the matrix notation rather than LINEST or LOGEST, but I've never had to regress on more than 5 or 6 parameters. I haven't heard of any limit to the size of matrix these equations can handle, so, in theory anyway, you should be able to perform the regression this way. -- MrShorty ------------------------------------------------------------------------ MrShorty's Profile: http://www.excelforum.com/member.php...o&userid=22181 View this thread: http://www.excelforum.com/showthread...hreadid=381662 |
#5
|
|||
|
|||
MrShorty -
How many rows/columns will the built in matrix functions (mmult, minverse, etc.) handle? Least squares regression "reduces" to some fairly simple matrix algebra which can be performed using the built in matrix functions. Most of the LSQ regression I do, I prefer to use the matrix notation rather than LINEST or LOGEST, but I've never had to regress on more than 5 or 6 parameters. I haven't heard of any limit to the size of matrix these equations can handle, so, in theory anyway, you should be able to perform the regression this way. < Some of the limitations, e.g., maximum 52 columns by 52 rows for MINVERSE, are described at the end of the following Knowledge Base article: http://support.microsoft.com/default...b;en-us;166342 - Mike www.mikemiddleton.com |
#6
|
|||
|
|||
Based on the information in that article, once could (assuming enough RAM, processor time, and such) theoretically do a least squares regression for up 52 parameters, then. -- MrShorty ------------------------------------------------------------------------ MrShorty's Profile: http://www.excelforum.com/member.php...o&userid=22181 View this thread: http://www.excelforum.com/showthread...hreadid=381662 |
#7
|
|||
|
|||
MrShorty wrote:
Based on the information in that article, once could (assuming enough RAM, processor time, and such) theoretically do a least squares regression for up 52 parameters, then. Theoretically yes, but forming the normal equations is a numerically poor way to do least squares, since numerical singularity is then governed by the ratio of largest to smallest eigenvalues in X'X instead of that ratio in X. MINVERSE will often give a numerically inaccurate solution instead of warning of approximate numerical singularity. Consequently I would tend distrust the output. This issue in 1 dimension is the difference between calculating =SUMSQ(data)-SUM(data)^2/COUNT(data) (as was done by VAR, STDEV, etc prior to Excel 2003) and calculating =DEVSQ(data) The first form becomes more and more inaccurate as the CV of the data decreases. For example, STDEV(x,x+0.1) should be 0.0707106781186548 =SQRT(0.005) regardless of x. But for x=100000 (CV=0.00007%) the first form (and STDEV in Excel 2002 and earlier) is inaccurate in the 4th significant figure of the standard deviation calculation. Between 1000000 and 10000000, all significant figures are lost in the first calculation (summing squared x's has pushed the interesting part of the calculation beyond the accuracy if IEEE double precision). While the second form continues to work well. Personally, I can't remember ever being asked to analyze data with a CV that small, (and if I did, I would have used coding (linear transformation to a more practical range) to avoid numerical problems, but things are not so clear in the multiple dimensions of the general linear model. For instance, if you fit a 6th degree polynomial where the x data is 20.0, 20.5, 21.0, 21.5, ..., 27.0, the condition number of X'X is around 10^30, suggesting that direct solution of the normal equations would require quadruple precision to calculate it http://groups-beta.google.com/group/...9a2bb33e6cdbb8 Pre-2003 LINEST (or the direct MINVERSE() formula linked from my previous post) will try, but will get no figures correct without warning. SAS PROC GLM and PROC REG will complain of numerical singularity (SAS's use of the sweep operator necessarily requires it to form the normal equations). That is why books on statistical computing recommend other approaches, such as Givens rotations, as implemented in http://lib.stat.cmu.edu/apstat/274 http://lib.stat.cmu.edu/apstat/75 the lm() function in S-PLUS and R will get 8 figures correct for each coefficient in the preceding example despite the ill-conditioning. Excel's trendline will get 9 figures correct, but will not go beyond a 6th degree polynomial. There is a contributed package for R that would allow you to pass data from Excel to R, analyze it in R, and pass the result back to Excel. To learn more about R, go to http://www.r-project.org Jerry |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
How do you populate the X variables in Regression analysis for mo. | Excel Worksheet Functions | |||
Plot 2 variables on Y axis | Charts and Charting in Excel | |||
LINEST maximum number of predictor variables | Excel Worksheet Functions | |||
Problem seting-up Regression Macro | Excel Discussion (Misc queries) | |||
Index Match With 3 Variables | Excel Worksheet Functions |