View Single Post
  #4   Report Post  
Posted to microsoft.public.excel.misc
Jerry W. Lewis Jerry W. Lewis is offline
external usenet poster
 
Posts: 837
Default How to interpret summary output from multiple regression analy

While your criteria has a nominal 5% error rate for any single pre-specified
test, the overall error rate is much higher. If the 10 potential regression
variables were statistically independent random variables that had no
predictive value, then the probability of declaring at least one of the
variables to be significant anyway would be =1-(1-0.5)^10 or about 0.40.

When you do this kind of data dredging, what you get is a working
hypothesis, not a proven model. If you have enough data, you might consider
randomly dividing your data into two groups. Then you could estimate a model
with the first group and test that hypothesized model with the second group.

As for methods, two commonly used methods of "stepwise regression" are
forward selection and backward elimination.
- Forward selection starts with no variables and one-by-one adds the
variable whose inclusion causes the largest decrease in residual sum of
squares.
- Backward elimination starts with all of the variables and one-by-one
eliminates the variable whose removal causes the smallest increase in
residual sum of squares.

Jerry

"B52bomber" wrote:

Jerry,

When conducting a multiple regression, sometimes there are "p" values less
than my threshold (p<0.05). I presume this means that these variables could
be eliminated from the regression equation?

Are there any methods to pick which variables to eliminate, so that I can
determine those variables which should be kept in the regression equation?

I am working a problem with 10 potential variables.

Thanks.

"Jerry W. Lewis" wrote:

They are the test statistic and p-value for the test that the corresponding
coefficient is zero. Note that there can be multiple testing issues when you
are "data dredging" instead of prespecifying the coefficient to be tested.

Jerry

"B52bomber" wrote:

When conductin multiple regression analysis y = f(x1, x2, x3, x4); what does
the t-stat and the p-value mean? I'm trying to determine which variables are
the "real" predictors of the Y value.