How to interpret summary output from multiple regression analysis
When conductin multiple regression analysis y = f(x1, x2, x3, x4); what does
the tstat and the pvalue mean? I'm trying to determine which variables are the "real" predictors of the Y value. 
They are the test statistic and pvalue for the test that the corresponding
coefficient is zero. Note that there can be multiple testing issues when you are "data dredging" instead of prespecifying the coefficient to be tested. Jerry "B52bomber" wrote: > When conductin multiple regression analysis y = f(x1, x2, x3, x4); what does > the tstat and the pvalue mean? I'm trying to determine which variables are > the "real" predictors of the Y value. 
Jerry,
When conducting a multiple regression, sometimes there are "p" values less than my threshold (p<0.05). I presume this means that these variables could be eliminated from the regression equation? Are there any methods to pick which variables to eliminate, so that I can determine those variables which should be kept in the regression equation? I am working a problem with 10 potential variables. Thanks. "Jerry W. Lewis" wrote: > They are the test statistic and pvalue for the test that the corresponding > coefficient is zero. Note that there can be multiple testing issues when you > are "data dredging" instead of prespecifying the coefficient to be tested. > > Jerry > > "B52bomber" wrote: > > > When conductin multiple regression analysis y = f(x1, x2, x3, x4); what does > > the tstat and the pvalue mean? I'm trying to determine which variables are > > the "real" predictors of the Y value. 
While your criteria has a nominal 5% error rate for any single prespecified
test, the overall error rate is much higher. If the 10 potential regression variables were statistically independent random variables that had no predictive value, then the probability of declaring at least one of the variables to be significant anyway would be =1(10.5)^10 or about 0.40. When you do this kind of data dredging, what you get is a working hypothesis, not a proven model. If you have enough data, you might consider randomly dividing your data into two groups. Then you could estimate a model with the first group and test that hypothesized model with the second group. As for methods, two commonly used methods of "stepwise regression" are forward selection and backward elimination.  Forward selection starts with no variables and onebyone adds the variable whose inclusion causes the largest decrease in residual sum of squares.  Backward elimination starts with all of the variables and onebyone eliminates the variable whose removal causes the smallest increase in residual sum of squares. Jerry "B52bomber" wrote: > Jerry, > > When conducting a multiple regression, sometimes there are "p" values less > than my threshold (p<0.05). I presume this means that these variables could > be eliminated from the regression equation? > > Are there any methods to pick which variables to eliminate, so that I can > determine those variables which should be kept in the regression equation? > > I am working a problem with 10 potential variables. > > Thanks. > > "Jerry W. Lewis" wrote: > > > They are the test statistic and pvalue for the test that the corresponding > > coefficient is zero. Note that there can be multiple testing issues when you > > are "data dredging" instead of prespecifying the coefficient to be tested. > > > > Jerry > > > > "B52bomber" wrote: > > > > > When conductin multiple regression analysis y = f(x1, x2, x3, x4); what does > > > the tstat and the pvalue mean? I'm trying to determine which variables are > > > the "real" predictors of the Y value. 
