Home |
Search |
Today's Posts |
#1
|
|||
|
|||
mutiple regression help
When doing a multiple regression in excel, what are the meaning of these out puts (that's are all in the same table) the fist column has my dependent variable which is labeled here "Intercept" and the independent variables, X2, X3, X4, X5 the second columns labeled "Coefficients" i think this column has the slope values of each independent variable; X2, X3, X4, and X5. These slope variables are in relation to all the other variables, so the slope of X2 is effected by by X3, X4 and X5. These slope values are the measure of how each independent variable effects the dependent variable. I'm guessing this allows me to predict where added data will go in the correlation. So for example if I want to predict how a film will do (my regression has to do with film gross) according to this data, i would multiple my variables from that movie (budget (X2), first weekend gross (X3), users ratings(X4) and MPPA rating (X5)) to the corresponding Coefficients values in this column. If what i am saying is right (or at least partially right), i don't know what the value of Intercept is for, since is from the dependent variable, it shouldn't have a slope value. The second column which is labeled "Standard Error" I'm guessing (if what I am saying above the coefficients values are right) is the accuracy of the predications that can be made. I'm guessing the larger the number the bigger the error. The fourth column which is labeled "t Stat" i have no clue what it means and how it contributes to my regression. I'm thinking it's some type of testing but i don't understand what it's testing and why. The fifth column which is labeled "P-value" is again something I don't understand. I think it has to do something with "t-Stat". My other theory is that it has to do something with probability. I really don't know though. The next two columns labeled "Lower 95%" and "Upper 95%", i believe this is the limits of my correlations. I think that this allows one to say that "i am 95% sure that the predicted data that lies between these lowers and uppers can be predicted by the accuracy of my the values in my "coefficients" column. I also am wondering about the graph outputs, the first graph "Line Fit Plot" outputs 4 scatter diagrams for each of my 4 independent variables. the graphs looks like their comparing my dependent variable (on the y axis) to a independent variable on the x axis. Is this just showing the correlation and relationship of each independent variable to the dependent variable. For each individual diagram, Is the comparison being made and liner relationship (the direction the lines seem to be going; positive, negative or none) based on just the independents variable and the dependent variable, or is the independent variable's slope taking into account the other 3 independent variables? The second graph; the "Residual Plot Graph", does this show the measure of stand error for each point? and the closer to 0 a point gets the lesser the error? -- happycow ------------------------------------------------------------------------ happycow's Profile: http://www.excelforum.com/member.php...o&userid=25701 View this thread: http://www.excelforum.com/showthread...hreadid=391430 |
#2
|
|||
|
|||
happycow wrote: When doing a multiple regression in excel, what are the meaning of these out puts (that's are all in the same table) the fist column has my dependent variable which is labeled here "Intercept" and the independent variables, X2, X3, X4, X5 the second columns labeled "Coefficients" i think this column has the slope values of each independent variable; X2, X3, X4, and X5. These slope variables are in relation to all the other variables, so the slope of X2 is effected by by X3, X4 and X5. These slope values are the measure of how each independent variable effects the dependent variable. I'm guessing this allows me to predict where added data will go in the correlation. So for example if I want to predict how a film will do (my regression has to do with film gross) according to this data, i would multiple my variables from that movie (budget (X2), first weekend gross (X3), users ratings(X4) and MPPA rating (X5)) to the corresponding Coefficients values in this column. If what i am saying is right (or at least partially right), i don't know what the value of Intercept is for, since is from the dependent variable, it shouldn't have a slope value. The predicted value at a given point (x1,x2,...x5) is c0 + x1*c1 + x2*c2 + ... + x5*c5 where c0 is the intercept and c1,...c5 are the slope coefficients. The second column which is labeled "Standard Error" I'm guessing (if what I am saying above the coefficients values are right) is the accuracy of the predications that can be made. I'm guessing the larger the number the bigger the error. Yes. The fourth column which is labeled "t Stat" i have no clue what it means and how it contributes to my regression. I'm thinking it's some type of testing but i don't understand what it's testing and why. The t statistic is computed as the coefficient divided by its standard error. Small values indicate that the particular coefficient may not be needed in the model. "Small" is generally defined in terms of p-values. The fifth column which is labeled "P-value" is again something I don't understand. I think it has to do something with "t-Stat". My other theory is that it has to do something with probability. I really don't know though. Both guesses are hitting around the issue. If a particular coefficient does not belong in the model (the true value is zero, so the observed value is due to random variation), then the p-value is the probability of observing by chance a coefficient as large as occurred with this data set. Thus the smaller the p-value, the greater the likelihood that a coefficient is really needed. A commonly used criteria is to assume that if p<0.05, then there is strong evidence that the coefficient is needed. The next two columns labeled "Lower 95%" and "Upper 95%", i believe this is the limits of my correlations. I think that this allows one to say that "i am 95% sure that the predicted data that lies between these lowers and uppers can be predicted by the accuracy of my the values in my "coefficients" column. The correct interpretation is that you are 95% confident that the interval (Lower to Upper) contains the true value for the coefficient. Note that the interval is random, while the coefficient is not (it is merely unknown). In particular, for a given data set, the interval either does or does not contain the true value (although you don't know which is true). Thus your confidence is in the procedure that generated the interval, not in the specific interval generated from the specific data set. It is a subtle concept that is often misunderstood. I also am wondering about the graph outputs, the first graph "Line Fit Plot" outputs 4 scatter diagrams for each of my 4 independent variables. the graphs looks like their comparing my dependent variable (on the y axis) to a independent variable on the x axis. Is this just showing the correlation and relationship of each independent variable to the dependent variable. For each individual diagram, Is the comparison being made and liner relationship (the direction the lines seem to be going; positive, negative or none) based on just the independents variable and the dependent variable, or is the independent variable's slope taking into account the other 3 independent variables? More or less. The second graph; the "Residual Plot Graph", does this show the measure of stand error for each point? and the closer to 0 a point gets the lesser the error? Residuals are observed values minus predicted values. If the model is correct, each residual plot should appear to be uniformly distributed. If there is a systematic pattern in one or more residual plots, then there the model is probably inadequate. All of these questions deal with standard concepts from any introductory statistics course. I highly recommend that you take such a course or at least read an introductory statistics text, since there is more to understand than is likely to be imparted in a few newsgroup replies. Jerry |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Forum | |||
Mutiple Regression output | Excel Discussion (Misc queries) | |||
Erroneous Regression on Residuals | Excel Discussion (Misc queries) | |||
Does Excel use least squares regression to calculate trendlines? | Charts and Charting in Excel | |||
how do I do statistic (regression) in excel? what's an array? | Excel Discussion (Misc queries) | |||
Problem seting-up Regression Macro | Excel Discussion (Misc queries) |