View Single Post
  #12   Report Post  
Posted to comp.os.linux.advocacy,comp.apps.spreadsheets,microsoft.public.excel.misc,microsoft.public.excel
Steve Carroll Steve Carroll is offline
external usenet poster
 
Posts: 8
Default Correct process for creating a linear trend line in Excel

On Jun 17, 1:57*am, Martin Brown
wrote:
On 16/06/2012 00:22, joeu2004 wrote:









"Onion Knight" wrote:
There has been some debate in COLA as to the correct way
to create a linear trendline in Excel. Someone in the forum
kindly made a video to show what he thought was the correct
method
http://tmp.gallopinginsanity.com/Lin...neCreation.mov

[....]
Does anyone see any missing steps for the creation of a
linear trend line?


I cannot find the original debate in comp.os.linux.advocacy (COLA); just
the thread titled "Visualizing where to draw the standard deviation
line", which refers to yet-another "a debate in COLA".


No matter; I'm not interested. I think we can answer your Excel
questions without getting into anything so esoteric.


I see no missing steps in creating the linear trendline per se using Excel.


I would also select the trendline options to display the trendline
equation and the R-squared of the trendline.


And after displaying the trendline equation, I might select the option
to Format Trendline Label in order to alter the display precision of the
coefficients. I would select Scientific with 14 decimal places if I want
to see the trendline coefficients "exactly"; that is, the most precision
that Excel will display.


But none of those "steps" affects how the linear trendline is created.


FYI, we can also use the Excel function LINEST to determine the
regression line coefficients.


If the dates are in A1:A24 and the percentages are in B1:B24, select two
horizontal cells and array-enter the following formula (press
ctrl+shift+Enter instead just Enter): =LINEST(B1:B24,A1:A24).


To compare with the "exact" trendline coefficients, I would also format
the LINEST results as Scientific with 14 decimal places.


You might notice infinitesimal differences between the trendline
coefficients and the LINEST results. But in this case, they are same up
10 or 11 decimal places in this case; "close enough for government
work". As a guess, the difference might be attributed to physically
different internal algorithms and/or to differences in internal
floating-point arithmetic anomalies.


The difference is that (apart from in some very early versions of XL2007
where the graph regression function was wrecked to make it agree with
another well known products answers) the linear regression on XL charts
uses a well designed algorithm that gives a very good approximation to
the true least squares parameter fit to data. LINEST is demonstrably
numerically unstable although it requires at least a 3rd order
polynomial fit and moderately hostile data to break it. Examples have
been posted here or in another m.p.e group previously.
(some years back)

For linear fits the two are more or less interchangable but for higher
orders of polynomial the chart function fit is more accurate.



For some insight into how the Excel LINEST and linear trendline
coefficients might be derived, see
http://en.wikipedia.org/wiki/Simple_linear_regression.


Note that =COVAR(A1:A24,B1:B24)/VARP(A1:A24) in E3 and
=AVERAGE(B1:B24)-E3*AVERAGE(A1:A24) are about the same as the LINEST
results. The difference is in the 3 least-significant binary bits of the
floating-point representations in this case.


You might also take note of the wiki page section titled "Normality
assumption". That may or may not be relevant to the COLA debates.


What is COLA?


The newsgroup where the troll Snit and his sock puppet Onion Knight
are lying and trolling with this topic.