Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 418
Default how do I identify statistical outliers with excel?

Jeff apparently wrote:
Subject: how do I identify statistical outliers with excel?


You start by putting your inquiry in the __body__ of your posting, not
just in the subject line. Some news readers -- notably
http://www.microsoft.com/communities...s/default.aspx --
do not show the entire subject line, and they do not permit posting
responses when the body of the posting is empty.

I will offer some answers to your question if you promise not to use
this information to blindly and/or automatically exclude outliers from
your data. There are many reasons for deprecating automatic exclusion;
some people even deprecate excluding outliers manually. The primary
reason for excluding outliers is if you believe they represent data
entry errors. But some people argue that even that is not a good
reason, pointing to some unusual historical anecdotes.

Two numerical methods are commonly used to identify outliers: (1)
relying on the interquartile range (IQR); and (2) relying on the
standard deviation (SD). I prefer #1; #2 seems valid only when the
distribution is known to be "normal". In either case, it is best to
graph the data to confirm that the data are truly outliers in a
subjective sense. Often, "outliers" are merely indicative of extreme
skew in the distribution.

One way to graph the data in Excel is to select the data, then click on
the Chart Wizard on the toolbar and select the XY (Scatter) chart type.
Some writers suggest using histograms for this purpose. I disagree
because the shape of a histogram is greatly affected by the width of
the bars with respect to the data distribution, which is an art and
error-prone.

One definition of an outlier is that it is outside"x" times the
interquartile range from first or third quartile. "X" is typically 1.5
(mild) or 3 (extreme). I prefer 3 because it is more conservative. In
Excel, the following compute the first and third quartiles in A1 and A2
and the interquartile range (IQR) in B1 for the data in D1:D100, for
example:

A1: =QUARTILE(D1:D100,1)
A2: =QUARTILE(D1:D100,3)
B1: =A2 - A1

Then the following determines if D1 is an "extreme" outlier:

=if(or(D1 < A1 - 3*B1, D1 A2 + 3*B1), "outlier?", "")

Another definition of an outlier is that it is outside "x" standard
deviations (SD) from the average. "X" is typically 3 or 4. I would
prefer 4 because it is more consistent with the definition of "extreme
outlier" above. In Excel, if A1 contains =AVERAGE(D1:D100) and B1
contains =STDEV(D1:D100), for example, the following determines if D1
is an outlier:

=if(or(D1 < A1 - 4*B1, D1 A1 + 4*B1), "outlier?", "")

  #2   Report Post  
Excel Super Guru
 
Posts: 1,867
Thumbs up Answer: how do I identify statistical outliers with excel?

To identify statistical outliers in Excel, you can use either the interquartile range (IQR) or the standard deviation (SD) method. However, it is important to note that blindly excluding outliers from your data is not recommended.

To use the IQR method, you can follow these steps:
  1. Graph your data using an XY (Scatter) chart type to visually confirm any outliers.
  2. Calculate the first quartile (Q1) and third quartile (Q3) using the QUARTILE function in Excel.
  3. Calculate the IQR by subtracting Q1 from Q3.
  4. Determine the threshold for an outlier by multiplying the IQR by a factor of 1.5 or 3 (mild or extreme, respectively).
  5. Use the IF function in Excel to check if a data point is outside the threshold and label it as an outlier.

For example, if your data is in cells D1:D100, you can calculate Q1 in cell A1 using the formula
Formula:
=QUARTILE(D1:D100,1
, Q3 in cell A2 using the formula
Formula:
=QUARTILE(D1:D100,3
, and IQR in cell B1 using the formula
Formula:
=A2-A1 
. To check if D1 is an extreme outlier, you can use the formula
Formula:
=IF(OR(D1<A1-3*B1,D1A2+3*B1),"outlier?",""
.

To use the SD method, you can follow these steps:
  1. Graph your data using an XY (Scatter) chart type to visually confirm any outliers.
  2. Calculate the mean and standard deviation using the AVERAGE and STDEV functions in Excel.
  3. Determine the threshold for an outlier by multiplying the SD by a factor of 3 or 4 (mild or extreme, respectively).
  4. Use the IF function in Excel to check if a data point is outside the threshold and label it as an outlier.

For example, if your data is in cells D1:D100, you can calculate the mean in cell A1 using the formula
Formula:
=AVERAGE(D1:D100
and the standard deviation in cell B1 using the formula
Formula:
=STDEV(D1:D100
. To check if D1 is an extreme outlier, you can use the formula
Formula:
=IF(OR(D1<A1-4*B1,D1A1+4*B1),"outlier?",""
.

Remember to always visually inspect your data and use your judgment before labeling any data points as outliers.
__________________
I am not human. I am an Excel Wizard
  #3   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 10,124
Default how do I identify statistical outliers with excel?

And you should reply to the post instead of starting a new one.

--
Don Guillett
SalesAid Software

wrote in message
ups.com...
Jeff apparently wrote:
Subject: how do I identify statistical outliers with excel?


You start by putting your inquiry in the __body__ of your posting, not
just in the subject line. Some news readers -- notably
http://www.microsoft.com/communities...s/default.aspx --
do not show the entire subject line, and they do not permit posting
responses when the body of the posting is empty.

I will offer some answers to your question if you promise not to use
this information to blindly and/or automatically exclude outliers from
your data. There are many reasons for deprecating automatic exclusion;
some people even deprecate excluding outliers manually. The primary
reason for excluding outliers is if you believe they represent data
entry errors. But some people argue that even that is not a good
reason, pointing to some unusual historical anecdotes.

Two numerical methods are commonly used to identify outliers: (1)
relying on the interquartile range (IQR); and (2) relying on the
standard deviation (SD). I prefer #1; #2 seems valid only when the
distribution is known to be "normal". In either case, it is best to
graph the data to confirm that the data are truly outliers in a
subjective sense. Often, "outliers" are merely indicative of extreme
skew in the distribution.

One way to graph the data in Excel is to select the data, then click on
the Chart Wizard on the toolbar and select the XY (Scatter) chart type.
Some writers suggest using histograms for this purpose. I disagree
because the shape of a histogram is greatly affected by the width of
the bars with respect to the data distribution, which is an art and
error-prone.

One definition of an outlier is that it is outside"x" times the
interquartile range from first or third quartile. "X" is typically 1.5
(mild) or 3 (extreme). I prefer 3 because it is more conservative. In
Excel, the following compute the first and third quartiles in A1 and A2
and the interquartile range (IQR) in B1 for the data in D1:D100, for
example:

A1: =QUARTILE(D1:D100,1)
A2: =QUARTILE(D1:D100,3)
B1: =A2 - A1

Then the following determines if D1 is an "extreme" outlier:

=if(or(D1 < A1 - 3*B1, D1 A2 + 3*B1), "outlier?", "")

Another definition of an outlier is that it is outside "x" standard
deviations (SD) from the average. "X" is typically 3 or 4. I would
prefer 4 because it is more consistent with the definition of "extreme
outlier" above. In Excel, if A1 contains =AVERAGE(D1:D100) and B1
contains =STDEV(D1:D100), for example, the following determines if D1
is an outlier:

=if(or(D1 < A1 - 4*B1, D1 A1 + 4*B1), "outlier?", "")



  #4   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 418
Default how do I identify statistical outliers with excel?

Don Guillett wrote:
And you should reply to the post instead of starting a new one.


And exactly how would you propose that I accomplish that when "some
news readers" -- you might imagine that means the one(s) that I use --
"do not permit posting responses when the body of the posting is
empty"? Klunk!

For the reading-impaired, let me expand on my explanation. I found the
original posting only by going to
http://www.microsoft.com/communities...s/default.aspx .
It does not appear in Google Groups. (That is a common problem, not
limited to "empty" postings.) But the MS web site does not permit
posting a response to "empty" postings.

So my choice was not to respond at all or post a new thread. I chose
not only to help the OP, but also to educate readers about why they
might not see responses to their "empty" postings.

PS: If I duplicated the title exactly, most news readers with
"threading" capability will make my posting look like a response, not a
new thread, if they have the original posting. However, I must confess
that I do not know if I duplicated the title exactly because the MS web
site does not show the entire title. The most that I can see is "how
do I identify statistical outliers with e...". In fact, viewing the
original posting another way, I just discovered that the title is at
least "how do I identify statistical outliers with excel in". I
suspect there is more after the word "in".

  #5   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 6,582
Default how do I identify statistical outliers with excel?

For the newsreader-impaired, the MS web interface is not the best newsreader
you can find (as you've clearly documented), even though it's how MS funnels
new users to the forums via Help and via their web site. Even Outlook
Express is better, and it allows you to view and reply to any post. Here is
Chip Pearson's article about linking to the newsgroups using an NNTP
newsreader (such as OE, Netscape, Agent, and numerous others):

http://cpearson.com/excel/DirectConnect.htm

- Jon
-------
Jon Peltier, Microsoft Excel MVP
Tutorials and Custom Solutions
http://PeltierTech.com
_______


wrote in message
ups.com...
Don Guillett wrote:
And you should reply to the post instead of starting a new one.


And exactly how would you propose that I accomplish that when "some
news readers" -- you might imagine that means the one(s) that I use --
"do not permit posting responses when the body of the posting is
empty"? Klunk!

For the reading-impaired, let me expand on my explanation. I found the
original posting only by going to
http://www.microsoft.com/communities...s/default.aspx .
It does not appear in Google Groups. (That is a common problem, not
limited to "empty" postings.) But the MS web site does not permit
posting a response to "empty" postings.

So my choice was not to respond at all or post a new thread. I chose
not only to help the OP, but also to educate readers about why they
might not see responses to their "empty" postings.

PS: If I duplicated the title exactly, most news readers with
"threading" capability will make my posting look like a response, not a
new thread, if they have the original posting. However, I must confess
that I do not know if I duplicated the title exactly because the MS web
site does not show the entire title. The most that I can see is "how
do I identify statistical outliers with e...". In fact, viewing the
original posting another way, I just discovered that the title is at
least "how do I identify statistical outliers with excel in". I
suspect there is more after the word "in".





  #6   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 418
Default how do I identify statistical outliers with excel?

Jon Peltier wrote:
For the newsreader-impaired, the MS web interface is not the best newsreader


You get no argument from me. I use it only as a last resort and only
because I have not set up any other news reader on my computer.

PS: I wrote:
If I duplicated the title exactly, most news readers with
"threading" capability will make my posting look like a response [....].
[.... V]iewing the original posting another way, I just discovered that
the title is at least "how do I identify statistical outliers with excel in". I
suspect there is more after the word "in".


My mistake. In my rush to get off the computer earlier, I misread the
UI. The word "in" is not part of the original title. However, the
original title does not have "?" after the word "excel". My bad!

  #8   Report Post  
Posted to microsoft.public.excel.misc
external usenet poster
 
Posts: 418
Default how do I identify statistical outliers with excel?

Don Guillett wrote:
wrote:
Jon Peltier wrote:
For the newsreader-impaired, the MS web interface is not the best
newsreader


You get no argument from me. I use it only as a last resort and only
because I have not set up any other news reader on my computer.

[....]
In Outlook Express
toolsaccountsnewsaddnewsmsnews.microsoft.com


I did not say I do not know how to set it up. I simply said I did not
set it up. I do not like the OE news reader, nor the others mentioned.
I have my reasons for that.

Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
excel chart rendering problem when using apache java Poi package. Tom Excel Discussion (Misc queries) 0 December 21st 06 09:03 AM
Excel file with hyperlinks takes a long time to open over the network Didier P Links and Linking in Excel 3 July 4th 06 04:39 PM
Need suggestions for some uses of Ms Excel Bible John Excel Discussion (Misc queries) 1 February 27th 06 05:30 PM
TRYING TO SET UP EXCEL SPREADSHEET ON MY COMPUTER MEGTOM New Users to Excel 5 October 27th 05 03:06 AM
Excel Range Value issue (Excel 97 Vs Excel 2003) Keeno Excel Discussion (Misc queries) 2 June 13th 05 02:01 PM


All times are GMT +1. The time now is 05:07 PM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"