Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.worksheet.functions
external usenet poster
 
Posts: 119
Default Removing statistical outliers

I have a large range of data (thousands of rows) and I was performing the
functions average and max on cells that had time format 37:30:55. I realized
in the data there were a few records that had an abnormally large value in
them and from discussion they were due to unavoidable circumstances. There
are a few records in each data set I am working with which are skewing my
numbers (very large values). Is there a way to tell these functions (or use a
different function that returns the same results) to exclude outliers or any
outlier larger/smaller then what I specify?

I know you can use the average function, select a range, skip over the
outlier and select another range after the value, and repeat, but now that
the data has so many recrds it is getting to be a tiresome task with the new
data (thousands of additional records) every month that I put on a new tab.
It takes a lot of time to go through thousands of rows to find the outlier
and not select them in the average/max function.

  #2   Report Post  
Posted to microsoft.public.excel.worksheet.functions
external usenet poster
 
Posts: 2,722
Default Removing statistical outliers

One way to discard outlier is to use an array* function similar to this:

=AVERAGE(IF(A2:A2000<100,A2:A2000))

This formula is designed to take the average of all number sin the range
A2:A2000 that are less than 100.

*Array functions are confirmed using Ctrl+Shift+Enter, not just Enter.
--
Best Regards,

Luke M
*Remember to click "yes" if this post helped you!*


"William" wrote:

I have a large range of data (thousands of rows) and I was performing the
functions average and max on cells that had time format 37:30:55. I realized
in the data there were a few records that had an abnormally large value in
them and from discussion they were due to unavoidable circumstances. There
are a few records in each data set I am working with which are skewing my
numbers (very large values). Is there a way to tell these functions (or use a
different function that returns the same results) to exclude outliers or any
outlier larger/smaller then what I specify?

I know you can use the average function, select a range, skip over the
outlier and select another range after the value, and repeat, but now that
the data has so many recrds it is getting to be a tiresome task with the new
data (thousands of additional records) every month that I put on a new tab.
It takes a lot of time to go through thousands of rows to find the outlier
and not select them in the average/max function.

  #3   Report Post  
Posted to microsoft.public.excel.worksheet.functions
external usenet poster
 
Posts: 119
Default Removing statistical outliers

Awesome, thanks. Will this work for the MAX function too?

"Luke M" wrote:

One way to discard outlier is to use an array* function similar to this:

=AVERAGE(IF(A2:A2000<100,A2:A2000))

This formula is designed to take the average of all number sin the range
A2:A2000 that are less than 100.

*Array functions are confirmed using Ctrl+Shift+Enter, not just Enter.
--
Best Regards,

Luke M
*Remember to click "yes" if this post helped you!*


"William" wrote:

I have a large range of data (thousands of rows) and I was performing the
functions average and max on cells that had time format 37:30:55. I realized
in the data there were a few records that had an abnormally large value in
them and from discussion they were due to unavoidable circumstances. There
are a few records in each data set I am working with which are skewing my
numbers (very large values). Is there a way to tell these functions (or use a
different function that returns the same results) to exclude outliers or any
outlier larger/smaller then what I specify?

I know you can use the average function, select a range, skip over the
outlier and select another range after the value, and repeat, but now that
the data has so many recrds it is getting to be a tiresome task with the new
data (thousands of additional records) every month that I put on a new tab.
It takes a lot of time to go through thousands of rows to find the outlier
and not select them in the average/max function.

  #4   Report Post  
Posted to microsoft.public.excel.worksheet.functions
external usenet poster
 
Posts: 119
Default Removing statistical outliers

Luke, how do I use time in that array? My time is format hh:mm:ss do I use
quotes, single quotes??

"Luke M" wrote:

One way to discard outlier is to use an array* function similar to this:

=AVERAGE(IF(A2:A2000<100,A2:A2000))

This formula is designed to take the average of all number sin the range
A2:A2000 that are less than 100.

*Array functions are confirmed using Ctrl+Shift+Enter, not just Enter.
--
Best Regards,

Luke M
*Remember to click "yes" if this post helped you!*


"William" wrote:

I have a large range of data (thousands of rows) and I was performing the
functions average and max on cells that had time format 37:30:55. I realized
in the data there were a few records that had an abnormally large value in
them and from discussion they were due to unavoidable circumstances. There
are a few records in each data set I am working with which are skewing my
numbers (very large values). Is there a way to tell these functions (or use a
different function that returns the same results) to exclude outliers or any
outlier larger/smaller then what I specify?

I know you can use the average function, select a range, skip over the
outlier and select another range after the value, and repeat, but now that
the data has so many recrds it is getting to be a tiresome task with the new
data (thousands of additional records) every month that I put on a new tab.
It takes a lot of time to go through thousands of rows to find the outlier
and not select them in the average/max function.

  #5   Report Post  
Posted to microsoft.public.excel.worksheet.functions
external usenet poster
 
Posts: 897
Default Removing statistical outliers

I have an article here that might be useful to you:

http://www.codeforexcelandoutlook.co...lude-extremes/

--JP

On Oct 6, 1:59*pm, William wrote:
I have a large range of data (thousands of rows) and I was performing the
functions average and max on cells that had time format 37:30:55. I realized
in the data there were a few records that had an abnormally large value in
them and from discussion they were due to unavoidable circumstances. There
are a few records in each data set I am working with which are skewing my
numbers (very large values). Is there a way to tell these functions (or use a
different function that returns the same results) to exclude outliers or any
outlier larger/smaller then what I specify?



  #6   Report Post  
Posted to microsoft.public.excel.worksheet.functions
external usenet poster
 
Posts: 8,520
Default Removing statistical outliers

Just to add one; check out the below link by Chip to know how time values are
stored in excel..

http://www.cpearson.com/excel/datetime.htm

If this post helps click Yes
---------------
Jacob Skaria


"William" wrote:

Luke, how do I use time in that array? My time is format hh:mm:ss do I use
quotes, single quotes??

"Luke M" wrote:

One way to discard outlier is to use an array* function similar to this:

=AVERAGE(IF(A2:A2000<100,A2:A2000))

This formula is designed to take the average of all number sin the range
A2:A2000 that are less than 100.

*Array functions are confirmed using Ctrl+Shift+Enter, not just Enter.
--
Best Regards,

Luke M
*Remember to click "yes" if this post helped you!*


"William" wrote:

I have a large range of data (thousands of rows) and I was performing the
functions average and max on cells that had time format 37:30:55. I realized
in the data there were a few records that had an abnormally large value in
them and from discussion they were due to unavoidable circumstances. There
are a few records in each data set I am working with which are skewing my
numbers (very large values). Is there a way to tell these functions (or use a
different function that returns the same results) to exclude outliers or any
outlier larger/smaller then what I specify?

I know you can use the average function, select a range, skip over the
outlier and select another range after the value, and repeat, but now that
the data has so many recrds it is getting to be a tiresome task with the new
data (thousands of additional records) every month that I put on a new tab.
It takes a lot of time to go through thousands of rows to find the outlier
and not select them in the average/max function.

  #7   Report Post  
Posted to microsoft.public.excel.worksheet.functions
external usenet poster
 
Posts: 8,520
Default Removing statistical outliers

24:00 equates to 1. So in the below formula you are filtering entries which
are less than 100 which means you are filtering down to hours which are less
than 2400:00:00 which should an ideal case to go with...

=MAX(IF(A2:A2000<100,A2:A2000))
for finding the max

If this post helps click Yes
---------------
Jacob Skaria


"William" wrote:

Luke, how do I use time in that array? My time is format hh:mm:ss do I use
quotes, single quotes??

"Luke M" wrote:

One way to discard outlier is to use an array* function similar to this:

=AVERAGE(IF(A2:A2000<100,A2:A2000))

This formula is designed to take the average of all number sin the range
A2:A2000 that are less than 100.

*Array functions are confirmed using Ctrl+Shift+Enter, not just Enter.
--
Best Regards,

Luke M
*Remember to click "yes" if this post helped you!*


"William" wrote:

I have a large range of data (thousands of rows) and I was performing the
functions average and max on cells that had time format 37:30:55. I realized
in the data there were a few records that had an abnormally large value in
them and from discussion they were due to unavoidable circumstances. There
are a few records in each data set I am working with which are skewing my
numbers (very large values). Is there a way to tell these functions (or use a
different function that returns the same results) to exclude outliers or any
outlier larger/smaller then what I specify?

I know you can use the average function, select a range, skip over the
outlier and select another range after the value, and repeat, but now that
the data has so many recrds it is getting to be a tiresome task with the new
data (thousands of additional records) every month that I put on a new tab.
It takes a lot of time to go through thousands of rows to find the outlier
and not select them in the average/max function.

  #8   Report Post  
Posted to microsoft.public.excel.worksheet.functions
external usenet poster
 
Posts: 119
Default Removing statistical outliers

That kind of sucks. I have dozens and dozens of data sets, each with a
different outlier some are wierd like < 434 hours or < 820 hours. It will be
a nightmare for me to try and caclulate what 434 equals if 24 is 1. No easier
way, like just using the real time value????

"Jacob Skaria" wrote:

24:00 equates to 1. So in the below formula you are filtering entries which
are less than 100 which means you are filtering down to hours which are less
than 2400:00:00 which should an ideal case to go with...

=MAX(IF(A2:A2000<100,A2:A2000))
for finding the max

If this post helps click Yes
---------------
Jacob Skaria


"William" wrote:

Luke, how do I use time in that array? My time is format hh:mm:ss do I use
quotes, single quotes??

"Luke M" wrote:

One way to discard outlier is to use an array* function similar to this:

=AVERAGE(IF(A2:A2000<100,A2:A2000))

This formula is designed to take the average of all number sin the range
A2:A2000 that are less than 100.

*Array functions are confirmed using Ctrl+Shift+Enter, not just Enter.
--
Best Regards,

Luke M
*Remember to click "yes" if this post helped you!*


"William" wrote:

I have a large range of data (thousands of rows) and I was performing the
functions average and max on cells that had time format 37:30:55. I realized
in the data there were a few records that had an abnormally large value in
them and from discussion they were due to unavoidable circumstances. There
are a few records in each data set I am working with which are skewing my
numbers (very large values). Is there a way to tell these functions (or use a
different function that returns the same results) to exclude outliers or any
outlier larger/smaller then what I specify?

I know you can use the average function, select a range, skip over the
outlier and select another range after the value, and repeat, but now that
the data has so many recrds it is getting to be a tiresome task with the new
data (thousands of additional records) every month that I put on a new tab.
It takes a lot of time to go through thousands of rows to find the outlier
and not select them in the average/max function.

  #9   Report Post  
Posted to microsoft.public.excel.worksheet.functions
external usenet poster
 
Posts: 8,520
Default Removing statistical outliers

I understand you are looking to find Average and max from large data sets...

Have you tried the formula Luke has given on average and the below max
formulas (array formulas)..

=MAX(IF(A2:A2000<100,A2:A2000))

It would be easy for us to suggest a solution if you can give more
information on how your data is arrranged....

If this post helps click Yes
---------------
Jacob Skaria


"William" wrote:

That kind of sucks. I have dozens and dozens of data sets, each with a
different outlier some are wierd like < 434 hours or < 820 hours. It will be
a nightmare for me to try and caclulate what 434 equals if 24 is 1. No easier
way, like just using the real time value????

"Jacob Skaria" wrote:

24:00 equates to 1. So in the below formula you are filtering entries which
are less than 100 which means you are filtering down to hours which are less
than 2400:00:00 which should an ideal case to go with...

=MAX(IF(A2:A2000<100,A2:A2000))
for finding the max

If this post helps click Yes
---------------
Jacob Skaria


"William" wrote:

Luke, how do I use time in that array? My time is format hh:mm:ss do I use
quotes, single quotes??

"Luke M" wrote:

One way to discard outlier is to use an array* function similar to this:

=AVERAGE(IF(A2:A2000<100,A2:A2000))

This formula is designed to take the average of all number sin the range
A2:A2000 that are less than 100.

*Array functions are confirmed using Ctrl+Shift+Enter, not just Enter.
--
Best Regards,

Luke M
*Remember to click "yes" if this post helped you!*


"William" wrote:

I have a large range of data (thousands of rows) and I was performing the
functions average and max on cells that had time format 37:30:55. I realized
in the data there were a few records that had an abnormally large value in
them and from discussion they were due to unavoidable circumstances. There
are a few records in each data set I am working with which are skewing my
numbers (very large values). Is there a way to tell these functions (or use a
different function that returns the same results) to exclude outliers or any
outlier larger/smaller then what I specify?

I know you can use the average function, select a range, skip over the
outlier and select another range after the value, and repeat, but now that
the data has so many recrds it is getting to be a tiresome task with the new
data (thousands of additional records) every month that I put on a new tab.
It takes a lot of time to go through thousands of rows to find the outlier
and not select them in the average/max function.

  #10   Report Post  
Posted to microsoft.public.excel.worksheet.functions
external usenet poster
 
Posts: 119
Default Removing statistical outliers


I tried the formula, and it works, however my outliers are strange numbers
at times (i.e. 2224:18:40). My spreadsheet is pretty simple. Column A is the
date and time when a report was received in format mm/dd/yy hh:mm:ss AM/PM
and the column B is the time it was acknowledged in the same format. Column C
is the difference between column A & B in time format 37:30:55.

I have dozens of tabs and spreadsheets with similiar data (each one unique
to itself representing various things). Some of the records have a huge time
difference due to the fact that someone forgot about it for months and
months. It is always different for outliers one tab has that huge one over
2224 another is 50:34:56 and another is 6:46:20.

Even though the formula works great, I would have to take time out to find
what number to use in the formula for the outliers instead of using a default
formula in my spreadsheets I can reuse.

"Jacob Skaria" wrote:

I understand you are looking to find Average and max from large data sets...

Have you tried the formula Luke has given on average and the below max
formulas (array formulas)..

=MAX(IF(A2:A2000<100,A2:A2000))

It would be easy for us to suggest a solution if you can give more
information on how your data is arrranged....

If this post helps click Yes
---------------
Jacob Skaria


"William" wrote:

That kind of sucks. I have dozens and dozens of data sets, each with a
different outlier some are wierd like < 434 hours or < 820 hours. It will be
a nightmare for me to try and caclulate what 434 equals if 24 is 1. No easier
way, like just using the real time value????

"Jacob Skaria" wrote:

24:00 equates to 1. So in the below formula you are filtering entries which
are less than 100 which means you are filtering down to hours which are less
than 2400:00:00 which should an ideal case to go with...

=MAX(IF(A2:A2000<100,A2:A2000))
for finding the max

If this post helps click Yes
---------------
Jacob Skaria


"William" wrote:

Luke, how do I use time in that array? My time is format hh:mm:ss do I use
quotes, single quotes??

"Luke M" wrote:

One way to discard outlier is to use an array* function similar to this:

=AVERAGE(IF(A2:A2000<100,A2:A2000))

This formula is designed to take the average of all number sin the range
A2:A2000 that are less than 100.

*Array functions are confirmed using Ctrl+Shift+Enter, not just Enter.
--
Best Regards,

Luke M
*Remember to click "yes" if this post helped you!*


"William" wrote:

I have a large range of data (thousands of rows) and I was performing the
functions average and max on cells that had time format 37:30:55. I realized
in the data there were a few records that had an abnormally large value in
them and from discussion they were due to unavoidable circumstances. There
are a few records in each data set I am working with which are skewing my
numbers (very large values). Is there a way to tell these functions (or use a
different function that returns the same results) to exclude outliers or any
outlier larger/smaller then what I specify?

I know you can use the average function, select a range, skip over the
outlier and select another range after the value, and repeat, but now that
the data has so many recrds it is getting to be a tiresome task with the new
data (thousands of additional records) every month that I put on a new tab.
It takes a lot of time to go through thousands of rows to find the outlier
and not select them in the average/max function.



  #11   Report Post  
Posted to microsoft.public.excel.worksheet.functions
external usenet poster
 
Posts: 806
Default Removing statistical outliers

Hello,

Here is a general approach:
http://sulprobil.com/html/outlier_resistant_beta.html

Regards,
Bernd
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
how do I identify statistical outliers with excel? [email protected] Excel Discussion (Misc queries) 8 April 23rd 23 07:42 PM
how to calculate outliers ismhs Charts and Charting in Excel 1 June 1st 08 02:49 PM
how do I identify statistical outliers with excel Jeff Excel Discussion (Misc queries) 0 January 7th 07 07:53 AM
Boxplots with outliers Confuzzled. Charts and Charting in Excel 1 March 17th 06 07:36 PM
outliers/histograms Julie Excel Discussion (Misc queries) 1 January 14th 06 07:28 AM


All times are GMT +1. The time now is 09:48 AM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright ©2004-2025 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"