View Single Post
  #8   Report Post  
Posted to microsoft.public.excel.worksheet.functions
Jacob Skaria Jacob Skaria is offline
external usenet poster
 
Posts: 8,520
Default Removing statistical outliers

I understand you are looking to find Average and max from large data sets...

Have you tried the formula Luke has given on average and the below max
formulas (array formulas)..

=MAX(IF(A2:A2000<100,A2:A2000))

It would be easy for us to suggest a solution if you can give more
information on how your data is arrranged....

If this post helps click Yes
---------------
Jacob Skaria


"William" wrote:

That kind of sucks. I have dozens and dozens of data sets, each with a
different outlier some are wierd like < 434 hours or < 820 hours. It will be
a nightmare for me to try and caclulate what 434 equals if 24 is 1. No easier
way, like just using the real time value????

"Jacob Skaria" wrote:

24:00 equates to 1. So in the below formula you are filtering entries which
are less than 100 which means you are filtering down to hours which are less
than 2400:00:00 which should an ideal case to go with...

=MAX(IF(A2:A2000<100,A2:A2000))
for finding the max

If this post helps click Yes
---------------
Jacob Skaria


"William" wrote:

Luke, how do I use time in that array? My time is format hh:mm:ss do I use
quotes, single quotes??

"Luke M" wrote:

One way to discard outlier is to use an array* function similar to this:

=AVERAGE(IF(A2:A2000<100,A2:A2000))

This formula is designed to take the average of all number sin the range
A2:A2000 that are less than 100.

*Array functions are confirmed using Ctrl+Shift+Enter, not just Enter.
--
Best Regards,

Luke M
*Remember to click "yes" if this post helped you!*


"William" wrote:

I have a large range of data (thousands of rows) and I was performing the
functions average and max on cells that had time format 37:30:55. I realized
in the data there were a few records that had an abnormally large value in
them and from discussion they were due to unavoidable circumstances. There
are a few records in each data set I am working with which are skewing my
numbers (very large values). Is there a way to tell these functions (or use a
different function that returns the same results) to exclude outliers or any
outlier larger/smaller then what I specify?

I know you can use the average function, select a range, skip over the
outlier and select another range after the value, and repeat, but now that
the data has so many recrds it is getting to be a tiresome task with the new
data (thousands of additional records) every month that I put on a new tab.
It takes a lot of time to go through thousands of rows to find the outlier
and not select them in the average/max function.