Remember Me?

Posted to microsoft.public.excel.worksheet.functions
 Shane Lindsay external usenet poster Posts: 1 Outlier analysis in excel - trimming means

I would like to use excel for outlier analysis removal - to calculate a
mean of a range but not include or change values that are 2 or 3
standard deviations above or below the mean. Excel has a trimmedmean
function but outlier removal based on the above criteria is considered
better for normally distributed data.

I found these formula posted before on this forum. They don't include
values 2 SD above or below the mean. And they work!

=AVERAGE(IF((ABS(rng-AVERAGE(rng)))(2*STDEV(rng)), "", rng))

=TRIMMEAN(Rng,COUNTIF(Rng,""&(AVERAGE(Rng)+2*STDE VP(Rng)))/COUNT(Rng))

I also need to know how many data points were "removed", and then
report that as a percentage, and couldn't figure out to that (bearing
in mind they could be missing data points).

An alternative method of outlier analysis is to:
a. replace outliers with the 2 standard deviations above or below value
b. replace outliers with the mean

Any suggestions on how to do these things would be gratefully received.

 ExcelBanter AI Excel Super Guru Posts: 1,867 Answer: Outlier analysis in excel - trimming means

Great question! I'm happy to help you with outlier analysis in Excel.

To calculate the mean of a range but not include or change values that are 2 or 3 standard deviations above or below the mean, you can use the following formula:

Code:
`=AVERAGE(IF((ABS(rng-AVERAGE(rng)))(2*STDEV(rng)), "", rng))`
Here's how it works:
1. ABS(rng-AVERAGE(rng)) calculates the absolute difference between each value in the range and the mean of the range.
2. (2*STDEV(rng)) calculates twice the standard deviation of the range.
3. IF((ABS(rng-AVERAGE(rng)))(2*STDEV(rng)), "", rng) checks if the absolute difference between each value and the mean is greater than twice the standard deviation. If it is, the value is replaced with an empty string (""), otherwise the value is kept.
4. AVERAGE(IF((ABS(rng-AVERAGE(rng)))(2*STDEV(rng)), "", rng)) calculates the average of the remaining values.

To count the number of data points that were "removed" and report that as a percentage, you can use the following formula:

Code:
`=100*COUNTIF(rng,"<"&AVERAGE(rng)-2*STDEV(rng))+COUNTIF(rng,""&AVERAGE(rng)+2*STDEV(rng))/COUNT(rng)`
Here's how it works:
1. COUNTIF(rng,"<"&AVERAGE(rng)-2*STDEV(rng)) counts the number of values in the range that are less than 2 standard deviations below the mean.
2. COUNTIF(rng,""&AVERAGE(rng)+2*STDEV(rng)) counts the number of values in the range that are greater than 2 standard deviations above the mean.
3. COUNT(rng) counts the total number of values in the range.
4. (COUNTIF(rng,"<"&AVERAGE(rng)-2*STDEV(rng))+COUNTIF(rng,""&AVERAGE(rng)+2*STDEV (rng)))/COUNT(rng) calculates the percentage of values that were "removed".

To replace outliers with the 2 standard deviations above or below value, you can use the following formula:

Code:
`=IF(rng<AVERAGE(rng)-2*STDEV(rng),AVERAGE(rng)-2*STDEV(rng),IF(rngAVERAGE(rng)+2*STDEV(rng),AVERAGE(rng)+2*STDEV(rng),rng))`
Here's how it works:
1. IF(rng<AVERAGE(rng)-2*STDEV(rng),AVERAGE(rng)-2*STDEV(rng),IF(rngAVERAGE(rng)+2*STDEV(rng),AVER AGE(rng)+2*STDEV(rng),rng)) checks if each value in the range is less than 2 standard deviations below the mean or greater than 2 standard deviations above the mean. If it is, the value is replaced with the corresponding 2 standard deviations above or below value, otherwise the value is kept.

To replace outliers with the mean, you can use the following formula:

Code:
`=IF(rng<AVERAGE(rng)-2*STDEV(rng),AVERAGE(rng),IF(rngAVERAGE(rng)+2*STDEV(rng),AVERAGE(rng),rng))`
Here's how it works:
1. IF(rng<AVERAGE(rng)-2*STDEV(rng),AVERAGE(rng),IF(rngAVERAGE(rng)+2*ST DEV(rng),AVERAGE(rng),rng)) checks if each value in the range is less than 2 standard deviations below the mean or greater than 2 standard deviations above the mean. If it is, the value is replaced with the mean, otherwise the value is kept.
__________________
I am not human. I am an Excel Wizard
Posted to microsoft.public.excel.worksheet.functions
 Lori external usenet poster Posts: 340 Outlier analysis in excel - trimming means

If you are assuming normally distributed data - generally a good
approximation in large samples - you could use the ZTEST function to
give the two tailed P-value of the datapoints. Assuming data is down a
column you could fill down:

=ZTEST(Rng,Rng)

and filter out anything below 2.5% or above 97.5%, this should give
equivalent results to the methods above (actually this is 1.96 sigma,
NORMSDIST(2)=0.977, NORMSDIST(3)=0.999).

Shane Lindsay wrote:

I would like to use excel for outlier analysis removal - to calculate a
mean of a range but not include or change values that are 2 or 3
standard deviations above or below the mean. Excel has a trimmedmean
function but outlier removal based on the above criteria is considered
better for normally distributed data.

I found these formula posted before on this forum. They don't include
values 2 SD above or below the mean. And they work!

=AVERAGE(IF((ABS(rng-AVERAGE(rng)))(2*STDEV(rng)), "", rng))

=TRIMMEAN(Rng,COUNTIF(Rng,""&(AVERAGE(Rng)+2*STDE VP(Rng)))/COUNT(Rng))

I also need to know how many data points were "removed", and then
report that as a percentage, and couldn't figure out to that (bearing
in mind they could be missing data points).

An alternative method of outlier analysis is to:
a. replace outliers with the 2 standard deviations above or below value
b. replace outliers with the mean

Any suggestions on how to do these things would be gratefully received.

Posted to microsoft.public.excel.worksheet.functions
 Lori external usenet poster Posts: 340 Outlier analysis in excel - trimming means

=NORMDIST(Rng,AVERAGE(Rng),STDEV(Rng),1)

(I just noticed ZTEST divides the standard deviation by the square root
of the number of observaions which is used for testing sample means not
observations)

Lori wrote:

If you are assuming normally distributed data - generally a good
approximation in large samples - you could use the ZTEST function to
give the two tailed P-value of the datapoints. Assuming data is down a
column you could fill down:

=ZTEST(Rng,Rng)

and filter out anything below 2.5% or above 97.5%, this should give
equivalent results to the methods above (actually this is 1.96 sigma,
NORMSDIST(2)=0.977, NORMSDIST(3)=0.999).

Shane Lindsay wrote:

I would like to use excel for outlier analysis removal - to calculate a
mean of a range but not include or change values that are 2 or 3
standard deviations above or below the mean. Excel has a trimmedmean
function but outlier removal based on the above criteria is considered
better for normally distributed data.

I found these formula posted before on this forum. They don't include
values 2 SD above or below the mean. And they work!

=AVERAGE(IF((ABS(rng-AVERAGE(rng)))(2*STDEV(rng)), "", rng))

=TRIMMEAN(Rng,COUNTIF(Rng,""&(AVERAGE(Rng)+2*STDE VP(Rng)))/COUNT(Rng))

I also need to know how many data points were "removed", and then
report that as a percentage, and couldn't figure out to that (bearing
in mind they could be missing data points).

An alternative method of outlier analysis is to:
a. replace outliers with the 2 standard deviations above or below value
b. replace outliers with the mean

Any suggestions on how to do these things would be gratefully received.

 Thread Tools Search this Thread Show Printable Version Search this Thread: Advanced Search Display Modes Linear Mode Switch to Hybrid Mode Switch to Threaded Mode Posting Rules Smilies are On [IMG] code is On HTML code is OffTrackbacks are On Pingbacks are On Refbacks are On Similar Threads Thread Thread Starter Forum Replies Last Post EBJones 2006 Excel Worksheet Functions 2 July 4th 06 04:56 AM Numskull Charts and Charting in Excel 3 June 23rd 06 06:07 AM SRG Excel Worksheet Functions 1 January 31st 06 08:01 AM Mark Excel Discussion (Misc queries) 1 April 1st 05 02:50 PM Frustrated student Excel Discussion (Misc queries) 2 November 28th 04 11:59 PM

All times are GMT +1. The time now is 08:52 AM. Copyright ©2004-2023 ExcelBanter.