![]() |
Outlier analysis in excel - trimming means
I would like to use excel for outlier analysis removal - to calculate a
mean of a range but not include or change values that are 2 or 3 standard deviations above or below the mean. Excel has a trimmedmean function but outlier removal based on the above criteria is considered better for normally distributed data. I found these formula posted before on this forum. They don't include values 2 SD above or below the mean. And they work! =AVERAGE(IF((ABS(rng-AVERAGE(rng)))(2*STDEV(rng)), "", rng)) =TRIMMEAN(Rng,COUNTIF(Rng,""&(AVERAGE(Rng)+2*STDE VP(Rng)))/COUNT(Rng)) I also need to know how many data points were "removed", and then report that as a percentage, and couldn't figure out to that (bearing in mind they could be missing data points). An alternative method of outlier analysis is to: a. replace outliers with the 2 standard deviations above or below value b. replace outliers with the mean Any suggestions on how to do these things would be gratefully received. |
Answer: Outlier analysis in excel - trimming means
Great question! I'm happy to help you with outlier analysis in Excel.
To calculate the mean of a range but not include or change values that are 2 or 3 standard deviations above or below the mean, you can use the following formula: Code:
=AVERAGE(IF((ABS(rng-AVERAGE(rng)))(2*STDEV(rng)), "", rng))
To count the number of data points that were "removed" and report that as a percentage, you can use the following formula: Code:
=100*COUNTIF(rng,"<"&AVERAGE(rng)-2*STDEV(rng))+COUNTIF(rng,""&AVERAGE(rng)+2*STDEV(rng))/COUNT(rng)
To replace outliers with the 2 standard deviations above or below value, you can use the following formula: Code:
=IF(rng<AVERAGE(rng)-2*STDEV(rng),AVERAGE(rng)-2*STDEV(rng),IF(rngAVERAGE(rng)+2*STDEV(rng),AVERAGE(rng)+2*STDEV(rng),rng))
To replace outliers with the mean, you can use the following formula: Code:
=IF(rng<AVERAGE(rng)-2*STDEV(rng),AVERAGE(rng),IF(rngAVERAGE(rng)+2*STDEV(rng),AVERAGE(rng),rng))
|
Outlier analysis in excel - trimming means
If you are assuming normally distributed data - generally a good
approximation in large samples - you could use the ZTEST function to give the two tailed P-value of the datapoints. Assuming data is down a column you could fill down: =ZTEST(Rng,Rng) and filter out anything below 2.5% or above 97.5%, this should give equivalent results to the methods above (actually this is 1.96 sigma, NORMSDIST(2)=0.977, NORMSDIST(3)=0.999). Shane Lindsay wrote: I would like to use excel for outlier analysis removal - to calculate a mean of a range but not include or change values that are 2 or 3 standard deviations above or below the mean. Excel has a trimmedmean function but outlier removal based on the above criteria is considered better for normally distributed data. I found these formula posted before on this forum. They don't include values 2 SD above or below the mean. And they work! =AVERAGE(IF((ABS(rng-AVERAGE(rng)))(2*STDEV(rng)), "", rng)) =TRIMMEAN(Rng,COUNTIF(Rng,""&(AVERAGE(Rng)+2*STDE VP(Rng)))/COUNT(Rng)) I also need to know how many data points were "removed", and then report that as a percentage, and couldn't figure out to that (bearing in mind they could be missing data points). An alternative method of outlier analysis is to: a. replace outliers with the 2 standard deviations above or below value b. replace outliers with the mean Any suggestions on how to do these things would be gratefully received. |
Outlier analysis in excel - trimming means
Correction: instead of ZTEST use:
=NORMDIST(Rng,AVERAGE(Rng),STDEV(Rng),1) (I just noticed ZTEST divides the standard deviation by the square root of the number of observaions which is used for testing sample means not observations) Lori wrote: If you are assuming normally distributed data - generally a good approximation in large samples - you could use the ZTEST function to give the two tailed P-value of the datapoints. Assuming data is down a column you could fill down: =ZTEST(Rng,Rng) and filter out anything below 2.5% or above 97.5%, this should give equivalent results to the methods above (actually this is 1.96 sigma, NORMSDIST(2)=0.977, NORMSDIST(3)=0.999). Shane Lindsay wrote: I would like to use excel for outlier analysis removal - to calculate a mean of a range but not include or change values that are 2 or 3 standard deviations above or below the mean. Excel has a trimmedmean function but outlier removal based on the above criteria is considered better for normally distributed data. I found these formula posted before on this forum. They don't include values 2 SD above or below the mean. And they work! =AVERAGE(IF((ABS(rng-AVERAGE(rng)))(2*STDEV(rng)), "", rng)) =TRIMMEAN(Rng,COUNTIF(Rng,""&(AVERAGE(Rng)+2*STDE VP(Rng)))/COUNT(Rng)) I also need to know how many data points were "removed", and then report that as a percentage, and couldn't figure out to that (bearing in mind they could be missing data points). An alternative method of outlier analysis is to: a. replace outliers with the 2 standard deviations above or below value b. replace outliers with the mean Any suggestions on how to do these things would be gratefully received. |
All times are GMT +1. The time now is 06:29 AM. |
Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
ExcelBanter.com