ExcelBanter - Z score on existing data

"jimbo" wrote:
I somehow have to take the outlier out of the group
and then recalculate the z and I have to check for
both low and high.
[....]
I can't figure out how to do it in excel either.
Am I using the wrong approach or is there a way, or a
mathematical compensation that makes up for the high
and low outlier being part of the mean and SD?

I have used a similar iterative approach when eliminating outliers. AFAIK,
there is no "mathematical compensation" factor. You simply need to exclude
outliers from the (re)computation of the mean and SD.

But there are tricks that you might be able to use to make it easier to do
in Excel.

One approach to eliminating outliers is to convert them to text or to clear
their cell. Note that Excel functions like AVERAGE and STDEV ignore text
and empty cells. Therefore, you do not need to adjust their parameters.

For example, if D90 contains the number 123 and you determine that is an
outlier, change the cell to '123 (note the apostrophe) and change the
horizontal format to right-alignment. Or simply clear D90.

You can still use AVERAGE(A1:L100) and STDEV(A1:L100). The mean and SD are
adjusted automatically, and the identification of subsequent outliers will
be based on the new mean and SD.

Note-1: If you change 123 to text, it would be prudent to change to the
font or cell fill color or otherwise identify the data as an excluded
outlier.

Note-2: Simply changing the cell format to Text alone does not cause
numeric constants to be converted to text, although it might change the
horizontal alignment to be similar to text.

Alternatively, it might be tempting to simply change the font or cell fill
color; that is easy to do automatically with a Condition Format. Then you
might use array-entered AVERAGE and STDEV formulas to ignore the colored
outliers.

However, AFAIK, there is no Excel function that could be used in a
conditional expression to recognized and ignore colored outliers. In
particular, CELLS("color") can__not__ be used for that purpose.

But you could write your own VBA function to recognize colored outliers.
See http://www.cpearson.com/excel/colors.aspx for details and ideas. Then
the mean and SD might be computed by the following array-entered formulas
(press ctrl+shift+Enter instead of just Enter):

=AVERAGE(IF(MYCOLOR(A1:L100)=0,A1:L100))
=STDEV(IF(MYCOLOR(A1:L100)=0,A1:L100))

Although I believe the __elimination__ of outliers (in contrast to their
__identification__) should be a manual procedure, you could implement the
iterative process in a macro.

Some unsolicited comments....

"jimbo" wrote:
I have 12 data points representing 12 months of data for
each of 100 people. I'm looking for outliers. So I set
up a high z score, max-mean/SD, and a low z, mean-min/SD.
When I look at my table, my z scores look low; I see clear
outliers that have a reletively low z.

Using the z-score to identify outliers is appropriate if the data are
normally distributed, or if you have good reason to believe a large
population of data would be.

Otherwise, you might consider alternative criteria, such as the
interquartile range (IQR).

In any case, "12 data points" is not much data for computing statistics and
identifying outliers. I hope you mean that the mean and SD are based on
1200 data points, which you might group by month for presentation.