Thread: 2.5 StdDevs
View Single Post
  #11   Report Post  
Posted to microsoft.public.excel.programming
Zone[_3_] Zone[_3_] is offline
external usenet poster
 
Posts: 373
Default 2.5 StdDevs

Tom, Yes, I understand. The reason I asked is that I understand that
one statistical method is to throw out the highest and lowest numbers,
then, if 12 numbers are left, count down 3 and take that number as the
1st quartile. Say I have these numbers:
0.2
1
3.25
5
17
50
125
500
1250
1500
32505
50000

Assuming the most absurd numbers were already thrown out (leaving only
a few absurd numbers), counting down 3 and returning the number would
be 3.25. Obviously this is neither the position or the value of the
3rd number. However, say I want to get the center value of the
dataset.
=QUARTILE(M2:M13,2) gives 87.5, which does not appear to me to be the
center, mathematically.
=AVERAGE(M2:M13) gives 7163.04, which also does not appear to be the
center.

There is quite a difference between these two numbers! The quartile
value Excel gives seems to lie in the middle of the quartile, not on
the threshold.

If the expert says that the figure should be on the order of hundreds
of dollars, then conceivably everything below 50 and above 1500 could
be excluded. Unfortunately, I do not have an exact range for each of
these hundreds of indicators, so I'm trying to throw out outliers by
StdDevs (which, by the way, I can set to 2, 2.5, or 3). After
throwing out the outliers, I want to get meaningful figures for the
1st, 2nd and 3rd Quartiles. These will be used for judging
performance of the clients.

This demonstrates my dilemma. I would prefer not to go into hundreds
of lines of VBA code to render meaningful figures. Any further
guidance from you or John would be appreciated. James

On Jul 9, 3:54?pm, Tom Ogilvy
wrote:
quartile returns the quartile value, not the position of the quartile. if
you had 12 random numbers between 100 and 1000, quartile(A1:A12,1) would
return the appropriate number between 100 and 1000, not 3.75 which would be
the position of the first quartile. See Excel help for details.

Average returns the arithmetic mean.

stdev returns the sample standard deviation. stdevp returns the population
standard deviation. Neither requires you to calculate the arithmetic mean
separately, but of course you would need that value to determine the distance
your datapoint is from the mean; but I already provided that information to
you.

--
Regards,
Tom Ogilvy



"Zone" wrote:
Tom and John,
I am paying very close attention to everything you say. The
problem is that for a particular ratio in a particular industry (such
as supply expense per client per day), it is known and published by
the experts that the figure should be in the range of a few hundred
dollars. Depending on other factors (which are captured), this could
vary between $200 or so and maybe $700. Problem is, the incoming data
includes figures such as 22 cents and $125,000, both of which are
obviously incorrect. Alas, we have no control over the incoming data,
so the best we can do is to try to grab the "reasonable" figures that
are the bulk of the data and ignore the "unreasonable" figures that
have resulted from data input errors, misunderstanding of what is
being asked, etc.
What is wanted is Top, Second and Third Quartiles for each
indicator. These cannot be figured unless the ridiculous data is
excluded. Tom, I presume that when you say the AVERAGE function will
get me the average, this is the same as the mean, since neither
StdDevs nor quartiles can be figured without the mean, to my limited
mathematical knowledge.
I acually wasn't aware that there were a spreadsheet function for
quartiles, so that is interesting. I take it that this reveals true
quartiles, not just counting down 3 in a set of 12 data, for instance.
I am grateful to both of you for your interest in my problem. I
readily admit to being mathematically obtuse. I have hundreds of
indicators to run, so I'm looking for a fast and efficient way to
handle this. James
On Jul 9, 2:56?pm, wrote:
James,


I'm thinking like Tom....what is it exactly you are trying to get to?


Are you trying to eliminate outliers from your dataset?....all those
beyond 2.5 std devs. Or are you trying to find the 25th and 75th
quartiles?


John- Hide quoted text -


- Show quoted text -