ExcelBanter - View Single Post - RANDOM # around a MEAN with STANDARD DEVIATION

joeu2004

On Sep 12, 4:57 pm, "David Biddulph" <groups [at] biddulph.org.uk
wrote:
at your A35 stage you may find yourself with the
((s^2 - sum((A1:A34-m)^2/n))*n/2) term negative, in which
case you wouldn't be able to achieve STDEV of 10.

I thought about that, too; so at one point, the formula was (s -
sqrt(v))^2*n/2, where "v" is sum((A1:A34-m)^2)/n. (The "/n" does need
to be inside the SUM(). It was a typo, although they are the same
mathematically -- albeit not necessarily so "numeric analytically".)

But I concluded that I can prove that s^2 is always greater than "v"
-- ergo, the sqrt argument will never be negative. The derivation of
the formula might be insightful (or insiteful <smile). We want:

s^2 = varp(A1:A36) = sum( (A1:A36-m)^2 ) / n
= sum( (A1:A34-m)^2 ) / n + ( (A35-m)^2 + (A36-m)^2 ) / n

That is by definition. Since all the terms of the sum are non-
negative, sum(...)/n, the partial variance, must be less than or equal
to s^2, the total variance. Therefore, s^2 - sum(...)/n must be non-
negative.

(For the rest of the derivation, see the Endnotes below.)

But aha! I just generated a random set of numbers where that is not
true(!). What's wrong with my proof?

Perhaps nothing. I suspect that the binary arithmetic error is
catching up to me. I did not expect that for such a small set of
numbers -- but I should have written the formula to accomodate it
anyway.

So the formula for A35 should be the array formula (commit with ctrl-
shift-Enter):

=sqrt( ( s - sqrt( sum( (A1:A34-m)^2 )/n ) )^2 *n/2 ) + m

But even with that correction, I get a 1% error difference in the
expected v. actual sd with one particular set of random numbers.

I guess that proves the point I made about "bristling" at Bernd's use
of the word "exactly" <wink.

I had previously thought along partly similar lines, producing 34 random
numbers and intending to set the 35th and 36th to achieve the mean and STDEV
required, but had discarded the idea because the spread of the first 34
might be so large as to prevent the STDEV being achievable.

That was the point I made about "extreme outliers".

I don't think my method avoids extreme outliers completely. But I do
think they are less likely. That is merely based on horrible
empirical experience with the other method (the paradigm: 34 random
numbers plus 2 dependent numbers).

-----

Endnotes

To complete the derivation ....

We select A35 and A36 in the same that we selected the other pairwise
"random" values, namely:

A36 = 2*m - A35

So ....

s^2 = varp(A1:A36) = sum( (A1:A36-m)^2 ) / n
= sum( (A1:A34-m)^2 ) / n + ( (A35-m)^2 + (A36-m)^2 ) / n

s^2 - sum((A1:A34-m)^2)/n = ( (A35-m)^2 + (2*m-A35-m)^2 )/n
= ( (A35-m)^2 + (m-A35)^2 )/n
= 2*(A35-m)^2/n

Somehow, I think that last term should have been obvious to me. But I
had not stopped to think about it.

So ....

(s^2 - sum((A1:A34-m)^2)/n) * n / 2 = (A35-m)^2

sqrt( (s^2 - sum((A1:A34-m)^2)/n) * n / 2 ) + m = A35

QED