Home |
Search |
Today's Posts |
|
#1
![]()
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
![]()
On Sep 12, 4:57 pm, "David Biddulph" <groups [at] biddulph.org.uk
wrote: at your A35 stage you may find yourself with the ((s^2 - sum((A1:A34-m)^2/n))*n/2) term negative, in which case you wouldn't be able to achieve STDEV of 10. I thought about that, too; so at one point, the formula was (s - sqrt(v))^2*n/2, where "v" is sum((A1:A34-m)^2)/n. (The "/n" does need to be inside the SUM(). It was a typo, although they are the same mathematically -- albeit not necessarily so "numeric analytically".) But I concluded that I can prove that s^2 is always greater than "v" -- ergo, the sqrt argument will never be negative. The derivation of the formula might be insightful (or insiteful <smile). We want: s^2 = varp(A1:A36) = sum( (A1:A36-m)^2 ) / n = sum( (A1:A34-m)^2 ) / n + ( (A35-m)^2 + (A36-m)^2 ) / n That is by definition. Since all the terms of the sum are non- negative, sum(...)/n, the partial variance, must be less than or equal to s^2, the total variance. Therefore, s^2 - sum(...)/n must be non- negative. (For the rest of the derivation, see the Endnotes below.) But aha! I just generated a random set of numbers where that is not true(!). What's wrong with my proof? Perhaps nothing. I suspect that the binary arithmetic error is catching up to me. I did not expect that for such a small set of numbers -- but I should have written the formula to accomodate it anyway. So the formula for A35 should be the array formula (commit with ctrl- shift-Enter): =sqrt( ( s - sqrt( sum( (A1:A34-m)^2 )/n ) )^2 *n/2 ) + m But even with that correction, I get a 1% error difference in the expected v. actual sd with one particular set of random numbers. I guess that proves the point I made about "bristling" at Bernd's use of the word "exactly" <wink. I had previously thought along partly similar lines, producing 34 random numbers and intending to set the 35th and 36th to achieve the mean and STDEV required, but had discarded the idea because the spread of the first 34 might be so large as to prevent the STDEV being achievable. That was the point I made about "extreme outliers". I don't think my method avoids extreme outliers completely. But I do think they are less likely. That is merely based on horrible empirical experience with the other method (the paradigm: 34 random numbers plus 2 dependent numbers). ----- Endnotes To complete the derivation .... We select A35 and A36 in the same that we selected the other pairwise "random" values, namely: A36 = 2*m - A35 So .... s^2 = varp(A1:A36) = sum( (A1:A36-m)^2 ) / n = sum( (A1:A34-m)^2 ) / n + ( (A35-m)^2 + (A36-m)^2 ) / n s^2 - sum((A1:A34-m)^2)/n = ( (A35-m)^2 + (2*m-A35-m)^2 )/n = ( (A35-m)^2 + (m-A35)^2 )/n = 2*(A35-m)^2/n Somehow, I think that last term should have been obvious to me. But I had not stopped to think about it. So .... (s^2 - sum((A1:A34-m)^2)/n) * n / 2 = (A35-m)^2 sqrt( (s^2 - sum((A1:A34-m)^2)/n) * n / 2 ) + m = A35 QED |
#2
![]()
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
![]()
On Sep 12, 9:06 pm, I wrote:
So the formula for A35 should be the array formula (commit with ctrl-shift-Enter): =sqrt( ( s - sqrt( sum( (A1:A34-m)^2 )/n ) )^2 *n/2 ) + m Hmm, I don't think so. Obviously s^2 - v does not equal (s - sqrt(v))^2. Oh well, back to the drawing board. As I noted previously, I believe the (original) formula is mathematically correct. But I need to deal with the numerical error caused by binary computer arithmetic. Sorry, don't have the time for this now. Perhaps someone else can offer a solution. |
#3
![]()
Posted to microsoft.public.excel.worksheet.functions
|
|||
|
|||
![]()
On Sep 12, 9:06 pm, joeu2004 wrote:
On Sep 12, 4:57 pm, "David Biddulph" <groups [at] biddulph.org.uk wrote: at your A35 stage you may find yourself with the ((s^2 - sum((A1:A34-m)^2/n))*n/2) term negative, in which case you wouldn't be able to achieve STDEV of 10. Of course you are right. I thought about that, too [...]. But I concluded that I can prove that s^2 is always greater than "v" [....] Since all the terms of the sum are non-negative, sum(...)/n, the partial variance, must be less than or equal to s^2, the total variance. Therefore, s^2 - sum(...)/n must be non-negative. Of course that is wrong. I confused myself between what would be true __if__ sd were 10 and what __is__ true given the sd of the 34 generated data points. Klunk! I will write 10,000 times "I will try not to solve math problems while I am rushing to get ready for a trip". Double klunk! I had previously thought along partly similar lines, producing 34 random numbers and intending to set the 35th and 36th to achieve the mean and STDEV required, but had discarded the idea because the spread of the first 34 might be so large as to prevent the STDEV being achievable. Now I wonder if even that is feasible. After all, isn't that essentially what I tried to do? It is easy to meet the mean criteria, at least by my method. But how else would you meet the sd criteria, other than with my method, which does not work? It seems that the solution must ensure that the variance of the 34 data values (actually 34/36 of the variance) does not exceed the desired variance of the 36 data values. I 'spose we could iterate in a macro until that condition is met. But I generally avoid iterative implementations that are not guaranteed to terminate in a finite (and reasonable) time. Of course, we could put a limit on the number of iterations, returning #NUM if we fail. Hmm, that sounds familiar <wink. Oh well.... |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
standard deviation | Excel Discussion (Misc queries) | |||
standard deviation | Excel Worksheet Functions | |||
Standard Deviation | Excel Worksheet Functions | |||
Standard Deviation | Excel Worksheet Functions | |||
standard deviation | Excel Discussion (Misc queries) |