View Single Post
  #8   Report Post  
 
Posts: n/a
Default t-distribution puzzle in Excel

Dana DeLouis wrote:
repeating this exercise using new random numbers, I found the sample
kurtosis jumps around ~2.9 with a standard deviation of ~1.1. The
theoretical value for the excess kurtosis of t distribution is 6 / (df
- 4) = 6,


Hi. I'm not a Stats guy, so this is just for gee wiz.
Just for feedback, the math program Mathematica appears to agree with Excel.
Most of the values returned by KurtosisExcess are below 6.
I "think" that Excel (2003) is doing it correctly!

These values agree with what you mentioned...

Variance[StudentTDistribution[5]]}
5/3 (or 1.6666...)

KurtosisExcess[StudentTDistribution[5]]
6

( I know this isn't Excel, but I thought it would be easier to show it this
way instead of just the answers.)

= = = = = = = = = = =
Example #1 Data size = 30,000. The value of KurtosisExcess jumps
around. Some values were below 3, and above 10. df =5.
Here's 10 samples...

t=Table[KurtosisExcess[RandomArray[StudentTDistribution[5], 30000]], {10}]

{5.111078, 4.0498174, 3.8689816, 4.5970144, 7.0692163, 3.2919761,
5.9771902, 4.1000405, 3.23158, 4.8915233}

{Mean[t], StandardDeviation[t]}

{4.6188418, 1.2021682}


= = = = = = = = = = =
Example #2 Data size = 300,000 (Larger size)

t=Table[KurtosisExcess[RandomArray[StudentTDistribution[5], 300000]],
{10}]

{4.4799506, 4.6979051, 7.693889, 4.8315766, 6.7575222, 4.685617,
4.415599, 4.9943591, 6.166258, 4.404873}

{Mean[t], StandardDeviation[t]}

{5.312755, 1.1504864}

As you can see, KurtosisExcess is usually below 6 and varies quite a bit.
Maybe someone good at Stats can jump in and confirm.
--
Dana DeLouis
Win XP & Office 2003


wrote in message
oups.com...
I tried a couple of standard ways to generate random numbers that
follow Student's t-distribution. Something is very wrong, and I don't
know what.

Start with a simple example. In column A, I put 1000 uniformly
distributed on (0,1) numbers, using RAND(). Then I apply to them the
function TINV. E.g., in cell B1, I have

=IF(A1<0.5,-TINV(2*A1,df),TINV(2*(1-A1),df))

where df is my degrees of freedom. I fixed df = 5.

I obtain a column of (presumably) t-distributed numbers. I then check
their excess kurtosis (as calculated by the KURT() function). As I was
repeating this exercise using new random numbers, I found the sample
kurtosis jumps around ~2.9 with a standard deviation of ~1.1. The
theoretical value for the excess kurtosis of t distribution is 6 / (df
- 4) = 6, and the theoretical stdev of the sample kurtosis itself is
sqrt(24/1000) ~ 0.15. So I am puzzled as to why the excess kurtosis is
so incredibly far off. Also puzzled by why it's so unstable.

I then replaced Excel's rand() worksheet function with the
Mersenne-Twister implementation in NtRand
http://www.numtech.com/NtRand/.

Similar result (the sample excess kurtosis jumps wildly but still very
far below the theoretical value of 6).

I then added another column of random uniform (0,1) numbers (in column
C), and tried using instead the formula:

=NORMSINV(A1)/SQRT(df/GAMMAINV(C1,df/2,2))

Similar result.

I tried usign 60,000 instead of 1000 numbers. Now the theoretical stdev
of the sample kurtosis is just 0.02.

Similar result.

I checked the variance and skewness, and those match their values ( df
/ df - 2 = 1.667 and 0 respectively) very closely.

I checked the normal distribution, and its kurtosis matches the
theoretical perfectly.

I know I'm doing something wrong, but what? =(

Thanks....


As the OP said the discussion on the statistical properties of the
sample kurtosis goes on at sci.stat.math and no new fault in Excel has
been discovered.

What has been discovered is how surprisingly slow the Excel inverse
functions are. I knew they were slow in extreme situations but I had
not appreciated how slow they were in general.

Out of curiousity, I then looked at
http://support.microsoft.com/default...21120121120120
to see what Microsoft suggest on generating random variables. As you
can see random number generation via the Analysis Toolpak is no longer
recommended. Instead they suggest =NORMINV(RAND(), mu, sigma) for
generating values from the normal distribution with mean mu and
variance sigma^2.

Curious about the performance I copied the formula =normsinv(rand())
into the range a1:j20000 to see how long it would take. On my machine
it took 154 seconds which is fairly terrible - spreadsheet overheads
notwithstanding.

I then tried inv_normal(rand()) in the same range and it took 4
seconds.

Lastly I tried the Box-Muller approach using
=SQRT(-2*LN(RAND()))*COS(2*PI()*RAND()) in the same range and this took
about 1 second.

Maybe the random number generation section of the page referred to
above could do with updating.

Ian Smith