View Single Post
  #2   Report Post  
Jerry W. Lewis
 
Posts: n/a
Default t-distribution puzzle in Excel

Excel's DIST functions are not that great in the tails. Excel's INV
functions (prior to 2003) are not that great at inverting the DIST
functions. The functions are more than adequate for simply hypothesis
testing, but you have double whammy in using this approach for
generating random numbers. Switch to inv_tdist in
http://members.aol.com/iandjmsmith/examples.xls
If you still have problems, then I will take a closer look.

For small df, there might be numerical issues due to the fact that there
is much less precision for representing small departures from 1 vs.
small departures from 0. If this is your problem, you might want to
only use p-values <0.5 with an auxiliary Bernoulli rv to decide whether
to use it for the upper or lower tail.

Jerry

wrote:

I tried a couple of standard ways to generate random numbers that
follow Student's t-distribution. Something is very wrong, and I don't
know what.

Start with a simple example. In column A, I put 1000 uniformly
distributed on (0,1) numbers, using RAND(). Then I apply to them the
function TINV. E.g., in cell B1, I have

=IF(A1<0.5,-TINV(2*A1,df),TINV(2*(1-A1),df))

where df is my degrees of freedom. I fixed df = 5.

I obtain a column of (presumably) t-distributed numbers. I then check
their excess kurtosis (as calculated by the KURT() function). As I was
repeating this exercise using new random numbers, I found the sample
kurtosis jumps around ~2.9 with a standard deviation of ~1.1. The
theoretical value for the excess kurtosis of t distribution is 6 / (df
- 4) = 6, and the theoretical stdev of the sample kurtosis itself is
sqrt(24/1000) ~ 0.15. So I am puzzled as to why the excess kurtosis is
so incredibly far off. Also puzzled by why it's so unstable.

I then replaced Excel's rand() worksheet function with the
Mersenne-Twister implementation in NtRand
http://www.numtech.com/NtRand/.

Similar result (the sample excess kurtosis jumps wildly but still very
far below the theoretical value of 6).

I then added another column of random uniform (0,1) numbers (in column
C), and tried using instead the formula:

=NORMSINV(A1)/SQRT(df/GAMMAINV(C1,df/2,2))

Similar result.

I tried usign 60,000 instead of 1000 numbers. Now the theoretical stdev
of the sample kurtosis is just 0.02.

Similar result.

I checked the variance and skewness, and those match their values ( df
/ df - 2 = 1.667 and 0 respectively) very closely.

I checked the normal distribution, and its kurtosis matches the
theoretical perfectly.

I know I'm doing something wrong, but what? =(

Thanks....