Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
 
Posts: n/a
Default t-distribution puzzle in Excel

I tried a couple of standard ways to generate random numbers that
follow Student's t-distribution. Something is very wrong, and I don't
know what.

Start with a simple example. In column A, I put 1000 uniformly
distributed on (0,1) numbers, using RAND(). Then I apply to them the
function TINV. E.g., in cell B1, I have

=IF(A1<0.5,-TINV(2*A1,df),TINV(2*(1-A1),df))

where df is my degrees of freedom. I fixed df = 5.

I obtain a column of (presumably) t-distributed numbers. I then check
their excess kurtosis (as calculated by the KURT() function). As I was
repeating this exercise using new random numbers, I found the sample
kurtosis jumps around ~2.9 with a standard deviation of ~1.1. The
theoretical value for the excess kurtosis of t distribution is 6 / (df
- 4) = 6, and the theoretical stdev of the sample kurtosis itself is
sqrt(24/1000) ~ 0.15. So I am puzzled as to why the excess kurtosis is
so incredibly far off. Also puzzled by why it's so unstable.

I then replaced Excel's rand() worksheet function with the
Mersenne-Twister implementation in NtRand
http://www.numtech.com/NtRand/.

Similar result (the sample excess kurtosis jumps wildly but still very
far below the theoretical value of 6).

I then added another column of random uniform (0,1) numbers (in column
C), and tried using instead the formula:

=NORMSINV(A1)/SQRT(df/GAMMAINV(C1,df/2,2))

Similar result.

I tried usign 60,000 instead of 1000 numbers. Now the theoretical stdev
of the sample kurtosis is just 0.02.

Similar result.

I checked the variance and skewness, and those match their values ( df
/ df - 2 = 1.667 and 0 respectively) very closely.

I checked the normal distribution, and its kurtosis matches the
theoretical perfectly.

I know I'm doing something wrong, but what? =(

Thanks....

  #2   Report Post  
Jerry W. Lewis
 
Posts: n/a
Default t-distribution puzzle in Excel

Excel's DIST functions are not that great in the tails. Excel's INV
functions (prior to 2003) are not that great at inverting the DIST
functions. The functions are more than adequate for simply hypothesis
testing, but you have double whammy in using this approach for
generating random numbers. Switch to inv_tdist in
http://members.aol.com/iandjmsmith/examples.xls
If you still have problems, then I will take a closer look.

For small df, there might be numerical issues due to the fact that there
is much less precision for representing small departures from 1 vs.
small departures from 0. If this is your problem, you might want to
only use p-values <0.5 with an auxiliary Bernoulli rv to decide whether
to use it for the upper or lower tail.

Jerry

wrote:

I tried a couple of standard ways to generate random numbers that
follow Student's t-distribution. Something is very wrong, and I don't
know what.

Start with a simple example. In column A, I put 1000 uniformly
distributed on (0,1) numbers, using RAND(). Then I apply to them the
function TINV. E.g., in cell B1, I have

=IF(A1<0.5,-TINV(2*A1,df),TINV(2*(1-A1),df))

where df is my degrees of freedom. I fixed df = 5.

I obtain a column of (presumably) t-distributed numbers. I then check
their excess kurtosis (as calculated by the KURT() function). As I was
repeating this exercise using new random numbers, I found the sample
kurtosis jumps around ~2.9 with a standard deviation of ~1.1. The
theoretical value for the excess kurtosis of t distribution is 6 / (df
- 4) = 6, and the theoretical stdev of the sample kurtosis itself is
sqrt(24/1000) ~ 0.15. So I am puzzled as to why the excess kurtosis is
so incredibly far off. Also puzzled by why it's so unstable.

I then replaced Excel's rand() worksheet function with the
Mersenne-Twister implementation in NtRand
http://www.numtech.com/NtRand/.

Similar result (the sample excess kurtosis jumps wildly but still very
far below the theoretical value of 6).

I then added another column of random uniform (0,1) numbers (in column
C), and tried using instead the formula:

=NORMSINV(A1)/SQRT(df/GAMMAINV(C1,df/2,2))

Similar result.

I tried usign 60,000 instead of 1000 numbers. Now the theoretical stdev
of the sample kurtosis is just 0.02.

Similar result.

I checked the variance and skewness, and those match their values ( df
/ df - 2 = 1.667 and 0 respectively) very closely.

I checked the normal distribution, and its kurtosis matches the
theoretical perfectly.

I know I'm doing something wrong, but what? =(

Thanks....


  #3   Report Post  
 
Posts: n/a
Default t-distribution puzzle in Excel

Thanks Jerry! Btw, your function is a LOT faster than Excel's tinv!

But the results are still the same.

I'm beginning to wonder.. maybe the distribution of the sample kurtosis
is so *terribly* skewed, even for sample of size 60000, that it looks
to me like I'm getting a biased result? And maybe the variance is so
terribly huge too (I only know the theoretical variance of the sample
kurtosis for normal, not t distribution)..

  #4   Report Post  
Jerry W. Lewis
 
Posts: n/a
Default t-distribution puzzle in Excel

Please clarify what you did. Are you talking about
1. Using TINV(2*p,df) but only for p<0.5 with an auxiliary Bernoulli
variable to decide whether it is the upper or lower tail (multiply
t-value by +/-1).
2. Using Ian Smith's inv_tdist(p,df).
3. Using Ian Smith's inv_tdist(p,df) but only for p<0.5 with an
auxiliary Bernoulli variable to decide whether it is the upper or lower
tail (multiply t-value by +/-1).
4. Something else (please describe)

I would be very surprised if you are finding a VBA function to be faster
than a native Excel function.

In Excel 2002 and earlier versions,
=TINV(p,df)
returns 5E6 for p<4.3E-7, regardless of df (too extreme even for df=1).
Whenever you hit such a value, that will necessarily destabilize your
descriptive statistics.

Excel 2003 truncates the distribution at 1E7, but the p-value for which
TINV returns 1E7 varies with df, and seems to provide around 10 digit
accuracy in this tail, so TINV in 2003 might be acceptable to generate
random variates, but I still would avoid GAMMAINV for this purpose.

By contrast, Ian Smith's VBA functions are as close to machine accuracy
as I have seen in any double precision implementation (better than
commercial statistics packages, better than commercial math libraries, etc.)

Jerry


wrote:

Thanks Jerry! Btw, your function is a LOT faster than Excel's tinv!

But the results are still the same.

I'm beginning to wonder.. maybe the distribution of the sample kurtosis
is so *terribly* skewed, even for sample of size 60000, that it looks
to me like I'm getting a biased result? And maybe the variance is so
terribly huge too (I only know the theoretical variance of the sample
kurtosis for normal, not t distribution)..


  #5   Report Post  
mm
 
Posts: n/a
Default t-distribution puzzle in Excel

Jerry,

I'm ignoring all p 0.5. Then I'm using

inv_tdist(p,df) if q < 0.5
inv_tidst(1-p,df) if q 0.5,

where q = rand() is an uniform r.v. from Excel (independent from the
previous one).

I don't why it's faster, but it is! At least 5 times faster, if not 10
times. I'm using Excel 2003.

But I think I found that my problem is in the statistical domain, not
in the computation. See this thread:
http://groups.google.com/group/sci.s...83e1dd956d59e/
or sci.stat.math thread with subject "t distribution -- sample
kurtosis" if this link doesn't work. (Anyone knows how to link
correctly to threads?..)

Btw, is inv_tdist in public domain?

Thank you!



  #6   Report Post  
Jerry W. Lewis
 
Posts: n/a
Default t-distribution puzzle in Excel



mm wrote:

....

Btw, is inv_tdist in public domain?

....


The first line of the VBA code is a copyright notice, so by definition,
it is not public domain.

However, if Ian Smith had not intended for individuals to use his code,
then I doubt that he would have published the source code in a public
place. If you are wanting to use it commercially, I would recommend
e-mailing him. His address ) is clearly displayed
on his web page
http://members.aol.com/iandjmsmith/
and I have found him to respond readily to communications regarding his
numerical analysis efforts. Ian?

Jerry

  #7   Report Post  
Dana DeLouis
 
Posts: n/a
Default t-distribution puzzle in Excel

repeating this exercise using new random numbers, I found the sample
kurtosis jumps around ~2.9 with a standard deviation of ~1.1. The
theoretical value for the excess kurtosis of t distribution is 6 / (df
- 4) = 6,


Hi. I'm not a Stats guy, so this is just for gee wiz.
Just for feedback, the math program Mathematica appears to agree with Excel.
Most of the values returned by KurtosisExcess are below 6.
I "think" that Excel (2003) is doing it correctly!

These values agree with what you mentioned...

Variance[StudentTDistribution[5]]}
5/3 (or 1.6666...)

KurtosisExcess[StudentTDistribution[5]]
6

( I know this isn't Excel, but I thought it would be easier to show it this
way instead of just the answers.)

= = = = = = = = = = =
Example #1 Data size = 30,000. The value of KurtosisExcess jumps
around. Some values were below 3, and above 10. df =5.
Here's 10 samples...

t=Table[KurtosisExcess[RandomArray[StudentTDistribution[5], 30000]], {10}]

{5.111078, 4.0498174, 3.8689816, 4.5970144, 7.0692163, 3.2919761,
5.9771902, 4.1000405, 3.23158, 4.8915233}

{Mean[t], StandardDeviation[t]}

{4.6188418, 1.2021682}


= = = = = = = = = = =
Example #2 Data size = 300,000 (Larger size)

t=Table[KurtosisExcess[RandomArray[StudentTDistribution[5], 300000]],
{10}]

{4.4799506, 4.6979051, 7.693889, 4.8315766, 6.7575222, 4.685617,
4.415599, 4.9943591, 6.166258, 4.404873}

{Mean[t], StandardDeviation[t]}

{5.312755, 1.1504864}

As you can see, KurtosisExcess is usually below 6 and varies quite a bit.
Maybe someone good at Stats can jump in and confirm.
--
Dana DeLouis
Win XP & Office 2003


wrote in message
oups.com...
I tried a couple of standard ways to generate random numbers that
follow Student's t-distribution. Something is very wrong, and I don't
know what.

Start with a simple example. In column A, I put 1000 uniformly
distributed on (0,1) numbers, using RAND(). Then I apply to them the
function TINV. E.g., in cell B1, I have

=IF(A1<0.5,-TINV(2*A1,df),TINV(2*(1-A1),df))

where df is my degrees of freedom. I fixed df = 5.

I obtain a column of (presumably) t-distributed numbers. I then check
their excess kurtosis (as calculated by the KURT() function). As I was
repeating this exercise using new random numbers, I found the sample
kurtosis jumps around ~2.9 with a standard deviation of ~1.1. The
theoretical value for the excess kurtosis of t distribution is 6 / (df
- 4) = 6, and the theoretical stdev of the sample kurtosis itself is
sqrt(24/1000) ~ 0.15. So I am puzzled as to why the excess kurtosis is
so incredibly far off. Also puzzled by why it's so unstable.

I then replaced Excel's rand() worksheet function with the
Mersenne-Twister implementation in NtRand
http://www.numtech.com/NtRand/.

Similar result (the sample excess kurtosis jumps wildly but still very
far below the theoretical value of 6).

I then added another column of random uniform (0,1) numbers (in column
C), and tried using instead the formula:

=NORMSINV(A1)/SQRT(df/GAMMAINV(C1,df/2,2))

Similar result.

I tried usign 60,000 instead of 1000 numbers. Now the theoretical stdev
of the sample kurtosis is just 0.02.

Similar result.

I checked the variance and skewness, and those match their values ( df
/ df - 2 = 1.667 and 0 respectively) very closely.

I checked the normal distribution, and its kurtosis matches the
theoretical perfectly.

I know I'm doing something wrong, but what? =(

Thanks....



  #8   Report Post  
 
Posts: n/a
Default t-distribution puzzle in Excel

Dana DeLouis wrote:
repeating this exercise using new random numbers, I found the sample
kurtosis jumps around ~2.9 with a standard deviation of ~1.1. The
theoretical value for the excess kurtosis of t distribution is 6 / (df
- 4) = 6,


Hi. I'm not a Stats guy, so this is just for gee wiz.
Just for feedback, the math program Mathematica appears to agree with Excel.
Most of the values returned by KurtosisExcess are below 6.
I "think" that Excel (2003) is doing it correctly!

These values agree with what you mentioned...

Variance[StudentTDistribution[5]]}
5/3 (or 1.6666...)

KurtosisExcess[StudentTDistribution[5]]
6

( I know this isn't Excel, but I thought it would be easier to show it this
way instead of just the answers.)

= = = = = = = = = = =
Example #1 Data size = 30,000. The value of KurtosisExcess jumps
around. Some values were below 3, and above 10. df =5.
Here's 10 samples...

t=Table[KurtosisExcess[RandomArray[StudentTDistribution[5], 30000]], {10}]

{5.111078, 4.0498174, 3.8689816, 4.5970144, 7.0692163, 3.2919761,
5.9771902, 4.1000405, 3.23158, 4.8915233}

{Mean[t], StandardDeviation[t]}

{4.6188418, 1.2021682}


= = = = = = = = = = =
Example #2 Data size = 300,000 (Larger size)

t=Table[KurtosisExcess[RandomArray[StudentTDistribution[5], 300000]],
{10}]

{4.4799506, 4.6979051, 7.693889, 4.8315766, 6.7575222, 4.685617,
4.415599, 4.9943591, 6.166258, 4.404873}

{Mean[t], StandardDeviation[t]}

{5.312755, 1.1504864}

As you can see, KurtosisExcess is usually below 6 and varies quite a bit.
Maybe someone good at Stats can jump in and confirm.
--
Dana DeLouis
Win XP & Office 2003


wrote in message
oups.com...
I tried a couple of standard ways to generate random numbers that
follow Student's t-distribution. Something is very wrong, and I don't
know what.

Start with a simple example. In column A, I put 1000 uniformly
distributed on (0,1) numbers, using RAND(). Then I apply to them the
function TINV. E.g., in cell B1, I have

=IF(A1<0.5,-TINV(2*A1,df),TINV(2*(1-A1),df))

where df is my degrees of freedom. I fixed df = 5.

I obtain a column of (presumably) t-distributed numbers. I then check
their excess kurtosis (as calculated by the KURT() function). As I was
repeating this exercise using new random numbers, I found the sample
kurtosis jumps around ~2.9 with a standard deviation of ~1.1. The
theoretical value for the excess kurtosis of t distribution is 6 / (df
- 4) = 6, and the theoretical stdev of the sample kurtosis itself is
sqrt(24/1000) ~ 0.15. So I am puzzled as to why the excess kurtosis is
so incredibly far off. Also puzzled by why it's so unstable.

I then replaced Excel's rand() worksheet function with the
Mersenne-Twister implementation in NtRand
http://www.numtech.com/NtRand/.

Similar result (the sample excess kurtosis jumps wildly but still very
far below the theoretical value of 6).

I then added another column of random uniform (0,1) numbers (in column
C), and tried using instead the formula:

=NORMSINV(A1)/SQRT(df/GAMMAINV(C1,df/2,2))

Similar result.

I tried usign 60,000 instead of 1000 numbers. Now the theoretical stdev
of the sample kurtosis is just 0.02.

Similar result.

I checked the variance and skewness, and those match their values ( df
/ df - 2 = 1.667 and 0 respectively) very closely.

I checked the normal distribution, and its kurtosis matches the
theoretical perfectly.

I know I'm doing something wrong, but what? =(

Thanks....


As the OP said the discussion on the statistical properties of the
sample kurtosis goes on at sci.stat.math and no new fault in Excel has
been discovered.

What has been discovered is how surprisingly slow the Excel inverse
functions are. I knew they were slow in extreme situations but I had
not appreciated how slow they were in general.

Out of curiousity, I then looked at
http://support.microsoft.com/default...21120121120120
to see what Microsoft suggest on generating random variables. As you
can see random number generation via the Analysis Toolpak is no longer
recommended. Instead they suggest =NORMINV(RAND(), mu, sigma) for
generating values from the normal distribution with mean mu and
variance sigma^2.

Curious about the performance I copied the formula =normsinv(rand())
into the range a1:j20000 to see how long it would take. On my machine
it took 154 seconds which is fairly terrible - spreadsheet overheads
notwithstanding.

I then tried inv_normal(rand()) in the same range and it took 4
seconds.

Lastly I tried the Box-Muller approach using
=SQRT(-2*LN(RAND()))*COS(2*PI()*RAND()) in the same range and this took
about 1 second.

Maybe the random number generation section of the page referred to
above could do with updating.

Ian Smith

Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Excel 2003 FAILS, but Excel 2000 SUCCEEDS ??? Richard Excel Discussion (Misc queries) 2 May 13th 23 11:46 AM
Opening two separate instances of Excel Ron Bishop Excel Discussion (Misc queries) 2 August 4th 05 05:30 PM
Stop Excel Rounding Dates leinad512 Excel Discussion (Misc queries) 1 April 20th 05 04:19 PM
Excel error - Startup (and Acrobat PDFMaker) gxdata Setting up and Configuration of Excel 0 February 4th 05 03:44 AM
how do i draw a distribution chart in excel travish19 Charts and Charting in Excel 3 January 15th 05 01:18 AM


All times are GMT +1. The time now is 05:26 PM.

Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.
Copyright ©2004-2025 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"