Home |
Search |
Today's Posts |
#1
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
How can I use covariance in random number generation
of a normal distribution? For example, I would use VBA to implement Box-Mueller using mean and std dev. I have implemented B-M in C. Can someone provide an algorithm (VBA, C or pseudocode) that incorporates covariance between two and among 3 or more random variables, each normally distributed? Is that called multivariate normal random number generation? FYI, This is for Monte Carlo simulation, if that matters. And I would be content with (even prefer) an algorithm that depends on a uniform RNG. Also, can I accomplish this without using VBA? For example, some people have suggested using NORMDIST(RAND(),mean,sd) as an RNG with normal distribution based on mean and std dev. I don't know how good that is and how that compares quality-wise to Box-Mueller, even assuming that RAND() is a good uniform RNG. (I believe some people question that assumption.) Frankly, I don't know how good Box-Mueller is quality-wise either, assuming a good uniform RNG to start with. I believe some people quibble over B-M as well. But I prefer to keep the discussion simple for now. I am interested in fundamental algorithms, not quibbles about perfection. |
#2
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
I think you meant NORMINV(RAND()...) not NORMDIST(RAND()...)
The former is a mathematically exact way to generate normal random variates, as is Box-Muller. Numerically, I would avoid the INV approach unless you have Excel 2003, sincer earlier inverse functions are not equal to the task. Both approaches can magnifiy inadequacies of the uniform random number generator. See http://groups.google.com/group/micro...b3b6faa69eb726 for more comments in this vein. As for a correlation structure, if x,y,z are independent normal random variables, then x+z and y+z are correlated normal random variables. You just need to generate a few extra variables to drive your correlation structure. Jerry " wrote: How can I use covariance in random number generation of a normal distribution? For example, I would use VBA to implement Box-Mueller using mean and std dev. I have implemented B-M in C. Can someone provide an algorithm (VBA, C or pseudocode) that incorporates covariance between two and among 3 or more random variables, each normally distributed? Is that called multivariate normal random number generation? FYI, This is for Monte Carlo simulation, if that matters. And I would be content with (even prefer) an algorithm that depends on a uniform RNG. Also, can I accomplish this without using VBA? For example, some people have suggested using NORMDIST(RAND(),mean,sd) as an RNG with normal distribution based on mean and std dev. I don't know how good that is and how that compares quality-wise to Box-Mueller, even assuming that RAND() is a good uniform RNG. (I believe some people question that assumption.) Frankly, I don't know how good Box-Mueller is quality-wise either, assuming a good uniform RNG to start with. I believe some people quibble over B-M as well. But I prefer to keep the discussion simple for now. I am interested in fundamental algorithms, not quibbles about perfection. |
#3
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
-
Browse to Google Groups and search for "excel correlated normal random" (without the quotes), and you'll get plenty of answers to your questions. - Mike www.mikemiddleton.com " wrote in message ... How can I use covariance in random number generation of a normal distribution? For example, I would use VBA to implement Box-Mueller using mean and std dev. I have implemented B-M in C. Can someone provide an algorithm (VBA, C or pseudocode) that incorporates covariance between two and among 3 or more random variables, each normally distributed? Is that called multivariate normal random number generation? FYI, This is for Monte Carlo simulation, if that matters. And I would be content with (even prefer) an algorithm that depends on a uniform RNG. Also, can I accomplish this without using VBA? For example, some people have suggested using NORMDIST(RAND(),mean,sd) as an RNG with normal distribution based on mean and std dev. I don't know how good that is and how that compares quality-wise to Box-Mueller, even assuming that RAND() is a good uniform RNG. (I believe some people question that assumption.) Frankly, I don't know how good Box-Mueller is quality-wise either, assuming a good uniform RNG to start with. I believe some people quibble over B-M as well. But I prefer to keep the discussion simple for now. I am interested in fundamental algorithms, not quibbles about perfection. |
#4
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
"Jerry W. Lewis" wrote:
I think you meant NORMINV(RAND()...) not NORMDIST(RAND()...) Yes. And thanks for your response. I was hoping to catch your attention :-). Also, immediately after I posted, I wondered if I am more interested in correlation (e.g. RSQ) than covariance. The non-academic material I am reading uses the terms interchangably. I'm afraid I fell into the same trap. Since they seem to be related, I hope it does not matter much; just a detail, I hope. As for a correlation structure, if x,y,z are independent normal random variables, then x+z and y+z are correlated normal random variables. You just need to generate a few extra variables to drive your correlation structure. I sense that this is very relevant and important to me. But I confess that you lost me completely. (My fault, not yours.) I have no idea what to do with the correlation "structure" (matrix?) in order to build the RNG. Can you be more specific? Perhaps a numerical example would help. The following is based on real data. I wish I had numbers that demonstrate closer correlation and negative correlation. But it is difficult to tweak the data to make that happen. I hope you can imagine that is the case. (And I hope the table is columnarized properly when I post this.) avg sd X 13.1% 18.9% Y 11.0% 11.9% Z 4.9% 2.1% rsq: X Y Z X 1 0.7221 0.0048 Y 0.7221 1 0.0030 Z 0.0048 0.0030 1 covar: X Y Z X 0.0228 0.0182 -0.0003 Y 0.0182 0.0135 -0.0001 Z -0.0003 -0.0001 0.0004 Ignoring the correlation factor, an RNG might be simply NORMINV(RAND(),avg,sd). Alternatively, the Box-Mueller algorithm is (simplified for illustration only): x1 = 2*rnd() - 1 x2 = (2*rnd() - 1) * sqrt(1 - x1^2) w = x1^2 + x2^2 return x1*sqrt(2*ln(w)/w) But if I simply do that for each of X, Y and Z and ignore correlation, I worry that the results of a simulation might not demonstrate the expected "negative" correlation (I wish) between Y and Z and the "close" correlation (I wish) between X and Y. So how do I change (or replace) the simple RNG to incorporate correlation? What would the Excel formula be? Or what would the VBA algorithm look like -- perhaps just a tweak of the B-M algorithm? |
#5
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
"Mike Middleton" wrote:
Browse to Google Groups and search for "excel correlated normal random" (without the quotes) I am glad I got your attention, too :-). Thanks. That does indeed provide more useful hits than the google search I had done earlier. It finds some useful papers, which I have yet to read -- including one that I had failed to bookmark some months ago and could not find again :-). And I am still plowing through the results to see if there are any other useful hits. In the meantime, I am particularly interested in the method you suggested in Jan05. It is simple enough even for me :-). Can you answer some questions? To generate random values from two correlated normal distributions X and Y, you suggested: z1 = norminv(rand(), 0, 1) z2 = norminv(rand(), 0, 1) x = meanX + stdevX*z1 y = meanY + stdevY*(z1*r + z2*sqrt(1-r^2)) 1. Did I translate your formulas correctly? I changed the notation slightly. 2. Does this method have a common name (a la "Box-Mueller" or "polar method")? It would be useful to know for my own documentation as well as if/when I discuss it with others. 3. I believe "rho" is sqrt(rsq(X,Y)) -- i.e. pearson(X,Y). Right? You had noted that "rho" must be between -1 and +1. 4. I believe the above works well for two random variables. What about for N random variables, N = 3? For example, for random variables X1, X2, X3 and X4, would I compute simply: z1 = norminv(rand(), 0, 1) z2 = norminv(rand(), 0, 1) z3 = norminv(rand(), 0, 1) z4 = norminv(rand(), 0, 1) x1 = meanX1 + stdevX1*z1 x2 = meanX2 + stdevX2*(z1*rX12 + z2*sqrt(1-rX12^2)) x3 = meanX3 + stdevX3*(z1*rX13 + z3*sqrt(1-rX13^2)) x4 = meanX4 + stdevX4*(z1*rX14 + z4*sqrt(1-rX14^2)) where, "rX12" is "rho" for X1 and X2; "rX13" is "rho" for X1 and X3; etc. The choice of the correlation with X1 for all cases seems somewhat arbitrary. I wonder if there are more terms or more complex terms in the multivariate case, or if the algorithm should be embellished in some other way. Thanks again for the pointer (google search). |
#6
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
The usual representation of Box-Muller is
http://www.taygeta.com/random/gaussian.html If X,Y, and Z are independent N(0,1), a*X+b*Y and c*Y+d*Z are distributed N(0,a^2+b^2) and N(0,c^2+d^2) with covariance b*c. Just choose a,b,c, and d that will give the desired covariance structure (which will be unchanged by constants to give the desired mean). More generally, if y is a column vector of n independent N(0,1) variables, and M is a matrix of constants with n columns, then MMULT(M,y) is multivariate normal with mean vector 0 and variance-covariance matrix MMULT(M,TRANSPOSE(M)). Again you can add in constants for the desired mean without violence to the covariance structure. Jerry " wrote: "Jerry W. Lewis" wrote: I think you meant NORMINV(RAND()...) not NORMDIST(RAND()...) Yes. And thanks for your response. I was hoping to catch your attention :-). Also, immediately after I posted, I wondered if I am more interested in correlation (e.g. RSQ) than covariance. The non-academic material I am reading uses the terms interchangably. I'm afraid I fell into the same trap. Since they seem to be related, I hope it does not matter much; just a detail, I hope. As for a correlation structure, if x,y,z are independent normal random variables, then x+z and y+z are correlated normal random variables. You just need to generate a few extra variables to drive your correlation structure. I sense that this is very relevant and important to me. But I confess that you lost me completely. (My fault, not yours.) I have no idea what to do with the correlation "structure" (matrix?) in order to build the RNG. Can you be more specific? Perhaps a numerical example would help. The following is based on real data. I wish I had numbers that demonstrate closer correlation and negative correlation. But it is difficult to tweak the data to make that happen. I hope you can imagine that is the case. (And I hope the table is columnarized properly when I post this.) avg sd X 13.1% 18.9% Y 11.0% 11.9% Z 4.9% 2.1% rsq: X Y Z X 1 0.7221 0.0048 Y 0.7221 1 0.0030 Z 0.0048 0.0030 1 covar: X Y Z X 0.0228 0.0182 -0.0003 Y 0.0182 0.0135 -0.0001 Z -0.0003 -0.0001 0.0004 Ignoring the correlation factor, an RNG might be simply NORMINV(RAND(),avg,sd). Alternatively, the Box-Mueller algorithm is (simplified for illustration only): x1 = 2*rnd() - 1 x2 = (2*rnd() - 1) * sqrt(1 - x1^2) w = x1^2 + x2^2 return x1*sqrt(2*ln(w)/w) But if I simply do that for each of X, Y and Z and ignore correlation, I worry that the results of a simulation might not demonstrate the expected "negative" correlation (I wish) between Y and Z and the "close" correlation (I wish) between X and Y. So how do I change (or replace) the simple RNG to incorporate correlation? What would the Excel formula be? Or what would the VBA algorithm look like -- perhaps just a tweak of the B-M algorithm? |
#7
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
Errata ("oh, for the love of P...!") ....
I wrote in one response: I wondered if I am more interested in correlation (e.g. RSQ) than covariance And in another response: I believe "rho" is sqrt(rsq(X,Y)) -- i.e. pearson(X,Y). I meant pearson() in both cases, which is neither rsq() nor sqrt(rsq()). I guess it is called r"sq" for a reason ;-). Klunk! Lo and behold, pearson() does give me the close(er) and negative correlations factors that I expected from the descriptions of the original data. pearson(X,Y) = 0.8497 pearson(X,Z) = -0.0695 pearson(Y,Z) = -0.0544 |
#8
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
-
(1) Looks OK for the two-variable situation. (2) I don't think there is a special name for these expressions, which are special cases of the general mulitvariate normal expressions. Box-Muller is a method for obtaining a single normal random number from two uniform random numbers. (3) OK. Rho is the correlation coefficient (Excel's CORREL and PEARSON worksheet functions). (4) Your four-variable expressions are not correct. The general case needs matrix notation involving all pairwise correlations. Here's some expressions (can't remember the source) for the special case of three normal random numbers: If x1, x2, and x3 are independent random variates, and c12, c13, and c23 are the desired correlations among them, then the correlated random variates v1, v2, and v3, can be calculated most easily as follows: v1 = x1 v2 = x1*c12 + x2*(1 - c12^2)^0.5 v3 = x1*c13 + x2*(c23 - c12*c13)/((1-c12^2)^0.5) + x3*[1-([c23 - c12*c13]^2)/(1-c12^2)-c13^2]^0.5 - Mike www.mikemiddleton.com " wrote in message ... "Mike Middleton" wrote: Browse to Google Groups and search for "excel correlated normal random" (without the quotes) I am glad I got your attention, too :-). Thanks. That does indeed provide more useful hits than the google search I had done earlier. It finds some useful papers, which I have yet to read -- including one that I had failed to bookmark some months ago and could not find again :-). And I am still plowing through the results to see if there are any other useful hits. In the meantime, I am particularly interested in the method you suggested in Jan05. It is simple enough even for me :-). Can you answer some questions? To generate random values from two correlated normal distributions X and Y, you suggested: z1 = norminv(rand(), 0, 1) z2 = norminv(rand(), 0, 1) x = meanX + stdevX*z1 y = meanY + stdevY*(z1*r + z2*sqrt(1-r^2)) 1. Did I translate your formulas correctly? I changed the notation slightly. 2. Does this method have a common name (a la "Box-Mueller" or "polar method")? It would be useful to know for my own documentation as well as if/when I discuss it with others. 3. I believe "rho" is sqrt(rsq(X,Y)) -- i.e. pearson(X,Y). Right? You had noted that "rho" must be between -1 and +1. 4. I believe the above works well for two random variables. What about for N random variables, N = 3? For example, for random variables X1, X2, X3 and X4, would I compute simply: z1 = norminv(rand(), 0, 1) z2 = norminv(rand(), 0, 1) z3 = norminv(rand(), 0, 1) z4 = norminv(rand(), 0, 1) x1 = meanX1 + stdevX1*z1 x2 = meanX2 + stdevX2*(z1*rX12 + z2*sqrt(1-rX12^2)) x3 = meanX3 + stdevX3*(z1*rX13 + z3*sqrt(1-rX13^2)) x4 = meanX4 + stdevX4*(z1*rX14 + z4*sqrt(1-rX14^2)) where, "rX12" is "rho" for X1 and X2; "rX13" is "rho" for X1 and X3; etc. The choice of the correlation with X1 for all cases seems somewhat arbitrary. I wonder if there are more terms or more complex terms in the multivariate case, or if the algorithm should be embellished in some other way. Thanks again for the pointer (google search). |
#9
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
"Mike Middleton" wrote:
(4) Your four-variable expressions are not correct. The general case needs matrix notation involving all pairwise correlations. As I expected. I just wanted confirmation because the OP in the Jan05 tried to draw the same simplifying (and hopeful) conclusion that I did, and no one ever corrected him. If x1, x2, and x3 are independent random variates, and c12, c13, and c23 are the desired correlations among them, then the correlated random variates v1, v2, and v3, can be calculated most easily as follows: v1 = x1 v2 = x1*c12 + x2*(1 - c12^2)^0.5 v3 = x1*c13 + x2*(c23 - c12*c13)/((1-c12^2)^0.5) + x3*[1-([c23 - c12*c13]^2)/(1-c12^2)-c13^2]^0.5 This is actually very helpful, if only for illustrative purposes. Could you provide the equations for __4__ independent random variates? It is not yet clear to me where this progression is headed. |
#10
![]()
Posted to microsoft.public.excel.misc
|
|||
|
|||
![]()
-
You wrote: Could you provide the equations for __4__ independent random variates? < No, I cannot. Perhaps you can find someone else to translate the general-case matrix operations into "equations." - Mike www.mikemiddleton.com " wrote in message ... "Mike Middleton" wrote: (4) Your four-variable expressions are not correct. The general case needs matrix notation involving all pairwise correlations. As I expected. I just wanted confirmation because the OP in the Jan05 tried to draw the same simplifying (and hopeful) conclusion that I did, and no one ever corrected him. If x1, x2, and x3 are independent random variates, and c12, c13, and c23 are the desired correlations among them, then the correlated random variates v1, v2, and v3, can be calculated most easily as follows: v1 = x1 v2 = x1*c12 + x2*(1 - c12^2)^0.5 v3 = x1*c13 + x2*(c23 - c12*c13)/((1-c12^2)^0.5) + x3*[1-([c23 - c12*c13]^2)/(1-c12^2)-c13^2]^0.5 This is actually very helpful, if only for illustrative purposes. Could you provide the equations for __4__ independent random variates? It is not yet clear to me where this progression is headed. |
Reply |
Thread Tools | Search this Thread |
Display Modes | |
|
|
![]() |
||||
Thread | Forum | |||
Seed numbers for random number generation, uniform distribution | Excel Discussion (Misc queries) | |||
Create a Graph with Random Number Generator | Charts and Charting in Excel | |||
Generating (in a random order)each number once from a given number | Excel Worksheet Functions | |||
Random Number Generator | Excel Worksheet Functions | |||
random number without repeating? | Excel Worksheet Functions |