Example: Using sas data step to do Monte Carlo Integration



Yüklə 49,26 Kb.
tarix04.02.2018
ölçüsü49,26 Kb.
#24397

IES 612/STA 4-573/STA 4-576

Spring 2005


Week 14 – IES612-week14-lecture.doc
Monte Carlo Simulation Methods
1. Describe all of the outcomes of an experiment.

2. Characterize the probabilities associated with each outcome.

3. Match these probabilities up with what is produced by some random number generator.

4. Generate a large number of random experiments according to this rule.


Examples:

1. Estimating  using MC methods

2. Sex ratio in family with 4 kids

3. Impact of violating the equal variance assumption in a 2 group t-test

EXAMPLE: Using SAS data step to do Monte Carlo Integration
/*

Problem: Estimate PI using Monte Carlo Integration

Strategy: Equation of a circle with radius=1: x2 + y2 = 1

which can be written y = sqrt(1-x2)

Area of this circle = 

Area of this circle in the first quadrant = /4


Generate Ux ~ Uniform(0,1) and Uy ~ Uniform(0,1)

Check to see if Uy <= sqrt(1-Ux2)

The proportion of generated points when this

Condition is true is an estimate of /4.

*/
data MCint;
/* initialize seed */

seed1 = 12345;


do isim = 1 to 10000;
/* generate point in first quadrant */

Ux = ranuni(seed1);

Uy = ranuni(seed1);
/* check to see if point under the circle */

Under = (Uy <= sqrt(1-Ux**2));


/* output simulation result */

drop isim;

output;

end;
* ODS TRACE ON;



ODS OUTPUT OneWayFreqs=data_freq;

proc freq;

table Under;

run;

ODS OUTPUT CLOSE;

* ODS TRACE OFF;
proc print data=data_freq; run;
data summary; set data_freq;

if under = 1;

PI_est = 4*Percent/100;

prop_est = Percent/100;

SE_PI_EST = 4*sqrt(prop_est * (1 – prop_est)/10000 );

PI_CI_Low = PI_est – 2*SE_PI_EST;

PI_CI_Up = PI_est + 2*SE_PI_EST;

put ‘-----------------------------------------------‘;

put ‘| MC Integration estimate of PI |’;

put ‘-----------------------------------------------‘;

put ‘ PI (estimate) = ‘ PI_est;

put ‘SE [PI (estimate)] = ‘ SE_PI_EST;

put ‘ Approx. 95% CI = ‘ PI_CI_Low ‘, ‘ PI_CI_Up;

put ‘ ‘;

put ‘Based on 10,000 simulated points. ‘;

put ‘-----------------------------------------------‘;

run;
From the SAS LOG file


-----------------------------------------------

| MC Integration estimate of PI |

-----------------------------------------------

PI (estimate) = 3.1056


SE [PI (estimate)] = 0.0166662792

Approx. 95% CI = 3.0722674415 , 3.1389325585
Based on 10,000 simulated points.

-----------------------------------------------
ODS RTF file='D:\baileraj\Classes\Fall 2003\sta402\SAS-programs\week6-MC-fig.rtf'

proc gplot data=MCint;

plot Uy*Ux=Under;

run;

ODS RTF CLOSE;

EXAMPLE: Simulating sex ratio in family with 4 kids


data monte2;

title2 “Problem 2: #boys & girls in families of size 4”;


array x x1-x4; * array containing family of 4 kids;

do j=1 to 1000; * generate 1000 familes;

numboys = 0; * initialize counters of #boys and #girls;

numgirls=0;

do over x; * generate a family of 4 kids;

p = ranuni(0);


if p<.5 then do;

numboys = numboys + 1;

x = 0;

end;


else do;

numgirls = numgirls + 1;

x = 1;

end;
end; * end of the loop for KIDS;



outcome = 0;

if numboys=numgirls then outcome=1;

drop j p;

output;


end; * end of the loop for FAMILIES;
proc print;

proc freq;

table outcome numboys*numgirls;

run;


Obs x1 x2 x3 x4 numboys numgirls outcome
1 1 0 0 0 3 1 0

2 0 0 0 1 3 1 0

3 0 0 0 1 3 1 0

4 0 0 1 1 2 2 1

5 0 1 0 1 2 2 1

6 1 1 0 1 1 3 0

7 1 0 0 1 2 2 1

8 1 1 1 1 0 4 0

9 1 0 1 0 2 2 1

10 0 0 0 0 4 0 0

11 0 0 0 1 3 1 0

. . .


990 0 0 1 0 3 1 0

991 1 1 1 0 1 3 0

992 0 0 1 1 2 2 1

993 1 1 1 0 1 3 0

994 1 0 1 0 2 2 1

995 1 1 0 0 2 2 1

996 0 1 0 0 3 1 0

997 1 0 1 1 1 3 0

998 0 0 0 1 3 1 0

999 0 0 1 0 3 1 0

1000 1 1 0 0 2 2 1
The FREQ Procedure
Cumulative Cumulative

outcome Frequency Percent Frequency Percent

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

0 620 62.00 620 62.00

1 380 38.00 1000 100.00

Table of numboys by numgirls


numboys numgirls
Frequency‚

Percent ‚

Row Pct ‚

Col Pct ‚ 0‚ 1‚ 2‚ 3‚ 4‚ Total

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

0 ‚ 0 ‚ 0 ‚ 0 ‚ 0 ‚ 60 ‚ 60

‚ 0.00 ‚ 0.00 ‚ 0.00 ‚ 0.00 ‚ 6.00 ‚ 6.00

‚ 0.00 ‚ 0.00 ‚ 0.00 ‚ 0.00 ‚ 100.00 ‚

‚ 0.00 ‚ 0.00 ‚ 0.00 ‚ 0.00 ‚ 100.00 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

1 ‚ 0 ‚ 0 ‚ 0 ‚ 235 ‚ 0 ‚ 235

‚ 0.00 ‚ 0.00 ‚ 0.00 ‚ 23.50 ‚ 0.00 ‚ 23.50

‚ 0.00 ‚ 0.00 ‚ 0.00 ‚ 100.00 ‚ 0.00 ‚

‚ 0.00 ‚ 0.00 ‚ 0.00 ‚ 100.00 ‚ 0.00 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

2 ‚ 0 ‚ 0 ‚ 380 ‚ 0 ‚ 0 ‚ 380

‚ 0.00 ‚ 0.00 ‚ 38.00 ‚ 0.00 ‚ 0.00 ‚ 38.00

‚ 0.00 ‚ 0.00 ‚ 100.00 ‚ 0.00 ‚ 0.00 ‚

‚ 0.00 ‚ 0.00 ‚ 100.00 ‚ 0.00 ‚ 0.00 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

3 ‚ 0 ‚ 263 ‚ 0 ‚ 0 ‚ 0 ‚ 263

‚ 0.00 ‚ 26.30 ‚ 0.00 ‚ 0.00 ‚ 0.00 ‚ 26.30

‚ 0.00 ‚ 100.00 ‚ 0.00 ‚ 0.00 ‚ 0.00 ‚

‚ 0.00 ‚ 100.00 ‚ 0.00 ‚ 0.00 ‚ 0.00 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

4 ‚ 62 ‚ 0 ‚ 0 ‚ 0 ‚ 0 ‚ 62

‚ 6.20 ‚ 0.00 ‚ 0.00 ‚ 0.00 ‚ 0.00 ‚ 6.20

‚ 100.00 ‚ 0.00 ‚ 0.00 ‚ 0.00 ‚ 0.00 ‚

‚ 100.00 ‚ 0.00 ‚ 0.00 ‚ 0.00 ‚ 0.00 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

Total 62 263 380 235 60 1000

6.20 26.30 38.00 23.50 6.00 100.00

EXAMPLE: Is the t-test really robust to violating the equal variance assumption?
/* Problem: Explore whether t-test really is robust to

violations of the equal variance assumption


Strategy: See if the t-test operates at the nominal

Type I error rate when the unequal variance

assumption is violated
Test case: n1=n2=10

Population 1: N(0,1)

Population 2: N(0,4)

*/


data twogroup;
array x{10} x1-x10;

array y{10} y1-y10;
do isim = 1 to 10000;



/* generate samples X~N(0,1) Y~N(0,4) - normal case */

do isample = 1 to 10;

x{isample} = rannor(0);

y{isample} = 2*rannor(0);

end;
/* calculate the t-statistic */

xbar = mean(of x1-x10);

ybar = mean(of y1-y10);
xvar = var(of x1-x10);

yvar = var(of y1-y10);
s2p = (9*xvar + 9*yvar)/18;
tstat = (xbar-ybar)/sqrt(s2p*(2/10));

Pvalue = 2*(1-probt(abs(tstat),18));

Reject05 = (Pvalue <= 0.05);
keep xbar ybar xvar yvar s2p tstat Pvalue Reject05;

output;

end; * end of the simulation loop;
/*

proc print;

run;

*/
proc freq;

table Reject05;

run;
Cumulative Cumulative

Reject05 Frequency Percent Frequency Percent

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

0 9443 94.43 9443 94.43

1 557 5.57 10000 100.00
So, a nominal error rate of 5% was specified and we rejected 5.57% of the time in the

10000 simulated samples.




IES 612 Monte Carlo Homework


1. Estimate the area under a standard normal curve [i.e. N(0,1)] between -2 and 2 using Monte Carlo Techniques. Place a bound on the error of estimation.
2. Randomly break a stick at some point along its length. Form the ratio of the length of the long end to the short end. Describe the distribution of this ratio. (Some relevance to ideas in Linkage groups in genetics.)
3. A Different Kind of Gambling (from How to Model It)

You are a good student, but your performance in examinations depends very much on your mood.


You tend to be in a good mood 4 days out of 6. When you are in a good mood, there is a 0.9 probability that you will score 100 percent in an examination and only a 0.1 probability that you will score 80 percent.
When you are in a bad mood there is a 0.2 probability that you will score 100 percent, a 0.3 probability of scoring 80 percent, and a 0.5 probability of scoring only 50 percent.
Your teacher offers you a choice. Either you write two equally weighted examinations (a midquarter and final) or else you write ten equally weighted tests (one each week).
Analyze this proposition and explain your choice.
4. Does a confidence interval for the mean of exponentially distributed quantity based upon a normal interval [± t(1-/2)*s/√ n ] cover the mean with the stated confidence coefficient 1-?
5. Regular, Aggregated or Completely Spatially Random?
Data were gathered on two plots of trees. For ease of manipulation, assume the plots were square and thus tree locations in the squares could be scaled to be between 0 and 1. Let the South-West corner of the plot correspond to the "origin" of the plot [ the point (0,0) on a coordinate system scaled over a unit square ] and let the North-East corner correspond to the point farthest away from the origin in the plot [ the point (1,1) on a coordinate system scaled over a unit square ].
One measure of spatial relationships is constructed from measuring distances to nearest neighbors. The distance between two points (x1, y1) and (x2, y2) can be found by d = . The Nearest Neighbor distance to a particular tree is the smallest of all of the distances from this tree to all other trees. One measure of tree closeness might be the average Nearest Neighbor distance. If this average is too "small", then this suggests aggregation. If this average is too "large", then this suggests a regular pattern. Use Monte Carlo simulation methods to evaluate the patterns in the two plots of trees illustrated below. Do you have evidence of aggregation or regular patterns in either of these two plots?
In plot 1, 4 trees were located at points (.25,.75), (.75, .75), (.25,.25), and (.75, .25).

In plot 2, 4 trees were located at points (.25,.75), (.30, .85), (.20, .80), and (.22, .70).




Yüklə 49,26 Kb.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə