Binomial Proportion Monte Carlo
I prepared this lesson for the son of a seventh grade friend who was working on a project for a Math/Science Fair. It involves simulating the results of flips of a fair coin, where the estimated parameter is the probability of success. As you probably know, an estimated binomial probability is a consistent estimator, and his project should demonstrate this. In statistics, “Monte Carlo” is refers to using a mathematical model of something of interest (often a sampling distribution) along with random number generators to address the effects of varying assorted parameters.
Here we have the outcomes from 1,000,000 samples. P2 is the proportion of heads in each sample of two flips of a fair coin. As you can see, there is a lot of error in the estimation of the true proportion (.5). If you had just one sample, 50% of the time you would be off as much as possible (proportion = 0 or 1).
The FREQ Procedure
P2
|
Frequency
|
Percent
|
0
|
25045
|
25.05
|
0.5
|
49701
|
49.70
|
1
|
25254
|
25.25
|
Here we have the outcomes from 1,000,000 samples. P10 is the proportion of heads in each sample of ten flips of a fair coin. As you can see, there is much less error in the estimation of the true proportion (.5). If you had just one sample, 66% of the time you would be off by not more than .1 (.4 proportion .6).
P10
|
Frequency
|
Percent
|
Cumulative
Frequency
|
Cumulative
Percent
|
0
|
102
|
0.10
|
102
|
0.10
|
0.1
|
916
|
0.92
|
1018
|
1.02
|
0.2
|
4442
|
4.44
|
5460
|
5.46
|
0.3
|
11549
|
11.55
|
17009
|
17.01
|
0.4
|
20356
|
20.36
|
37365
|
37.37
|
0.5
|
24875
|
24.88
|
62240
|
62.24
|
0.6
|
20692
|
20.69
|
82932
|
82.93
|
0.7
|
11584
|
11.58
|
94516
|
94.52
|
0.8
|
4368
|
4.37
|
98884
|
98.88
|
0.9
|
1003
|
1.00
|
99887
|
99.89
|
1
|
113
|
0.11
|
100000
|
100.00
|
Here we also have the outcomes from 1,000,000 samples. P20 is the proportion of heads in each sample of twenty flips of a fair coin. As you can see, there is even less error in the estimation of the true proportion (.5). If you had just one sample, 74% of the time you would be off by not more than .1 (.4 proportion .6).
P20
|
Frequency
|
Percent
|
Cumulative
Frequency
|
Cumulative
Percent
|
0.05
|
2
|
0.00
|
2
|
0.00
|
0.1
|
26
|
0.03
|
28
|
0.03
|
0.15
|
123
|
0.12
|
151
|
0.15
|
0.2
|
452
|
0.45
|
603
|
0.60
|
0.25
|
1422
|
1.42
|
2025
|
2.03
|
0.3
|
3707
|
3.71
|
5732
|
5.73
|
0.35
|
7300
|
7.30
|
13032
|
13.03
|
0.4
|
11829
|
11.83
|
24861
|
24.86
|
0.45
|
15990
|
15.99
|
40851
|
40.85
|
0.5
|
17873
|
17.87
|
58724
|
58.72
|
0.55
|
16259
|
16.26
|
74983
|
74.98
|
0.6
|
11944
|
11.94
|
86927
|
86.93
|
0.65
|
7283
|
7.28
|
94210
|
94.21
|
0.7
|
3693
|
3.69
|
97903
|
97.90
|
0.75
|
1519
|
1.52
|
99422
|
99.42
|
0.8
|
447
|
0.45
|
99869
|
99.87
|
0.85
|
104
|
0.10
|
99973
|
99.97
|
0.9
|
27
|
0.03
|
100000
|
100.00
|
The MEANS Procedure
Variable
|
Std Dev
|
|
0.3546092
|
0.1577809
|
0.1115296
|
|
So, how much are we likely to be offer when we have just one sample of 2, 10, or 20 flips. We could compute the mean absolute deviation. This would involve finding, for each sample, the amount by which it was off (observed proportion minus .5) and then average those deviations, ignoring the sign of the deviations. A much more common procedure is to compute the standard deviation, which the square root of the mean squared deviation. This statistic is called the “standard error.” In the table above are the observed standard deviations for each of the three sample sizes. Notice that as the number of coin flips increases, the expected amount of error decreases. This is a very important property of estimators, a property known as “consistency.”
It is known that the standard error of a proportion is equal to , where p is the probability of success (heads in this case), q is the probability of failure (number of heads), and n is the sample size (number of coin flips). For n = 2, the standard error is . For n = 10, the standard error is . For n = 20, the standard error is . Notice that these theoretical values are exceptionally close to those observed. If we had all of eternity to get an infinite number of samples (and if p really were exactly .5), the observed standard errors would perfectly match the theoretical standard error.
SAS Code for This Monte Carlo
options formdlim='-' pageno=min nodate;
DATA coinflips; drop I;
do Sample=1 to 100000;
array Ys[20] Y1-Y20;
do I = 1 to 20;
Ys[I]=UNIFORM(0);
array Head[20] Head1-Head20;
Head[I} = 0;
if Ys[I] >.5 then Head[I] = 1;
end; output; end;
*proc print;
Data Flips; set coinflips;
P2 = (Head1 + Head2)/2;
P10 = Sum(of Head1-Head10)/10;
P20 = Sum(of Head1-Head20)/20;
*proc print;
Proc Freq; Table P2 P10 P20 / plots=freqplot; run;
Proc Means std; var P2 P10 P20; run;
Dostları ilə paylaş: |