Binomial Proportion Monte Carlo
I prepared this lesson for the son of a seventh grade friend who was working on a project for a Math/Science Fair. It involves simulating the results of flips of a fair coin, where the estimated parameter is the probability of success. As you probably know, an estimated binomial probability is a consistent estimator, and his project should demonstrate this. In statistics, “Monte Carlo” is refers to using a mathematical model of something of interest (often a sampling distribution) along with random number generators to address the effects of varying assorted parameters.
Here we have the outcomes from 1,000,000 samples. P2 is the proportion of heads in each sample of two flips of a fair coin. As you can see, there is a lot of error in the estimation of the true proportion (.5). If you had just one sample, 50% of the time you would be off as much as possible (proportion = 0 or 1).
The FREQ Procedure
P2

Frequency

Percent

0

25045

25.05

0.5

49701

49.70

1

25254

25.25

Here we have the outcomes from 1,000,000 samples. P10 is the proportion of heads in each sample of ten flips of a fair coin. As you can see, there is much less error in the estimation of the true proportion (.5). If you had just one sample, 66% of the time you would be off by not more than .1 (.4 proportion .6).
P10

Frequency

Percent

Cumulative
Frequency

Cumulative
Percent

0

102

0.10

102

0.10

0.1

916

0.92

1018

1.02

0.2

4442

4.44

5460

5.46

0.3

11549

11.55

17009

17.01

0.4

20356

20.36

37365

37.37

0.5

24875

24.88

62240

62.24

0.6

20692

20.69

82932

82.93

0.7

11584

11.58

94516

94.52

0.8

4368

4.37

98884

98.88

0.9

1003

1.00

99887

99.89

1

113

0.11

100000

100.00

Here we also have the outcomes from 1,000,000 samples. P20 is the proportion of heads in each sample of twenty flips of a fair coin. As you can see, there is even less error in the estimation of the true proportion (.5). If you had just one sample, 74% of the time you would be off by not more than .1 (.4 proportion .6).
P20

Frequency

Percent

Cumulative
Frequency

Cumulative
Percent

0.05

2

0.00

2

0.00

0.1

26

0.03

28

0.03

0.15

123

0.12

151

0.15

0.2

452

0.45

603

0.60

0.25

1422

1.42

2025

2.03

0.3

3707

3.71

5732

5.73

0.35

7300

7.30

13032

13.03

0.4

11829

11.83

24861

24.86

0.45

15990

15.99

40851

40.85

0.5

17873

17.87

58724

58.72

0.55

16259

16.26

74983

74.98

0.6

11944

11.94

86927

86.93

0.65

7283

7.28

94210

94.21

0.7

3693

3.69

97903

97.90

0.75

1519

1.52

99422

99.42

0.8

447

0.45

99869

99.87

0.85

104

0.10

99973

99.97

0.9

27

0.03

100000

100.00

The MEANS Procedure
Variable

Std Dev


0.3546092

0.1577809

0.1115296


So, how much are we likely to be offer when we have just one sample of 2, 10, or 20 flips. We could compute the mean absolute deviation. This would involve finding, for each sample, the amount by which it was off (observed proportion minus .5) and then average those deviations, ignoring the sign of the deviations. A much more common procedure is to compute the standard deviation, which the square root of the mean squared deviation. This statistic is called the “standard error.” In the table above are the observed standard deviations for each of the three sample sizes. Notice that as the number of coin flips increases, the expected amount of error decreases. This is a very important property of estimators, a property known as “consistency.”
It is known that the standard error of a proportion is equal to , where p is the probability of success (heads in this case), q is the probability of failure (number of heads), and n is the sample size (number of coin flips). For n = 2, the standard error is . For n = 10, the standard error is . For n = 20, the standard error is . Notice that these theoretical values are exceptionally close to those observed. If we had all of eternity to get an infinite number of samples (and if p really were exactly .5), the observed standard errors would perfectly match the theoretical standard error.
SAS Code for This Monte Carlo
options formdlim='' pageno=min nodate;
DATA coinflips; drop I;
do Sample=1 to 100000;
array Ys[20] Y1Y20;
do I = 1 to 20;
Ys[I]=UNIFORM(0);
array Head[20] Head1Head20;
Head[I} = 0;
if Ys[I] >.5 then Head[I] = 1;
end; output; end;
*proc print;
Data Flips; set coinflips;
P2 = (Head1 + Head2)/2;
P10 = Sum(of Head1Head10)/10;
P20 = Sum(of Head1Head20)/20;
*proc print;
Proc Freq; Table P2 P10 P20 / plots=freqplot; run;
Proc Means std; var P2 P10 P20; run;
Dostları ilə paylaş: 