Binomial Proportion Monte Carlo

Yüklə 77,96 Kb.

tarix	05.10.2018
ölçüsü	77,96 Kb.
	#72181

Variable Std Dev
DATA
Freq

Binomial Proportion Monte Carlo

I prepared this lesson for the son of a seventh grade friend who was working on a project for a Math/Science Fair. It involves simulating the results of flips of a fair coin, where the estimated parameter is the probability of success. As you probably know, an estimated binomial probability is a consistent estimator, and his project should demonstrate this. In statistics, “Monte Carlo” is refers to using a mathematical model of something of interest (often a sampling distribution) along with random number generators to address the effects of varying assorted parameters.

Here we have the outcomes from 1,000,000 samples. P2 is the proportion of heads in each sample of two flips of a fair coin. As you can see, there is a lot of error in the estimation of the true proportion (.5). If you had just one sample, 50% of the time you would be off as much as possible (proportion = 0 or 1).

The FREQ Procedure

P2	Frequency	Percent
0	25045	25.05
0.5	49701	49.70
1	25254	25.25

Here we have the outcomes from 1,000,000 samples. P10 is the proportion of heads in each sample of ten flips of a fair coin. As you can see, there is much less error in the estimation of the true proportion (.5). If you had just one sample, 66% of the time you would be off by not more than .1 (.4  proportion  .6).

P10	Frequency	Percent	Cumulative Frequency	Cumulative Percent
0	102	0.10	102	0.10
0.1	916	0.92	1018	1.02
0.2	4442	4.44	5460	5.46
0.3	11549	11.55	17009	17.01
0.4	20356	20.36	37365	37.37
0.5	24875	24.88	62240	62.24
0.6	20692	20.69	82932	82.93
0.7	11584	11.58	94516	94.52
0.8	4368	4.37	98884	98.88
0.9	1003	1.00	99887	99.89
1	113	0.11	100000	100.00

Here we also have the outcomes from 1,000,000 samples. P20 is the proportion of heads in each sample of twenty flips of a fair coin. As you can see, there is even less error in the estimation of the true proportion (.5). If you had just one sample, 74% of the time you would be off by not more than .1 (.4  proportion  .6).

P20	Frequency	Percent	Cumulative Frequency	Cumulative Percent
0.05	2	0.00	2	0.00
0.1	26	0.03	28	0.03
0.15	123	0.12	151	0.15
0.2	452	0.45	603	0.60
0.25	1422	1.42	2025	2.03
0.3	3707	3.71	5732	5.73
0.35	7300	7.30	13032	13.03
0.4	11829	11.83	24861	24.86
0.45	15990	15.99	40851	40.85
0.5	17873	17.87	58724	58.72
0.55	16259	16.26	74983	74.98
0.6	11944	11.94	86927	86.93
0.65	7283	7.28	94210	94.21
0.7	3693	3.69	97903	97.90
0.75	1519	1.52	99422	99.42
0.8	447	0.45	99869	99.87
0.85	104	0.10	99973	99.97
0.9	27	0.03	100000	100.00

The MEANS Procedure

Variable

Std Dev

P10

P20

0.3546092

0.1577809

0.1115296

So, how much are we likely to be offer when we have just one sample of 2, 10, or 20 flips. We could compute the mean absolute deviation. This would involve finding, for each sample, the amount by which it was off (observed proportion minus .5) and then average those deviations, ignoring the sign of the deviations. A much more common procedure is to compute the standard deviation, which the square root of the mean squared deviation. This statistic is called the “standard error.” In the table above are the observed standard deviations for each of the three sample sizes. Notice that as the number of coin flips increases, the expected amount of error decreases. This is a very important property of estimators, a property known as “consistency.”

It is known that the standard error of a proportion is equal to , where p is the probability of success (heads in this case), q is the probability of failure (number of heads), and n is the sample size (number of coin flips). For n = 2, the standard error is . For n = 10, the standard error is . For n = 20, the standard error is . Notice that these theoretical values are exceptionally close to those observed. If we had all of eternity to get an infinite number of samples (and if p really were exactly .5), the observed standard errors would perfectly match the theoretical standard error.

SAS Code for This Monte Carlo

options formdlim='-' pageno=min nodate;

DATA coinflips; drop I;

do Sample=1 to 100000;

array Ys[20] Y1-Y20;

do I = 1 to 20;

Ys[I]=UNIFORM(0);

array Head[20] Head1-Head20;

Head[I} = 0;

if Ys[I] >.5 then Head[I] = 1;

end; output; end;

*proc print;

Data Flips; set coinflips;

P2 = (Head1 + Head2)/2;

P10 = Sum(of Head1-Head10)/10;

P20 = Sum(of Head1-Head20)/20;

*proc print;

Proc Freq; Table P2 P10 P20 / plots=freqplot; run;

Proc Means std; var P2 P10 P20; run;

Yüklə 77,96 Kb.

Dostları ilə paylaş: