Econometrics Lecture Notes: Monte Carlo simulation 1

Yüklə 20,51 Kb.

tarix	04.02.2018
ölçüsü	20,51 Kb.
	#24392

Econometrics Lecture Notes: Dummy variables Time Trends

Econometrics Lecture Notes: Monte Carlo simulation
In econometrics we frequently wish to explore the properties of an estimator. That is we are not so much interested in economic theory and estimate a relationship between Y and a vector of explanatory variables X. Instead we want to know what happens to our estimator as sample size increases, as the variance of the error term increases as the degree of serial correlation increases. In order to do this we make up the data set and make up an (economic?) relationship which we then proceed to estimate. An example will clarify, within the context of GLS and serial correlation:
Step 1: The model we specify as:
Y_t = 101 + 2.3X_1t-0.2X_2t + u_t (1)
Step 2: Specify n (sample size to be 100)
Step3: Generate data for X_1t. To do this we use a random number generator. That is we program the computer (within, e.g. RATS) to generate 100 numbers which in turn will be X₁₁, X₁₂, …….., X_1,100. We need to specify the mean (μ₁) and the variance (σ²₁)of these numbers: X₁ ~ N(μ₁, σ²₁). We now use computers to do this, early on it is possible a roulette wheel may have been used, hence the name. Now generate data for X₂t, t=1,..,100) using the same approach. You should probably specify a different mean and variance. We could also make the variable trended by e.g. generating X_t2,then calculating X_2t =X_2t+ 0.2*t; where t goes 1,2,3,………,100.
Step 4 Generate data for u_1t. This takes several stages. We assume that the error term displays first order positive serial correlation:
u_t = 0.7u_t-1 + ε_t (2)
where ε_tis a pure white noise error term. We need to specify its variance (mean will be zero) and again we use a random number generator where ε_t ~N(0, σ²_ε). Note N denotes the normal distribution, i.e. our error terms will be randomly drawn from a normal distribution with mean zero and variance σ²_ε. We tend to use the normal distribution, but you could use others. We now assume that the error term in period 1 u₁ was 0 and can now generate the error terms for periods 2 to 100, by inserting the lagged values for u_t-1 and the values for ε_t in equation 2. Excel would do this quite easily.
Step 5 We now know everything on the right hand side of equation (1). We can use this to generate values for Y_t, for all t.
Step 6 Estimate equation (1) using OLS, that is regress Y_t on a constant, X_1t and X_2t. Call this vector of estimator β_0OLS, β_1OLS,β_2OLS for the three coefficients respectively. (Note you will also get estimates for the variance of the error term and the first order correlation coefficient in (2) [in this case 0.7]). Now estimate equation (2) using GLS, where the coefficients will be termed β_0GLS, β_1GLS,β_2GLS, respectively. We could compare these coefficients with the known true values to see which is the closest. But with just one comparison to make luck will play a part. Hence:
Step 7 Go back to step 3 and repeat the whole process [some would argue that we should go to step 4, keep the X’s, change just the error term and hence Y]. Now do this a hundred times and calculate:

₁₀₀

Σ(β^j_0OLS– 101)/100.0

^j=1
₁₀₀

Σ(β^j_1OLS– 2.3)/100.0

^j=1
₁₀₀

Σ(β^j_2OLS– (-0.2))/100.0

^j=1
where the superscript j denotes the simulation from 1 to 100. This gives you the average bias on the 100 simulations for each of the three coefficients. For an unbiased estimator it should be close to zero. You should do the same for the GLS estimator. What would you expect to find? In general unbiased estimators (unless you added a lagged dependent variable to equation (1), when together with serial correlation you would get biased estimates when using OLS, but not GLS. However, the GLS estimator should be more efficient, on average it should be closer to the true values. To test this, you would calculate the variance of the 100 different estimates of the OLS estimates for each of the three coefficients and do the same with the GLS estimates.
8. This completes the Monte Carlo process, but you can continue. You could for example, see what happens as you increase the sample size (at step 2), increase the variance of the error term (at step 4), introduce heterogeneous error terms into the model, whereby the error term is correlated with one or more of the X’s.
2SLS. If we were doing a Monte Carlo simulation with a simultaneous equation system and comparing OLS with 2SLS, the methodology would be essentially the same. You would need to generate X values for all the right hand side in all the equations – remember that you need to specify a simultaneous system. You need also to generate values for all the error terms in all the equations. However, because, e.g., Y₁ would depend upon Y₂ we could not calculate Y₁in step 5 until we know Y₂. But similarly we could not generate Y₂ until we knew Y₁. hence we must calculate the Reduced Form equations, expressing Y₁ and Y₂ solely in terms of exogenous variables and error terms and calculate Y₁ and Y₂ that way. This has been done within the context of time series analysis because the basic example was of serial correlation. But the techniques can be applied equally to cross section problems.
Econometrics Lecture Notes: Dummy variables & Time Trends
Shift Dummy
In the regression Y_t = β₀ + β₁X_1t + β₂X_2t + β₃D_1t, estimated 1955:Q1 to 2003Q4
D_1t takes a value of 1 in 1974Q1-Q4 and 1979Q1-Q4, otherwise it takes a value of zero. If significant it implies that in those quarters Y was β₃ high(lower if negative) than in the rest of the sample. It implies something happened in those quarters to shift the relationship up or down. The quarters I have chosen coincide roughly with the two oil crises of the 1979. Another example I have used in a paper, was when a civil service strike significantly reduced the number of bankrupt firms, because the Inland revenue and Customs and Excise are major petitioners to the Courts of bankrupt firms. The significance of β₃ is therefore a test of a structural change in the periods specified. If D1 had taken the form, D_1t=0 for all periods until 1971q1 and then a value of one thereafter and was significant it would imply a permanent shift upwards/downwards of the relationship and be tantamount to testing for a structural break. In effect it changes the constant term. An alternative is called a:
Slope Dummy
In the regression Y_t = β₀ + β₁X_1t + β₂X_2t + β₃X_1tD_1t, estimated 1955:Q1 to 2003Q4
If significant it implies that the coefficient on X₁ was β₁ for the periods when D₁ was not operative and equal to β₁+ β₃ in the periods when D₁ was operative.
It is tremendously tempting at times when you look at a plot of the residuals and see large positive (negative) outliers in say three successive periods or to see what looks like a structural break to specify a dummy variable to pick this up. Some would regard this as a form of data mining, however, if you can find a plausible event to explain these shifts, it seems a case of when the data is informing the theory. But you must have a plausible explanation, simply including a dummy variable because it is significant is not acceptable.
Seasonal Dummies
When we use quarterly or monthly data we can expect there to be regular movements in the data, based on the different times of the year. (People consume more ice cream and lager in summer and possibly drink more coffee in winter). There are several ways of dealing with this in econometrics. We can use seasonally adjusted data, i.e. take data which (in general someone else) has employed some form of filter on to change the data, taking out of it regular seasonal movements. The problem is that they may have filtered out more than you would wish. Data is a scarce commodity, in general filtering removes part of the signal and the econometrician’s ability to discern the true DGP (data generating process). An alternative is to use seasonal dummies. Thus in the following regression with quarterly data
Y_t = β₀ + β₁X_1t + β₂X_2t + β₃SD_1t+ β₄SD_2t + β₅SD_3t
SD_1t,SD_2t,SD_3t are seasonal dummies, E.g:
Quarter SD_1t SD_2t SD_3t

…….

1974Q1 1 0 0

1974Q2 0 1 0

1974Q3 0 0 1

1974Q4 0 0 0

1975Q1 1 0 0

1975Q2 0 1 0

1975Q3 0 0 1

1975Q4 0 0 0

…….

…….
This indicates when estimated that Y_t is β₃ higher (lower) in period 1 than in period 4, β₄ higher (lower) in period 2 than in period 4 and β₅ higher (lower) in period 3 than in period 4. Period or quarter 4 is the reference quarter. None of the dummy variables are operative in period 4. If we had included a fourth seasonal dummy variable then in every quarter one of the dummies would have been 1, that is ΣSD_jt = 1 for all t (where the summation is across j and thus from 1 to 4). This coincides with the constant term and the regression would be exactly collinear we would not be able to invert (X’X) to obtain (X’X)^-1X’Y. Sometimes computers do give results, but they are nonsense. This is known as ‘the dummy variable trap’. There are other forms, e.g. if we divide people into rural, town and city dwellers and have dummies for all three, we run into the dummy variable trap.
There is a further method of dealing with seasonality. If we take the above equation and lag it four periods:
Y_t-4 = β₀ + β₁X_1t-4 + β₂X_2t-4 + β₃SD_1t-4+ β₄SD_2t-4 + β₅SD_3t-4
Subtractthis from the previous equation:
Yt -Yt_-4 = β_0-β₀ + β1[X_1t -X1_t-4] + β2[X_2t -X_2t-4] +
β3[SD_1t -SD_1t-4]+ β₄[SD₂-SD_2t-4] + β5[SD_3t - SD3_t-4]
= β1[X_1t -X1_t-4] + β2[X_2t -X_2t-4]
That is regressing annual changes of left hand side variable on right hand side variables [with no constant term] removes problem of seasonality.
Time Trend
In the Cobb Douglas production function:
Y_t = AK_t^L_t^βe^γt
We can take [natural] logs to base e:
lnY_t = lnA + lnK_t+ βlnL_t+ γt
t is an example of a time trend, it increases by 1 every period, 1,2,3,…………….,n. It reflects the impact on the left hand side variable of something which is changing in a steady manner over time and which we cannot otherwise model. In the above case it is the impact of productivity growth and γ is an estimate of productivity growth.

Yüklə 20,51 Kb.

Dostları ilə paylaş: