A monte carlo experiment to study the curse of dimensionality in the multivariate probit model


Table 1: Overall summary of the simulation results



Yüklə 294,82 Kb.
səhifə3/3
tarix04.02.2018
ölçüsü294,82 Kb.
#24384
1   2   3

Table 1: Overall summary of the simulation results

 

Cross-sectional data, uncorrelated random coefficients

Cross-sectional data, correlated random coefficients

Panel data, uncorrelated random coefficients

Panel data, correlated random coefficients

MACML

GHK-Halton

GHK-SGI

MCMC

MACML

GHK-Halton

GHK-SGI

MCMC

MACML

GHK-Halton FIML

GHK-Halton CML

GHK-SGI CML

MCMC

MACML

GHK– Halton FIML

GHK-Halton CML

GHK-SGI CML

MCMC

Absolute Percentage Bias (APB)

All parameters

2.64

3.89

3.05

3.45

3.16

4.25

3.72

16.43

3.08

7.53

4.49

28.15

8.23

3.62

8.43

6.14

33.09

20.42

Mean parameters

0.65

1.25

2.27

4.23

0.71

0.85

0.8

1.12

1.63

3.42

2.13

22.35

6.48

2.23

3.86

2.89

24.39

3.45

Covariance parameters

3.30

4.77

3.31

3.19

3.98

5.38

4.69

21.53

3.56

8.90

5.25

30.08

8.81

4.08

9.95

7.22

35.99

26.08

Finite Sample Standard Error (FSSE)

All parameters

0.33

0.46

0.42

0.24

0.30

0.34

0.28

0.33

0.26

0.25

0.23

0.16

0.18

0.18

0.21

0.21

0.12

0.21

Mean parameters

0.22

0.28

0.26

0.25

0.28

0.32

0.26

0.35

0.19

0.21

0.19

0.14

0.17

0.19

0.22

0.22

0.13

0.23

Covariance parameters

0.37

0.52

0.47

0.24

0.31

0.35

0.29

0.32

0.28

0.26

0.24

0.16

0.18

0.18

0.20

0.20

0.12

0.2

Asymptotic Standard Error (ASE)

All parameters

0.33

0.46

0.44

0.23

0.25

0.36

0.27

0.28

0.22

0.25

0.23

0.16

0.17

0.17

0.22

0.20

0.16

0.19

Mean parameters

0.24

0.30

0.28

0.23

0.27

0.31

0.28

0.31

0.16

0.20

0.19

0.15

0.16

0.20

0.19

0.21

0.17

0.18

Covariance parameters

0.36

0.51

0.49

0.23

0.24

0.38

0.26

0.27

0.24

0.27

0.24

0.16

0.17

0.16

0.23

0.19

0.15

0.19

RMSE and Coverage Probability (CP)

RMSE

0.345

0.466

0.429

0.360

0.309

0.429

0.373

0.541

0.291

0.384

0.318

0.560

0.386

0.316

0.395

0.343

0.631

0.507

CP80%

92.12

89.95

90.47

90.14

88.28

86.49

86.12

57.25

80.42

63.73

74.01

52.46

59.23

75.86

58.17

71.53

49.55

55.46

Computation Time (minutes)

Convergence time

0.72

1.06

0.86

5.31

2.07

2.79

2.66

6.44

5.88

13.42

 6.75

14.48

7.02

8.73

24.53

9.76

27.36

8.97

ASE computation time

0.31

0.33

0.31

--

0.41

0.38

0.36

--

12.98

17.23

 13.85

18.29

--

14.36

20.45

 17.53

22.01

--

Total runtime

1.03

1.39

1.17

5.31

2.58

3.17

3.02

6.44

18.86

30.65

 20.60

32.77

7.02

23.09

44.98

 27.29

49.37

8.97




1 This is because the individual likelihood function can be written as the product of univariate cumulative normals integrated over an inside untruncated one-dimensional integral (to obtain the choice occasion-specific probability of the individual), followed by the product of all the choice occasion-specific probabilities across the choice occasions of the individual integrated over an outside untruncated K-dimensional integral space (see Equation (4) in Bhat and Sidharthan, 2011). Obviously, this way of integral evaluation in our simulation setting using the MMNP model basis is much easier to estimate than the 20-dimensional integral in the generic MNP model basis. However, we will use the generic MNP model basis here too as this is the conceptual (and general) basis for this paper.

2 In the CML approach for the panel case, we consider all pairings of the couplet probabilities within an individual (that is, we consider all 10 pairings across the five choice occasions of each individual; see Section 2 for details). However, the CML approach does not need all pairings. A subset of the authors is testing the consequence of using fewer pairings within each individual within the CML context (see Bhat, 2014 for additional details). Doing so can lead to substantial reductions in computation time beyond what is presented here for the MACML and other frequentist approaches.

3 Some of these assumptions may be relaxed to generate a variety of spatial/local dependence or time-varying coefficients (see Bhat, 2014).

4 The issue of the number of iterations in the simulation chain prior to convergence to the joint posterior distribution of parameters (that is, the “burn-in”) has received quite a bit of attention in the Bayesian estimation literature, with no clear consensus regarding the number of iterations that should be considered as “burn-in”. While some studies (see, for example, Johndrow et al., 2013, Burgette and Reiter, 2013, Wang et al., 2014) use a specific number (such as 1,000 or 3,000 or 10,000) for burn-in iterations, others (see, for example, Zhang et al., 2008) use a specific percentage to arrive at this number (such as 1% or 10% of the total number of iterations of the sampler used in the estimation). Some studies (see, for example, Gelman and Shirley, 2011) even question the use of burn-in iterations. While we do not intend to address this “burn-in” issue in the current paper, we will say that we varied the burn-in from 500 to 1000 to 10,000 for a select sample of estimation runs across different data generation cases, and found little impact on the metrics used to assess accuracy and precision of parameter recovery.

5 The detailed results for all the cases are available in an online appendix at: http://www.caee.utexas.edu/prof/bhat/ABSTRACTS/SimEval/Appendix.pdf.

6 This significant drop in the GHK-SGI performance from the cross-sectional to panel case may be attributed to one or more of three different factors: (1) due to an increase in the dimension of integration (recall that the panel models using the CML approach in this paper involve 8-dimensional integrals, while the cross-sectional models involve 4-dimensional integrals), (2) due to the change in the nature of the dataset (cross-sectional to panel), and (3) due to any potential difficulty of using SGI approach in conjunction with the CML method. To disentangle these effects, we conducted additional simulation experiments with the GHK-SGI method. Specifically, we estimated models on simulated data for cross-sectional MNP with uncorrelated random parameters for seven choice alternatives (dimension of integration equals 6) and 9 choice alternatives (dimension of integration equals 8), respectively, with the same simulation configuration as discussed earlier. The overall APB values for the 6 and 8 dimensional integration cases (with the new cross-sectional data) were 6.59 and 12.30, respectively (and the overall APB for the 4 dimensional uncorrelated cross-sectional case is 3.05; see Table 1). These results indicate that the ability of the GHK-SGI method to recover true parameters degrades quickly after 4 or 5 dimensions (another recent study by Abay, 2015 confirms this trend). It is worth noting, however, that the panel data model integrals of 8-dimensions (as in Table 1) show a much poorer performance (APB values are around 30%) compared to cross-sectional data models of the same dimension. This could be due to evaluation of a greater number of 8 dimensional integrals in the panel datasets estimated using CML approach. That is, for a cross-sectional dataset with 2500 observations and 9 alternatives, we evaluate a total of twenty-five hundred 8-dimensional integrals, while for a panel dataset with 500 observations with 5 choice occasions, we evaluate a total of five thousand 8-dimensional integrals. Therefore, it appears that the performance of the SGI method degrades quickly with the dimensionality of integration as well as with the number of integrals evaluated (in this case the number of 8-dimensional integrals doubled due to the CML approach). However, further research is required to fully disentangle the impact of the nature of the dataset and dimension of integration on the performance of the SGI method.

7 While the sampling distribution (whose standard deviation is represented by FSSEs) is not a Bayesian concept, one may invoke the Bernstein–von Mises Theorem (see Train, 2009, pp. 288) that the posterior distribution of each parameter converges to a normal distribution with the same variance as that of the maximum likelihood estimator (frequentist estimator, to be more inclusive) to use the FSSE values for assessing the empirical efficiency of the MCMC estimator.


Yüklə 294,82 Kb.

Dostları ilə paylaş:
1   2   3




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə