A monte carlo experiment to study the curse of dimensionality in the multivariate probit model

Yüklə 294,82 Kb.

səhifə	1/3
tarix	04.02.2018
ölçüsü	294,82 Kb.
	#24384

1 2 3

Simulation Evaluation of Emerging Estimation Techniques for Multinomial Probit Models

Priyadarshan N. Patil

The University of Texas at Austin

Department of Civil, Architectural and Environmental Engineering

301 E. Dean Keeton St. Stop C1761, Austin, TX 78712, USA

Phone: 1-512-471-4535, Email: priyadarshan@utexas.edu

Subodh K. Dubey

University of South Australia

Institute for Choice

140 Arthur St, Level 13, North Sydney, NSW 2060, Australia

Phone: +61-426-580-105, Email: subbits@gmail.com

Abdul R. Pinjari

University of South Florida

Department of Civil and Environmental Engineering

4202 E Fowler Ave, ENB 118, Tampa, FL 33620, USA

Phone: 1-813-974-9671, Email: apinjari@usf.edu

Elisabetta Cherchi (corresponding author)

Newcastle University

School of Civil Engineering and Geosciences

Transport Operations Research Group (TORG)

Cassie Building, Room 2.21, Newcastle upon Tyne, NE1 7RU, UK

Phone: +44 (0) 191 208 3501, Email: Elisabetta.Cherchi@newcastle.ac.uk

Ricardo Daziano

Cornell University

School of Civil and Environmental Engineering

305 Hollister Hall, Ithaca NY 14853, USA

Phone: 1-607-255-2018, Email: daziano@cornell.edu

Chandra R. Bhat

The University of Texas at Austin

Department of Civil, Architectural and Environmental Engineering

301 E. Dean Keeton St. Stop C1761, Austin, TX 78712, USA

Phone: 1-512-471-4535, Email: bhat@mail.utexas.edu

ABSTRACT

A simulation evaluation is presented to compare alternative estimation techniques for a five-alternative multinomial probit (MNP) model with random parameters, including cross-sectional and panel datasets and for scenarios with and without correlation among random parameters. The different estimation techniques assessed are: (1) The maximum approximate composite marginal likelihood (MACML) approach; (2) The Geweke-Hajivassiliou-Keane (GHK) simulator with Halton sequences, implemented in conjunction with the composite marginal likelihood (CML) estimation approach; (3) The GHK approach with sparse grid nodes and weights, implemented in conjunction with the composite marginal likelihood (CML) estimation approach; and (4) a Bayesian Markov Chain Monte Carlo (MCMC) approach. In addition, for comparison purposes, the GHK simulator with Halton sequences was implemented in conjunction with the traditional, full information maximum likelihood approach as well. The results indicate that the MACML approach provided the best performance in terms of the accuracy and precision of parameter recovery and estimation time for all data generation settings considered in this study. For panel data settings, the GHK approach with Halton sequences, when combined with the CML approach, provided better performance than when implemented with the full information maximum likelihood approach, albeit not better than the MACML approach. The sparse grid approach did not perform well in recovering the parameters as the dimension of integration increased, particularly so with the panel datasets. The Bayesian MCMC approach performed well in datasets without correlations among random parameters, but exhibited limitations in datasets with correlated parameters.

Keywords: Discrete choice, GHK simulator, Sparse grid integration, composite marginal likelihood (CML) method, MACML estimation, Bayesian Markov Chain Monte Carlo (MCMC).

INTRODUCTION

Two general econometric model structures have been commonly used in the literature for random utility maximization (RUM)-based discrete choice analysis. These are: (1) the mixed multinomial logit (MMNL) model (McFadden and Train, 2000) and (2) the multinomial probit (MNP) model (Daganzo, 1979).

The MMNL model is typically estimated using the MSL approach, whose desirable asymptotic properties are obtained at the expense of computational cost (because the number of simulation draws has to rise faster than the square root of the number of observations used for estimation). Unfortunately, in situations where the dimensionality of integration is high, such as when spatial/social dependencies are of interest or when considering multi-level (e.g., intra- and inter-individual) unobserved variations in parameters, the computational cost to ensure good asymptotic estimator properties can be prohibitive or, sometimes, simply impractical. Moreover, the MSL estimation and inference can be affected by simulation noise, which might cause problems ranging from non-convergence to inaccuracy and/or non-inversion of the Hessian of the log-likelihood function. Yet, the MSL continues to be the inference approach of choice for MMNL model estimation.

In contrast to the MMNL model, the MNP model has seen relatively little use in the past couple of decades, mainly because its likelihood function involves a truncated multivariate integral (i.e., the cumulative multivariate normal (MVN) function) that is generally more difficult to evaluate using simulation methods compared to the untruncated multivariate integration in the MMNL model. Many studies in the 1990s and earlier on estimating MNP focused on simulation-based estimation, leading up to important advances, including the well known Geweke-Hajivassiliou-Keane (GHK) approach. These studies (for example, Hajivassiliou et al., 1996) demonstrated that the GHK outperformed many other simulation based approaches at that time. As a result, the GHK approach is by far the most commonly used to estimate MNP models. It is worth noting, however, that the GHK is an MSL inference procedure and faces the same problems discussed above in the context of the MSL estimation of the MMNL model. The dimensionality of integration in the MNP model choice probability expressions depends on the number of choice alternatives (and the number of choice occasions per individual in the case of panel data with a general error structure specification). Therefore, the computational cost increases significantly as the number of choice alternatives (or the number of choice occasions per individual) increases. Besides, the GHK simulator is perceived to be relatively more difficult to understand and implement than the MSL simulator for the MMNL (see Train, 2009).

Despite the considerations discussed above, there has been continued interest in MNP for a variety of reasons. The MNP model can indeed be more parsimonious (computationally) than the MMNL in many situations, such as when the number of random coefficients is much more than the number of alternatives (and when the random coefficients are normally distributed). This is because the MNP likelihood function can be expressed as an integral whose dimensionality does not depend on the number of random coefficients in the specification. Besides, in some contexts, the MVN distributional assumption of the MNP may carry better appeal than the extreme value (or multivariate extreme value) distribution used in logit-based models. For example, in social or spatial interaction models, it is much easier to specify parsimonious correlation structures using the MNP kernel than the logit kernel, primarily because of the conjugate nature of the multivariate normal distribution under affine transformations. This is reflected in the almost exclusive use of the MNP kernel for discrete choice models with spatial/social dependence (see a review in Bhat, 2015). Moreover, more recently, there has been a renewed excitement in revisiting the estimation of MNP models using a variety of different methods to approximate or simulate the MVN integrals. Most of these methods can be classified into one of the following three broad categories, each of which is discussed in the following section: (1) Improvements to numerical quadrature methods such as the sparse grid integration (SGI)-based quadrature methods advanced by Heiss and Winschel (2008) and Heiss (2010), (2) Bhat’s (2011) maximum approximate composite marginal likelihood (MACML) method, which combines the use of analytic approximations to the MVN integral, as opposed to simulation or numerical evaluation, with a composite marginal likelihood (CML) framework and (3) Advances in Bayesian Markov Chain Monte Carlo (MCMC) methods, particularly those using data augmentation techniques (McCulloch et al., 2000; Imai and van Dyk, 2005).

1.1 The Current Research

Given the increasing interest in MNP models and the emergence of new methods to estimate these models, it is timely to evaluate and compare the performance of different estimation methods available in the literature. Most techniques mentioned above have been compared with traditional frequentist simulation-based approaches, particularly the simulation-based GHK approach (Heiss, 2010; Abay, 2015) or the mixed probit MSL approach (Bhat and Sidharthan, 2011). Some efforts have solely focused on the accuracy of evaluating MVN integrals without examining parameter estimation (Sándor and András, 2004; Connors et al., 2014). To our knowledge, little exists on a comprehensive comparison of the recently emerging methods for MNP estimation in terms of different metrics of importance – the accuracy of parameter recovery, precision of parameter recovery, and the estimation time. The objective of this paper is to fill this gap. Specifically, we compare the following approaches to estimate MNP models:

The MACML approach (as in Bhat, 2011);
The SGI-based quadrature method embedded into the GHK approach, labeled the GHK-SGI method, and used in conjunction with the CML estimation approach;
The GHK-simulator using quasi Monte Carlo draws from Halton sequences (Bhat, 2001; Bhat, 2003), labeled the GHK-Halton method in the rest of the paper. We used this method in conjunction with the traditional, full information maximum likelihood (FIML) approach as well as the CML approach; and
The Bayesian MCMC approach with data augmentation (as in McCulloch et al., 2000).

To undertake the comparison among these approaches, we conduct simulation experiments with synthetic datasets for a five-alternative MNP model with five random coefficients (in the rest of this paper, the number of choice alternatives is denoted by I and the number of random coefficients is denoted by K; so I=5 and K=5), for scenarios with and without correlation among random parameters, and for both cross-sectional and panel data settings in an aspatial context. For panel (or repeated choice) data, we simulate five choice occasions per individual. Subsequently, we evaluate the relative merits of the different estimation techniques for MNP model estimation.

In addition to the above discussed methods, we explored Heiss’s (2010) modified version of the GHK-SGI method, where he implements the SGI in conjunction with an efficient importance sampling (EIS) technique in the GHK approach. However, our experiments with this approach were not successful, with most estimation attempts encountering convergence problems. After a brief description of this method in Section 2.1.3, we discuss the reasons why the method may not have worked in our context.

A few important caveats here in terms of our study. In our simulation design, we will assume independence in the utility kernel error terms across alternatives at each choice occasion. Technically, in the cross-sectional case, one can then view the model as a mixed multinomial probit (MMNP) model for estimation in which we write the likelihood function as the product of univariate cumulative normal functions integrated over an untruncated -dimensional (i.e, 6-dimensional) integral space (see Equation (2) in Bhat and Sidharthan, 2011). However, the estimation is easier done using the traditional MNP model basis that involves only a truncated (I-1)-dimensional (i.e, 4-dimensional) integral space (see next section). So, we use the MNP model basis. In the panel case, with a general error structure with individual-specific heterogeneity, choice occasion heterogeneity (or intra-individual variations in taste sensitivity across choice occasions), as well as a general covariance structure across the utilities of alternatives at each choice occasion, the result is an (I-1)*5-dimensional (i.e., 20-dimensional) integral for evaluating MNP probabilities in the likelihood function. In spirit, we will assume this general structure as the model basis for evaluating different estimation procedures, even though we only use simpler versions of this structure (that is, only assume individual-specific heterogeneity in the random coefficients) in the simulation design itself. Technically, in the panel case, assuming only individual-specific heterogeneity simplifies the likelihood function when viewed as an MMNP model for estimation.¹

For all the frequentist approaches tested on panel data in this paper (except one exception as discussed at the end of this paragraph), we consider the CML estimation approach within the generic MNP model basis, which reduces the dimensionality of integration by compounding (within each individual) all pairs (or couplets) of choice occasion probabilities. Doing so reduces the dimensionality of the MVNCD function to be evaluated in the CML function to dimensions (that is, to an 8-dimensional MVNCD function in the simulation case). That is, all the frequentist approaches (the GHK-Halton, the GHK-SGI, and the MACML) are applied in the panel case using the CML estimation approach rather than the full maximum likelihood estimation approach (for the cross-sectional case, the CML and the full maximum likelihood estimation approaches collapse to being exactly the same).² However, to examine the benefit of the CML-based approach for the GHK-Halton simulator (i.e., the GHK-Halton-CML approach), we also evaluated the performance of the traditional GHK-Halton simulator embedded within the full information maximum likelihood (i.e., the GHK-Halton-FIML approach).

THE MULTINOMIAL PROBIT MODEL

The structure of the panel version of the MNP model is presented here. The cross-sectional model corresponds to a panel with one choice instance per individual. Also, for ease in presentation, a balanced panel is assumed with all alternatives available at each choice instance for each individual.

Let be the index for choice instance be the index for individual and be the index for choice alternatives Next, write the utility that an individual derives from choosing alternative at choice instance as:

(1)

where is a vector of exogenous attributes and is an individual specific column vector of coefficients. Assume that is distributed multivariate normal with mean vector b and covariance matrix where L is a lower-triangular Cholesky factor of Ω. That is, where where is a vector of zeros. In this paper, we assume no correlation across random coefficients of different individuals , and no variation in across different choice occasions of the same individual. In addition, we assume that the terms are IID normally distributed across individuals, alternatives, and choice occasions, with mean zero and variance 0.5.³

With the notations as above, we may write the utility expression in Equation (1) as:

(2)

Next, define the following vectors and matrices (where stands for the identity matrix of dimension T):

and

Equation (2) may now be written in a compact form for all individuals and choice occasions as:

(3)

The distribution of may be expressed as: with and that of may be expressed as: .

Let individual be assumed to choose alternative at choice occasion t. Let . To estimate the model, we need to construct the likelihood that the utility differences (with respect to the chosen alternative) for all choice occasions are less than zero. To do so, define as a block diagonal matrix, with each block diagonal being of size and containing the matrix . itself is constructed as an identity matrix of size with an extra column of “-1” values added at the column. Then the construction of the likelihood expression for individual q (i.e. the joint probability of the sequence of choices () made by the individual q) is given below:

(4)

That is, where is the multivariate normal density function for a dimensional normal variate with mean and covariance matrix To rewrite in terms of the standard multivariate normal distribution, define as the diagonal matrix of standard deviations of . The vector is standard multivariate normally distributed with correlation matrix . Equation (4) may now be written as:

(5)

where and are the standard MVN probability density and the MVNCD function of dimension G, respectively, and .

The dimensionality of the integration in Equation (5) is . Therefore, as the number of choice alternatives or the number of choice occasions per individual increases, the likelihood function becomes computationally expensive or in some cases infeasible to evaluate at a level of accuracy and smoothness needed for parameter estimation using traditional techniques.

A potential solution to reduce the dimensionality of integration is to use the composite marginal likelihood (CML) approach, where the overall likelihood function is calculated as the product of low dimensional marginal densities (see Bhat, 2014). In the current context, the CML function for an individual q may be written as a product of the pairwise joint probabilities of the individual’s choices over all pairs of choice occasions (Bhat, 2011):

(6)

To further develop the CML function above, define , and where is a -selection matrix with an identity matrix of size () occupying the first () rows and the through columns, and another identity matrix of size () occupying the last () rows and the through columns. All other elements of take a value of zero. Then in Equation (6) may be written as:

(7)

Working with the above CML function helps reduce the dimensionality of integration from (in the likelihood function of Equation (5)) to, thereby reducing the model estimation time substantially, and alleviating convergence and parameter recovery problems arising due to large dimensional integrals in the original likelihood function. Of course, if there is only one choice occasion, then the CML expression in Equation (7) collapses to the usual full-information likelihood based estimation approach.

MNP Estimation Techniques

In this section, we discuss the different approaches evaluated in this study for estimating MNP models.

The Maximum Approximate Composite Marginal Likelihood (MACML) Approach

Bhat (2011) proposed the MACML approach that utilizes a CML estimation procedure (Varin, 2008; Bhat, 2014) combined with an analytic approximation to evaluate the MVN cumulative distribution (MVNCD) function in MNP models. The analytic approximation he used is based on the decomposition of the multivariate integral into a product of conditional probabilities that are approximated analytically (see Solow, 1990; and Joe, 1995, though these earlier studies focused on the evaluation of a single MVNCD function, while Bhat proposed an approach to incorporate the analytic approximation in an estimation setting with multiple MVNCD function evaluations). There are at least two advantages of the MACML approach. First, using an analytic expression for the MVNCD function obviates the need for simulation. This also renders the approximated likelihood surface smooth and well behaved for optimization purposes. Second, the CML estimation technique helps in reducing large dimensional integrals (due to panel or repeated-choice data, or spatial/social interactions) into products of lower dimensional integrals. To conserve space, we do not provide additional details on this approach but refer the reader to Bhat (2011).

The GHK-Halton Simulator

The GHK approach starts with transforming the correlated error differences in an MNP model into linear functions of uncorrelated standard normal deviates using the Cholesky decomposition of the error difference covariance matrix. Doing so helps in recasting the MVNCD as a recursive product of univariate (conditional) cumulative normal distributions. We do not discuss this simulator in any more detail, but refer the reader to Train (2009) for a good exposition of this method. The one difference from the discussion in Train (2009) is that we embed the Halton approach to recursively simulate draws from the truncated regions (as discussed in detail in Bhat et al., 2010) instead of drawing from pseudo-Monte Carlo sequences in the traditional GHK-simulation approach; therefore, the label GHK-Halton simulator. In addition, as indicated earlier, for panel data settings, we consider both the CML estimation approach (using Equation (7)) as well as the full information MSL (FIML) approach (using Equation (5)) in conjunction with the GHK-Halton simulator.

The GHK Approach with Sparse Grid Integration (GHK-SGI)

Heiss and Winschel (2008) proposed a multivariate quadrature method using the concept of sparse grid integration (SGI) that has been gaining popularity for the evaluation of multidimensional integrals. SGI-based multivariate quadrature is similar to traditional quadrature, except that the multivariate node points at which the integrand is evaluated are chosen cleverly and sparsely (based on a tensor product rule from Smolyak, 1963) to avoid the curse of dimensionality from operating on a full grid of all combinations of nodes in all dimensions. Heiss and Winschel (2008) describe this approach in detail and demonstrate the effectiveness of the approach in evaluating multidimensional integrals of up to 20 dimensions for MMNL (not MNP) parameter estimation. In the current paper on MNP parameter estimation, we employ the GHK-SGI approach, where the SGI nodes and weights are used within the GHK framework, instead of drawing from Psuedo-Monte Carlo or Quasi-Monte Carlo sequences as in traditional GHK-simulation (see Abay, 2015 for a similar setup for multivariate binary probit models). Further, as indicated earlier, we used the CML estimation in conjunction with the GHK-SGI approach.

The GHK Simulator with Efficient Importance Sampling and Sparse Grids

The performance of quasi-random (e.g., Halton) sequence-based GHK simulation may be enhanced through the use of Efficient Importance sampling (EIS), a variance-reduction technique based on the idea that a certain set of draws from a given sequence contribute more toward the approximation of the integral than other draws from the same sequence. If one can sample such ‘important’ values more frequently, the approximation will be quicker and more accurate. Hence, the key to importance sampling is to choose an auxiliary distribution (also called the importance sampler) which facilities easy sampling of important draws along with reducing the sampling variance (i.e., distance between the importance sampler and the initial sampler). In this context, Heiss (2010) proposes the use of a normally distributed importance sampler inside the GHK simulator along with a weighted least squares technique proposed by Richard and Zhang (2007) to minimize the sampling variance. Heiss (2010) provides Monte Carlo evidence that the resulting GHK-EIS-SGI approach offers better (and less computationally intensive) parameter recovery than the simulation-based GHK procedure in the context of panel binary probit models.

A potential problem with the use of sparse grids in conjunction with importance sampling is that a significant percentage of sparse grid nodes might be associated with negative weights. And using negative weights within weighted least squares technique to minimize the variance (i.e., minimizing an objective function using negative weights) might lead to undue importance to the negative weights causing convergence issues during parameter estimation. This is a reason why our experiments to estimate MNP models with Heiss’s (2010) GHK-EIS-SGI approach were not successful, with most estimation attempts encountering convergence issues. We attempted two adhoc solutions to address this problem: (1) neglect all the SGI nodes with negative weight during the minimization of sampling variance, and (2) replace negative SGI weights by their absolute values. Neither of these approaches appears to guarantee a smooth estimation, as we found in the current study. Thus, we dropped the GHK-EIS-SGI approach in further investigations in this paper.

The Bayesian MCMC Approach

Advances in the Bayesian domain have led to efficient MCMC methods for estimating MNP models, particularly using the data augmentation technique. Application of the Bayesian method of estimation to MNP consists of a Markov chain Monte Carlo (MCMC) approximation of the posterior distribution of the model parameters. The basic idea is to augment the parameter space so that simulated realizations of the random utility are generated. Therefore, data augmentation in this case implies that the dependent variable (the utility function) becomes observable, making it possible to use standard Bayesian regression techniques for estimating both the population parameters and the random taste variations. Both McCulloch et al. (2000) and Imai and van Dyk (2005) follow this approach. However, the McCulloch et al. (2000) method incorporates the normalization for utility scale after the estimation is done. Imai and van Dyk’s (2005) method improved on this by considering the normalization for utility scale explicitly at the beginning of estimation (similar to the frequentist approach). In the current study, we evaluate the performance of the McCulloch et al. (2000) method and leave the Imai and van Dyk (2005) approach for future research.

Yüklə 294,82 Kb.

Dostları ilə paylaş:

1 2 3