Semiparametric Estimation of the Random Utility Model with Rank-Ordered Choice Data †
Jin Yan
Hong Il Yoo
∗
‡
April 15, 2017
Abstract We propose two semiparametric methods for estimating the random utility model using rank-ordered choice data. The framework is semiparametric in that the utility index function includes nite dimensional preference parameters but the error terms follow an unspecied distribution. Our methods allow for a exible form of heteroskedasticity across individuals.
When the complete preference rankings of
alternatives in a choice set are observed, our methods also allow for exible patterns of heteroskedasticity and correlated errors across alternatives, as well as a variety of random coecient distributions; in particular, our methods can accommodate most of popular parametric random utility models and any nite mixture of those models. The baseline method we develop is the generalized maximum score (GMS) estimator, which is strongly consistent but follows a non-standard asymptotic distribution. To facilitate statistical inferences, we make extra but mild regularity assumptions and develop the smoothed generalized maximum score (SGMS) estimator, which is both strongly consistent and asymptotically normal. Our Monte Carlo experiments show that under a variety of stochastic specications, the GMS and SGMS estimators perform favorably against popular parametric estimators. ∗ We thank Xu Cheng, Liran Einav, Jeremy Fox, Bruce Hansen, Han Hong, Arthur Lewbel, Taisuke Otsu, Joris Pinkse, Jack Porter and seminar participants at the 2015 Tsinghua Econometric Conference, the 12th International Symposium on Econometric Theory and Applications, the 2016 Asian and European Meetings of the Econometric Society, Academia Sinica, Newcastle University, Sun Yat-Sen University and the Chinese University of Hong Kong for valuable comments and discussions. We acknowledge funding support provided by the Hong Kong Research Grants Council General Research fund 2014/2015 (Project No.14413214) and six anonymous referee reports for the project proposal. All errors are ours.
† Department of Economics, The Chinese University of Hong Kong. Email:
[email protected]. ‡ Durham University Business School, Durham University. Email:
[email protected].
1
2
Keywords: Rank-ordered; Random utility; Semiparametric estimation; Smoothing JEL Classication: C14, C35.
1
Introduction
Rank-ordered choices can be elicited using the same type of survey as multinomial choices, specically one that presents an individual with a nite set of mutually exclusive alternatives. The two elicitation formats may be distinguished by the amount of information that is available to the econometrician. A multinomial choice reports the individual's choice or most preferred alternative from the set, whereas a rank-ordered choice reports further about the individual's preference ordering such as her second and third preferences: see for example Hausman and Ruud (1987), Calfee
et al.
(2001), and Train and Winston (2007). One rank-ordered
choice observation provides a similar amount of information as several multinomial choice observations, in the sense that it allows inferring what the individual's choices would have been if her more preferred alternatives were not available.
This allows fewer individuals to be interviewed to achieve a given level of statistical
precision and, as Scarpa
et al.
(2011) point out, the resulting logistic advantages could be substantial for
many non-market valuation studies which involve a narrowly dened population of interest. We develop semiparametric methods for the estimation of random utility models using rank-ordered choice data. Despite the wide availability of parametric counterparts, such semiparametric methods remain almost undeveloped to date. The random utility function of interest has a typical structure: it comprises a systematic component or utility index varying with nite-dimensional explanatory variables, and an additive stochastic component or error term. The objective is to estimate preference parameters, referring to the coecients on the explanatory variables. The methods are semiparametric in that they maintain the usual parametric form of the systematic component but place only non-parametric restrictions on the stochastic component. The parametric methods are equally well-established for multinomial choice and rank-ordered choice data.
In most cases, an analysis of multinomial choice data involves the maximum (simulated) likelihood
estimation of one of four models: multinomial logit (MNL), nested MNL, multinomial probit (MNP) and random coecient or mixed MNL. Each model assumes a dierent parametric distribution of the stochastic component, and has its own rank-ordered choice counterpart which shares the same assumption: rank-ordered
3
logit (ROL) of Beggs
et al.
(1981), nested ROL of Dagsvik and Liu (2009), rank-ordered probit (ROP) of
Layton and Levine (2003), and mixed ROL of Layton (2000) and Calfee
et al.
(2001). Building on Falmagne
(1978) and Barberá and Pattanaik (1986), McFadden (1986) provides a technique which can be applied to translate any parametric multinomial choice model into the corresponding rank-ordered choice model. The literature on the semiparametric methods is more lopsided.
For multinomial choice data, several
alternative methods exist including Manski (1975), Ruud (1986), Lee (1995), Lewbel (2000) and Fox (2007). The special case of binomial choice data has attracted even greater attention, and the respectable menagerie includes Ruud (1983), Manski (1985), Han (1987), Horowitz (1992), Klein and Spady (1993) and Sherman (1993) to name a few.
When it comes to rank-ordered choice data, we are aware of only one study that
aimed at the semiparametric estimation of the preference parameters, namely Hausman and Ruud (1987). In that study, the weighted M-estimator (WME) of Ruud (1983, 1986) is generalized for use with rank-ordered choice data, whereas the original WME was intended for use with binomial and multinomial choice data. The generalized WME allows the consistent estimation of the ratios of coecients despite stochastic misspecication, but there are two drawbacks aecting its empirical applicability. As the authors acknowledge, the estimator's consistency is conned to the ratios of the coecients on continuous explanatory variables, and its asymptotic distribution is unknown outside a special case of Newey (1986). This paper proposes a pair of new semiparametric methods for rank-ordered choice data. We call them the generalized maximum score (GMS) estimator and the smoothed generalized maximum score (SGMS) estimator respectively. Both estimators are consistent under more general assumptions concerning explanatory variables than the generalized WME of Hausman and Ruud (1987). Roughly speaking, if one of
q
explana-
tory variables is continuous, each estimator allows the consistent estimation of the ratios of all coecients regardless of whether the other
q−1
variables are continuous or discrete. Moreover, the SGMS estimator
is asymptotically normal, meaning that it is amenable to the application of usual Wald-type tests.
The
GMS estimator follows a non-standard asymptotic distribution, but it does not require extra smoothness conditions. The GMS estimator generalizes the pairwise maximum score (MS) estimator of Fox (2007), which has been developed for use with multinomial choice data and is a modern extension of the classic MS estimator due to Manski (1975). Suppose that the individual faces
J
alternatives. A multinomial choice observation
4
allows one to infer the outcomes of
J −1
pairwise comparisons where each pair comprises her actual choice
and an unchosen alternative. A rank-ordered choice observation allows one to infer the outcomes of more pairwise comparisons. For example, in case the individual ranks all
J
alternatives from best to worst, her
rank-ordered choice would allow one to learn the outcomes of all possible
J(J − 1)/2
pairwise comparisons.
The GMS estimator extends the MS estimator by incorporating such extra information. The key identication condition comprises an intuitively plausible set of inequalities: in a pairwise comparison, if one alternative's systematic utility exceeds the other's, its chance of being ranked better also exceeds the other's. The GMS estimator inherits all attractive properties of the MS estimator, two of which are particularly relevant to empirical applications. First, the GMS estimator allows the econometrician to be agnostic about the form of interpersonal heteroskedasticity or scale heterogeneity (Hensher 2010), referring to variations in the overall scale of utility across individuals.
et al.,
1999; Fiebig
et al.,
1 This property is desirable
because in most studies, the exact form of interpersonal heteroskedasticity matters only to the extent that its misspecication leads to the inconsistent estimation of the core preference parameters. Second, the GMS estimator is consistent when the data generating process (DGP) comprises an arbitrary mixture of dierent models, provided that it is consistent for each component model. The empirical evidence from behavioral economics (Harrison and Rutstr¨ om, 2009; Conte
et al., 2011) supports the notion that characterizing observed
choices requires more than one model, but the parametric estimation of a mixture model demands the exact knowledge of the number and composition of component models. In addition, when each individual ranks all alternatives from best to worst, the GMS estimator is substantively more exible than the MS estimator. As we discuss in details later, the GMS estimator is consistent for all popular parametric models exhibiting exible substitution patterns, whereas the MS estimator is
2 The GMS estimator therefore delivers what the empiricist may expect from the use of a semipara-
not.
metric method, namely the ability to estimate all popular parametric models consistently on top of other types of models. This is an interesting nding because in the parametric framework, the advantage of using
1 This
property explains a major dierence between the GMS estimator and the maximum rank correlation (MRC) estimator
of Han (1987) and Sherman (1993). The GMS method utilizes the observed ranking information and does pairwise comparisons of alternatives within each individual, allowing the conditional joint distribution of the error terms to vary across individuals. In comparison, the MCR estimator does pairwise comparisons between individuals and requires the error terms to be independent of the explanatory variables, ruling out the possibility of heteroskedasticity across individuals.
2 The dierence arises because the complete ranking information allows us to replace the exchangeability
et al., 2005; Fox, 2007) with a much weaker assumption of zero conditional median.
assumption (Goeree
5
rank-ordered choice data instead of multinomial choice data is limited to eciency gains (Hausman and Ruud, 1987), and a multinomial choice model may be more robust to stochastic misspecication than its rank-ordered choice counterpart (Yan and Yoo, 2014).
The eciency-bias tradeo does not apply in the
semiparametric framework, where the advantage of using rank-ordered choice data also includes robustness to a wider variety of DGPs. We note that in most studies on rank-ordered choices, the complete rankings are elicited as required for this result (Hausman and Ruud, 1987; Calfee Scarpa
et al., 2001; Capparros et al., 2008;
et al., 2011; Yoo and Doiron, 2013; Oviedo and Yoo, 2016).
The SGMS estimator oers the same types of practical benets as the GMS estimator, and addresses the latter's major drawbacks in return for requiring extra smoothness assumptions. The GMS estimator's rate of convergence is
N −1/3
which is slower than the usual rate of
N −1/2 ,
and it follows a non-standard
asymptotic distribution of Kim and Pollard (1990) which is inconvenient for use with conventional hypothesis tests. These properties are inherited from the MS estimator, and arise because the objective function is a sum of step functions. Horowitz (1992) develops the smoothed maximum score (SMS) estimator for binomial choice data which replaces the step functions with smoothing functions, and Yan (2012) extends the method to multinomial choice data.
Our smoothing technique follows this tradition.
estimator's convergence rate can be made arbitrarily closer to
N −1/2
We show that the SGMS
under extra smoothness conditions and
that its asymptotic distribution is normal, with a covariance matrix which can be consistently estimated. The remainder of this paper is organized as follows. Section 2 develops the GMS estimator and compares it with popular parametric methods. Section 3 develops the SGMS estimator. Section 4 presents the Monte Carlo evidence on the nite sample properties of the proposed estimators. Section 5 concludes.
2 2.1
The Model and the Generalized Maximum Score Estimator A Random Utility Framework and Rank-Ordered Choice Data
Consider the standard random utility model. collection of alternatives. Let
J = {1, . . . , J}
An individual in the population of interest faces a nite
denote the set of alternatives and let
J ≥2
be the number of
6
alternatives contained in
uj = x0j β + εj where
J.
The utility from choosing alternative
(1)
is an observed
q -vector
interactions with the individual's characteristics,
εj
is assumed as follows:
∀ j ∈ J,
xj ≡ (xj,1 , . . . , xj,q )0 ∈ Rq
of interest, and
j , uj ,
containing the attributes of alternative
β ≡ (β1 , . . . , βq )0 ∈ Rq
is the unobserved component of utility to the econometrician. The utility index
(or stochastic) utility.
Let
ε ≡ (ε1 , . . . , εJ )0 ∈ RJ r(j, u)
X ≡ (x1 , . . . , xJ )0 ∈ RJ×q
εj
when
is
which is called unsystematic
be the matrix of the explanatory variables and
denote the latent or potentially unobserved ranking of alternative
r(j, u) = q
x0j β
be the vector of the error terms.
underlying alternative-specic utilities that
and their
is the preference parameter vector
often called systematic (or deterministic) utility, as opposed to the error term
Let
j
j
is the
q th
u ≡ (u1 , u2 , . . . , uJ )0 ∈ RJ .
indicates a more preferred alternative. For instance, suppose that and
r(4, u) = 2.
based on the vector of
We shall follow the notational convention
best alternative in the choice set
r(1, u) = 3, r(2, u) = 4, r(3, u) = 1
j,
J,
meaning that a smaller ranking value
J =4
and
u3 > u4 > u1 > u2 .
Then,
Purely for technical convenience, our notation handles
any utility tie by assigning a better ranking to an alternative which happens to have a smaller numeric label. For instance, suppose instead that
u3 > u4 = u1 > u2 .
Then,
r(1, u) = 2
and
r(4, u) = 3
since numeric label
1 is smaller than 4. A more formal denition of the latent ranking that incorporates our notational convention is as follows. Let
T(j, u)
k ∈ T(j, u) set
T.
be the set of alternatives with the same utility as alternative one-to-one onto the integers
For any two alternatives
{0, . . . , |T(j, u)| − 1},
where
|T|
j . A(k, T(j, u))
maps element
is the number of alternatives in the
k, l ∈ T(j, u), A(k, T(j, u)) < A(l, T(j, u))
if and only if
k < l.
For any
j ∈ J,
dene its latent ranking as
r(j, u) ≡ L(j, u) + 1 + A(j, T(j, u))
where
L(j, u)
denotes the number of alternatives that yield strictly larger utility than alternative
(2)
j
for the
individual. Notice that when there is no utility tie, the last term is irrelevant to the latent ranking value since
7
A(j, T(j, u)) = 0. and the set
By denition (2), there is a one-to-one mapping between the set
{1, . . . , J}.
Next, let
rj
denote the reported or actually observed ranking of alternative
J
be the vector of the reported rankings of all ranking
rj
{r(j, u) : j = 1, . . . , J}
coincides with the latent ranking
alternatives in
r(j, u)
J.
j,
and
r ≡ (r1 , . . . , rJ )0 ∈ NJ
We shall maintain that the reported
in case the individual reports the complete ranking
of alternatives, and is a censored version of the latent ranking in case she reports a partial ranking.
M
facilitate further discussion, suppose that the individual reports the ranking of her best
1 ≤ M ≤ J − 1, J = 4
and
and leaves that of the other
u3 > u4 > u1 > u2 .
In case
J −M
M = 3,
alternatives unspecied.
To
alternatives where
As before, suppose that
the complete ranking is observed since the individual
reports her best, second-best and third-best alternatives, allowing the econometrician to infer that the only remaining alternative is her worst one:
r = (r1 , r2 , r3 , r4 ) = (3, 4, 1, 2)
reported ranking is identical to its latent ranking. In case
M = 2,
meaning that each alternative's
only a partial ranking is observed since
the individual reports her best and second best alternatives, and the econometrician cannot tell whether alternative 1 is preferred to alternative 2: as latent ranking
r(2, u).
Finally, in case
r = (3, 3, 1, 2) M = 1,
so that reported ranking
r2
is no longer the same
the resulting partial ranking observation is identical to a
multinomial choice observation since the individual reports only her best alternative:
r = (2, 2, 1, 2).
A more formal denition of the reported ranking that incorporates the above discussion is as follows. Let the random set
r(j, u) ≤ M }.
rj =
When
M (M ⊂ J)
denote the set of the best
The reported ranking of alternative
r(j, u)
if
M +1
if
M = J − 1,
j,
M
alternatives for the individual, that is,
M ≡ {j :
then, follows the observation rule
r(j, u) ≤ M, or equivalently, j ∈ M,
(3)
r(j, u) > M, or equivalently, j ∈ J \ M.
the complete ranking is observed. When
M = 1,
the resulting partial ranking is obser-
vationally equivalent to a multinomial choice. The intermediate cases of partial rankings, which result when
2≤M
and
J > 3,
are much less common in empirical studies though not unprecedented.
3
for example Layton (2000) and Train and Winston (2007), both of which analyze data on the best and second-best
alternatives; their data structures are
M =2
and
J >3
according to our notations.
8
2.2
The Generalized Maximum Score Estimator
This section establishes strong consistency of the Generalized Maximum Score (GMS) estimator, the rst of two semiparametric methods that we propose. The GMS estimator is semiparametric in the sense that it allows the econometrician to estimate preference parameters
β
consistently, without commiting to a specic
parametric form of the conditional distribution of errors given observed attributes
ε|X .
Our rst assumption
pertains to sampling.
Assumption 1.
{(r n , X n , εn ) : n = 1, . . . , N } is a random sample of (r, X, ε), where r n ≡ (rn1 , . . . , rnJ )0 ∈
NJ , X n ≡ (xn1 , . . . , xnJ )0 ∈ RJ×q , (r n , X n )
and εn ≡ (εn1 , . . . , εnJ )0 ∈ RJ . For each individual n = 1, . . . , N ,
is observed.
Assumption 1 states that we have dently and identically distributed (
N
observations of
i.i.d.).
(r, X),
indexed by
n
to avoid notational
4
As usual in discrete choice modeling, identication of parameters
5
and individuals are indepen-
For the latter reason, we drop subscript
clutter except when it is needed for clarication.
they are unique only up to a scale.
n,
β
requires scale normalization since
When a parametric form of the conditional distribution of
ε|X
is
6 But
specied, identication is almost always achieved by normalizing a scale parameter of that distribution.
when no parametric form is specied, no scale parameter is available for normalization. In a semiparametric framework, identication is therefore achieved by normalizing Subject to the prior knowledge that some element of
β
β
instead.
is non-zero, we can normalize the magnitude of
that element. Economists may agree, for example, that the coecient on the own price variable is negative. Without loss of generality, we assume that the other
q−1
elements of
Assumption 2.
β∈B
β.
|β1 | = 1.
Dene
˜ ≡ (β2 , . . . , βq )0 ∈ Rq−1 β
as the vector containing
The following assumption imposes restriction on the parameter space.
where B ≡ {−1, 1} × B˜ and B˜ is a compact subset of Rq−1 where q ≥ 2.
Next, we state Assumption 3 which presents a key identication condition pertaining to the strong consistency of the GMS estimator. This assumption implicitly places a restriction on the conditional distribution
4 Throughout this paper, we use n to denote an individual, and j, k, l to denote alternatives. 5 Multiplying both β and ε by any positive constant leads to the same rank-ordered choice data. 6 For instance, in the binomial probit model, the variance of the conditional distribution is assumed
to be one.
9
of
ε|X ,
albeit it is a non-parametric restriction which is satised by a range of parametric functional forms,
some of which we will discuss in the subsequent section. Denote the systematic utility of alternative
vj ≡ x0j β
for any alternative
j
as
j ∈ J.
For any individual, and for any pair of alternatives j, k ∈ J,
Assumption 3.
vj > vk
if and only if P (rj < rk |X) > P (rk < rj |X).
(4)
In words, alternative j generates more systematic utility than alternative k if and only if there is a higher chance that j is preferred to k (rj < rk ) than the reverse (rk < rj ), conditional on all explanatory variables. Assumption 3 immediately implies that alternatives
j
and
above alternative
P (rj < rk |X) = P (rj > rk |X)
if and only if
vj = vk ,
k
have the same systematic utility if and only if the probability that alternative
k
is the same as the probability that alternative
k
is ranked above alternative
Two special types of rank-ordered choice data are worth highlighting. First, when
M = 1,
k (rj < rk )
if and only if
P (rj < rk |X) = P (rj = 1|X).
When we replace
j
is ranked as the best alternative (rj
= 1),
is ranked
j.
the individual
reports only her best alternative and we have multinomial choice data. In this case, alternative above alternative
j
i.e.,
j
is ranked
so we have
(5)
P (rj < rk |X) with P (rj = 1|X) and P (rk < rj |X) with P (rk = 1|X) in (4), Assumption 3
becomes the monotonicity property of the choice probabilities (Manski, 1975), i.e., the ranking of the choice probability of an alternative is the same as the ranking of the systematic utility of the alternative for any given individual.
7
Second, when
M = J − 1,
rank-ordered choice data.
7 See
the individual ranks all alternatives from best to worst, and we have fully
With this complete ranking information, we can compare the utilities between
Fox (2007) for a detailed discussion of the sucient conditions for the monotonicity property of the choice probabilities.
10
any two alternatives. Without loss of generality, let's focus on a pair of alternatives
j
Alternative
is ranked above alternative
the utility from choosing alternative
P (rj < rk |X)
k,
k
(j, k)
such that
if and only if the utility from choosing alternative
so we have
j
j < k.
is larger than
8
= P (uj ≥ uk |X)
(6)
= P (εk − εj ≤ vj − vk |X). The only if part holds under the denition of ranking
r,
and the if part is a direct result of complete
ranking. The rst equality of (6) may not hold if we only observe a partial ranking, i.e., because event alternative even if
j
rj < rk
naturally implies
nor alternative
k
uj ≥ uk ,
belongs to the set
but event
uj ≥ uk
may not imply
M < J − 1.
rj < rk .
This is
When neither
M, both of them are observed with the same ranking, M + 1,
uj > uk .
For any pair of alternatives, assume that the conditional distribution of
εk − εj
function. Then the well-known (pairwise) zero conditional median (ZCM) restriction,
is a strictly increasing
median(εk −εj |X) = 0,
is a necessary and sucient condition for Assumption 3 when a complete ranking of all the alternatives is
9 Notice that
available. The proof is straightforward.
P (rj < rk |X)+P (rk < rj |X) = 1 when the choice set is
fully rank-ordered. For necessity, Assumption 3 implies that
1/2,
or equivalently,
that
vj > vk
P (εk − εj ≤ vj − vk |X) = 1/2
if and only if
P (rj < rk |X) > 1/2
vj −vk = 0 holds if and only if P (rj < rk |X) =
by (6). For suciency, the ZCM assumption implies
by (6), or equivalently,
P (rj < rk |X) > P (rk < rj |X)).
Next, we describe the intuition of applying Assumption 3 to construct the GMS estimator for
β.
Let
1(·)
be the indicator function that equals one if the event in the parenthesis is true and zero otherwise, and let
˜0 )0 b ≡ (b1 , b event
be any vector in the parameter space
rj < rk
8 If j > k, 9 This
k
then
rj < rk ;
and if
P (rj < rk |X) = P (uj > uk |X). This j if k < j when k ∈ T(j).
Under Assumption 3, if
rk < rj ;
is more likely to occur than the event
more likely to be true than the event
alternative
B.
if
x0k β > x0j β
x0j β = x0k β
x0j β > x0k β
is true, then the
is true, then the event
holds, then the event
rj < rk
is because we break ties using the function
rk < rj
is
has the same
A(·, T(j)),
and rank
above alternative
proof does not apply to partially rank-ordered choice data, e.g., multinomial choice data, because the rst equality in
(6) does not hold. Goeree et al. (2005) give an example showing that the ZCM assumption is not sucient for the monotonicity property of the choice probabilities.
11
chance to be true as the event
mjk (b)
rk < rj .
Therefore, the expected value of the following match
=
1(rj < rk ) · 1(x0j b > x0k b) + 1(rk < rj ) · 1(x0k b > x0j b) + 1(rj < rk ) · 1(x0j b = x0k b)
=
1(rj < rk ) · 1(x0j b ≥ x0k b) + 1(rk < rj ) · 1(x0k b > x0j b)
should be maximized at the true preference parameter vector index of alternative
˜0 ) (bN,1 , b N
bN ≡
j for individual n.
∈ B,
for
β
β,
where
b ∈ B.
Dene
x0nj b
(7)
b-utility
as the
Applying the analogy principle, we propose a semiparametric estimator,
as follows:
bN ∈ argmaxb∈B QN (b),
(8)
where
QN (b) = N −1
N X
X
1(rnj < rnk ) · 1(x0nj b ≥ x0nk b) + 1(rnk < rnj ) · 1(x0nk b > x0nj b) .
(9)
n=1 1≤j
In the special case of
M = 1,
i.e. when we have multinomial choice data, the estimator
becomes the pairwise maximum score (MS) estimator of Fox (2007). When data, the estimator
bN
bN
dened by (8)
J = 2 or we have binomial choice
becomes the MS estimator of Manski (1985). For this reason, the estimator
bN
will
be called the generalized maximum score (GMS) estimator. When all the explanatory variables are discrete, we can always nd another parameter vector in the neighborhood of
β
which generate the same ranking of utility indexes as the true parameter vector.
To
achieve point identication, we need to impose an extra assumption on the explanatory variables, namely, we need a continuous explanatory variable conditional on other explanatory variables. Next, we dene a few notations and state the restrictions on explanatory variables formally in Assumption 4. Since only the dierence in utilities matters to the observed outcome of random utility maximization, we shall assume
xJ = 0
10 Next, let
without any loss of generality.
xjk ≡ (xjk,1 , . . . , xjk,q )0 ∈ Rq
the dierence between the explanatory variable vectors of alternatives
j
and
k,
that is,
xjk = xj − xk .
Assumption 2, we assumed that the rst parameter has nonzero value. For each alternative
10 If x 6= 0 J
initially, one can recode
xj
as
xj − xJ
for all
j∈J
including
j = J.
denote
j ∈ J,
In
partition
12
the vector
xj
into
˜ 0j )0 , (xj,1 , x
where
xj,1
to the remainder. So the rst element of Denote
xjk ,
˜ ≡ (˜ ˜ J )0 ∈ RJ×(q−1) . X x1 , . . . , x
and
˜ jk , x
respectively. Matrices
Assumption 4.
(a)
xjk
is
xjk,1 = xj,1 − xk,1
Vectors and
xj
˜n X
xnj , xnjk , are the
and
and
˜ j ≡ (xjk,2 , . . . , xjk,q )0 ∈ Rq−1 x
and its remaining elements
˜ njk x
are the
nth
refers
˜ jk = x ˜j − x ˜k. x
observation of vectors
nth observation of matrices X
and
˜, X
xj ,
respectively.
The following statements are true.
For any pair of alternatives j, k ∈ J, gjk (xjk,1 |˜xjk ) denotes the density function of xjk,1 conditional on ˜ jk , x
(b)
Xn
is the rst element of
and gjk (xjk,1 |˜xjk ) is nonzero everywhere on R for almost every x˜ jk .
For any constant vector c ≡ (c1 , . . . , cq )0 ∈ Rq , Xc = 0 with probability one if and only if c = 0.
Assumption 4 is sucient to show that other vectors limit of the objective function
QN (b)
b ∈ B would yield dierent values for the probability
from the true parameter vector
β.
Assumption 4(a) avoids the local
failure of identication, which is important for semiparametric setting. Assumption 4(b) is analogous to the full-rank condition for the binomial choice model, which prevents the global failure of identication. The following theorem establishes the strong consistency of the GMS estimator. Appendix provides the proofs of all theorems stated in the main text.
Theorem 1.
β,
Let Assumptions
1-4
hold. The GMS estimator bN dened in (8) converges almost surely to
the true preference parameter vector in the data generating process.
2.3
Comparisons with Parametric Methods
From the empiricist's perspective, the question of paramount interest would be how exible the semiparametric model is in comparision with parameteric models that one may consider. Modern desktop computing power makes this question especially relevant. Standard computing resources of today can handle the estimation of models that feature fairly exible, albeit parametric, error structures. When applied to data on complete rankings (i.e.
M = J − 1),
the GMS estimator postulates a semipara-
metric model which nests all popular parametric models and any nite mixture of those models, provided that the explanatory variables satisfy regularity conditions such as Assumption 4. In most studies on rank-
13
11 Such a degree of exibility
ordered choices, the complete rankings are elicited as required for this result.
is not something to be taken for granted. For instance, the MS estimator (Manski, 1975; Fox, 2007) using multinomial choice data is consistent for a family of parametric models featuring exchangeable errors (e.g. multinomial logit and multinomial probit with equicorrelated errors), but not for those parametric models that feature more exible error structures (e.g. nested multinomial logit, multinomial probit with a general error covariance matrix, and mixed logit). This section elaborates on the semiparametric model that the GMS estimator postulates, and its comparisons with popular parametric models. To clarify the notion of interpersonal heteroskedasticity here (and later, unobserved interpersonal heterogeneity), we reinstate individual subscript
n.
With a slight abuse of
notations, an observationally equivalent form of equation (1) may be specied to express the utility that individual
n
derives from alternative
unj = σn × (x0nj β) + εnj
where the new parameter
σn
as
n = 1, 2, ...N
σn ∈ R1+
12 Equivalently,
individuals.
for
j
and
j ∈ J,
(10)
captures that portion of the overall scale of utility which varies across
may be also described as a parameter that is inversely proportional to that
portion of error variance which varies across individuals. Consistent estimation of a parametric model requires the correct specication of both the joint density of errors
εn |X n
and the functional form of
σn .
The GMS
estimator allows both requirements to be relaxed substantially. Regardless of the depth of rankings observed (i.e.
for every
M
such that
1 ≤ M ≤ J − 1),
the GMS
estimator is consistent for the semiparametric model that accommodates any form of interpersonal heteroskedasticity via
σn .
For verication, note that when
vnj ≡ x0nj β
stated in Assumption 3, so does any positive multiple of this pair,
and
vnk ≡ x0nk β
σn × vnj
and
σn × vnk .
The GMS estima-
σn .
This is a desirable
tor, therefore, allows the empiricist to be agnostic about the exact functional form of property because in most studies,
11 See
σn
satisfy the inequality
demands attention only to the extent that it must be correctly specied
for example, Hausman and Ruud (1987), Calfee et al. (2001), Capparros et al. (2008), Scarpa et al. (2011), Yoo and
Doiron (2013) and Oviedo and Yoo (2016).
12 Since
an ane transformation of utilities does not alter observed behavior, the random utility specication (10) is obser-
vationally equivalent to
εnj /σn ,
rather than
they do not rely on
unj = x0nj β + εnj /σn .
εnj alone. εnj having
The slight abuse of notations refers to that
Note that the presence of a parameter like a standardized scale.
σn
εj
in equation (1) corresponds to
does not aect any of our earlier results because
14
for the consistent estimation of preference parameters
β.
The remainder of this section assumes the use of complete rankings (M
= J − 1).
This allows the
semiparametric model to accommodate any model that satises the pairwise zero conditional median (ZCM) restriction, i.e.
median(εnk − εnj |X n ) = 0 f or any j, k ∈ J,
(11)
which is then a necessary and sucient condition for Assumption 3 as long as the distribution of
(εnk −εnj )|X n
is a strictly increasing function: see section 2.2. In comparison, any parametric model involves a much stronger set of restrictions aecting other moments too, since the density of
εn |X n
is specied in full detail.
The semiparametric model based on equation (11) oers considerable exibility not only over possible distributions of idiosyncratic errors, but also over possible distributions of random coecients. To see this latter aspect, note that one may view erogeneity
ηn
dimension as
εn
as composite errors comprising individual-specic coecient het-
(that has the same dimension as
εn )
such that a typical entry in
β)
and purely idiosyncratic errors
ε n ≡ X n η n + n
n
is
0
εnj ≡ xnj η n + nj .
Suppose now that idiosyncratic errors for any
j, k ∈ J,
(that has the same
(12)
n
satisfy the pairwise ZCM restriction,
and the usual random coecient modeling assumption,
as individual heterogeneity has ZCM i.e.
median(η n |X n ) = 0,
median(nk − nj |X n ) = 0
(η n ⊥n )|X n ,
the composite errors
holds. Then, as long
εn
satisfy the pairwise
ZCM restriction in equation (11) too: dierencing two composite errors results in a linear combination of conditionally independent random variables,
(xnk −xnj )0 η n
and
(nk −nj ), each of which has the conditional
13 In comparison, a parametric random coecient model places more rigid restrictions on the
median of zero.
distribution of individual heterogeneity much as that of
ηn ,
because the density of
η n |X n
needs be specied in full detail
n |X n .
It is easy to verify that the semiparametric model accommodates the classic troika of parametric random
13 The coecients β may be interpreted as the median of population preference parameters vis-a-vis η specic deviations around them.
n that measure individual-
15
utility models, logit, nested logit and probit. All three models assume away interprsonal heteroskedasticity by setting
σn = 1 ∀n = 1, 2, ..., N ,
pairwise ZCM condition.
and assume an idiosyncratic error density
In case of logit, the idiosyncractic errors are
i.i.d.
εn |X n
that implies the
extreme value type 1 over
alternatives and, as the celebrated result of McFadden (1974) shows, dierencing two errors results in a standard logistic random variable that is symmetric around 0. The nested logit directly generalizes the logit model by specifying the joint density of distribution allows for a
positive
εn |X n
as a generalized extreme value (GEV) distribution.
correlation between
same nest or pre-specied subset of
J.
εnj
and
εnk
in case alternatives
j
and
k
This
belong to the
Dierencing two GEV errors still results in a logistic random variable
that is symmetric around 0, though it may not have the unit scale. Finally, in its unrestricted form, the probit model generalizes the nested logit model by specifying the multivariate normal density that allows for heteroskedasticity of and
εnk .
εnj
over alternatives
j,
and also for
any sign
εn |X n ∼ N (0, V ε )
of correlation between
εnj
Dierencing two zero-mean multivariate normal variables results in a zero-mean normal variable
which is symmetric around its mean. Random coecient, or mixed, logit models have become the workhorse of empirical modeling in the recent decade. The semiparametric model accommodates the most popular variant of mixed logit models, as well as its extensions. In the context of error decomposition (12), a mixed logit model has idiosyncratic errors
n |X n as i.i.d.
extreme value type 1 over alternatives and incorporates a non-degenerate mixing distribution
of random heterogeneity
η n |X n .
While the mixing distribution may take any parametric form, specifying
η n |X n ∼ N (0, V η ) is by far the most popular choice, so much so that the generic name mixed logit
is often
associated with this normal-mixture logit model. Dierencing the normal-mixture logit model's composite errors results in a linear combination of conditionally independent zero-mean normal and standard logistic random variables, that has the conditional median of zero. Fiebig
et al.
(2010) augment the normal-mixture
logit model with a log-normally distributed interpersonal heteroskedasticity parameter
σn ,
and nd that
the resulting Generalized Multinomial Logit model is capable of capturing the multimodality of preferences. Because the semiparametric model allows for any form of model too.
Greene
et al.
σn ,
it nests the Generalized Multinomial Logit
(2006) extend the normal-mixture model in another direction, by allowing the
variance-covariance of random coecients,
V ar(η n |X n )
to vary with
X n.
The semiparametric model nests
their heteroskedastic normal-mixture logit model too, since this type of generalization does not aect the
16
conditional median of
ηn .
The semiparametric model also accommodates any nite mixture of the aforementioned parametric models, and more generally that of all parametric models satisfying the pairwise ZCM restriction.
In other
words, it allows for that the data generating process may comprise dierent parametric models for dierent
14 This exibility comes from the fact that the GMS estimator does not require the density of
individuals.
εn |X n
to be identical across all individuals
n = 1, 2, ..., N ,
as long as each individual's density satises the
pairwise ZCM restriction. While the nite mixture of parametric models approach has not been applied to the analysis of multinomial choice or rank-ordered choice data, it has motivated inuential studies in the binomial choice analysis of decision making under risk (Harrison and Rutstr¨ om, 2009; Conte
et al.,
2011).
The ndings from that literature unambiguously suggest that postulating only one parametric model for all individuals may be an unduly restrictive assumption.
3
The Smoothed GMS Estimator
The maximum score (MS) type estimator is
N 1/3 -consistent,
and its asymptotic distribution is studied in
Cavanagh (1987) and Kim and Pollard (1990). Kim and Pollard have shown that
N 1/3
times the centered
MS estimator converges in distribution to the random variable that maximizes a certain Gaussian process for the binomial choice data. Their general theorem can be applied to multinomial choice data and rank-ordered choice data too. However, the resulting asymptotic distribution is too complicated to be used for inference in empirical applications. Abrevaya and Huang (2005) prove that the standard bootstrap is not consistent for the MS estimator. Delgado
et al.
(2001) show that subsampling consistently estimates the asymptotic
distribution of the test statistic of the MS estimator for the binomial choice data.
But subsampling has
eciency loss, and its computational cost is very high for the MS or GMS estimator because a global search method is needed to solve the maximization problem for each subsample. In this section, we propose an estimator that complements the GMS estimator by addressing these practical limitations, in return for making some additional assumptions. In the context of Manski's (1985) binomial choice MS estimator, Horowitz (1992) develops a smoothed maximum score (SMS) estimator that replaces
14 For
example, the nested logit model may generate 1/3 of the sample while the normal-mixture logit may generate the rest.
17
the step functions with smooth functions. Yan (2012) applies this technique to derive a smoothed version of Fox's (2007) multinomial choice MS estimator.
We use the same approach to derive a smoothed GMS
(SGMS) estimator, which oers similar benets as its SMS predecessors. Specically, we show that the SGMS estimator has a convergence rate which is faster than
N −1/3
under extra smoothness conditions, and also
that it is asymptotically normal.
3.1
The Smoothed GMS Estimator and its Asymptotic Properties
The objective function in (9) can be rewritten as
QN (b) = N −1
N X
X
[1(rnj < rnk ) − 1(rnk < rnj )] · 1(x0njk b ≥ 0) + 1(rnk < rnj )
(13)
n=1 1≤j
by replacing
1(x0nkj b > 0)
[1 − 1(x0njk b ≥ 0)].
with
The indicator function of
b
in (13) can be replaced by a suciently smooth function
analogous to a cumulative distribution function. Let sample size
N
hN
K(·),
where
K(·)
is
be a positive bandwidth that goes to zero when the
goes to innity. Application of the smoothing idea in Horowitz (1992) to the right-hand side
of (13) yields a smoothed version of GMS (SGMS) estimator
bSN ∈ argmaxQSN (b, hN ),
(14)
b∈B
where
QSN (b, hN ) = N −1
N X
X
[1(rnj < rnk ) − 1(rnk < rnj )] · K x0njk b/hN + 1(rnk < rnj ) .
(15)
n=1 1≤j
The next condition states the requirements that the smoothing function estimator
bSN
to be consistent.
Condition 1. Let
0
and let
(a)
K(·) should satisfy for the SGMS
K(x)
{hN : N = 1, 2, . . .} be a sequence of strictly positive real numbers satisfying limN →∞ hN =
be a function on
|K(x)| < C
for some nite
R
such that:
C
and all
x ∈ (−∞, ∞);
and
18
(b)
limx→−∞ K(x) = 0
Theorem 2.
and
limx→∞ K(x) = 1.
Let Assumptions
1-4 and Condition 1
hold. The SGMS estimator bSN ∈ B dened in (14)
converges almost surely to the true preference parameter vector β. By Theorem 2, the consistency of the SGMS estimator holds under the same set of assumptions as the GMS estimator, as long as the smoothing function is properly chosen.
Since any cumulative distribution
function (e.g. the standard normal distribution function) satises Condition 1, the SGMS does not require more assumptions to achieve consistency than the GMS estimator does. Extra assumptions, however, are required in order to derive the asymptotic distribution of the SGMS estimator. Assume that function of
QSN (b, hN )
with respect to
tN (b, hN ) = (N hN )−1
K(·) ˜ b
as
N X
is twice dierentiable. Next, dene the rst- and second-order derivatives
tN (b, hN )
X
and
H N (b, hN ),
respectively, where the vector
˜ njk [1(rnj < rnk ) − 1(rnk < rnj )] · K 0 x0njk b/hN x
(16)
n=1 1≤j
and the matrix
H N (b, hN ) = (N h2N )−1
N X
X
˜ njk x ˜ 0njk . [1(rnj < rnk ) − 1(rnk < rnj )] · K 00 x0njk b/hN x
(17)
n=1 1≤j
Let
bSN,1
denote the rst element of
bSN ∈ B,
function (15) of the SGMS estimator
bSN
and
˜S b N
denote the vector of the other elements. The objective
is a smooth function. To derive the rst order condition, we make
the following assumption:
Assumption 5.
˜ β
is an interior point of B˜ .
By Theorem 2 and Assumption 5, probability approaching 1 as
β
N → ∞.
˜S bSN,1 = β1 , b N
is an interior point of
A Taylor series expansion of
˜ B
tN (bSN , hN )
and
tN (bSN , hN ) = 0
with
around the true parameter
yields
˜S − β), ˜ tN (bSN , hN ) = tN (β, hN ) + H N (b∗N , hN )(b N
(18)
19
where
ρ(N )
˜∗ }, b∗ = bS = β1 , b∗N ≡ {b∗N,1 , b N N,1 N,1
such that
ρ(N )tN (β, hN )
and
˜∗ b N
is a vector between
˜S b N
converges in distribution and also that
to a nonsingular, nonstochastic matrix
H.
and
˜. β
Suppose there is a function
H N (b∗N , hN )
converges in probability
Then,
˜S − β) ˜ = −H −1 ρ(N )tN (β, hN ) + op (1). ρ(N )(b N
It is essential to derive the limiting distribution of
(19)
ρ(N )tN (β, hN ) and the probability limit of H N (b∗N , hN )
by (18) and (19) to obtain the asymptotic distribution of the SGMS estimator.
ρ(N )tN (β, hN )
is asymptotically normal if bandwidth
hN
Later, we will show that
is properly chosen according to the smoothness
conditions imposed on the distribution of the continuous explanatory variable and error terms. put, the fastest convergence rate of of ranking comparison in (4) is
˜ ˜S − β b N
to 0 is
dth (d ≥ 2)
Roughly
ρ(N )−1 N −d/(2d+1)
when the conditional probability
(d − 1)th
order dierentiable. Therefore, a
order dierentiable with respect to the systematic utility and
the conditional density of the continuous explanatory variable is higher convergence rate (corresponding to larger
d),
is achieved at the cost of making stronger smoothness
assumptions on the distributions of the continuous explanatory variable and the error terms. By properly choosing the bandwidth
hN N −1/(2d+1)
below), we can conclude that less than 2. If
d = 1,
and smoothing function
ρ(N )tN (β, hN )
the random matrix
K(·)
(according to Condition 2 given
d
is asymptotically normal. We require the integer
H N (b∗N , hN )
to be no
does not converge to a non-stochastic matrix
˜S has an unknown limiting distribution instead; it follows that the limiting distribution of ρ(N )(b N
H,
and
˜ is also − β)
unknown by (18). In the binomial choice setting, the SMS estimator is derived from a single latent variable equation, where the conditional choice probability of alternative 1,
P (r1 = 1|x) = P (−¯ ε ≤ x0 β|x),
(20)
can be expressed as the conditional distribution of the error term
ε¯ given a single vector x.15
This conditional
distribution function plays an important role in expressing the limiting distribution of the properly normalized
15 Equation interpreted as
(20) uses the common notation adopted in binomial choice analysis. To connect with our notation,
x1 − x2
and
ε¯ should
be interpreted as
ε1 − ε2 .
x
should be
20
SMS estimator.
The SGMS estimator is derived from a model with multiple latent vectors.
special case of complete rankings, calculating the probability of a ranking comparison, e.g.
Outside the
P (r1 < r2 |X),
is
even more complicated than calculating a choice probability. Consider an example where the individual only reveals her best and second best alternatives from the set with four alternatives. By the denition of ranking
r
in (3), we have
P (r1 < r2 |X)
= P (u1 ≥ u2 ≥ max{u3 , u4 }|X) (21)
+P (u1 ≥ u3 > max{u2 , u4 }|X) + P (u1 ≥ u4 > max{u2 , u3 }|X) +P (u3 > u1 ≥ max{u2 , u4 }|X) + P (u4 > u1 ≥ max{u2 , u3 }|X). Calculating
P (r1 < r2 |X)
by (21) using the joint distribution (or density) function of error terms
not an easy task. Fortunately, it is not needed for deriving the asymptotic distribution of By (16), the convergence rate of
tN (β, hN )
P (rj < rk |X) − P (rk < rj |X) approaches 0 as is
dth
N
x0jk β
is 0 if
goes to innity as long as
order dierentiable with respect to
bandwidth
O(hdN ),
hN .
For each pair of alternatives
is 0 by Assumption 3.
x0jk β x0jk β ,
is nonzero. If the dierence we choose a
dth
(j, k),
The kernel function
is
ρ(N )tN (β, hN ).
to 0 depends on the product of the kernel function
the pairwise dierences between ranking comparisons.
ε
K 0 (·)
and
the dierence
K 0 x0jk β/hN
P (rj < rk |X) − P (rk < rj |X)
order kernel
K 0 (·)
and an appropriate
Analogous to the results on the kernel density estimation, the SGMS estimator's bias is
variance is
O[(N hN )−1 ],
and fastest convergence rate is
N −d/(2d+1) .
To facilitate a formal derivation of the asymptotic distribution of the SGMS estimator, we introduce a series of extra notations rst.
v ≡ (v1 , . . . , vJ−1 , vJ )0 . vJ X
and
˜ (v, X)
v − ι J vj .
for xed
β.
vj ≡ x0j β
Let
is 0 since Dene
For example, when
xJ
be the systematic utility of choosing alternative
j.
Denote
is normalized to be 0. There is a one-to-one correspondence between
ιJ ≡ (1, . . . , 1) ∈ RJ .
For any alternative
j ∈ J,
let
v −j
be the vector
1 < j < J,
v −j = (v1 − vj , . . . , vj−1 − vj , 0, vj+1 − vj , . . . , vJ − vj )0 .
In words,
v −j
is computed by subtracting the systematic utility of alternative
systematic utilities. For any pair of alternatives
j, k ∈ J,
dene
v−j,k = vk − vj
j
and
from the raw vector of
˜ −j,k v
as the vector that
21
consists of all elements of
v −j
excluding
v−j,k .
For example, when
1 < j < k < J,
˜ −j,k ≡ (v1 − vj , . . . , vk−1 − vj , vk+1 − vj , . . . , vJ − vj )0 . v
If
J > 2,
for any three dierent alternatives
elements of
v −j
excluding
v−j,k
and
v−j,l .
j, k, l ∈ J,
dene
˜ −j,kl v
as the vector that consists of all of the
1 < j < k < l < J,
For example, when
˜ −j,kl ≡ (v1 − vj , . . . , vk−1 − vj , vk+1 − vj , . . . , vl−1 − vj , vl+1 − vj , . . . , vJ − vj )0 . v ˜ pjk (v−j,k |˜ v −j,k , X)
Let
denote the conditional density of
v−j,k
given
˜ . (˜ v −j,k , X)
Dene the derivatives
(i)
i ˜ = ∂ i pjk (v−j,k |˜ ˜ pjk (v−j,k |˜ v −j,k , X) v −j,k , X)/∂(v −j,k )
and
(0) ˜ ≡ pjk (v−j,k |˜ ˜ pjk (v−j,k |˜ v −j,k , X) v −j,k , X).
˜ pjkl (v−j,k , v−j,l |˜ v −j,kl , X)
Let
denote the joint density of
Given any pair of alternatives for xed
β ∈ B.
(v−j,k , v−j,l )
X,
˜ . (˜ v −j,kl , X)
˜ ˜ −j,k , X) j, k ∈ J, there is a one-to-one correspondence between X and (v−j,k , v
The conditional probability of alternative
explanatory matrix
conditional on
or equivalently,
˜ . ˜ −j,k , X) (v−j,k , v
j
ranked better than alternative
k
depends the
Dene
˜ ≡ Fjk (v−j,k , v ˜ ˜ −j,k , X) ˜ −j,k , X) P (rj < rk |v−j,k , v
(22)
˜ − P (rk < rj |v−j,k , v ˜ ≡ F¯jk (v−j,k , v ˜ ˜ −j,k , X) ˜ −j,k , X) ˜ −j,k , X). P (rj < rk |v−j,k , v
(23)
and
22
Next, for any integer
i > 0,
dene the following derivatives:
(i) i ˜ ≡ ∂ i F¯jk (v−j,k , v ˜ ˜ −j,k , X) ˜ −j,k , X)/∂(v F¯jk (v−j,k , v −j,k )
whenever the derivatives exist. Likewise, dene the scalar constants
kd =
´∞ −∞
kd
and
kΩ
by
xd K 0 (x)dx
and
ˆ
∞
[K 0 (x)]2 dx,
kΩ = −∞
whenever these quantities exist. Finally, dene the
H
q−1
vector
a
and the
(q − 1) × (q − 1)
matrices
Ω
and
as follows:
X
a=
kd
1≤j
Ω=
X
d X
h i 1 (i) ˜ p(d−i) (0|˜ ˜ xjk , ˜ −j,k , X) v −j,k , X)˜ E F¯jk (0, v jk i!(d − i)! i=1
(24)
i h ˜ pjk (0|˜ ˜ xjk x ˜ −j,k , X) ˜ 0jk , 2kΩ E Fjk (0, v v −j,k , X)˜
(25)
1≤j
and
H=
X
i h (1) ˜ pjk (0|˜ ˜ xjk x ˜ −j,k , X) ˜ 0jk v −j,k , X)˜ E F¯jk (0, v
(26)
1≤j
whenever these quantities exist. Now, we turn to the derivation of the asymptotic distribution of the SGMS estimator by making the following requirements on the smoothing function
Condition 2. The following statements are true.
K(·),
bSN .
We start o
in addition to Condition 1.
23
(a)
K(x) is twice dierentiable for x ∈ R, |K 0 (x)| and |K 00 (x)| are uniformly ´∞ ´∞ ´∞ [K 0 (x)]4 dx, −∞ x2 |K 00 (x)|dx, and −∞ [K 00 (x)]2 dx are nite. −∞ ´∞
|xd K 0 (x)|dx < ∞, kd ∈ (0, ∞), ´∞ ´∞ (1 ≤ i < d), −∞ |xi K 0 (x)|dx < ∞ and −∞ xi K 0 (x)dx = 0. d ≥ 2,
(b) For some integer
(c) For any integer
−∞
i (0 ≤ i ≤ d),
lim hi−d N
´ |hN x|>η
N →∞
any
η > 0,
and any sequence
{hN }
and
bounded, and the integrals
kΩ ∈ (0, ∞).
converging to
For any integer
i
0,
|xi K 0 (x)|dx = 0
and
lim h−1 N
´
N →∞
|hN x|>η
|K 00 (x)|dx = 0.
Next, we state extra assumptions which are needed for the derivation, with brief comments on the implications of each assumption. Assumption 6.
(i) ˜ ˜ −j,k , X) For any pair of alternatives j < k, and for v−j,k in a neighborhood of zero, F¯jk (v−j,k , v
˜ , where exists and is a continuous function of v−j,k and is bounded by a constant C for almost every (˜v −j,k , X) C<∞
and i is an integer (1 ≤ i ≤ d).
By denition (23), function
F¯jk (·)
can be derived from the conditional distribution of the error terms.
Assumption 6 in essence imposes the dierentiability requirement on the conditional distribution function of the vector
ε
with respect to systematic utilities.
Assumption 7.
(a)
The following statements are true.
˜ exists and is a continuous function of v−j,k For any pair of alternatives j, k ∈ J, p(i) v −j,k , X) jk (v−j,k |˜ ˜ < C for v−j,k in a neighborhood of zero, almost every (˜ ˜ , some satisfying |p(i) v −j,k , X)| v −j,k , X) jk (v−j,k |˜ ˜ < C for all v−j,k constant C < ∞, and any integer i (1 ≤ i ≤ d − 1). In addition, |pjk (v−j,k |˜v −j,k , X)| and almost every
(b)
˜ . (˜ v −j,k , X)
˜ < C for all (v−j,k , v−j,l ) and For any three dierent alternatives j, k, l ∈ J, pjkl (v−j,k , v−j,l |˜v −j,kl , X) ˜ . almost every (˜v −j,kl , X)
24
(c)
˜ , vec(X)vec( ˜ ˜ 0 , and vec(X)vec( ˜ ˜ 0 vec(X)vec( ˜ ˜ 0 have nite rst The components of matrices X X) X) X)
absolute moments. Assumption 7 imposes regularity conditions on the explanatory variables.
In addition to the continu-
ity requirement imposed by Assumption 4, Assumption 7 further requires that the conditional probability density function of the rst explanatory variable,
xjk,1 ,
given other explanatory variables is
(d − 1)th
order
dierentiable.
Assumption 8.
(logN )/(N h4N ) → 0
as N → ∞.
Assumptions 6-8, together with Condition 2, are analogous to typical assumptions made in the kernel density estimation. A higher convergence rate of the SGMS estimator can be achieved using a higher order kernel
K 0 (·)
when the required derivatives of
Assumption 9.
F¯ (·)
and
p(·)
exist.
The matrix H , dened by (26), is negative denite.
Note that the matrix
H
is analogous to the Hessian information matrix in the quasi-MLE.
The following theorem presents the main results concerning the asymptotic distribution of the SGMS estimator.
Theorem 3.
Let Assumptions 1-9 and Conditions 1-2 hold for some integer d ≥ 2 and let {bSN } be a sequence
of solutions to problem (14), S
(a)
−1 ˜ ˜ If N h2d+1 → ∞ as N → ∞, then h−d a. N N (bN − β) converges in probability to −H
(b)
˜ converges in distribution to If N h2d+1 has a nite limit λ as N → ∞, then (N hN )1/2 (b˜N − β) N
S
M V N −λ1/2 H −1 a, H −1 ΩH −1 .
(c)
Let hN = (λ/N )1/(2d+1) with 0 < λ < ∞; W be any nonstochastic, positive semidenite matrix such that a0 H −1 W H −1 a 6= 0; EA denote the expectation with respect to the asymptotic distribution of S
˜ − β) ˜ ; N d/(2d+1) (b N
S
S
˜ 0 W (b ˜ − β)] ˜ . The M SE is minimized by setting and M SE ≡ EA [(b˜N − β) N
λ = λ∗ ≡ trace ΩH −1 W H −1 / 2da0 H −1 W H −1 a ,
(27)
25
S
˜ converges in distribution to in which case N d/(2d+1) (b˜N − β) M V N −(λ∗ )d/(2d+1) H −1 a, (λ∗ )−1/(2d+1) H −1 ΩH −1 .
By Theorem 3, if as
2d+1 −d/(2d+1) d/(2d+1) N h2d+1 → ∞ as N → ∞, then h−d = (N hN ) → 0; if N h2d+1 →0 N N /N N
N → ∞, (N hN )1/2 /N d/(2d+1) = (N h2d+1 )1/(4d+2) → 0. N
rate of convergence of the SGMS estimator is
N −d/(2d+1) .
λ ∈ (0, ∞) can achieve the fastest rate of convergence. M SE
of the asymptotic distribution of
Therefore, Theorem 3 implies that the fastest
Choosing bandwidth
Theorem 3(c) shows that
hN = (λ/N )1/(2d+1)
where
λ∗ , dened by (27), minimizes
S
˜ − β) ˜ . N d/(2d+1) (b N
To make the results of Theorem 3 useful in applications, it is necessary to be able to estimate the parameters in the limiting distribution
a, Ω,
and
H
consistently from observations of
(r, X).
The next
theorem shows how this can be done.
Theorem 4.
Let Assumptions 1-9 and Conditions 1-2 hold for some integer d ≥ 2 and vector bSN be a
consistent estimator based on hN ∝ N −1/(2d+1) . Let h∗N ∝ N −δ/(2d+1) , where δ ∈ (0, 1). Then converges in probability to a.
(a)
ˆ N ≡ (h∗N )−d tN (bSN , h∗N ) a
(b)
For b ∈ B and n = 1, . . . , N , dene X
tN n (b, hN ) =
˜ njk h−1 [1(rnj < rnk ) − 1(rnk < rnj )] K 0 x0njk b/hN x N ,
1≤j
the matrix ˆ N ≡ (hN /N ) Ω
N X
tN n (bSN , hN )tN n (bSN , hN )0
n=1
converges in probability to Ω. (c)
H N (bSN , hN )
converges in probability to H .
By Theorem 3(c), the asymptotic bias of
hN = (λ/N )1/(2d+1) .
˜S − β) ˜ N d/(2d+1) (b N
is
−λd/(2d+1) H −1 a
It follows from Theorem 4 that the bias term
when the bandwidth
−λd/(2d+1) H −1 a
can be estimated
26
consistently by
ˆN . −λd/(2d+1) H N (bN , hN )−1 a
Therefore, dene
˜u = b ˜S + (λ/N )d/(2d+1) H N (bS , hN )−1 a ˆN b N N N
(28)
as the bias-corrected SGMS estimator.
3.2
A Small-Sample Correction
In this subsection, we apply a method proposed by Horowitz (1992) to remove part of the nite sample bias of
ˆN . a
By Theorem 2,
˜S ˆ N around b of a N
˜ =β
bSN,1 = β1
with probability approaching 1 as
N
goes to innity. A Taylor expansion
yields
˜S − β) ˜ ˆ N − a = (h∗N )−d tN (β, h∗N ) − a + (h∗N )−d H N (b∗N , h∗N )(b a N
with probability approaching 1 as
N
goes to innity, where
hand side of (29) shows that the nite sample bias of
(h∗N )−d tN (β, h∗N ) − a,
ˆN a
is a vector between
has two components.
has a nonzero mean due to the use of a nonzero bandwidth
∗ −d ˜S H N (b∗N , h∗N )(b second component, (hN ) N true parameter vector
b∗N
β
in estimating
˜ , − β)
(29)
bSN
and
β.
The right-
The rst component,
h∗N
to estimate
a.
The
has a nonzero mean due to the use of an estimate of the
a.
The bias correction method described here is aimed at removing the second component of bias by order
N −(1−δ)d/(2d+1) .
Note that the second component of the right-hand side of (29) can be written as
˜ ˜ = N hN (h∗ )2d −1/2 H N (b∗ , h∗ )(N hN )1/2 (b ˜S − β). ˜S − β) (h∗N )−d H N (b∗N , h∗N )(b N N N N N
The probability limit of
H N (b∗N , h∗N )
converges in distribution to
is
H
by Lemmas 8-9 of the Appendix B, and
M V N (−λ1/2 H −1 a, H −1 ΩH −1 )
1/2 ∗ −d ˜S − β) ˜ N hN (h∗N )2d (hN ) H N (b∗N , h∗N )(b N
by Theorem 3. Therefore,
˜S − β) ˜ (N hN )1/2 (b N
27
converges in distribution to
M V N (−λ1/2 a, Ω).
ˆN a
By this result, we treat
as an estimator of
−1/2 1/2 a − N hN (h∗N )2d λ a
rather than of
a.
Thus, the bias corrected estimator of
a
is
n −1/2 o ˆ cN = a ˆ N / 1 − λ−1 N hN (h∗N )2d . a
3.3
(30)
Bandwidth Selection
Theorem 3(c) provides a way to choose the bandwidth for the SGMS estimator. To achieve the minimum
M SE ,
an optimal
λ∗
can be consistently estimated by the conclusion of Theorem 4. Therefore, one possible
way of choosing bandwidth is to set for
ˆ )1/(2d+1) hN = (λ/N
given the integer
ˆ is a consistent estimator d, where λ
λ∗ . Specically, the choice of bandwidth can be implemented by taking the following steps. Step 1. Given
d,
choose a
hN ∝ N −1/(2d+1)
and
S Step 2. Compute the SGMS estimator bN using
h∗N ∝ N −δ/(2d+1) hN . Use bSN and
for
h∗N
δ ∈ (0, 1). to compute
ˆ cN . a
Use
bSN
and
hN
to
ˆ N and H N (bS , hN ). compute Ω N Step 3. Estimate
ˆN λ
=
λ∗
by
io n h ˆ N H N (bS , hN )−1 H N (bS , hN )−1 trace Ω N N i−1 h c 0 S S −1 ˆ cN . · 2d(ˆ aN ) H N (bN , hN ) H N (bN , hN )−1 a
Step 4. Calculate the estimated bandwidth
(31)
ˆ N /N )1/(2d+1) . heN = (λ
Step 5. Compute the SGMS estimator using
heN .
Note that this approach is analogous to the plug-in method of kernel density estimation. As usual in the application of the plug-in method, the choice of the initial bandwidth some exploration, because the estimated bandwidth
heN
hN
and parameter
δ
would require
may be sensitive to that choice. In our Monte Carlo
experiments in the next section, the bandwidth has been initialized by setting
hN = N −1/5
and
δ = 0.1.
28
4
Monte Carlo Experiments
In this section, we provide Monte Carlo simulation results to explore nite-sample properties of the GMS estimator individual
bN
and the SGMS estimator
n's
utility from alternative
unj = xnj,1 β1 + xnj,2 βn2 + εnj
bSN .
We consider six data generating processes (DGPs). In each DGP,
j , unj ,
for
is specied as
n = 1, 2, ..., N
and
j = 1, 2, ..., 5.
Each DGP is used to simulate two sets of 1000 random samples of set and
500
N
(32)
individuals, where
N = 100
in the rst
in the second set.
β1
In all DGPs, the rst preference parameter for all individuals:
β1 = 1 .
is a deterministic coecient and takes the value of 1
In DGPs 1-4, the second preference parameter
coecient and takes the value of
1
for all individuals:
βn2 = β2 = 1
for all
n.
βn2
is also a deterministic
In DGPs 5-6, however,
βn2
is a random coecient that varies across individuals, and each individual's coecient value is a draw from
N (1, 1): βn2 = β2 + ηn error terms
εnj :
where
β2 = 1
and
ηn
is a
we provide more details below.
N (0, 1)
and
xnj,2 .18
16 Each DGP species its own distribution of
17
The econometrician observes a utility-based ranking
xnj,1
draw.
rn
of
J =5
alternatives in
J,
as well as attributes
As usual, the depth of observed rankings would inuence the nite sample precision of
an estimator; and in the context of our semparametric estimators, it also inuences the degree of exibility that semiparametric models oer. Recall that when the complete rankings
(M = J − 1 = 4)
are observed,
the semiparametric model nests all popular parametric models as special cases; when only partial rankings (M
< 4)
are available, this is not the case because then the semiparametric model cannot accommodate
alternative-specic heteroekedasticity and exible correlation patterns. We will therefore explore the nite sample behavior of the estimators at three depth levels:
M =1
when only the best alternative is observed,
M = 2 when the best and second alternatives are observed, and M = 4 when the complete ranking is observed. 16 In
random coecient models, we are often interested in discovering a certain central tendency of the random preference
E(βn2 ) under correct parametric median(βn2 ) under Assumptions 1-4. For E(βn2 ) = median(βn2 ) = 1.
parameter, such as its mean or its median. The mixed logit estimator will consistently estimate specications and the proposed semiparametric estimators can consistently estimate the simplicity of demonstration, we choose
βn2 ∼ N (1, 1)
such that
17 In all DGPs, we normalize the variance of ε 2 nj to be π /6, subject to rounding errors. 18 Here we use a relative small choice set mainly because the probit and the mixed logit specications
yield objective functions
that require multivariate integration, and consequently a lot of computation time. The computation time of the GMS and SGMS estimators per se is aordable even if the choice set is very large.
29
In all DGPs, observed attribute uniform draws: specically,
xnj,1
and
qnj
xnj,1
is a draw from
xnj,2 ≡ qnj /zn
where
N (0, 2) and xnj ,2
qnj
is from
U (0, 3)
vary across both individuals and alternatives, whereas
is generated as a ratio of two dierent and
zn
zn
is from
U ( 51 , 5).19
Note that
varies only across individuals. All
three distributions that generate the observed attributes are independent of one another, and
i.i.d.
across
the subscripted dimension(s). For comparison with our GMS and SGMS estimates, we also compute maximum likelihood estimates using three popular parametric models summarized in Section 2.3, namely rank-ordered logit (ROL), rank-ordered probit (ROP), and mixed ROL (MROL). We do not estimate the nested ROL model, primarily because our analysis already includes the ROP model which is a more exible parametric method to incorporate correlated errors. In case of ROP and MROL, we opt to place no constraint on the variance-covariance parameters of the underlying multivariate normal densities.
20 This allows us to compare our semiparametric methods with
both restrictive (ROL) and very exible (ROP and mixed ROL) parametric methods. Our discussion focuses on coecient ratio
β2 /β1 which is identied in both parametric and semiparametric
models. In the discrete choice analysis of individual preferences, the main parameter of interest often takes the form of a ratio between coecients on non-price and price attributes; this type of ratio is known as,
inter alia, equivalent prices (Hausman and Ruud, 1987), implicit prices (Calfee et al., 2001) and willingnessto-pay (Small
et al.,
manner to identify
2005).
β1
and
In parametric models, we normalize the scale of the error terms in the usual
β2
separately, and we derive the ratio of the relevant slope coecient estimates.
In semiparametric models, we normalize of interest
β2 /β1 = β2 /sign(β1 )
|β1 | = 1
to identify
β2
using the relevant estimates.
and the sign of
β1 ,
and we compute the ratio
21 Since the GMS estimator entails maximizing
a sum of step functions, we use a global search method to compute the GMS estimates: specically the dierential evolution algorithm of Storn and Price (1995) which was also Fox's (2007) preferred method for computing his multinomial MS estimates. In this Monte Carlo study, we implement a particular version of
19 This
pair of uniform distributions ensures that the second observed attributes has approximately the same variance as the
rst attribute, i.e.
20 Our
V ar(qnj /zn ) ' 2.
ROP specication requires estimating two slope coecients (β1 and β2 ) and eight identied variance-covariance parameters of pairwise error dierences. Our MROL specication assumes that both slope coecients are random and bivariate normal: we estimate two mean (β1 and
β2 )
and three variance-covariance parameters of the bivariate normal density. The ROP
(MROL) model has been estimated in Stata using command -asroprobit- (-mixlogit-); the likelihood function has been simulated by taking 250 pseudo-random draws from Hammersley (Halton) sequences.
21 The
property.
estimate of the sign will converge at an extremely fast rate such that there is no need to analyze its nite-sample
30
the SGMS estimator which uses the standard normal distribution function as the smoothing kernel
K(·).
The
resulting objective function is dierentiable, and can be maximized by starting any of usual gradient-based algorithms from several initial search points. For the SGMS estimator, the bandwidth has been initalized by setting
hN = N −1/5
and
δ = 0.1,
and optimized subsequently by applying the method in Section 3.3.
22
Table 1 summarizes the true distribution of the error terms in each DGP and whether particular methods can estimate
β2 /β1
consistently.
The summary presents a strong case for the importance of considering
semiparametric methods for rank-ordered choice data: the GMS/SGMS estimator using complete rankings is the only method that remains consistent throughout all DGPs. The GMS/SGMS estimator using partial rankings is consistent when the error terms are
i.i.d.
(DGP 1-2) or heteroskedastic across individuals (DGP
3), but becomes inconsistent in the presence of alternative-specic heteroskedasticity (DGP 4) and/or random coecients (DGP 5-6). As usual, a parametric method is consistent only when the DGP happens to coincide with the postulated parametric model itself or its special cases. Tables 2-7 report each method's bias and RMSE across 1,000 samples of size
N
DGP. In each table, the top and bottom panels summarize the results for sample sizes
simulated from each
N = 100 and N = 500
respectively. Eciency gains from the use of deeper rankings, alongside the usual play of asymptotics, are apparent from the tables. When a method is consistent for a particular DGP, increasing the depth of rankings
M
holding the sample size
N
xed reduces its bias and RMSE. Increasing the sample size holding the depth
of rankings xed also has the same eects qualitatively. The GMS estimator using complete rankings is consistent under all DGPs, and displays negligible nite sample bias in most cases. The associated bias is approximately 6% of the true parameter value in DGPs 1 and 2 when
N = 100,
and 2% or less in all other DGPs and/or sample size congurations. These results
illustrate a considerable benet that the use of deeper rankings oers for semiparametric estimation: the partial rankings GMS estimator is consistent for only rst three DGPs (DGP 1-3), and even under those DGPs, the estimator exhibits larger bias which sometimes exceeds 10% of the true value when the sample size
N = 100 (though bias stays below 4% when N = 500).
Across all depth levels and sample size congurations,
the SGMS estimator behaves similarly as its GMS counterpart but tends to display a small increase in bias
22 When
the sample size is small, the parameter
rate of the bias estimator
ˆN , a
λ∗
may be estimated with a large standard error due to the slow convergence
sometimes resulting in a very large estimate of the bandwidth. We apply a trimming procedure
to avoid this situation. The estimated
λ∗
is trimmed at a large constant (1000) for all DGPs.
31
Table 1: Consistency of estimators by Monte Carlo DGPs DGP
Distribution of
εnj
ROL
ROP
MROL
(S)GMS
1
εnj
is
i.i.d. EV (0, 1, 0)
Yes
No
Yes
Yes
2
εnj
is
i.i.d. N (0.577, π2 /6)
No
Yes
No
Yes
3
εnj = 0.0055(zn4 + 2zn2 )nj where nj is i.i.d. N (0, 1)
No
No
No
Yes
4
εnj = 0.75xnj,2 nj where nj is i.i.d. N (0, 1)
No
No
No
5
εnj
is
i.i.d. EV (0, 1, 0)
No when Yes when
No
No
Yes
No when Yes when
6
Note: EV (0, 1, 0)
εnj = 0.75xnj,2 nj where nj is i.i.d. N (0, 1)
No
No
No
No when Yes when
M < 4; M =4 M < 4; M =4 M < 4; M =4
stands for the extreme value type 1 distribution, assumed by the ROL model, with a
0.577 and a variance of π 2 /6. Where relevant, the error component is i.i.d. for n = 1, . . . , N j = 1, . . . , J . M = 4 (M < 4) refers to an estimator that incorporates the complete (partial) rankings. (No) means the estimator of β2 /β1 is (not) consistent given the DGP.
mean of
and Yes
0.0034
-0.0007
-0.0006
2
4
-0.0016
4
1
0.0119
2
500
0.0300
1
100
Bias
M
N
0.0601
0.0805
0.1124
0.1382
0.1883
0.2698
RMSE
ROL
-0.0070
-0.0026
0.0091
-0.0075
0.0142
0.0490
Bias
0.0630
0.0834
0.1170
0.1466
0.1966
0.2972
RMSE
ROP
0.0047
0.0076
0.0151
0.0100
0.0297
0.0654
Bias
βˆ2 /βˆ1
0.0626
0.0841
0.1206
0.1450
0.2001
0.3100
RMSE
MROL
Table 2: Monte Carlo results on
0.0045
0.0200
0.0363
0.0653
0.0843
0.1453
Bias
0.1739
0.2157
0.2858
0.3355
0.4077
0.5777
RMSE
GMS
of DGP 1
0.0224
0.0338
0.0528
0.0632
0.0927
0.1403
Bias
0.1044
0.1439
0.2029
0.2422
0.3122
0.4759
RMSE
SGMS
32
0.0052
0.0081
0.0134
2
4
0.0154
4
1
0.0113
2
500
0.0243
1
100
Bias
M
N
0.0679
0.0811
0.1079
0.1488
0.1817
0.2491
RMSE
ROL
0.0038
0.0056
0.0097
0.0058
0.0125
0.0379
Bias
0.0629
0.0788
0.1086
0.1429
0.1845
0.2781
RMSE
ROP
0.0185
0.0164
0.0201
0.0278
0.0284
0.0543
Bias
βˆ2 /βˆ1
0.0719
0.0847
0.1147
0.1592
0.1942
0.2801
RMSE
MROL
Table 3: Monte Carlo results on
0.0191
0.0315
0.0363
0.0597
0.1106
0.1301
Bias
0.2072
0.2262
0.2756
0.3781
0.4572
0.5560
RMSE
GMS
of DGP 2
0.0305
0.0383
0.0463
0.0749
0.1002
0.1280
Bias
0.1205
0.1430
0.1823
0.2805
0.3434
0.4260
RMSE
SGMS
33
0.1569
0.1570
0.1593
2
4
0.1661
4
1
0.1609
2
500
0.1760
1
100
Bias
M
N
0.1645
0.1655
0.1752
0.1904
0.1980
0.2517
RMSE
ROL
0.2032
0.2038
0.2009
0.1928
0.1806
0.1892
Bias
0.2082
0.2122
0.2205
0.2179
0.2224
0.2848
RMSE
ROP
0.0697
0.0780
0.0830
0.0770
0.0844
0.1039
Bias
βˆ2 /βˆ1
0.0754
0.0873
0.1033
0.1013
0.1222
0.1810
RMSE
MROL
Table 4: Monte Carlo results on
-0.0002
0.0005
0.0021
0.0029
0.0055
0.0307
Bias
0.0193
0.0309
0.0603
0.0561
0.0940
0.1873
RMSE
GMS
of DGP 3
0.0196
0.0214
0.0266
0.0329
0.0342
0.0532
Bias
0.0294
0.0381
0.0590
0.0644
0.0864
0.1446
RMSE
SGMS
34
-0.1502
-0.3021
-0.4945
2
4
-0.4880
4
1
-0.3021
2
500
-0.1276
1
100
Bias
M
N
0.4998
0.3145
0.1913
0.5123
0.3529
0.2794
RMSE
ROL
-0.4658
-0.3055
-0.1541
-0.4191
-0.2494
-0.0674
Bias
0.4711
0.3182
0.2037
0.4485
0.3187
0.2881
RMSE
ROP
-0.2828
-0.1401
-0.0008
-0.2963
-0.1520
0.0073
Bias
βˆ2 /βˆ1
0.2906
0.1634
0.1181
0.3324
0.2457
0.2721
RMSE
MROL
Table 5: Monte Carlo results on
-0.0032
0.1500
0.2872
-0.0063
0.1593
0.3087
Bias
0.1537
0.2356
0.3687
0.2591
0.3600
0.5129
RMSE
GMS
of DGP 4
0.0277
0.1785
0.3221
0.0457
0.2065
0.3674
Bias
0.0904
0.2093
0.3559
0.2099
0.3252
0.5121
RMSE
SGMS
35
-0.2970
-0.2752
-0.2692
2
4
-0.2531
4
1
-0.2553
2
500
-0.2618
1
100
Bias
M
N
0.2905
0.2983
0.3268
0.3538
0.3728
0.4159
RMSE
ROL
-0.3259
-0.3329
-0.3488
-0.2763
-0.2763
-0.2739
Bias
0.3438
0.3532
0.3766
0.3713
0.3900
0.4408
RMSE
ROP
0.0194
0.0124
-0.0117
-0.0037
-0.0105
-0.0157
Bias
βˆ2 /βˆ1
0.1132
0.1277
0.1614
0.2466
0.2877
0.3681
RMSE
MROL
Table 6: Monte Carlo results on
0.0141
-0.0020
-0.0442
0.0161
0.0093
0.0196
Bias
0.2280
0.2670
0.3193
0.4255
0.4857
0.5917
RMSE
GMS
of DGP 5
0.0412
0.0193
-0.0220
0.0633
0.0469
0.0390
Bias
0.1660
0.1823
0.2348
0.3398
0.3968
0.4891
RMSE
SGMS
36
-0.3179
-0.4322
-0.5700
2
4
-0.5540
4
1
-0.4141
2
500
-0.2859
1
100
Bias
M
N
0.5764
0.4434
0.3411
0.5848
0.4707
0.4019
RMSE
ROL
-0.3259
-0.3329
-0.3488
-0.5042
-0.3838
-0.2519
Bias
0.3478
0.2892
0.3766
0.5371
0.4493
0.4001
RMSE
ROP
-0.2768
-0.1575
-0.0451
-0.2750
-0.1543
-0.0375
Bias
βˆ2 /βˆ1
0.2888
0.1907
0.1442
0.3478
0.2891
0.3176
RMSE
MROL
Table 7: Monte Carlo results on
0.0006
0.1058
0.1926
0.0012
0.0988
0.2058
Bias
0.1977
0.2370
0.3225
0.3607
0.4181
0.5294
RMSE
GMS
of DGP 6
0.0358
0.1368
0.2355
0.0622
0.1716
0.2816
Bias
0.1356
0.2012
0.3008
0.2960
0.3763
0.5007
RMSE
SGMS
37
38
and a reduction in RMSE, the expected trade-os from using a smoothing kernel to construct a surrogate objective function.
For DGPs 1, 2 and 4, at least one parametric method allows consistent maximum
likelihood estimation. The results suggest that the eciency gains (as measured by the reduction in RMSE) that a consistent SGMS estimator oers over a consistent GMS estimator are comparable to what a consistent parametric estimator oers over the SGMS estimator itself. The results pertaining to DGPs 3, 4 and 6 present a particularly strong case for the considering the use of the semiparametric methods in empirical practice.
While none of the popular parametric methods
is consistent under these DGPs, at least one parametric method arguably comes close to getting each DGP approximately right; yet, even in the larger sample conguration (N
= 500),
an approximately correct
parametric method may still exhibit a substantial amount of bias. In the context of DGP 3, for instance, the ROP model is a correct specication apart from its failure to capture interpersonal heteroskedasticity; yet, the ROP method's bias stays in the neighbourhood of 20% of the true parameter value. In DGP 4 and DGP 6, there is alternative-specic heteroskedasticity induced via a normal error component which multiplies the second attribute
xnj,2 ; this error component can be absorbed into the normal random coecient on xnj,2 , and
the MROL model is therefore a correct specication apart from that it postulates the presence of a redundant extreme value error component. While the MROL method's bias is indeed negligible when only information on the most preferred alternative is used (M
= 1),
used and may reach 28% with complete rankings (M
it becomes amplied as deeper ranking information is
= 4).
While our experiments were designed to illustrate the properties of the semiparametric methods, the results also add some cautionary notes to the debate over the reliability of rank-ordered choice data. Based on the intuitively convincing premise that ranking is a more cognitively demanding task than making a choice, some researchers contend that in case a parametric method using the rst preference (M complete rankings (M
= J − 1)
= 1)
and
yield systematically dierent estimates, the econometrician should not make
use of complete rankings: see Chapman and Staelin (1982) and Ben-Akiva
et al.
(1992) for the inuential
and earliest proponents of this view. The results pertaining to DGPs 3-6, however, caution against basing data and model selection on the comparisons of the rst preference estimates and the complete rankings estimates. Inconsistent parametric methods may or may not be equally biased at all depth levels, and it is not always the case that the rst preference estimates are subject to smaller misspecication bias than the
39
complete rankings estimates.
5
Conclusions
To collect more preference information from a given sample of individuals, multinomial choice surveys can be readily modied to elicit rank-ordered choices. All parametric methods for multinomial choices have their rank-ordered choice counterparts that exploit the extra information to estimate the underlying random utility model more eciently. But semiparametric methods for rank-ordered choices remain undeveloped, apart from the seminal work of Hausman and Ruud (1987) that is only applicable to continuous regressors. We develop two semiparametric methods for rank-ordered choices: the generalized maximum score (GMS) estimator and the smoothed generalized maximum score (SGMS) estimator. The GMS estimator builds on the maximum score (MS) estimator (Manski, 1975; Fox, 2007) for multinomial choices.
Like its predecessor, the GMS
estimator allows the consistent estimation of coecients on both continuous and discrete regressors when there is a suitable continuous regressor for normalizing the scale of utility. We establish conditions for the strong consistency of the GMS estimator, which follows a non-standard asymptotic distribution and displays a
N −1/3
convergence rate. The SGMS estimator complements the GMS estimator, much as Horowitz's (1992)
smoothed MS estimator complements Manski's (1985) MS estimator in the context of binomial choices. By adding mild regularity conditions, we show that the SGMS estimator is also strongly consistent, and that it is asymptotically normal with a convergence rate approaching
N −1/2
as the strength of the smoothing
conditions increases. Our results are fairly general and cover data on complete rankings (i.e. a full preference ranking of
J
alternatives is observed) as well as partial rankings (i.e. up to the
M th
most preferred out of
J
alternatives are observed). Our study nds that rank-ordered choices provide an interesting data environment which can facilitate and benet from the development of semiparametric methods.
Most interestingly, our results show that
using the extra information from rank-ordered choices is not just a matter of eciency gains, to the contrary of what parametric analyses might lead one to anticipate.
For our semiparametric estimators, it is also
a matter of consistency in the sense that using complete rankings instead of partial rankings allows the semiparametric estimators to become robust to wider classes of stochastic specications. More specically,
40
the MS estimator for multinomial choices and the GMS/SGMS estimators for partial rankings are robust to any form of interpersonal heteroskedasticity.
But they are not robust to any error variance-covariance
structure that varies across alternatives, meaning that they cannot consistently estimate exible parametric models including nested logit, unrestricted probit and random coecient logit. By contrast, the GMS/SGMS estimators for complete rankings can accommodate error structures as such, fullling the usual expectations for a semiparametric method to be more exible than popular parametric methods. The main intuition behind this contrast is that the use of complete rankings allows one to infer which alternative is more preferred in every possible pair of alternatives in a choice set. The strong consistency of the GMS/SGMS estimators for complete rankings can be therefore shown under almost the same assumptions as that of the MS estimator for binomial choices, without invoking stronger assumptions needed to address more analytically complex cases of multinomial choices or partial rankings. Together with our Monte Carlo evidence on the bias of parametric methods under misspecication, this nding calls for a reconsideration of the conventional wisdom prevailing in the empirical literature.
Since
Chapman and Staelin (1982), several studies have contended that in case the estimates using complete rankings diverge from the estimates using information on the best alternative alone (or other types of partial rankings), one should have more faith in the latter set of estimates and question the reliability of data on deeper preference ranks.
But with our semiparametric methods, it is the former set of estimates that
is consistent under a wider variety of true models.
And with parametric methods, the discrepancy may
arise even when the reliability of data is beyond any doubt as in simulated samples, because the amount of misspecication bias may vary (non-monotonically) in the depth of rankings used. While the premise that an individual nds it easier to tell her best alternative than, say third- or fourth-best alternative, is intuitively appealing, testing the validity of the conventional wisdom would require the use of a semiparametric method which oers the same degree of robustness regardless of the depth of ranked used in estimation. In our view, the development of a method as such is a promising avenue for future research.
41
A
Proof of Theorem 1
In Appendix A, we provide the proofs of Theorem 1 and of Lemmas 1-3. Lemma 1 establishes the identication condition. Lemma 2 veries the continuity property of the probability limit of the GMS estimator's objective function
QN (b).
Lemma 3 shows the uniform convergence of
Throughout, for
Q∗ (b) ≡ E
b ∈ Rq ,
X
1≤j
to its probability limit.
let
1(rj < rk ) · 1(x0jk b ≥ 0) + 1(rk < rj ) · 1(x0kj b > 0) ,
denote the probability limit of
Lemma 1.
QN (b)
QN (b)
Under Assumptions
(A1)
in (9).
, the true preference parameter vector β uniquely maximizes Q∗ (b) for
2-4
b ∈ B.
Proof.
Applying the law of iterated expectations to the right-hand side of (A1) yields
Q∗ (b)
=
E
X
1≤j
=
h
i P (rj < rk |X) · 1(x0jk b ≥ 0) + P (rk < rj |X) · 1(x0kj b > 0)
n o E [P (rj < rk |X) − P (rk < rj |X)] · 1(x0jk b ≥ 0) + P (rk < rj |X)
X 1≤j
It follows from Assumption 3 that
rk |X) − P (rk < rj |X) of
Q∗ (b).
and
β−
for any
β
globally maximizes
is the same as the sign of
Consider a dierent parameter vector
x0jk β .
Q∗ (b)
β − ∈ B.
with positive probability, if we observe that
of alternatives
j, k ∈ J,
the argument for
then we can conclude
β1 = −1
is similar. If
If, for values of
x0jk β
β−
β X
because the sign of
and
x0jk β −
with positive probability,
Q∗ (b).
β
β
In other words,
have opposite signs for some pair
We will show this argument for
the set of points where
P (rj <
is a unique global maximizer
will not maximize
Q∗ (β) > Q∗ (β − ).
β1− = 1,
b ∈ B
Next, we show that
yield dierent rankings of systematic utilities, then
X
for
and
β−
β1 = 1;
yield dierent ranking of
42
systematic utility is
D(β, β − )
=
{X|x0jk β < 0 < x0jk β − f or some j, k ∈ J}
=
˜ < −xjk,1 < x ˜ − f or some j, k ∈ J}. ˜ 0jk β {X|˜ x0jk β D(β, β − )
By Assumption 4(a), the probability of one for any pair of alternatives
− Assumption 4(b). If β1
= −1,
equals zero if and only if
j, k ∈ J, which implies that Xβ = Xβ the set of points where
β
and
β
−
−
˜ =x ˜− ˜ 0jk β ˜ 0jk β x
with probability
with probability one. This contradicts
give dierent predictions is
˜ − , −˜ ˜ f or some j, k ∈ J}. D(β, β − ) = {X|xjk,1 < min(˜ x0jk β x0jk β)
The probability of parameter vector
Lemma 2.
Proof.
D(β, β − )
β
is positive by Assumption 4(a). Thus, we have proved that the true preference
uniquely maximizes
Under Assumptions
2
For any pair of alternatives
Q∗ (b)
for b ∈ B.
and 4, Q∗ (b) is continuous in b ∈ B. j < k,
dene
Q∗jk (b) ≡ E [1(rj < rk ) − 1(rk < rj )] · 1(x0jk b ≥ 0) + 1(rk < rj ) .
Assume that
b1 = 1.
Q∗jk (b)
The argument for
b1 = −1
(A2)
is symmetric. By the law of iterated expectation,
n o = E [P (rj < rk |xjk ) − P (rk < rj |xjk )] · 1(x0jk b ≥ 0) + P (rk < rj |xjk )
=
´ n´ ∞ ˜ −˜ x0jk b
o [P (rj < rk |xjk ) − P (rk < rj |xjk )] gjk (xjk,1 |˜ xjk )dxjk,1 dP (˜ xjk ) + P (rk < rj ), (A3)
where
P (˜ xjk )
denotes the cumulative distribution function of
˜ jk . x
The curly brackets inner integral of the
43
right-hand side of (A3) is a function of
X
Q∗ (b) =
˜ jk x
and
b
that is continuous in
b ∈ B.
Therefore,
Q∗jk (b)
1≤j
is also continuous in
Lemma 3.
Proof.
b ∈ B.
Under Assumptions
1-2
For any pair of alternatives
QN jk (b) ≡ N −1
and 4, QN (b) converges almost surely to Q∗ (b) uniformly over b ∈ B.
j, k ∈ J,
dene
N X [1(rnj < rnk ) − 1(rnk < rnj )] · 1(x0njk b ≥ 0) + 1(rnk < rnj ) . n=1
By denitions (A2), (9) and (A1), we have
X
QN (b) =
Q∗jk (b) = E[QN jk (b)],
QN jk (b),
1≤j
and
X
Q∗ (b) =
Q∗jk (b).
1≤j
By Lemma 4 of Manski (1985), with probability one,
limN →∞ supb∈Rq QN jk (b) − Q∗jk (b) = 0,
Because
QN (b)
uniformly over
Proof.
is the sum of a nite number of term
converges almost surely to
Q∗ (b)
b ∈ B.
(THEOREM 1) The proof of strong consistency involves verifying the conditions of Theorem 2.1 in
Newey and McFadden (1994): (1)
QN jk (b), QN (b)
Q∗ (b)
is uniquely maximized at
β;
44
(2) The parameter space (3)
Q∗ (b)
(4)
QN (b)
B
is continuous in
is compact;
b;
and
converges almost surely to its probability limit
Q∗ (b)
uniformly over
b∈B
.
Conditions (1), (3), and (4) are veried by Lemmas 1, 2, and 3, respectively. Condition (2) is guaranteed by Assumption 2. Therefore, the GMS estimator that maximizes
QN (b)
converges to
β
almost surely under
Assumptions 1-4.
B
Proof of Theorems 2-4
In Appendix B, we provide the proofs of Theorems 2-4 and of Lemmas 4-9. Lemma 4 establishes the uniform convergence of the SGMS objective function to its probability limit. Lemmas 5-6 establish the asymptotic distribution of the normalized forms of
tN (β, hN ).
Lemmas 7-9 justify that
H N (b∗N , hN )
converges to a
nonstochastic matrix in probability. By applying Taylor series expansion, Lemmas 5-7 can be used to derive the asymptotic distribution of the centered, properly normalized SGMS estimator for the random utility model.
Lemma 4.
Under Assumptions
1-4
and Condition 1, QN (b, hN ) converges almost surely to Q∗ (b) uniformly
over b ∈ B. Proof.
First, we show that
QSN (b, hN )
converges almost surely to
QN (b)
uniformly over
b∈B
following the
method in Lemma 4 of Horowitz (1992). We calculate
N X S QN (b, hN ) − QN (b) ≤ 1 N n=1
X
1 x0njk b > 0 − K x0njk b/hN .
1≤j
The right-hand side of (B1) is the sum of
cN 1 (η)
and
cN 2 (η),
where
N
cN 1 (η) ≡
1X N n=1
X
1 x0njk b > 0 − K x0njk b/hN · 1 x0njk b ≥ η ,
1≤j
N
1X cN 2 (η) ≡ N n=1
X 1≤j
1 x0njk b > 0 − K x0njk b/hN · 1 x0njk b < η ,
(B1)
45
and
η ∈R
is a positive number. Condition 1(b) implies that for any
|K(x) − 1| < δ · J −2 for any
N > N0 .
uniformly over
and
|K(−x)| < δ · J −2
Therefore,
b∈B
as
N → ∞. "
X
cN 2 (η) ≤
cN 1 (η) < δ
C N
for any
for any
Next consider
x > c.
As
N > N0 . cN 2 (η).
δ > 0,
there exists
c >0
such that
hN → 0, there exist N0 ∈ N such that η/hN > c
We have shown that for each
η > 0, cN 1 (η) → 0
By Condition 1(a), there is a nite
C
such that
# N X 0 1 xnjk b < η .
(B2)
n=1
1≤j
Horowitz (1992) shows that the inner-bracket part of the right-hand side of (B2) converges to 0 uniformly over
b ∈ B.
Because
J
cN 2 (η)
is nite,
also converges to 0 uniformly over
side of (B1) converges to 0 uniformly over
sup QSN (b, hN ) − Q∗ (b)
b∈B
b∈B
as
N → ∞.
The right-hand
because
≤ sup QSN (b, hN ) − QN (b) + |QN (b) − Q∗ (b)|
b∈B
(B3)
b∈B
≤ sup QSN (b, hN ) − QN (b) + sup |QN (b) − Q∗ (b)| , b∈B
b∈B
and we have proved that the right-hand side of (B3) converges to 0 almost surely. converges almost surely to
Proof.
Q∗ (b)
uniformly over
Therefore,
QSN (b, hN )
b ∈ B.
(THEOREM 2) The proof of strong consistency involves verifying the conditions of Theorem 2.1 in
Newey and McFadden (1994): (1)
Q∗ (b)
is uniquely maximized at
(2) The parameter space (3)
Q∗ (b)
(4)
QSN (b, hN )
B
is continuous in
β;
is compact;
b;
and
converges uniformly almost surely to its probability limit
Q∗ (b).
Conditions (1), (3), and (4) are veried by Lemmas 1, 2, and 4, respectively. Condition (2) is guaranteed by Assumption 2. Therefore, the SGMS estimator that maximizes
QSN (b, hN )
under Assumptions 1-4 and Condition 1.
Lemma 5.
(a)
Let Assumptions 1, 3-4, 6-7 and Conditions 1-2 hold. Then
limN →∞ E h−d N tN (β, hN ) = a;
and
converges to
β
almost surely
46
limN →∞ V ar (N hN )1/2 tN (β, hN ) = Ω.
(b)
Proof.
First, under Assumption 1 we calculate that
X
E h−d = N tN (β, hN )
n o −d−1 E [1(rj < rk ) − 1(rk < rj )] K 0 (x0jk β/hN )˜ xjk hN
1≤j
X
=
(B4)
djk ,
1≤j
where
djk ≡ E [1(rj < rk ) − 1(rk < rj )] K 0 (x0jk β/hN )˜ xjk h−d−1 . N
(B5)
By the law of iterated expectations,
nh i o −d−1 ˜ − P (rk < rj |v−j,k , v ˜ · K 0 (−v−j,k /hN )˜ ˜ −j,k , X) ˜ −j,k , X) P (rj < rk |v−j,k , v xjk hN h i ˜ · K 0 (−v−j,k /hN )˜ ˜ −j,k , X) = E F¯jk (v−j,k , v xjk h−d−1 . N
djk
= E
By Assumption 3, around
v−j,k = 0
˜ = 0. ˜ −j,k , X) F¯jk (0, v
ξ
is between 0 and
d−1 X 1 ¯ (i) 1 ¯ (d) i d ˜ ˜ ˜ −j,k , X)(v ˜ −j,k , X)(v Fjk (0, v Fjk (ξ, v −j,k ) + −j,k ) , i! d! i=1
v−j,k .
˜ = pjk (v−j,k |˜ v −j,k , X)
ξi
is between 0 and
Taylor expansion of
d−i−1 X c=0
where
˜ ˜ −j,k , X) F¯jk (v−j,k , v
yields
˜ = ˜ −j,k , X) F¯jk (v−j,k , v
where
Application of the Taylor series expansion for
(B6)
v−j,k .
˜ pjk (v−j,k |˜ v −j,k , X)
around
v−j,k = 0
(B7)
yields
1 (i) 1 (d−i) i d−i ˜ ˜ pjk (0|˜ v −j,k , X)(v p (ξi |˜ v −j,k , X)(v , −j,k ) + −j,k ) c! (d − i)! jk
Combining (B7) and (B8) yields
(B8)
47
˜ pjk (v−j,k |˜ ˜ F¯jk (v−j,k , −˜ v −j,k , X) v −j,k , X)
=
d−1 X i=1
1 (i) d ˜ p(d−i) (ξi |˜ ˜ ˜ −j,k , X) F¯ (0, v v −j,k , X)(v −j,k ) jk i!(d − i)! jk
1 ¯ (d) d ˜ pjk (v−j,k |˜ ˜ ˜ −j,k , X) + d! Fjk (ξ, v v −j,k , X)(v −j,k ) d−1 d−i−1 X X 1 (i) i+c ˜ p(c) (0|˜ ˜ ˜ −j,k , X) + F¯jk (0, v v −j,k , X)(v , −j,k ) jk i!c! i=1 c=0 (B9)
whenever the derivatives exist.
Assumptions 6-7 imply that all of the derivatives in the right-hand side
of (B9) exist and are uniformly bounded for almost every
ζjk = −v−j,k /hN .
Decompose
djk
˜ (˜ v −j,k , X)
if
|v−j,k | ≤ η
for some
η > 0.
Let
into two parts:
djk ≡ djk1 + djk2 ,
where
ˆ djk1 = h−d N
˜ pj,k (−ζjk hN |˜ ˜ xjk K 0 (ζjk )dζjk dP (˜ ˜ ˜ −j,k , X) F¯jk (−ζjk hN , v v −j,k , X)˜ v −j,k , X)
(B10)
˜ pj,k (−ζjk hN |˜ ˜ xjk K 0 (ζjk )dζjk dP (˜ ˜ ˜ −j,k , X) F¯jk (−ζjk hN , v v −j,k , X)˜ v −j,k , X).
(B11)
|hN ζjk |>η
and
ˆ djk2 = h−d N
|hN ζjk |≤η
Under Assumption 7 and Condition 2,
ˆ |djk1 | ≤ Ch−d N
where
|djk1 |
˜ → 0, |˜ xjk | · |K 0 (ζjk )| dζjk dP (˜ v −j,k , X) |hN ζjk |>η
denotes the vector of the absolute values of
djk1 .
Plugging (B9) into (B11) and using the
48
assumption that
djk2 → kd
as
N → ∞,
K 0 (·)
is a
dth
order kernel yield the result that
d X
h i 1 (i) ˜ p(d−i) (0|˜ ˜ xjk ˜ −j,k , X) E F¯jk (0, v v −j,k , X)˜ jk i!(d − i)! i=1
(B12)
by Lebesgue's dominated convergence theorem. Therefore, by (B4) we have proved part (a).
Next consider
V ar[(N hN )1/2 tN (β, hN )].
V ar[(N hN )1/2 tN (β, hN )] = hN V ar
By Assumption 1,
X
[1(rj < rk ) − 1(rk < rj )] K 0 (x0jk β/hN )˜ xjk h−1 N
.
1≤j
Denote
X
eN ≡
[1(rj < rk ) − 1(rk < rj )] K 0 (x0jk β/hN )˜ xjk h−1 N ,
(B13)
1≤j
then
V ar[(N hN )1/2 tN (β, hN )] = hN E(eN e0N ) − hN E(eN )E(e0N ).
In part (a), we show that where
E[h−d N eN ] = O(1),
so
hN E(eN )E(e0N ) = o(1).
(B14)
Since the binomial choice setting
J = 2 has been discussed in Horowitz (1992), the following discussion focuses on the case where J ≥ 3.
Dene
hN E(eN e0N ) ≡ LN 1 + LN 2 ,
(B15)
where
LN 1 =
X
2h−1 N E {[1(rj < rk ) − 1(rk < rj )] [1(rj < rl ) − 1(rl < rj )]
1≤j
˜ 0jl ·K 0 (x0jk β/hN )K 0 (x0jl β/hN )˜ xjk x ˜ 0kl + [1(rj < rk ) − 1(rk < rj )] [1(rk < rl ) − 1(rl < rk )] K 0 (x0jk β/hN )K 0 (x0kl β/hN )˜ xjk x o ˜ 0kl , + [1(rj < rl ) − 1(rl < rj )] [1(rk < rl ) − 1(rl < rk )] K 0 (x0jl β/hN )K 0 (x0kl β/hN )˜ xjl x
(B16)
49
and
o n 2 2 0 0 0 ˜ ˜ h−1 x x E [1(r < r ) − 1(r < r )] K (x β/h ) jk j k k j N jk jk . N
X
LN 2 =
(B17)
1≤j
Let
ζjk = −v−j,k /hN
for any pair of alternatives
X
LN 2 =
j, k ∈ J.
By the law of iterated expectation,
i ´h ˜ + Fkj (hN ζjk , v ˜ ˜ −j,k , X) ˜ −j,k + hN ζjk ιJ−1 , X) Fjk (−hN ζjk , v
(B18)
1≤j
By Assumptions 3, 6-7, Condition 2, and Lebesgue's dominated convergence theorem,
N → ∞.
LN 2 → Ω
By Assumption 7,
|LN 1 | ≤
´ ˜ ˜ 0jl |dζjk dζjl dP (˜ 2ChN [ |K 0 (ζjk )K 0 (ζjl )˜ xjk x v −j,kl , X)
X 1≤j
Lemma 6.
(B19)
´ ˜ ˜ 0kl |dζkj dζkl dP (˜ v −k,jl , X) + |K 0 (ζjk )K 0 (ζkl )˜ xjk x ´ ˜ ˜ 0kl | dζlj dζlk dP (˜ + |K 0 (ζjl )K 0 (ζkl )˜ xjl x v −l,jk , X)].
Thus, by Assumption 7 and Condition 2,
LN 1 → 0
when
N → ∞.
We have proved part (b).
Let Assumptions 1, 3-4, 6-7 and Conditions 1-2 hold. Then
(a)
2d+1 If N hN → ∞ as N → ∞, then h−d N tN (β, hN ) → a.
(b)
d 2d+1 If N hN → λ, where λ ∈ (0, ∞), as N → ∞, then (N hN )1/2 tN (β, hN ) → M V N (λ1/2 a, Ω).
Proof.
when
p
If
2d+1 N hN →∞
as
N → ∞,
−d V ar[hN tN (β, hN )] =
then
1 V ar[(N hN )1/2 tN (β, hN )] → 0 N h2d+1 N
by Lemma 5(b). Lemma 5(b) together with Chebyshev's Theorem imply Lemma 6(a). Next consider part (b). Dene
wN = (N hN )
1/2
{tN (β, hN ) − E[tN (β, hN )]} .
1/2 (N hN )1/2 E[tN (β, hN )] = (N h2d+1 )1/2 E[h−d a, N N tN (β, hN )] → λ
Lemma 5(a) implies that
50
so it suces to prove that dimensional vector
γ
γ 0 wN
γ 0 γ = 1.
such that
N X
tN (β, hN ) ≡ N −1
is asymptotically distributed as
N (0, γ 0 Ωγ)
for any nonstochastic
q−1
Denote that
tN n ,
n=1 where
X
tN n ≡ tN n (β, hN ) =
[1(rnj < rnk ) − 1(rnk < rnj )] K 0 (x0njk β/hN )˜ xnjk h−1 N .
1≤j
So we have
γ 0 wN = (hN /N )1/2 γ 0
N X
[tN n − E(tN n )].
n=1
Let
CFN (·)
denote the characteristic function of
γ 0 wN .
Using the proof of Lemma 6 in Horowitz (1992)
yields the result that
lim CFN (τ ) = exp(−γ 0 Ωγτ 2 /2),
N →∞
which is the same as the characteristic function of
Lemma 7.
N (0, γ 0 Ωγ).
Let Assumptions 1, 3-4, 6-8 and Conditions 1-2 hold. For any pair of alternatives j, k ∈ J,
˜ , assume that ||˜xjk || ≤ c for some c > 0. Let η be some positive real number such that p(1) v −j,k , X) jk (v−j,k |˜ (1) ˜ , ˜ −j,k , X) F¯jk (v−j,k , v
(2) ˜ exist and are uniformly bounded for almost every (˜ ˜ ˜ −j,k , X) and F¯jk (v−j,k , v v −j,k , X)
if |v−j,k | ≤ η. For θ ∈ Rq−1 , dene t∗N (θ)
=
(N h2N )−1
N X
X
˜ 0njk θ)˜ [1(rnj < rnk ) − 1(rnk < rnj )] K 0 (x0njk β/hN + x xnjk .
n=1 1≤j
Dene the sets ΘN , N = 1, 2, . . ., by ΘN = θ : θ ∈ Rq−1 , hN kθk ≤ η/2c .
51
(a)
Then plim sup kt∗N (θ) − E [t∗N (θ)]k = 0.
(B20)
N →∞ θ∈ΘN
(b)
There are nite numbers α1 and α2 such that for all θ ∈ ΘN kE[t∗N (θ)] − Hθk ≤ o(1) + α1 hN kθk + α2 hN kθk
2
(B21)
uniformly over θ ∈ ΘN . Proof.
Dene
g N n (θ)
X
=
n ˜ 0njk θ)˜ xnjk [1(rnj < rnk ) − 1(rnk < rnj )] K 0 (x0njk β/hN + x (B22)
1≤j
h
−E [1(rnj < rnk ) − 1(rnk < rnj )] K
0
(x0njk β/hN
+
˜ 0njk θ)˜ xnjk x
io
.
The remaining part of the proof of (B20) follows the proof of (A15) in Lemma 7 of Horowitz (1992). Next, we prove (B21). Dene
X
E [t∗N (θ)] ≡
t∗N jk (θ),
1≤j
where
t∗N jk (θ)
Decompose
o n 0 0 ˜ 0jk θ)˜ xjk . = h−2 N E [1(rj < rk ) − 1(rk < rj )] K (xjk β/hN + x h i 0 ˜ ¯ ˜ −j,k , X)K ˜ 0jk θ)˜ = h−2 (−v−j,k /hN + x xjk . N E Fjk (v−j,k , v
t∗N jk (θ)
t∗N jk1 = h−2 N
into two parts:
´ |v−j,k |>η
t∗N jk (θ) ≡ t∗N jk1 + t∗N jk2 ,
where
0 ˜ ˜ −j,k , X)K ˜ 0jk θ) F¯jk (v−j,k , v (−v−j,k /hN + x
˜ ˜ ·˜ xjk pjk (v−j,k |˜ v −j,k , X)dv v −j,k , X) −j,k dP (˜
52
and
t∗N jk2 = h−2 N
´ |v−j,k |≤η
0 ˜ ˜ −j,k , X)K ˜ 0jk θ) F¯jk (v−j,k , v (−v−j,k /hN + x
˜ ˜ ·˜ xjk pjk (v−j,k |˜ v −j,k , X)dv v −j,k , X). −j,k dP (˜ For some nite
C > 0,
by Assumption 7(a) and
ˆ
∗
tN jk1 ≤ Ch−2 N
Let
||˜ xjk || ≤ c,
0 ˜ K (−v−j,k /hN + x ˜ 0jk θ) dv−j,k dP (˜ v −j,k , X).
|v−j,k |>η
˜ 0jk θ . ζjk = −v−j,k /hN + x
Since
hN ||θ|| ≤ η/2c
and
||˜ xjk || ≤ c, |v−j,k | > η
implies that
|ζjk | > | − v−j,k /hN | − |˜ x0jk θ| > η/2hN
and
ˆ
∗
tN jk1 ≤ Ch−1 N
|K 0 (ζjk )| dζjk .
(B23)
|ζjk |>η/2hN
We have
lim sup t∗N jk1 = 0,
(B24)
N →∞θ∈ΘN
because the term on the right-hand side of (B23) converges to 0 by Condition 2. Next, we consider
|v−j,k | ≤ η ,
substitution of
d=2
If
into the right-hand side of (B9) yields
˜ pjk (v−j,k |˜ ˜ F¯jk (v−j,k , −˜ v −j,k , X) v −j,k , X) =
t∗N jk2 .
(1) ˜ pjk (0|˜ ˜ −j,k ˜ −j,k , X) F¯jk (0, v v −j,k , X)v (1) 2 ˜ p(1) (ξ1 |ξ1 , v ˜ ˜ −j,k , X) ˜ −j,k , X)(v +F¯jk (0, v −j,k ) jk (2) 2 ˜ pjk (v−j,k |˜ ˜ ˜ −j,k , X) +(1/2)F¯jk (ξ, v v −j,k , X)(v −j,k ) ,
(B25)
53
where
ξ
and
ξ1
are between 0 and
v−j,k .
Decompose
t∗N jk2
into two parts
t∗N jk2 ≡ sN jk1 + sN jk2 ,
where
´
sN jk1 = h−2 N
(1) ˜ pjk (0|˜ ˜ xjk ˜ −j,k , X) F¯jk (0, v v −j,k , X)˜
|v−j,k |≤η
˜ ˜ 0jk θ)dv−j,k dP (˜ ·v−j,k K 0 (−v−j,k /hN + x v −j,k , X), and
´
sN jk2 = h−2 N
|v−j,k |≤η
(1) ˜ p(1) (ξ1 |ξ1 , v ˜ ˜ −j,k , X) ˜ −j,k , X) [F¯jk (0, v jk
(2) ˜ pjk (v−j,k |˜ ˜ xjk ˜ −j,k , X) +(1/2)F¯jk (ξ, v v −j,k , X)]˜
˜ ˜ 0jk θ)dv−j,k dP (˜ v −j,k , X). ·(v−j,k )2 K 0 (−v−j,k /hN + x Let
˜ 0jk θ , ζjk = −v−j,k /hN + x
sN jk1 =
then
´ |ζjk −˜ x0jk θ|≤η/hN
(1) ˜ pjk (0|˜ ˜ ˜ −j,k , X) F¯jk (0, v v −j,k , X)
˜ ˜ 0jk θ)K 0 (ζjk )dζjk dP (˜ v −j,k , X). ·˜ xjk (ζjk − x Because
|
´
ζK 0 (ζ)dζ = 0
´ |ζjk −˜ x0jk θ|≤η/hN
and
|˜ x0jk θhN | ≤ η/2,
´ ζjk K 0 (ζjk )dζjk | = | |ζjk −˜x0 θ|>η/hN ζjk K 0 (ζjk )dζjk | jk ´ ≤ |ζjk |>η/2hN |ζjk K 0 (ζjk )|dζjk .
By Condition 2, the right-hand term of (B26) is bounded uniformly over
θ ∈ ΘN
(B26)
and it converges to 0.
Therefore, by Lebesgue's dominated convergence theorem,
lim sup |
N →∞θ∈ΘN
´ |ζjk −˜ x0jk θ|≤η/hN
(1) ˜ ˜ −j,k , X) F¯jk (0, v
˜ = 0. ˜ xjk ζjk K 0 (ζjk )dζjk dP (˜ ·pjk (0|˜ v −j,k , X)˜ v −j,k , X)|
(B27)
54
In addition,
||˜ x0jk θ
´ |ζjk −˜ x0jk θ|≤η/hN
˜ 0jk θ|| K 0 (ζjk )dζjk − x
(B28)
´ ≤ |˜ x0jk θhN |h−1 |K 0 (ζjk )|dζjk N |ζjk −˜ x0jk θ|>η/hN ´ ≤ (η/2)h−1 |K 0 (ζjk )|dζjk . N |ζjk −˜ x0 θ|>η/hN jk
By Condition 2, the right-hand side of (B28) is bounded uniformly over by Lebesgue's dominated convergence theorem and the denition of
N →∞ θ∈ΘN
´
X
lim || sup
|ζjk −˜ x0jk θ|≤η/hN
θ ∈ ΘN
and it converges to 0. Next,
H,
(1) ˜ ˜ −j,k , X) F¯jk (0, v
(B29)
1≤j
˜ − Hθ|| = 0. ˜ xjk x ˜ 0jk θζjk K 0 (ζjk )dζjk dP (˜ ·pjk (0|˜ v −j,k , X)˜ v −j,k , X) For some nite
C > 0,
||sN jk2 || ≤ ChN
´ |ζjk −˜ x0jk θ|≤η/hN
˜ ˜ 0jk θ)2 |K 0 (ζjk )|dζjk dP (˜ v −j,k , X) (ζjk − x
(B30)
≤ o(1) + αjk1 hN ||θ|| + αjk2 hN ||θ||2 for some nite
Lemma 8.
αjk1
and
αjk2 .
So part (b) is established by combining (B24), (B27), (B29), and (B30).
S
S ˜ Let Assumptions 1-9 and Conditions 1-2 hold and dene θN = (b˜N − β)/h N , where bN is a
SGMS estimator. Then the probability limit of θN is zero. Proof. Let
Pδ
Given any
δ > 0, choose γ to be a nite number such that P r(||˜ xjk || ≤ γf orany1 ≤ j < k ≤ J) ≥ 1−δ .
be the probability distribution of
k ≤ J}.
Let
Cγ0
X
denote the complement of
conditional on the event
Cγ .
Cγ ≡ {X : |||˜ xjk || ≤ γ f or any 1 ≤ j <
The remaining part of the proof of Lemma 8 follows the proof
in Lemma 8 of Horowitz (1992).
Lemma 9.
0
Let Assumptions 1-8 and Conditions 1-2 hold. Let {βN = (βN 1 , β˜ N )0 } be any sequence in B
such that (βN − β)/hN → 0 as N → ∞. Then the probability limit of H N (βN , hN ) is H . Proof.
Assume that
βN 1 = β1 ∈ {−1, 1} because this is true for suciently large N
˜ − β)/h ˜ θ N = (β N. N
Let
{aN }
be a sequence such that
aN → ∞
and
if
βN 1 − β1 → 0 .
a N θN → 0
as
N → ∞.
Denote Dene
55
˜ N = {X ˜ : ||˜ X xjk || ≤ aN f or any 1 ≤ j < k ≤ J}.
For any
> 0,
h i ˜N . lim P [|H N (β N , hN ) − H| > ] = lim P |H N (β N , hN ) − H| > |X
N →∞
N →∞
Therefore, it suces to show that shev's Theorem. Consider Dene
˜ N] → H E[H N (β N , hN )|X
˜ N] E[H N (β N , hN )|X
˜ N ], E N ≡ E[H N (β N , hN )|X
E N jk = h−2 N
´
then
and
˜ N] → 0 V ar[H N (β N , hN )|X
by Cheby-
rst.
EN =
P
1≤j
E N jk ,
where
˜ pjk (v−j,k |˜ ˜ xjk x ˜ −j,k , X) ˜ 0jk F¯jk (v−j,k , v v −j,k , X)˜
(B31)
˜ ˜ 0jk θ N )dv−j,k dPN jk (˜ ·K 00 (−v−j,k /hN + x v −j,k , X), and
η
PN jk
denote the distribution of
such that
˜ (˜ v −j,k , X)
conditional on
(1) ˜ , F¯ (2) (v−j,k , v ˜ , ˜ −j,k , X) ˜ −j,k , X) F¯jk (v−j,k , v jk
uniformly bounded if
|v−j,k | ≤ η .
and
˜ N. X
By Assumptions 6-7, there exists an
(1)
˜ pjk (v−j,k |˜ v −j,k , X)
exist and are almost surely
Therefore, substitution of (B25) into (B31) yields
E N jk = I N jk1 + I N jk2 + I N jk3 ,
where
I N jk1 = h−2 N
´ |v−j,k |≤η
(1) ˜ pjk (0|˜ ˜ xjk x ˜ −j,k , X) ˜ 0jk F¯jk (0, v v −j,k , X)˜
˜ ˜ 0jk θ N )dv−j,k dPN jk (˜ v −j,k , X), ·v−j,k K 00 (−v−j,k /hN + x I N jk2 = h−2 N
´ |v−j,k |≤η
(1) ˜ p(1) (ξ1 |˜ ˜ ˜ −j,k , X) [F¯jk (0, v v −j,k , X) jk
(2) ˜ pj,k (v−j,k |˜ ˜ xjk x ˜ −j,k , X) ˜ 0jk +(1/2)F¯jk (ξ, v v −j,k , X)]˜
˜ ˜ 0jk θ N )dv−j,k dPN jk (˜ ·(v−j,k )2 K 00 (−v−j,k /hN + x v −j,k , X), and
I N jk3 = h−2 N
´ |v−j,k |>η
˜ pjk (v−j,k |˜ ˜ xjk x ˜ −j,k , X) ˜ 0jk F¯jk (v−j,k , v v −j,k , X)˜
˜ ˜ 0jk θ N )dv−j,k dPN jk (˜ ·K 00 (−v−j,k /hN + x v −j,k , X).
(B32)
56
Dene
˜ 0jk θ N . ζjk = −v−j,k /hN + x
I N jk1 =
´ |ζjk −˜ x0jk θ N |≤η/hN
Then
(1) ˜ pj,k (0|˜ ˜ xjk x ˜ −j,k , X) ˜ 0jk F¯jk (0, v v −j,k , X)˜
˜ ˜ 0jk θ N )K 00 (ζjk )dζjk dPN jk (˜ ·(ζjk − x v −j,k , X). Because
|˜ x0jk θ N | ≤ aN ||θ N || → 0,
by Assumptions 6-7 and Conditions 1-2,
h i (1) ˜ pjk (0|˜ ˜ xjk x ˜ −j,k , X) ˜ 0jk . I N jk1 → E F¯jk (0, v v −j,k , X)˜
C > 0,
For some nite
|I N jk2 | ≤ ChN
(B33)
by Assumptions 6-7 and Conditions 1-2,
´ |ζjk −˜ x0jk θ N |≤η/hN
˜ 0jk | |˜ xjk x
(B34)
˜ → 0. ˜ 0jk θ N )2 |K 00 (ζjk )|dζjk dPN jk (˜ v −j,k , X) ·(ζjk − x Finally we calculate (B32):
ˆ |I N jk3 | ≤
Ch−1 N
|ζjk −˜ x0jk θ N |>η/hN
˜ ˜ 0jk | · |K 00 (ζjk )|dζjk dPN jk (˜ |˜ xjk x v −j,k , X).
Under Assumptions 6-7 and Condition 2, the right-hand side of (B35) converges to 0 as
(B35)
N
goes to innity.
Combination of (B35), (B33), (B34), and (B35) establishes that
X
EN =
E N jk =
1≤j
Next consider
=
X
(I N jk1 + I N jk2 + I N jk3 ) → H.
1≤j
˜ N ]: V ar[H(β N , hN )|X
˜ N] V ar[H(β N , hN )|X X ˜ ˜ 0jk θ N )˜ ˜ 0jk h−2 N −1 V ar{ [1(rnj < rnk ) − 1(rnk < rnj )] K 00 (−v−j,k /hN + x xjk x N |X N } 1≤j
=
N
−1
˜ N ] + O(N −1 ), E[r N (θ N )r N (θ N )0 |X
(B36)
57
where
X
r N (θ N ) =
˜ 0jk θ N )vec(˜ ˜ 0jk )h−2 [1(rnj < rnk ) − 1(rnk < rnj )] K 00 (−v−j,k /hN + x xjk x N .
1≤j
˜ 0jk θ N . ζjk = −v−j,k /hN + x
Let
For some nite
C,
˜ N] N −1 E[r N (θ N )r N (θ N )0 |X X ˆ ˜ 0jk )vec(˜ ˜ 0jk )0 vec(˜ xjk x ≤ ChN (N h4N )−1 xjk x
(B37)
1≤j
˜ ·[K 00 (ζjk )]2 dζjk dPN jk (˜ v −j,k , X) ˆ X ˜ 0jl )0 ˜ 0jk )vec(˜ xjl x vec(˜ xjk x
+Ch2N (N h4N )−1
k6=l; k,l∈J\{j}
˜ ·K 00 (ζjk )K 00 (ζjl )dζjk dζjl dPN jkl (˜ v −j,kl , X), where
PN jkl
is the distribution of
˜ (˜ v −j,kl , X)
conditional on
˜ N. X
The right-hand side of (B37) converges
to zero by Assumptions 6-8 and Condition 2. Therefore, it follows from (B36) that
˜ N] V ar[H(β N , hN )|X
converges to zero.
Proof. 1 as of
(THEOREM 3) By Theorem 2,
N → ∞,
tN (bSN , hN )
and consequently
˜S bSN,1 = β1 and b N
tN (bSN , hN ) = 0
around the true parameter vector
is an interior point of
˜ with probability approaching B
with probability approaching 1. A Taylor series expansion
β
yields
˜S − β) ˜ = 0, tN (β, hN ) + H N (b∗N , hN )(b N
where
b∗N
is between
β
and
(B38)
bSN .
Part (a): By (B38),
S
∗ −d ˜ ˜ h−d N tN (β, hN ) + H N (bN , hN )hN (bN − β) = 0
with probability approaching 1 as
N → ∞.
Lemmas 8-9 imply that
plimH N (b∗N , hN ) = H .
Because
H
is
58
nonsingular by Assumption 9, we have
−1 −d ˜S ˜ h−d hN tN (β, hN ) + op (1). N (bN − β) = −H
Part (a) now follows from Lemma 6(a). Part (b): By (B38),
˜S − β) ˜ =0 (N hN )1/2 tN (β, hN ) + H N (b∗N , hN )(N hN )1/2 (b N
with probability approaching 1 as
N → ∞.
So, by Lemmas 8-9 and Assumption 9,
S
˜ − β) ˜ = −H −1 (N hN )1/2 tN (β, hN ) + op (1). (N hN )1/2 (b N
Part (b) now follows from Lemma 6(b). Part (c): By the property of matrix trace,
˜ 0 ]}. ˜ b ˜S − β) ˜ = T r{W EA [(b ˜S − β)( ˜ 0 W (b ˜S − β)] ˜S − β) EA [(b N N N N
Part (b) implies that
˜ 0] ˜ b ˜S − β) ˜S − β)( EA [(b N N = N −2d/(2d+1) λ−1/(2d+1) H −1 ΩH −1 + λ2d/(2d+1) H −1 aa0 H −1 . Therefore, by the denition of MSE,
M SE = N −2d/(2d+1) T r W S −1 λ−1/(2d+1) Ω + λ2d/(2d+1) aa0 H −1 .
To minimize MSE, take the dierentiation of the right-hand side of (B39) with respect to
(B39)
λ.
From the rst
59
order condition, we show that MSE is minimized by setting
λ = λ∗ ,
where
λ∗ = [trace(W H −1 ΩH −1 )]/[trace(2dW H −1 aa0 H −1 )].
It follows from Part (b) that
˜S − β) ˜ N d/(2d+1) (b N
(B40)
has the asymptotic distribution
M V N (−(λ∗ )d/(2d+1) H −1 a, (λ∗ )−1/(2d+1) H −1 ΩH −1 ).
Proof.
(THEOREM 4)
Part (a): Applying Taylor expansion to
tN (bSN , h∗N )
around
β
yields
(h∗N )−d tN (bSN , h∗N ) = (h∗N )−d tN (β, h∗N ) ˜ +[∂tN (b∗N , h∗N )/∂ b with probability approaching one as
op (1).
Therefore,
˜S (h∗N )−d (b N
N → ∞,
∗ ˜S − β)/h ˜ (b N N = op (1).
˜ = op (1) − β)
˜S ] (h∗N )−d (b N
where
b∗N
˜ − β)
is between
Lemma 9 implies that
−d ˜S because (hN ) (bN
plim[(h∗N )−d tN (β, h∗N )] = a
(B41)
0
˜ = Op (1) − β)
bSN
and
β.
By Lemma 8,
S
˜ ˜ − β)/h (b N = N
˜0 ] = H . plim[∂tN (b∗N , h∗N )/∂ b
by Theorem 3 and
In addition,
(h∗N /hN )−d → 0.
by Lemma 5(a). Part (a) now follows by taking probability limits as
Finally,
N →∞
of
each side of (B41). Part (b): By Chebyshev's Theorem, it suces to show that consider
ˆ N) → Ω E(Ω
and
ˆ N ) → 0. V ar(Ω
First
ˆ N ): E(Ω
ˆ N ) = hN E[tN n (bS , hN )tN n (bS , hN )0 ] ≡ L∗ + L∗ , E(Ω N N N1 N2
where
L∗N 1
=
h−1 N
X 1≤j
h i2 S 0 0 0 ˜ jk x ˜ jk , E [1(rj < rk ) − 1(rk < rj )] K (xjk bN /hN ) x
(B42)
60
and
X
L∗N 2 =
2h−1 N E {[1(rj < rk ) − 1(rk < rj )] [1(rj < rl ) − 1(rl < rj )]
1≤j
˜ 0jl ·K 0 (x0jk bSN /hN )K 0 (x0jl bSN /hN )˜ xjk x ˜ 0kl + [1(rj < rk ) − 1(rk < rj )] [1(rk < rl ) − 1(rl < rk )] K 0 (x0jk bSN /hN )K 0 (x0kl bSN /hN )˜ xjk x ˜ 0kl ]. + [1(rj < rl ) − 1(rl < rj )] [1(rk < rl ) − 1(rl < rk )] K 0 (x0jl bSN /hN )K 0 (x0kl bSN /hN )˜ xjl x Let
˜S − β)/h ˜ θ N = (b N N ´
X
L∗N 1 =
and
˜ 0jk θ N . ζjk = −v−j,k /hN + x
Then
˜ ˜ 0jk θ N ), v ˜ −j,k , X] {Fjk [hN (−ζjk + x
(B43)
1≤j
˜ ˜ −j,k + hN (−ζjk + x ˜ 0jk θ N )ι0J−1 , X} ˜ 0jk θ N ), v +Fkj [hN (−ζjk + x ˜ ˜ xjk x ˜ 0jk [K 0 (ζjk )]2 dζjk dP (˜ ˜ 0jk θ N )|˜ v −j,k , X). v −j,k , X]˜ ·pjk [hN (−ζjk + x By Assumptions 3, 6-7, Condition 2, and Lebesgue's dominated convergence theorem, the right-hand side of (B43) converges to
|L∗N 2 | ≤
Ω
when
X 1≤j
N → ∞.
Under Assumption 7,
´ ˜ ˜ 0jl |dζjk dζjl dP (˜ v −j,kl , X) 2ChN [ |K 0 (ζjk )K 0 (ζjl )˜ xjk x ´ ˜ ˜ 0kl |dζkj dζkl dP (˜ + |K 0 (ζjk )K 0 (ζkl )˜ xjk x v −k,jl , X) ´ ˜ ˜ 0kl | dζlj dζlk dP (˜ v −l,jk , X)]. + |K 0 (ζjl )K 0 (ζkl )˜ xjl x
Therefore, the right-hand side of (B44) converges to 0 when
ˆ N) → Ω E(Ω
by (B42).
(B44)
N →∞
by Assumption 7 and Condition 2. So
61
Next consider
ˆ N) V ar(Ω
ˆ N ). V ar(Ω
By Assumption 1, we can calculate
=
0 S S V ar tN n bN , hN tN n bN , hN 0 S S 2 hN /N E vec tN n bN , hN tN n bN , hN ) 0 0 S S vec tN n bN , hN tN n bN , hN + o(1)
=
(N h2N )−1 E [cc0 ] + o(1),
=
h2N /N
(B45)
where
c≡
X
cjklm ,
1≤j
˜ 0lm ). xjk x cjklm ≡ [1(rj < rk ) − 1(rk < rj )] [1(rl < rm ) − 1(rm < rl )] K 0 (x0jk bSN /hN )K 0 (x0lm bSN /hN )vec(˜
The right-hand side of (B45) converges to 0 under Assumption 7 and Condition 2. Therefore we have proved that
ˆ N ) = 0. V ar(Ω
Part (c): This is a result implied by Lemma 9.
62
References [1] Abrevaya J and Huang J. 2005. On the Bootstrap of the Maximum Score Estimator.
Econometrica
73:
1175-1204.
[2] Barberá S and Pattanaik P. 1986. Falmagne and the Rationalizability of Stochastic Choices in Terms of Random Orderings.
Econometrica
54: 707-715.
[3] Beggs S, Cardell S, and Hausman J. 1981. Assessing the potential demand for electric cars.
Econometrics
Journal of
16: 1-19.
[4] Ben-Akiva M, Morikawa T, and Shiroishi F. 1992. Analysis of the reliability of preference ranking data.
Journal of Business Research
24: 149-164.
[5] Calfee J, Winston C, and Stempski R. 2001. Econometric Issues in Estimating Consumer Preferences from Stated Preference Data: a Case Study of the Value of Automobile Travel Time.
and Statistics
Review of Economics
83: 699-707.
[6] Caparros A, Oviedo J, and Campos P. 2008. Would you choose your preferred alternative? Comparing choice and recoded ranking experiments.
American Journal of Agricultural Economics
[7] Cavanagh CL. 1987. Limiting Behavior of Estimators Dened by Optimization.
90: 843-855.
Unpublished manuscript,
Department of Economics, Harvard University, Cambridge, MA.
[8] Chapman R and Staelin R. 1982. Exploiting rank ordered choice set data within the stochastic utility model.
Journal of Marketing Research
19: 288-301.
[9] Conte A, Hey JD, Moatt PG. 2011. Mixture models of choice under risk. Journal of Econometrics 162: 79-88.
[10] Dagsvik J and Liu G. 2009. A framework for analyzing rank-ordered data with application to automobile demand.
Transportation Research Part A 43:
1-12.
[11] Delgado M, Rodríguez-Poo J, and Wolf M. 2001. Subsampling Inference in Cube Root Asymptotics with an Application to Manski's Maximum Score Estimator.
Economics Letters, 73:
241-250.
63
[12] Falmagne J. 1978. A Representation Theorem for Finite Random Scale Systems.
Psychology, 18:
Journal of Mathematical
52-72.
[13] Fiebig D, Keane M, Louviere J, and Wasi N. 2010. The Generalized Multinomial Logit Model: Accounting for Scale and Coecient Heterogeneity.
Market Science, 29:
393-421.
[14] Fox J. 2007. Semiparametric Estimation of Multinomial Discrete-choice Models Using a Subset of Choices.
RAND Journal of Economics, 38:
1002-1019.
[15] Goeree JK, Holt C, Palfrey T. 2005. Regular Quantal Response Equilibrium.
Experimental Economics
8: 347-367.
[16] Greene WH, Hensher DA, Rose J. 2006. Accounting for heterogeneity in the variance of unobserved eects in mixed logit models. Transportation Research Part B 40: 75-92.
[17] Han A. 1987. Non-parametric Analysis of a Generalized Regression Model.
Journal of Econometrics
35:
303-316.
[18] Harrison GW, Rutstr¨ om EE. 2009. Expected utility theory and prospect theory: one wedding and a decent funeral. Experimental Economics 12: 133-158.
[19] Hausman J and Ruud P. 1987. Specifying and testing econometric models for rank-ordered data.
of Econometrics
34: 83-104.
[20] Hensher DA, Louviere J and Swait J. 1999. Combining sources of preference data.
metrics
Journal
Journal of Econo-
89: 197-221.
[21] Horowitz J. 1992. A Smoothed Maximum Score Estimator for the Binary Response Model.
Econometrica,
60: 505-531.
[22] Kim J and Pollard D. 1990. Cube Root Asymptotics.
Annals of Statistics
18: 191-219.
[23] Klein RW, Spady RH. 1993. An Ecient Semiparametric Estimator for Binary Response Models.
metrica
61: 387-421.
Econo-
64
[24] Layton D. 2000. Random coecient models for stated preference surveys.
Economics and Management
Journal of Environmental
40: 21-36.
[25] Layton D and Levine R. 2003. How Much Does the Far Future Matter? A Hierarchical Bayesian Analysis of the Public's Willingness to Mitigate Ecological Impacts of Climate Change.
Statistical Association
Journal the American
98: 533-544.
[26] Lee L-F. 1995. Semiparametric maximum likelihood estimation of polychotomous and sequential choice models.
Journal of Econometrics, 65:
381-428.
[27] Lewbel A. 2000. Semiparametric Qualitative Response Model Estimation With Instrumental Variables and Unknown Heteroscedasticity.
Journal of Econometrics
97: 145-177.
[28] Manski C. 1975. Maximum Score Estimation of the Stochastic Utility Model of Choice.
Econometrics, 3:
Journal of
205-228.
[29] Manski C. 1985. Semiparametric Analysis of Discrete Response: Asymptotic Properties of the Maximum Score Estimator.
Journal of Econometrics, 27:
313-334.
[30] McFadden D. 1974. Conditional logit analysis of qualitative choice behavior. In: Zarembka P. (Ed.),
Frontiers in Econometrics. Academic Press:
New York, pp.105-142.
[31] McFadden D. 1986. The Choice Theory Approach to Market Research.
Marketing Science
5: 275-297.
[32] Newey W. 1986. Linear Instrumental Variable Estimation of Limited Dependent Variable Models with Endogenous Explanatory Variables.
Journal of Econometrics, Vol 32:
127-141.
[33] Newey W and McFadden D. 1994. Large Sample Estimation and Hypothesis Testing.
Econometrics, Vol 4:
Handbook of
2111-2245.
[34] Oviedo J and Yoo H. 2016. A Latent Class Nested Logit Model for Rank-Ordered Data with Application to Cork Oak Reforestation.
s10640-016-0058-7.
Environmental and Resource Economics.
DOI:
10.1007/
65
[35] Ruud P. 1983. Sucient Conditions for the Consistency of Maximum Likelihood Estimation Despite Misspecication of Distribution in Multinomial Discrete Choice Models.
Econometrica, 51:
225-228.
[36] Ruud P. 1986. Consistent Estimation of Limited Dependent Variable Models Despite Misspecication of Distribution.
Journal of Econometrics, 32:
157-187.
[37] Scarpa R, Notaro S, Louviere J, and Raaeli R. 2011. Exploring scale eects of best/worst rank ordered choice data to estimate benets of tourism in Alpine Grazing Commons.
Economics
American Journal of Agricultural
93: 813-828.
[38] Sherman R. 1993. The Limiting Distribution of the Maximum Rank Correlation Estimator.
Econometrica
61: 123-137.
[39] Small K, Winston C, and Yan J. 2005. Uncovering the distribution of motorists' preferences for travel time and reliability.
Econometrica
73: 1367-1382.
[40] Storn R and Price K. 1997. Dierential EvolutionA Simple and Ecient Heuristic for Global Optimization over Continuous Spaces.
Journal of Global Optimization, 11:
341-359.
[41] Train K and Winston C. 2007. Vehicle choice behavior and the declining market share of U.S. automakers.
International Economic Review
48: 1469-1496.
[42] Yan J. 2012. A smoothed maximum score estimator for multinomial discrete choice models. Working paper.
[43] Yan J and Yoo H. 2014. The seeming unreliability of rank-ordered data as a consequence of model misspecication. MPRA Paper No. 56285. http://mpra.ub.uni-muenchen.de/56285/
[44] Yoo H and Doiron D. 2013. The use of alternative preference elicitation methods in complex discrete choice experiments.
Journal of Health Economics
32: 1166-1179.