Semiparametric Estimation of the Random Utility Model ...

Viewer
Transcript

Semiparametric Estimation of the Random Utility Model with Rank-Ordered Choice Data †

Jin Yan

Hong Il Yoo

∗

‡

April 15, 2017

Abstract We propose two semiparametric methods for estimating the random utility model using rank-ordered choice data. The framework is semiparametric in that the utility index function includes nite dimensional preference parameters but the error terms follow an unspecied distribution. Our methods allow for a exible form of heteroskedasticity across individuals.

When the complete preference rankings of

alternatives in a choice set are observed, our methods also allow for exible patterns of heteroskedasticity and correlated errors across alternatives, as well as a variety of random coecient distributions; in particular, our methods can accommodate most of popular parametric random utility models and any nite mixture of those models. The baseline method we develop is the generalized maximum score (GMS) estimator, which is strongly consistent but follows a non-standard asymptotic distribution. To facilitate statistical inferences, we make extra but mild regularity assumptions and develop the smoothed generalized maximum score (SGMS) estimator, which is both strongly consistent and asymptotically normal. Our Monte Carlo experiments show that under a variety of stochastic specications, the GMS and SGMS estimators perform favorably against popular parametric estimators. ∗ We thank Xu Cheng, Liran Einav, Jeremy Fox, Bruce Hansen, Han Hong, Arthur Lewbel, Taisuke Otsu, Joris Pinkse, Jack Porter and seminar participants at the 2015 Tsinghua Econometric Conference, the 12th International Symposium on Econometric Theory and Applications, the 2016 Asian and European Meetings of the Econometric Society, Academia Sinica, Newcastle University, Sun Yat-Sen University and the Chinese University of Hong Kong for valuable comments and discussions. We acknowledge funding support provided by the Hong Kong Research Grants Council General Research fund 2014/2015 (Project No.14413214) and six anonymous referee reports for the project proposal. All errors are ours.

† Department of Economics, The Chinese University of Hong Kong. Email: [email protected]. ‡ Durham University Business School, Durham University. Email: [email protected].

1

2

Keywords: Rank-ordered; Random utility; Semiparametric estimation; Smoothing JEL Classication: C14, C35.

1

Introduction

Rank-ordered choices can be elicited using the same type of survey as multinomial choices, specically one that presents an individual with a nite set of mutually exclusive alternatives. The two elicitation formats may be distinguished by the amount of information that is available to the econometrician. A multinomial choice reports the individual's choice or most preferred alternative from the set, whereas a rank-ordered choice reports further about the individual's preference ordering such as her second and third preferences: see for example Hausman and Ruud (1987), Calfee

et al.

(2001), and Train and Winston (2007). One rank-ordered

choice observation provides a similar amount of information as several multinomial choice observations, in the sense that it allows inferring what the individual's choices would have been if her more preferred alternatives were not available.

This allows fewer individuals to be interviewed to achieve a given level of statistical

precision and, as Scarpa

et al.

(2011) point out, the resulting logistic advantages could be substantial for

many non-market valuation studies which involve a narrowly dened population of interest. We develop semiparametric methods for the estimation of random utility models using rank-ordered choice data. Despite the wide availability of parametric counterparts, such semiparametric methods remain almost undeveloped to date. The random utility function of interest has a typical structure: it comprises a systematic component or utility index varying with nite-dimensional explanatory variables, and an additive stochastic component or error term. The objective is to estimate preference parameters, referring to the coecients on the explanatory variables. The methods are semiparametric in that they maintain the usual parametric form of the systematic component but place only non-parametric restrictions on the stochastic component. The parametric methods are equally well-established for multinomial choice and rank-ordered choice data.

In most cases, an analysis of multinomial choice data involves the maximum (simulated) likelihood

estimation of one of four models: multinomial logit (MNL), nested MNL, multinomial probit (MNP) and random coecient or mixed MNL. Each model assumes a dierent parametric distribution of the stochastic component, and has its own rank-ordered choice counterpart which shares the same assumption: rank-ordered

3

logit (ROL) of Beggs

et al.

(1981), nested ROL of Dagsvik and Liu (2009), rank-ordered probit (ROP) of

Layton and Levine (2003), and mixed ROL of Layton (2000) and Calfee

et al.

(2001). Building on Falmagne

(1978) and Barberá and Pattanaik (1986), McFadden (1986) provides a technique which can be applied to translate any parametric multinomial choice model into the corresponding rank-ordered choice model. The literature on the semiparametric methods is more lopsided.

For multinomial choice data, several

alternative methods exist including Manski (1975), Ruud (1986), Lee (1995), Lewbel (2000) and Fox (2007). The special case of binomial choice data has attracted even greater attention, and the respectable menagerie includes Ruud (1983), Manski (1985), Han (1987), Horowitz (1992), Klein and Spady (1993) and Sherman (1993) to name a few.

When it comes to rank-ordered choice data, we are aware of only one study that

aimed at the semiparametric estimation of the preference parameters, namely Hausman and Ruud (1987). In that study, the weighted M-estimator (WME) of Ruud (1983, 1986) is generalized for use with rank-ordered choice data, whereas the original WME was intended for use with binomial and multinomial choice data. The generalized WME allows the consistent estimation of the ratios of coecients despite stochastic misspecication, but there are two drawbacks aecting its empirical applicability. As the authors acknowledge, the estimator's consistency is conned to the ratios of the coecients on continuous explanatory variables, and its asymptotic distribution is unknown outside a special case of Newey (1986). This paper proposes a pair of new semiparametric methods for rank-ordered choice data. We call them the generalized maximum score (GMS) estimator and the smoothed generalized maximum score (SGMS) estimator respectively. Both estimators are consistent under more general assumptions concerning explanatory variables than the generalized WME of Hausman and Ruud (1987). Roughly speaking, if one of

q

explana-

tory variables is continuous, each estimator allows the consistent estimation of the ratios of all coecients regardless of whether the other

q−1

variables are continuous or discrete. Moreover, the SGMS estimator

is asymptotically normal, meaning that it is amenable to the application of usual Wald-type tests.

The

GMS estimator follows a non-standard asymptotic distribution, but it does not require extra smoothness conditions. The GMS estimator generalizes the pairwise maximum score (MS) estimator of Fox (2007), which has been developed for use with multinomial choice data and is a modern extension of the classic MS estimator due to Manski (1975). Suppose that the individual faces

J

alternatives. A multinomial choice observation

4

allows one to infer the outcomes of

J −1

pairwise comparisons where each pair comprises her actual choice

and an unchosen alternative. A rank-ordered choice observation allows one to infer the outcomes of more pairwise comparisons. For example, in case the individual ranks all

J

alternatives from best to worst, her

rank-ordered choice would allow one to learn the outcomes of all possible

J(J − 1)/2

pairwise comparisons.

The GMS estimator extends the MS estimator by incorporating such extra information. The key identication condition comprises an intuitively plausible set of inequalities: in a pairwise comparison, if one alternative's systematic utility exceeds the other's, its chance of being ranked better also exceeds the other's. The GMS estimator inherits all attractive properties of the MS estimator, two of which are particularly relevant to empirical applications. First, the GMS estimator allows the econometrician to be agnostic about the form of interpersonal heteroskedasticity or scale heterogeneity (Hensher 2010), referring to variations in the overall scale of utility across individuals.

et al.,

1999; Fiebig

et al.,

1 This property is desirable

because in most studies, the exact form of interpersonal heteroskedasticity matters only to the extent that its misspecication leads to the inconsistent estimation of the core preference parameters. Second, the GMS estimator is consistent when the data generating process (DGP) comprises an arbitrary mixture of dierent models, provided that it is consistent for each component model. The empirical evidence from behavioral economics (Harrison and Rutstr¨ om, 2009; Conte

et al., 2011) supports the notion that characterizing observed

choices requires more than one model, but the parametric estimation of a mixture model demands the exact knowledge of the number and composition of component models. In addition, when each individual ranks all alternatives from best to worst, the GMS estimator is substantively more exible than the MS estimator. As we discuss in details later, the GMS estimator is consistent for all popular parametric models exhibiting exible substitution patterns, whereas the MS estimator is

2 The GMS estimator therefore delivers what the empiricist may expect from the use of a semipara-

not.

metric method, namely the ability to estimate all popular parametric models consistently on top of other types of models. This is an interesting nding because in the parametric framework, the advantage of using

1 This

property explains a major dierence between the GMS estimator and the maximum rank correlation (MRC) estimator

of Han (1987) and Sherman (1993). The GMS method utilizes the observed ranking information and does pairwise comparisons of alternatives within each individual, allowing the conditional joint distribution of the error terms to vary across individuals. In comparison, the MCR estimator does pairwise comparisons between individuals and requires the error terms to be independent of the explanatory variables, ruling out the possibility of heteroskedasticity across individuals.

2 The dierence arises because the complete ranking information allows us to replace the exchangeability

et al., 2005; Fox, 2007) with a much weaker assumption of zero conditional median.

assumption (Goeree

5

rank-ordered choice data instead of multinomial choice data is limited to eciency gains (Hausman and Ruud, 1987), and a multinomial choice model may be more robust to stochastic misspecication than its rank-ordered choice counterpart (Yan and Yoo, 2014).

The eciency-bias tradeo does not apply in the

semiparametric framework, where the advantage of using rank-ordered choice data also includes robustness to a wider variety of DGPs. We note that in most studies on rank-ordered choices, the complete rankings are elicited as required for this result (Hausman and Ruud, 1987; Calfee Scarpa

et al., 2001; Capparros et al., 2008;

et al., 2011; Yoo and Doiron, 2013; Oviedo and Yoo, 2016).

The SGMS estimator oers the same types of practical benets as the GMS estimator, and addresses the latter's major drawbacks in return for requiring extra smoothness assumptions. The GMS estimator's rate of convergence is

N −1/3

which is slower than the usual rate of

N −1/2 ,

and it follows a non-standard

asymptotic distribution of Kim and Pollard (1990) which is inconvenient for use with conventional hypothesis tests. These properties are inherited from the MS estimator, and arise because the objective function is a sum of step functions. Horowitz (1992) develops the smoothed maximum score (SMS) estimator for binomial choice data which replaces the step functions with smoothing functions, and Yan (2012) extends the method to multinomial choice data.

Our smoothing technique follows this tradition.

estimator's convergence rate can be made arbitrarily closer to

N −1/2

We show that the SGMS

under extra smoothness conditions and

that its asymptotic distribution is normal, with a covariance matrix which can be consistently estimated. The remainder of this paper is organized as follows. Section 2 develops the GMS estimator and compares it with popular parametric methods. Section 3 develops the SGMS estimator. Section 4 presents the Monte Carlo evidence on the nite sample properties of the proposed estimators. Section 5 concludes.

2 2.1

The Model and the Generalized Maximum Score Estimator A Random Utility Framework and Rank-Ordered Choice Data

Consider the standard random utility model. collection of alternatives. Let

J = {1, . . . , J}

An individual in the population of interest faces a nite

denote the set of alternatives and let

J ≥2

be the number of

6

alternatives contained in

uj = x0j β + εj where

J.

The utility from choosing alternative

(1)

is an observed

q -vector

interactions with the individual's characteristics,

εj

is assumed as follows:

∀ j ∈ J,

xj ≡ (xj,1 , . . . , xj,q )0 ∈ Rq

of interest, and

j , uj ,

containing the attributes of alternative

β ≡ (β1 , . . . , βq )0 ∈ Rq

is the unobserved component of utility to the econometrician. The utility index

(or stochastic) utility.

Let

ε ≡ (ε1 , . . . , εJ )0 ∈ RJ r(j, u)

X ≡ (x1 , . . . , xJ )0 ∈ RJ×q

εj

when

is

which is called unsystematic

be the matrix of the explanatory variables and

denote the latent or potentially unobserved ranking of alternative

r(j, u) = q

x0j β

be the vector of the error terms.

underlying alternative-specic utilities that

and their

is the preference parameter vector

often called systematic (or deterministic) utility, as opposed to the error term

Let

j

j

is the

q th

u ≡ (u1 , u2 , . . . , uJ )0 ∈ RJ .

indicates a more preferred alternative. For instance, suppose that and

r(4, u) = 2.

based on the vector of

We shall follow the notational convention

best alternative in the choice set

r(1, u) = 3, r(2, u) = 4, r(3, u) = 1

j,

J,

meaning that a smaller ranking value

J =4

and

u3 > u4 > u1 > u2 .

Then,

Purely for technical convenience, our notation handles

any utility tie by assigning a better ranking to an alternative which happens to have a smaller numeric label. For instance, suppose instead that

u3 > u4 = u1 > u2 .

Then,

r(1, u) = 2

and

r(4, u) = 3

since numeric label

1 is smaller than 4. A more formal denition of the latent ranking that incorporates our notational convention is as follows. Let

T(j, u)

k ∈ T(j, u) set

T.

be the set of alternatives with the same utility as alternative one-to-one onto the integers

For any two alternatives

{0, . . . , |T(j, u)| − 1},

where

|T|

j . A(k, T(j, u))

maps element

is the number of alternatives in the

k, l ∈ T(j, u), A(k, T(j, u)) < A(l, T(j, u))

if and only if

k < l.

For any

j ∈ J,

dene its latent ranking as

r(j, u) ≡ L(j, u) + 1 + A(j, T(j, u))

where

L(j, u)

denotes the number of alternatives that yield strictly larger utility than alternative

(2)

j

for the

individual. Notice that when there is no utility tie, the last term is irrelevant to the latent ranking value since

7

A(j, T(j, u)) = 0. and the set

By denition (2), there is a one-to-one mapping between the set

{1, . . . , J}.

Next, let

rj

denote the reported or actually observed ranking of alternative

J

be the vector of the reported rankings of all ranking

rj

{r(j, u) : j = 1, . . . , J}

coincides with the latent ranking

alternatives in

r(j, u)

J.

j,

and

r ≡ (r1 , . . . , rJ )0 ∈ NJ

We shall maintain that the reported

in case the individual reports the complete ranking

of alternatives, and is a censored version of the latent ranking in case she reports a partial ranking.

M

facilitate further discussion, suppose that the individual reports the ranking of her best

1 ≤ M ≤ J − 1, J = 4

and

and leaves that of the other

u3 > u4 > u1 > u2 .

In case

J −M

M = 3,

alternatives unspecied.

To

alternatives where

As before, suppose that

the complete ranking is observed since the individual

reports her best, second-best and third-best alternatives, allowing the econometrician to infer that the only remaining alternative is her worst one:

r = (r1 , r2 , r3 , r4 ) = (3, 4, 1, 2)

reported ranking is identical to its latent ranking. In case

M = 2,

meaning that each alternative's

only a partial ranking is observed since

the individual reports her best and second best alternatives, and the econometrician cannot tell whether alternative 1 is preferred to alternative 2: as latent ranking

r(2, u).

Finally, in case

r = (3, 3, 1, 2) M = 1,

so that reported ranking

r2

is no longer the same

the resulting partial ranking observation is identical to a

multinomial choice observation since the individual reports only her best alternative:

r = (2, 2, 1, 2).

A more formal denition of the reported ranking that incorporates the above discussion is as follows. Let the random set

r(j, u) ≤ M }.

rj =

When

M (M ⊂ J)

denote the set of the best

The reported ranking of alternative

    r(j, u)

if

   M +1

if

M = J − 1,

j,

M

alternatives for the individual, that is,

M ≡ {j :

then, follows the observation rule

r(j, u) ≤ M, or equivalently, j ∈ M,

(3)

r(j, u) > M, or equivalently, j ∈ J \ M.

the complete ranking is observed. When

M = 1,

the resulting partial ranking is obser-

vationally equivalent to a multinomial choice. The intermediate cases of partial rankings, which result when

2≤M
and

J > 3,

are much less common in empirical studies though not unprecedented.

3

for example Layton (2000) and Train and Winston (2007), both of which analyze data on the best and second-best

alternatives; their data structures are

M =2

and

J >3

according to our notations.

8

2.2

The Generalized Maximum Score Estimator

This section establishes strong consistency of the Generalized Maximum Score (GMS) estimator, the rst of two semiparametric methods that we propose. The GMS estimator is semiparametric in the sense that it allows the econometrician to estimate preference parameters

β

consistently, without commiting to a specic

parametric form of the conditional distribution of errors given observed attributes

ε|X .

Our rst assumption

pertains to sampling.

Assumption 1.

{(r n , X n , εn ) : n = 1, . . . , N } is a random sample of (r, X, ε), where r n ≡ (rn1 , . . . , rnJ )0 ∈

NJ , X n ≡ (xn1 , . . . , xnJ )0 ∈ RJ×q , (r n , X n )

and εn ≡ (εn1 , . . . , εnJ )0 ∈ RJ . For each individual n = 1, . . . , N ,

is observed.

Assumption 1 states that we have dently and identically distributed (

N

observations of

i.i.d.).

(r, X),

indexed by

n

to avoid notational

4

As usual in discrete choice modeling, identication of parameters

5

and individuals are indepen-

For the latter reason, we drop subscript

clutter except when it is needed for clarication.

they are unique only up to a scale.

n,

β

requires scale normalization since

When a parametric form of the conditional distribution of

ε|X

is

6 But

specied, identication is almost always achieved by normalizing a scale parameter of that distribution.

when no parametric form is specied, no scale parameter is available for normalization. In a semiparametric framework, identication is therefore achieved by normalizing Subject to the prior knowledge that some element of

β

β

instead.

is non-zero, we can normalize the magnitude of

that element. Economists may agree, for example, that the coecient on the own price variable is negative. Without loss of generality, we assume that the other

q−1

elements of

Assumption 2.

β∈B

β.

|β1 | = 1.

Dene

˜ ≡ (β2 , . . . , βq )0 ∈ Rq−1 β

as the vector containing

The following assumption imposes restriction on the parameter space.

where B ≡ {−1, 1} × B˜ and B˜ is a compact subset of Rq−1 where q ≥ 2.

Next, we state Assumption 3 which presents a key identication condition pertaining to the strong consistency of the GMS estimator. This assumption implicitly places a restriction on the conditional distribution

4 Throughout this paper, we use n to denote an individual, and j, k, l to denote alternatives. 5 Multiplying both β and ε by any positive constant leads to the same rank-ordered choice data. 6 For instance, in the binomial probit model, the variance of the conditional distribution is assumed

to be one.

9

of

ε|X ,

albeit it is a non-parametric restriction which is satised by a range of parametric functional forms,

some of which we will discuss in the subsequent section. Denote the systematic utility of alternative

vj ≡ x0j β

for any alternative

j

as

j ∈ J.

For any individual, and for any pair of alternatives j, k ∈ J,

Assumption 3.

vj > vk

if and only if P (rj < rk |X) > P (rk < rj |X).

(4)

In words, alternative j generates more systematic utility than alternative k if and only if there is a higher chance that j is preferred to k (rj < rk ) than the reverse (rk < rj ), conditional on all explanatory variables. Assumption 3 immediately implies that alternatives

j

and

above alternative

P (rj < rk |X) = P (rj > rk |X)

if and only if

vj = vk ,

k

have the same systematic utility if and only if the probability that alternative

k

is the same as the probability that alternative

k

is ranked above alternative

Two special types of rank-ordered choice data are worth highlighting. First, when

M = 1,

k (rj < rk )

if and only if

P (rj < rk |X) = P (rj = 1|X).

When we replace

j

is ranked as the best alternative (rj

= 1),

is ranked

j.

the individual

reports only her best alternative and we have multinomial choice data. In this case, alternative above alternative

j

i.e.,

j

is ranked

so we have

(5)

P (rj < rk |X) with P (rj = 1|X) and P (rk < rj |X) with P (rk = 1|X) in (4), Assumption 3

becomes the monotonicity property of the choice probabilities (Manski, 1975), i.e., the ranking of the choice probability of an alternative is the same as the ranking of the systematic utility of the alternative for any given individual.

7

Second, when

M = J − 1,

rank-ordered choice data.

7 See

the individual ranks all alternatives from best to worst, and we have fully

With this complete ranking information, we can compare the utilities between

Fox (2007) for a detailed discussion of the sucient conditions for the monotonicity property of the choice probabilities.

10

any two alternatives. Without loss of generality, let's focus on a pair of alternatives

j

Alternative

is ranked above alternative

the utility from choosing alternative

P (rj < rk |X)

k,

k

(j, k)

such that

if and only if the utility from choosing alternative

so we have

j

j < k.

is larger than

8

= P (uj ≥ uk |X)

(6)

= P (εk − εj ≤ vj − vk |X). The only if part holds under the denition of ranking

r,

and the if part is a direct result of complete

ranking. The rst equality of (6) may not hold if we only observe a partial ranking, i.e., because event alternative even if

j

rj < rk

naturally implies

nor alternative

k

uj ≥ uk ,

belongs to the set

but event

uj ≥ uk

may not imply

M < J − 1.

rj < rk .

This is

When neither

M, both of them are observed with the same ranking, M + 1,

uj > uk .

For any pair of alternatives, assume that the conditional distribution of

εk − εj

function. Then the well-known (pairwise) zero conditional median (ZCM) restriction,

is a strictly increasing

median(εk −εj |X) = 0,

is a necessary and sucient condition for Assumption 3 when a complete ranking of all the alternatives is

9 Notice that

available. The proof is straightforward.

P (rj < rk |X)+P (rk < rj |X) = 1 when the choice set is

fully rank-ordered. For necessity, Assumption 3 implies that

1/2,

or equivalently,

that

vj > vk

P (εk − εj ≤ vj − vk |X) = 1/2

if and only if

P (rj < rk |X) > 1/2

vj −vk = 0 holds if and only if P (rj < rk |X) =

by (6). For suciency, the ZCM assumption implies

by (6), or equivalently,

P (rj < rk |X) > P (rk < rj |X)).

Next, we describe the intuition of applying Assumption 3 to construct the GMS estimator for

β.

Let

1(·)

be the indicator function that equals one if the event in the parenthesis is true and zero otherwise, and let

˜0 )0 b ≡ (b1 , b event

be any vector in the parameter space

rj < rk

8 If j > k, 9 This

k

then

rj < rk ;

and if

P (rj < rk |X) = P (uj > uk |X). This j if k < j when k ∈ T(j).

Under Assumption 3, if

rk < rj ;

is more likely to occur than the event

more likely to be true than the event

alternative

B.

if

x0k β > x0j β

x0j β = x0k β

x0j β > x0k β

is true, then the

is true, then the event

holds, then the event

rj < rk

is because we break ties using the function

rk < rj

is

has the same

A(·, T(j)),

and rank

above alternative

proof does not apply to partially rank-ordered choice data, e.g., multinomial choice data, because the rst equality in

(6) does not hold. Goeree et al. (2005) give an example showing that the ZCM assumption is not sucient for the monotonicity property of the choice probabilities.

11

chance to be true as the event

mjk (b)

rk < rj .

Therefore, the expected value of the following match

=

1(rj < rk ) · 1(x0j b > x0k b) + 1(rk < rj ) · 1(x0k b > x0j b) + 1(rj < rk ) · 1(x0j b = x0k b)

=

1(rj < rk ) · 1(x0j b ≥ x0k b) + 1(rk < rj ) · 1(x0k b > x0j b)

should be maximized at the true preference parameter vector index of alternative

˜0 ) (bN,1 , b N

bN ≡

j for individual n.

∈ B,

for

β

β,

where

b ∈ B.

Dene

x0nj b

(7)

b-utility

as the

Applying the analogy principle, we propose a semiparametric estimator,

as follows:

bN ∈ argmaxb∈B QN (b),

(8)

where

QN (b) = N −1

N X

X

1(rnj < rnk ) · 1(x0nj b ≥ x0nk b) + 1(rnk < rnj ) · 1(x0nk b > x0nj b) .

(9)

n=1 1≤j
In the special case of

M = 1,

i.e. when we have multinomial choice data, the estimator

becomes the pairwise maximum score (MS) estimator of Fox (2007). When data, the estimator

bN

bN

dened by (8)

J = 2 or we have binomial choice

becomes the MS estimator of Manski (1985). For this reason, the estimator

bN

will

be called the generalized maximum score (GMS) estimator. When all the explanatory variables are discrete, we can always nd another parameter vector in the neighborhood of

β

which generate the same ranking of utility indexes as the true parameter vector.

To

achieve point identication, we need to impose an extra assumption on the explanatory variables, namely, we need a continuous explanatory variable conditional on other explanatory variables. Next, we dene a few notations and state the restrictions on explanatory variables formally in Assumption 4. Since only the dierence in utilities matters to the observed outcome of random utility maximization, we shall assume

xJ = 0

10 Next, let

without any loss of generality.

xjk ≡ (xjk,1 , . . . , xjk,q )0 ∈ Rq

the dierence between the explanatory variable vectors of alternatives

j

and

k,

that is,

xjk = xj − xk .

Assumption 2, we assumed that the rst parameter has nonzero value. For each alternative

10 If x 6= 0 J

initially, one can recode

xj

as

xj − xJ

for all

j∈J

including

j = J.

denote

j ∈ J,

In

partition

12

the vector

xj

into

˜ 0j )0 , (xj,1 , x

where

xj,1

to the remainder. So the rst element of Denote

xjk ,

˜ ≡ (˜ ˜ J )0 ∈ RJ×(q−1) . X x1 , . . . , x

and

˜ jk , x

respectively. Matrices

Assumption 4.

(a)

xjk

is

xjk,1 = xj,1 − xk,1

Vectors and

xj

˜n X

xnj , xnjk , are the

and

and

˜ j ≡ (xjk,2 , . . . , xjk,q )0 ∈ Rq−1 x

and its remaining elements

˜ njk x

are the

nth

refers

˜ jk = x ˜j − x ˜k. x

observation of vectors

nth observation of matrices X

and

˜, X

xj ,

respectively.

The following statements are true.

For any pair of alternatives j, k ∈ J, gjk (xjk,1 |˜xjk ) denotes the density function of xjk,1 conditional on ˜ jk , x

(b)

Xn

is the rst element of

and gjk (xjk,1 |˜xjk ) is nonzero everywhere on R for almost every x˜ jk .

For any constant vector c ≡ (c1 , . . . , cq )0 ∈ Rq , Xc = 0 with probability one if and only if c = 0.

Assumption 4 is sucient to show that other vectors limit of the objective function

QN (b)

b ∈ B would yield dierent values for the probability

from the true parameter vector

β.

Assumption 4(a) avoids the local

failure of identication, which is important for semiparametric setting. Assumption 4(b) is analogous to the full-rank condition for the binomial choice model, which prevents the global failure of identication. The following theorem establishes the strong consistency of the GMS estimator. Appendix provides the proofs of all theorems stated in the main text.

Theorem 1.

β,

Let Assumptions

1-4

hold. The GMS estimator bN dened in (8) converges almost surely to

the true preference parameter vector in the data generating process.

2.3

Comparisons with Parametric Methods

From the empiricist's perspective, the question of paramount interest would be how exible the semiparametric model is in comparision with parameteric models that one may consider. Modern desktop computing power makes this question especially relevant. Standard computing resources of today can handle the estimation of models that feature fairly exible, albeit parametric, error structures. When applied to data on complete rankings (i.e.

M = J − 1),

the GMS estimator postulates a semipara-

metric model which nests all popular parametric models and any nite mixture of those models, provided that the explanatory variables satisfy regularity conditions such as Assumption 4. In most studies on rank-

13

11 Such a degree of exibility

ordered choices, the complete rankings are elicited as required for this result.

is not something to be taken for granted. For instance, the MS estimator (Manski, 1975; Fox, 2007) using multinomial choice data is consistent for a family of parametric models featuring exchangeable errors (e.g. multinomial logit and multinomial probit with equicorrelated errors), but not for those parametric models that feature more exible error structures (e.g. nested multinomial logit, multinomial probit with a general error covariance matrix, and mixed logit). This section elaborates on the semiparametric model that the GMS estimator postulates, and its comparisons with popular parametric models. To clarify the notion of interpersonal heteroskedasticity here (and later, unobserved interpersonal heterogeneity), we reinstate individual subscript

n.

With a slight abuse of

notations, an observationally equivalent form of equation (1) may be specied to express the utility that individual

n

derives from alternative

unj = σn × (x0nj β) + εnj

where the new parameter

σn

as

n = 1, 2, ...N

σn ∈ R1+

12 Equivalently,

individuals.

for

j

and

j ∈ J,

(10)

captures that portion of the overall scale of utility which varies across

may be also described as a parameter that is inversely proportional to that

portion of error variance which varies across individuals. Consistent estimation of a parametric model requires the correct specication of both the joint density of errors

εn |X n

and the functional form of

σn .

The GMS

estimator allows both requirements to be relaxed substantially. Regardless of the depth of rankings observed (i.e.

for every

M

such that

1 ≤ M ≤ J − 1),

the GMS

estimator is consistent for the semiparametric model that accommodates any form of interpersonal heteroskedasticity via

σn .

For verication, note that when

vnj ≡ x0nj β

stated in Assumption 3, so does any positive multiple of this pair,

and

vnk ≡ x0nk β

σn × vnj

and

σn × vnk .

The GMS estima-

σn .

This is a desirable

tor, therefore, allows the empiricist to be agnostic about the exact functional form of property because in most studies,

11 See

σn

satisfy the inequality

demands attention only to the extent that it must be correctly specied

for example, Hausman and Ruud (1987), Calfee et al. (2001), Capparros et al. (2008), Scarpa et al. (2011), Yoo and

Doiron (2013) and Oviedo and Yoo (2016).

12 Since

an ane transformation of utilities does not alter observed behavior, the random utility specication (10) is obser-

vationally equivalent to

εnj /σn ,

rather than

they do not rely on

unj = x0nj β + εnj /σn .

εnj alone. εnj having

The slight abuse of notations refers to that

Note that the presence of a parameter like a standardized scale.

σn

εj

in equation (1) corresponds to

does not aect any of our earlier results because

14

for the consistent estimation of preference parameters

β.

The remainder of this section assumes the use of complete rankings (M

= J − 1).

This allows the

semiparametric model to accommodate any model that satises the pairwise zero conditional median (ZCM) restriction, i.e.

median(εnk − εnj |X n ) = 0 f or any j, k ∈ J,

(11)

which is then a necessary and sucient condition for Assumption 3 as long as the distribution of

(εnk −εnj )|X n

is a strictly increasing function: see section 2.2. In comparison, any parametric model involves a much stronger set of restrictions aecting other moments too, since the density of

εn |X n

is specied in full detail.

The semiparametric model based on equation (11) oers considerable exibility not only over possible distributions of idiosyncratic errors, but also over possible distributions of random coecients. To see this latter aspect, note that one may view erogeneity

ηn

dimension as

εn

as composite errors comprising individual-specic coecient het-

(that has the same dimension as

εn )

such that a typical entry in

β)

and purely idiosyncratic errors

ε n ≡ X n η n + n

n

is

0

εnj ≡ xnj η n + nj .

Suppose now that idiosyncratic errors for any

j, k ∈ J,

(that has the same

(12)

n

satisfy the pairwise ZCM restriction,

and the usual random coecient modeling assumption,

as individual heterogeneity has ZCM i.e.

median(η n |X n ) = 0,

median(nk − nj |X n ) = 0

(η n ⊥n )|X n ,

the composite errors

holds. Then, as long

εn

satisfy the pairwise

ZCM restriction in equation (11) too: dierencing two composite errors results in a linear combination of conditionally independent random variables,

(xnk −xnj )0 η n

and

(nk −nj ), each of which has the conditional

13 In comparison, a parametric random coecient model places more rigid restrictions on the

median of zero.

distribution of individual heterogeneity much as that of

ηn ,

because the density of

η n |X n

needs be specied in full detail

n |X n .

It is easy to verify that the semiparametric model accommodates the classic troika of parametric random

13 The coecients β may be interpreted as the median of population preference parameters vis-a-vis η specic deviations around them.

n that measure individual-

15

utility models, logit, nested logit and probit. All three models assume away interprsonal heteroskedasticity by setting

σn = 1 ∀n = 1, 2, ..., N ,

pairwise ZCM condition.

and assume an idiosyncratic error density

In case of logit, the idiosyncractic errors are

i.i.d.

εn |X n

that implies the

extreme value type 1 over

alternatives and, as the celebrated result of McFadden (1974) shows, dierencing two errors results in a standard logistic random variable that is symmetric around 0. The nested logit directly generalizes the logit model by specifying the joint density of distribution allows for a

positive

εn |X n

as a generalized extreme value (GEV) distribution.

correlation between

same nest or pre-specied subset of

J.

εnj

and

εnk

in case alternatives

j

and

k

This

belong to the

Dierencing two GEV errors still results in a logistic random variable

that is symmetric around 0, though it may not have the unit scale. Finally, in its unrestricted form, the probit model generalizes the nested logit model by specifying the multivariate normal density that allows for heteroskedasticity of and

εnk .

εnj

over alternatives

j,

and also for

any sign

εn |X n ∼ N (0, V ε )

of correlation between

εnj

Dierencing two zero-mean multivariate normal variables results in a zero-mean normal variable

which is symmetric around its mean. Random coecient, or mixed, logit models have become the workhorse of empirical modeling in the recent decade. The semiparametric model accommodates the most popular variant of mixed logit models, as well as its extensions. In the context of error decomposition (12), a mixed logit model has idiosyncratic errors

n |X n as i.i.d.

extreme value type 1 over alternatives and incorporates a non-degenerate mixing distribution

of random heterogeneity

η n |X n .

While the mixing distribution may take any parametric form, specifying

η n |X n ∼ N (0, V η ) is by far the most popular choice, so much so that the generic name mixed logit

is often

associated with this normal-mixture logit model. Dierencing the normal-mixture logit model's composite errors results in a linear combination of conditionally independent zero-mean normal and standard logistic random variables, that has the conditional median of zero. Fiebig

et al.

(2010) augment the normal-mixture

logit model with a log-normally distributed interpersonal heteroskedasticity parameter

σn ,

and nd that

the resulting Generalized Multinomial Logit model is capable of capturing the multimodality of preferences. Because the semiparametric model allows for any form of model too.

Greene

et al.

σn ,

it nests the Generalized Multinomial Logit

(2006) extend the normal-mixture model in another direction, by allowing the

variance-covariance of random coecients,

V ar(η n |X n )

to vary with

X n.

The semiparametric model nests

their heteroskedastic normal-mixture logit model too, since this type of generalization does not aect the

16

conditional median of

ηn .

The semiparametric model also accommodates any nite mixture of the aforementioned parametric models, and more generally that of all parametric models satisfying the pairwise ZCM restriction.

In other

words, it allows for that the data generating process may comprise dierent parametric models for dierent

14 This exibility comes from the fact that the GMS estimator does not require the density of

individuals.

εn |X n

to be identical across all individuals

n = 1, 2, ..., N ,

as long as each individual's density satises the

pairwise ZCM restriction. While the nite mixture of parametric models approach has not been applied to the analysis of multinomial choice or rank-ordered choice data, it has motivated inuential studies in the binomial choice analysis of decision making under risk (Harrison and Rutstr¨ om, 2009; Conte

et al.,

2011).

The ndings from that literature unambiguously suggest that postulating only one parametric model for all individuals may be an unduly restrictive assumption.

3

The Smoothed GMS Estimator

The maximum score (MS) type estimator is

N 1/3 -consistent,

and its asymptotic distribution is studied in

Cavanagh (1987) and Kim and Pollard (1990). Kim and Pollard have shown that

N 1/3

times the centered

MS estimator converges in distribution to the random variable that maximizes a certain Gaussian process for the binomial choice data. Their general theorem can be applied to multinomial choice data and rank-ordered choice data too. However, the resulting asymptotic distribution is too complicated to be used for inference in empirical applications. Abrevaya and Huang (2005) prove that the standard bootstrap is not consistent for the MS estimator. Delgado

et al.

(2001) show that subsampling consistently estimates the asymptotic

distribution of the test statistic of the MS estimator for the binomial choice data.

But subsampling has

eciency loss, and its computational cost is very high for the MS or GMS estimator because a global search method is needed to solve the maximization problem for each subsample. In this section, we propose an estimator that complements the GMS estimator by addressing these practical limitations, in return for making some additional assumptions. In the context of Manski's (1985) binomial choice MS estimator, Horowitz (1992) develops a smoothed maximum score (SMS) estimator that replaces

14 For

example, the nested logit model may generate 1/3 of the sample while the normal-mixture logit may generate the rest.

17

the step functions with smooth functions. Yan (2012) applies this technique to derive a smoothed version of Fox's (2007) multinomial choice MS estimator.

We use the same approach to derive a smoothed GMS

(SGMS) estimator, which oers similar benets as its SMS predecessors. Specically, we show that the SGMS estimator has a convergence rate which is faster than

N −1/3

under extra smoothness conditions, and also

that it is asymptotically normal.

3.1

The Smoothed GMS Estimator and its Asymptotic Properties

The objective function in (9) can be rewritten as

QN (b) = N −1

N X

X

[1(rnj < rnk ) − 1(rnk < rnj )] · 1(x0njk b ≥ 0) + 1(rnk < rnj )

(13)

n=1 1≤j
by replacing

1(x0nkj b > 0)

[1 − 1(x0njk b ≥ 0)].

with

The indicator function of

b

in (13) can be replaced by a suciently smooth function

analogous to a cumulative distribution function. Let sample size

N

hN

K(·),

where

K(·)

is

be a positive bandwidth that goes to zero when the

goes to innity. Application of the smoothing idea in Horowitz (1992) to the right-hand side

of (13) yields a smoothed version of GMS (SGMS) estimator

bSN ∈ argmaxQSN (b, hN ),

(14)

b∈B

where

QSN (b, hN ) = N −1

N X

X

[1(rnj < rnk ) − 1(rnk < rnj )] · K x0njk b/hN + 1(rnk < rnj ) .

(15)

n=1 1≤j
The next condition states the requirements that the smoothing function estimator

bSN

to be consistent.

Condition 1. Let

0

and let

(a)

K(·) should satisfy for the SGMS

K(x)

{hN : N = 1, 2, . . .} be a sequence of strictly positive real numbers satisfying limN →∞ hN =

be a function on

|K(x)| < C

for some nite

R

such that:

C

and all

x ∈ (−∞, ∞);

and

18

(b)

limx→−∞ K(x) = 0

Theorem 2.

and

limx→∞ K(x) = 1.

Let Assumptions

1-4 and Condition 1

hold. The SGMS estimator bSN ∈ B dened in (14)

converges almost surely to the true preference parameter vector β. By Theorem 2, the consistency of the SGMS estimator holds under the same set of assumptions as the GMS estimator, as long as the smoothing function is properly chosen.

Since any cumulative distribution

function (e.g. the standard normal distribution function) satises Condition 1, the SGMS does not require more assumptions to achieve consistency than the GMS estimator does. Extra assumptions, however, are required in order to derive the asymptotic distribution of the SGMS estimator. Assume that function of

QSN (b, hN )

with respect to

tN (b, hN ) = (N hN )−1

K(·) ˜ b

as

N X

is twice dierentiable. Next, dene the rst- and second-order derivatives

tN (b, hN )

X

and

H N (b, hN ),

respectively, where the vector

˜ njk [1(rnj < rnk ) − 1(rnk < rnj )] · K 0 x0njk b/hN x

(16)

n=1 1≤j
and the matrix

H N (b, hN ) = (N h2N )−1

N X

X

˜ njk x ˜ 0njk . [1(rnj < rnk ) − 1(rnk < rnj )] · K 00 x0njk b/hN x

(17)

n=1 1≤j
Let

bSN,1

denote the rst element of

bSN ∈ B,

function (15) of the SGMS estimator

bSN

and

˜S b N

denote the vector of the other elements. The objective

is a smooth function. To derive the rst order condition, we make

the following assumption:

Assumption 5.

˜ β

is an interior point of B˜ .

By Theorem 2 and Assumption 5, probability approaching 1 as

β

N → ∞.

˜S bSN,1 = β1 , b N

is an interior point of

A Taylor series expansion of

˜ B

tN (bSN , hN )

and

tN (bSN , hN ) = 0

with

around the true parameter

yields

˜S − β), ˜ tN (bSN , hN ) = tN (β, hN ) + H N (b∗N , hN )(b N

(18)

19

where

ρ(N )

˜∗ }, b∗ = bS = β1 , b∗N ≡ {b∗N,1 , b N N,1 N,1

such that

ρ(N )tN (β, hN )

and

˜∗ b N

is a vector between

˜S b N

converges in distribution and also that

to a nonsingular, nonstochastic matrix

H.

and

˜. β

Suppose there is a function

H N (b∗N , hN )

converges in probability

Then,

˜S − β) ˜ = −H −1 ρ(N )tN (β, hN ) + op (1). ρ(N )(b N

It is essential to derive the limiting distribution of

(19)

ρ(N )tN (β, hN ) and the probability limit of H N (b∗N , hN )

by (18) and (19) to obtain the asymptotic distribution of the SGMS estimator.

ρ(N )tN (β, hN )

is asymptotically normal if bandwidth

hN

Later, we will show that

is properly chosen according to the smoothness

conditions imposed on the distribution of the continuous explanatory variable and error terms. put, the fastest convergence rate of of ranking comparison in (4) is

˜ ˜S − β b N

to 0 is

dth (d ≥ 2)

Roughly

ρ(N )−1 N −d/(2d+1)

when the conditional probability

(d − 1)th

order dierentiable. Therefore, a

order dierentiable with respect to the systematic utility and

the conditional density of the continuous explanatory variable is higher convergence rate (corresponding to larger

d),

is achieved at the cost of making stronger smoothness

assumptions on the distributions of the continuous explanatory variable and the error terms. By properly choosing the bandwidth

hN N −1/(2d+1)

below), we can conclude that less than 2. If

d = 1,

and smoothing function

ρ(N )tN (β, hN )

the random matrix

K(·)

(according to Condition 2 given

d

is asymptotically normal. We require the integer

H N (b∗N , hN )

to be no

does not converge to a non-stochastic matrix

˜S has an unknown limiting distribution instead; it follows that the limiting distribution of ρ(N )(b N

H,

and

˜ is also − β)

unknown by (18). In the binomial choice setting, the SMS estimator is derived from a single latent variable equation, where the conditional choice probability of alternative 1,

P (r1 = 1|x) = P (−¯ ε ≤ x0 β|x),

(20)

can be expressed as the conditional distribution of the error term

ε¯ given a single vector x.15

This conditional

distribution function plays an important role in expressing the limiting distribution of the properly normalized

15 Equation interpreted as

(20) uses the common notation adopted in binomial choice analysis. To connect with our notation,

x1 − x2

and

ε¯ should

be interpreted as

ε1 − ε2 .

x

should be

20

SMS estimator.

The SGMS estimator is derived from a model with multiple latent vectors.

special case of complete rankings, calculating the probability of a ranking comparison, e.g.

Outside the

P (r1 < r2 |X),

is

even more complicated than calculating a choice probability. Consider an example where the individual only reveals her best and second best alternatives from the set with four alternatives. By the denition of ranking

r

in (3), we have

P (r1 < r2 |X)

= P (u1 ≥ u2 ≥ max{u3 , u4 }|X) (21)

+P (u1 ≥ u3 > max{u2 , u4 }|X) + P (u1 ≥ u4 > max{u2 , u3 }|X) +P (u3 > u1 ≥ max{u2 , u4 }|X) + P (u4 > u1 ≥ max{u2 , u3 }|X). Calculating

P (r1 < r2 |X)

by (21) using the joint distribution (or density) function of error terms

not an easy task. Fortunately, it is not needed for deriving the asymptotic distribution of By (16), the convergence rate of

tN (β, hN )

P (rj < rk |X) − P (rk < rj |X) approaches 0 as is

dth

N

x0jk β

is 0 if

goes to innity as long as

order dierentiable with respect to

bandwidth

O(hdN ),

hN .

For each pair of alternatives

is 0 by Assumption 3.

x0jk β x0jk β ,

is nonzero. If the dierence we choose a

dth

(j, k),

The kernel function

is

ρ(N )tN (β, hN ).

to 0 depends on the product of the kernel function

the pairwise dierences between ranking comparisons.

ε

K 0 (·)

and

the dierence

K 0 x0jk β/hN

P (rj < rk |X) − P (rk < rj |X)

order kernel

K 0 (·)

and an appropriate

Analogous to the results on the kernel density estimation, the SGMS estimator's bias is

variance is

O[(N hN )−1 ],

and fastest convergence rate is

N −d/(2d+1) .

To facilitate a formal derivation of the asymptotic distribution of the SGMS estimator, we introduce a series of extra notations rst.

v ≡ (v1 , . . . , vJ−1 , vJ )0 . vJ X

and

˜ (v, X)

v − ι J vj .

for xed

β.

vj ≡ x0j β

Let

is 0 since Dene

For example, when

xJ

be the systematic utility of choosing alternative

j.

Denote

is normalized to be 0. There is a one-to-one correspondence between

ιJ ≡ (1, . . . , 1) ∈ RJ .

For any alternative

j ∈ J,

let

v −j

be the vector

1 < j < J,

v −j = (v1 − vj , . . . , vj−1 − vj , 0, vj+1 − vj , . . . , vJ − vj )0 .

In words,

v −j

is computed by subtracting the systematic utility of alternative

systematic utilities. For any pair of alternatives

j, k ∈ J,

dene

v−j,k = vk − vj

j

and

from the raw vector of

˜ −j,k v

as the vector that

21

consists of all elements of

v −j

excluding

v−j,k .

For example, when

1 < j < k < J,

˜ −j,k ≡ (v1 − vj , . . . , vk−1 − vj , vk+1 − vj , . . . , vJ − vj )0 . v

If

J > 2,

for any three dierent alternatives

elements of

v −j

excluding

v−j,k

and

v−j,l .

j, k, l ∈ J,

dene

˜ −j,kl v

as the vector that consists of all of the

1 < j < k < l < J,

For example, when

˜ −j,kl ≡ (v1 − vj , . . . , vk−1 − vj , vk+1 − vj , . . . , vl−1 − vj , vl+1 − vj , . . . , vJ − vj )0 . v ˜ pjk (v−j,k |˜ v −j,k , X)

Let

denote the conditional density of

v−j,k

given

˜ . (˜ v −j,k , X)

Dene the derivatives

(i)

i ˜ = ∂ i pjk (v−j,k |˜ ˜ pjk (v−j,k |˜ v −j,k , X) v −j,k , X)/∂(v −j,k )

and

(0) ˜ ≡ pjk (v−j,k |˜ ˜ pjk (v−j,k |˜ v −j,k , X) v −j,k , X).

˜ pjkl (v−j,k , v−j,l |˜ v −j,kl , X)

Let

denote the joint density of

Given any pair of alternatives for xed

β ∈ B.

(v−j,k , v−j,l )

X,

˜ . (˜ v −j,kl , X)

˜ ˜ −j,k , X) j, k ∈ J, there is a one-to-one correspondence between X and (v−j,k , v

The conditional probability of alternative

explanatory matrix

conditional on

or equivalently,

˜ . ˜ −j,k , X) (v−j,k , v

j

ranked better than alternative

k

depends the

Dene

˜ ≡ Fjk (v−j,k , v ˜ ˜ −j,k , X) ˜ −j,k , X) P (rj < rk |v−j,k , v

(22)

˜ − P (rk < rj |v−j,k , v ˜ ≡ F¯jk (v−j,k , v ˜ ˜ −j,k , X) ˜ −j,k , X) ˜ −j,k , X). P (rj < rk |v−j,k , v

(23)

and

22

Next, for any integer

i > 0,

dene the following derivatives:

(i) i ˜ ≡ ∂ i F¯jk (v−j,k , v ˜ ˜ −j,k , X) ˜ −j,k , X)/∂(v F¯jk (v−j,k , v −j,k )

whenever the derivatives exist. Likewise, dene the scalar constants

kd =

´∞ −∞

kd

and

kΩ

by

xd K 0 (x)dx

and

ˆ

∞

[K 0 (x)]2 dx,

kΩ = −∞

whenever these quantities exist. Finally, dene the

H

q−1

vector

a

and the

(q − 1) × (q − 1)

matrices

Ω

and

as follows:

X

a=

kd

1≤j
Ω=

X

d X

h i 1 (i) ˜ p(d−i) (0|˜ ˜ xjk , ˜ −j,k , X) v −j,k , X)˜ E F¯jk (0, v jk i!(d − i)! i=1

(24)

i h ˜ pjk (0|˜ ˜ xjk x ˜ −j,k , X) ˜ 0jk , 2kΩ E Fjk (0, v v −j,k , X)˜

(25)

1≤j
and

H=

X

i h (1) ˜ pjk (0|˜ ˜ xjk x ˜ −j,k , X) ˜ 0jk v −j,k , X)˜ E F¯jk (0, v

(26)

1≤j
whenever these quantities exist. Now, we turn to the derivation of the asymptotic distribution of the SGMS estimator by making the following requirements on the smoothing function

Condition 2. The following statements are true.

K(·),

bSN .

We start o

in addition to Condition 1.

23

(a)

K(x) is twice dierentiable for x ∈ R, |K 0 (x)| and |K 00 (x)| are uniformly ´∞ ´∞ ´∞ [K 0 (x)]4 dx, −∞ x2 |K 00 (x)|dx, and −∞ [K 00 (x)]2 dx are nite. −∞ ´∞

|xd K 0 (x)|dx < ∞, kd ∈ (0, ∞), ´∞ ´∞ (1 ≤ i < d), −∞ |xi K 0 (x)|dx < ∞ and −∞ xi K 0 (x)dx = 0. d ≥ 2,

(b) For some integer

(c) For any integer

−∞

i (0 ≤ i ≤ d),

lim hi−d N

´ |hN x|>η

N →∞

any

η > 0,

and any sequence

{hN }

and

bounded, and the integrals

kΩ ∈ (0, ∞).

converging to

For any integer

i

0,

|xi K 0 (x)|dx = 0

and

lim h−1 N

´

N →∞

|hN x|>η

|K 00 (x)|dx = 0.

Next, we state extra assumptions which are needed for the derivation, with brief comments on the implications of each assumption. Assumption 6.

(i) ˜ ˜ −j,k , X) For any pair of alternatives j < k, and for v−j,k in a neighborhood of zero, F¯jk (v−j,k , v

˜ , where exists and is a continuous function of v−j,k and is bounded by a constant C for almost every (˜v −j,k , X) C<∞

and i is an integer (1 ≤ i ≤ d).

By denition (23), function

F¯jk (·)

can be derived from the conditional distribution of the error terms.

Assumption 6 in essence imposes the dierentiability requirement on the conditional distribution function of the vector

ε

with respect to systematic utilities.

Assumption 7.

(a)

The following statements are true.

˜ exists and is a continuous function of v−j,k For any pair of alternatives j, k ∈ J, p(i) v −j,k , X) jk (v−j,k |˜ ˜ < C for v−j,k in a neighborhood of zero, almost every (˜ ˜ , some satisfying |p(i) v −j,k , X)| v −j,k , X) jk (v−j,k |˜ ˜ < C for all v−j,k constant C < ∞, and any integer i (1 ≤ i ≤ d − 1). In addition, |pjk (v−j,k |˜v −j,k , X)| and almost every

(b)

˜ . (˜ v −j,k , X)

˜ < C for all (v−j,k , v−j,l ) and For any three dierent alternatives j, k, l ∈ J, pjkl (v−j,k , v−j,l |˜v −j,kl , X) ˜ . almost every (˜v −j,kl , X)

24

(c)

˜ , vec(X)vec( ˜ ˜ 0 , and vec(X)vec( ˜ ˜ 0 vec(X)vec( ˜ ˜ 0 have nite rst The components of matrices X X) X) X)

absolute moments. Assumption 7 imposes regularity conditions on the explanatory variables.

In addition to the continu-

ity requirement imposed by Assumption 4, Assumption 7 further requires that the conditional probability density function of the rst explanatory variable,

xjk,1 ,

given other explanatory variables is

(d − 1)th

order

dierentiable.

Assumption 8.

(logN )/(N h4N ) → 0

as N → ∞.

Assumptions 6-8, together with Condition 2, are analogous to typical assumptions made in the kernel density estimation. A higher convergence rate of the SGMS estimator can be achieved using a higher order kernel

K 0 (·)

when the required derivatives of

Assumption 9.

F¯ (·)

and

p(·)

exist.

The matrix H , dened by (26), is negative denite.

Note that the matrix

H

is analogous to the Hessian information matrix in the quasi-MLE.

The following theorem presents the main results concerning the asymptotic distribution of the SGMS estimator.

Theorem 3.

Let Assumptions 1-9 and Conditions 1-2 hold for some integer d ≥ 2 and let {bSN } be a sequence

of solutions to problem (14), S

(a)

−1 ˜ ˜ If N h2d+1 → ∞ as N → ∞, then h−d a. N N (bN − β) converges in probability to −H

(b)

˜ converges in distribution to If N h2d+1 has a nite limit λ as N → ∞, then (N hN )1/2 (b˜N − β) N

S

M V N −λ1/2 H −1 a, H −1 ΩH −1 .

(c)

Let hN = (λ/N )1/(2d+1) with 0 < λ < ∞; W be any nonstochastic, positive semidenite matrix such that a0 H −1 W H −1 a 6= 0; EA denote the expectation with respect to the asymptotic distribution of S

˜ − β) ˜ ; N d/(2d+1) (b N

S

S

˜ 0 W (b ˜ − β)] ˜ . The M SE is minimized by setting and M SE ≡ EA [(b˜N − β) N

λ = λ∗ ≡ trace ΩH −1 W H −1 / 2da0 H −1 W H −1 a ,

(27)

25

S

˜ converges in distribution to in which case N d/(2d+1) (b˜N − β) M V N −(λ∗ )d/(2d+1) H −1 a, (λ∗ )−1/(2d+1) H −1 ΩH −1 .

By Theorem 3, if as

2d+1 −d/(2d+1) d/(2d+1) N h2d+1 → ∞ as N → ∞, then h−d = (N hN ) → 0; if N h2d+1 →0 N N /N N

N → ∞, (N hN )1/2 /N d/(2d+1) = (N h2d+1 )1/(4d+2) → 0. N

rate of convergence of the SGMS estimator is

N −d/(2d+1) .

λ ∈ (0, ∞) can achieve the fastest rate of convergence. M SE

of the asymptotic distribution of

Therefore, Theorem 3 implies that the fastest

Choosing bandwidth

Theorem 3(c) shows that

hN = (λ/N )1/(2d+1)

where

λ∗ , dened by (27), minimizes

S

˜ − β) ˜ . N d/(2d+1) (b N

To make the results of Theorem 3 useful in applications, it is necessary to be able to estimate the parameters in the limiting distribution

a, Ω,

and

H

consistently from observations of

(r, X).

The next

theorem shows how this can be done.

Theorem 4.

Let Assumptions 1-9 and Conditions 1-2 hold for some integer d ≥ 2 and vector bSN be a

consistent estimator based on hN ∝ N −1/(2d+1) . Let h∗N ∝ N −δ/(2d+1) , where δ ∈ (0, 1). Then converges in probability to a.

(a)

ˆ N ≡ (h∗N )−d tN (bSN , h∗N ) a

(b)

For b ∈ B and n = 1, . . . , N , dene X

tN n (b, hN ) =

˜ njk h−1 [1(rnj < rnk ) − 1(rnk < rnj )] K 0 x0njk b/hN x N ,

1≤j
the matrix ˆ N ≡ (hN /N ) Ω

N X

tN n (bSN , hN )tN n (bSN , hN )0

n=1

converges in probability to Ω. (c)

H N (bSN , hN )

converges in probability to H .

By Theorem 3(c), the asymptotic bias of

hN = (λ/N )1/(2d+1) .

˜S − β) ˜ N d/(2d+1) (b N

is

−λd/(2d+1) H −1 a

It follows from Theorem 4 that the bias term

when the bandwidth

−λd/(2d+1) H −1 a

can be estimated

26

consistently by

ˆN . −λd/(2d+1) H N (bN , hN )−1 a

Therefore, dene

˜u = b ˜S + (λ/N )d/(2d+1) H N (bS , hN )−1 a ˆN b N N N

(28)

as the bias-corrected SGMS estimator.

3.2

A Small-Sample Correction

In this subsection, we apply a method proposed by Horowitz (1992) to remove part of the nite sample bias of

ˆN . a

By Theorem 2,

˜S ˆ N around b of a N

˜ =β

bSN,1 = β1

with probability approaching 1 as

N

goes to innity. A Taylor expansion

yields

˜S − β) ˜ ˆ N − a = (h∗N )−d tN (β, h∗N ) − a + (h∗N )−d H N (b∗N , h∗N )(b a N

with probability approaching 1 as

N

goes to innity, where

hand side of (29) shows that the nite sample bias of

(h∗N )−d tN (β, h∗N ) − a,

ˆN a

is a vector between

has two components.

has a nonzero mean due to the use of a nonzero bandwidth

∗ −d ˜S H N (b∗N , h∗N )(b second component, (hN ) N true parameter vector

b∗N

β

in estimating

˜ , − β)

(29)

bSN

and

β.

The right-

The rst component,

h∗N

to estimate

a.

The

has a nonzero mean due to the use of an estimate of the

a.

The bias correction method described here is aimed at removing the second component of bias by order

N −(1−δ)d/(2d+1) .

Note that the second component of the right-hand side of (29) can be written as

˜ ˜ = N hN (h∗ )2d −1/2 H N (b∗ , h∗ )(N hN )1/2 (b ˜S − β). ˜S − β) (h∗N )−d H N (b∗N , h∗N )(b N N N N N

The probability limit of

H N (b∗N , h∗N )

converges in distribution to

is

H

by Lemmas 8-9 of the Appendix B, and

M V N (−λ1/2 H −1 a, H −1 ΩH −1 )

1/2 ∗ −d ˜S − β) ˜ N hN (h∗N )2d (hN ) H N (b∗N , h∗N )(b N

by Theorem 3. Therefore,

˜S − β) ˜ (N hN )1/2 (b N

27

converges in distribution to

M V N (−λ1/2 a, Ω).

ˆN a

By this result, we treat

as an estimator of

−1/2 1/2 a − N hN (h∗N )2d λ a

rather than of

a.

Thus, the bias corrected estimator of

a

is

n −1/2 o ˆ cN = a ˆ N / 1 − λ−1 N hN (h∗N )2d . a

3.3

(30)

Bandwidth Selection

Theorem 3(c) provides a way to choose the bandwidth for the SGMS estimator. To achieve the minimum

M SE ,

an optimal

λ∗

can be consistently estimated by the conclusion of Theorem 4. Therefore, one possible

way of choosing bandwidth is to set for

ˆ )1/(2d+1) hN = (λ/N

given the integer

ˆ is a consistent estimator d, where λ

λ∗ . Specically, the choice of bandwidth can be implemented by taking the following steps. Step 1. Given

d,

choose a

hN ∝ N −1/(2d+1)

and

S Step 2. Compute the SGMS estimator bN using

h∗N ∝ N −δ/(2d+1) hN . Use bSN and

for

h∗N

δ ∈ (0, 1). to compute

ˆ cN . a

Use

bSN

and

hN

to

ˆ N and H N (bS , hN ). compute Ω N Step 3. Estimate

ˆN λ

=

λ∗

by

io n h ˆ N H N (bS , hN )−1 H N (bS , hN )−1 trace Ω N N i−1 h c 0 S S −1 ˆ cN . · 2d(ˆ aN ) H N (bN , hN ) H N (bN , hN )−1 a

Step 4. Calculate the estimated bandwidth

(31)

ˆ N /N )1/(2d+1) . heN = (λ

Step 5. Compute the SGMS estimator using

heN .

Note that this approach is analogous to the plug-in method of kernel density estimation. As usual in the application of the plug-in method, the choice of the initial bandwidth some exploration, because the estimated bandwidth

heN

hN

and parameter

δ

would require

may be sensitive to that choice. In our Monte Carlo

experiments in the next section, the bandwidth has been initialized by setting

hN = N −1/5

and

δ = 0.1.

28

4

Monte Carlo Experiments

In this section, we provide Monte Carlo simulation results to explore nite-sample properties of the GMS estimator individual

bN

and the SGMS estimator

n's

utility from alternative

unj = xnj,1 β1 + xnj,2 βn2 + εnj

bSN .

We consider six data generating processes (DGPs). In each DGP,

j , unj ,

for

is specied as

n = 1, 2, ..., N

and

j = 1, 2, ..., 5.

Each DGP is used to simulate two sets of 1000 random samples of set and

500

N

(32)

individuals, where

N = 100

in the rst

in the second set.

β1

In all DGPs, the rst preference parameter for all individuals:

β1 = 1 .

is a deterministic coecient and takes the value of 1

In DGPs 1-4, the second preference parameter

coecient and takes the value of

1

for all individuals:

βn2 = β2 = 1

for all

n.

βn2

is also a deterministic

In DGPs 5-6, however,

βn2

is a random coecient that varies across individuals, and each individual's coecient value is a draw from

N (1, 1): βn2 = β2 + ηn error terms

εnj :

where

β2 = 1

and

ηn

is a

we provide more details below.

N (0, 1)

and

xnj,2 .18

16 Each DGP species its own distribution of

17

The econometrician observes a utility-based ranking

xnj,1

draw.

rn

of

J =5

alternatives in

J,

as well as attributes

As usual, the depth of observed rankings would inuence the nite sample precision of

an estimator; and in the context of our semparametric estimators, it also inuences the degree of exibility that semiparametric models oer. Recall that when the complete rankings

(M = J − 1 = 4)

are observed,

the semiparametric model nests all popular parametric models as special cases; when only partial rankings (M

< 4)

are available, this is not the case because then the semiparametric model cannot accommodate

alternative-specic heteroekedasticity and exible correlation patterns. We will therefore explore the nite sample behavior of the estimators at three depth levels:

M =1

when only the best alternative is observed,

M = 2 when the best and second alternatives are observed, and M = 4 when the complete ranking is observed. 16 In

random coecient models, we are often interested in discovering a certain central tendency of the random preference

E(βn2 ) under correct parametric median(βn2 ) under Assumptions 1-4. For E(βn2 ) = median(βn2 ) = 1.

parameter, such as its mean or its median. The mixed logit estimator will consistently estimate specications and the proposed semiparametric estimators can consistently estimate the simplicity of demonstration, we choose

βn2 ∼ N (1, 1)

such that

17 In all DGPs, we normalize the variance of ε 2 nj to be π /6, subject to rounding errors. 18 Here we use a relative small choice set mainly because the probit and the mixed logit specications

yield objective functions

that require multivariate integration, and consequently a lot of computation time. The computation time of the GMS and SGMS estimators per se is aordable even if the choice set is very large.

29

In all DGPs, observed attribute uniform draws: specically,

xnj,1

and

qnj

xnj,1

is a draw from

xnj,2 ≡ qnj /zn

where

N (0, 2) and xnj ,2

qnj

is from

U (0, 3)

vary across both individuals and alternatives, whereas

is generated as a ratio of two dierent and

zn

zn

is from

U ( 51 , 5).19

Note that

varies only across individuals. All

three distributions that generate the observed attributes are independent of one another, and

i.i.d.

across

the subscripted dimension(s). For comparison with our GMS and SGMS estimates, we also compute maximum likelihood estimates using three popular parametric models summarized in Section 2.3, namely rank-ordered logit (ROL), rank-ordered probit (ROP), and mixed ROL (MROL). We do not estimate the nested ROL model, primarily because our analysis already includes the ROP model which is a more exible parametric method to incorporate correlated errors. In case of ROP and MROL, we opt to place no constraint on the variance-covariance parameters of the underlying multivariate normal densities.

20 This allows us to compare our semiparametric methods with

both restrictive (ROL) and very exible (ROP and mixed ROL) parametric methods. Our discussion focuses on coecient ratio

β2 /β1 which is identied in both parametric and semiparametric

models. In the discrete choice analysis of individual preferences, the main parameter of interest often takes the form of a ratio between coecients on non-price and price attributes; this type of ratio is known as,

inter alia, equivalent prices (Hausman and Ruud, 1987), implicit prices (Calfee et al., 2001) and willingnessto-pay (Small

et al.,

manner to identify

2005).

β1

and

In parametric models, we normalize the scale of the error terms in the usual

β2

separately, and we derive the ratio of the relevant slope coecient estimates.

In semiparametric models, we normalize of interest

β2 /β1 = β2 /sign(β1 )

|β1 | = 1

to identify

β2

using the relevant estimates.

and the sign of

β1 ,

and we compute the ratio

21 Since the GMS estimator entails maximizing

a sum of step functions, we use a global search method to compute the GMS estimates: specically the dierential evolution algorithm of Storn and Price (1995) which was also Fox's (2007) preferred method for computing his multinomial MS estimates. In this Monte Carlo study, we implement a particular version of

19 This

pair of uniform distributions ensures that the second observed attributes has approximately the same variance as the

rst attribute, i.e.

20 Our

V ar(qnj /zn ) ' 2.

ROP specication requires estimating two slope coecients (β1 and β2 ) and eight identied variance-covariance parameters of pairwise error dierences. Our MROL specication assumes that both slope coecients are random and bivariate normal: we estimate two mean (β1 and

β2 )

and three variance-covariance parameters of the bivariate normal density. The ROP

(MROL) model has been estimated in Stata using command -asroprobit- (-mixlogit-); the likelihood function has been simulated by taking 250 pseudo-random draws from Hammersley (Halton) sequences.

21 The

property.

estimate of the sign will converge at an extremely fast rate such that there is no need to analyze its nite-sample

30

the SGMS estimator which uses the standard normal distribution function as the smoothing kernel

K(·).

The

resulting objective function is dierentiable, and can be maximized by starting any of usual gradient-based algorithms from several initial search points. For the SGMS estimator, the bandwidth has been initalized by setting

hN = N −1/5

and

δ = 0.1,

and optimized subsequently by applying the method in Section 3.3.

22

Table 1 summarizes the true distribution of the error terms in each DGP and whether particular methods can estimate

β2 /β1

consistently.

The summary presents a strong case for the importance of considering

semiparametric methods for rank-ordered choice data: the GMS/SGMS estimator using complete rankings is the only method that remains consistent throughout all DGPs. The GMS/SGMS estimator using partial rankings is consistent when the error terms are

i.i.d.

(DGP 1-2) or heteroskedastic across individuals (DGP

3), but becomes inconsistent in the presence of alternative-specic heteroskedasticity (DGP 4) and/or random coecients (DGP 5-6). As usual, a parametric method is consistent only when the DGP happens to coincide with the postulated parametric model itself or its special cases. Tables 2-7 report each method's bias and RMSE across 1,000 samples of size

N

DGP. In each table, the top and bottom panels summarize the results for sample sizes

simulated from each

N = 100 and N = 500

respectively. Eciency gains from the use of deeper rankings, alongside the usual play of asymptotics, are apparent from the tables. When a method is consistent for a particular DGP, increasing the depth of rankings

M

holding the sample size

N

xed reduces its bias and RMSE. Increasing the sample size holding the depth

of rankings xed also has the same eects qualitatively. The GMS estimator using complete rankings is consistent under all DGPs, and displays negligible nite sample bias in most cases. The associated bias is approximately 6% of the true parameter value in DGPs 1 and 2 when

N = 100,

and 2% or less in all other DGPs and/or sample size congurations. These results

illustrate a considerable benet that the use of deeper rankings oers for semiparametric estimation: the partial rankings GMS estimator is consistent for only rst three DGPs (DGP 1-3), and even under those DGPs, the estimator exhibits larger bias which sometimes exceeds 10% of the true value when the sample size

N = 100 (though bias stays below 4% when N = 500).

Across all depth levels and sample size congurations,

the SGMS estimator behaves similarly as its GMS counterpart but tends to display a small increase in bias

22 When

the sample size is small, the parameter

rate of the bias estimator

ˆN , a

λ∗

may be estimated with a large standard error due to the slow convergence

sometimes resulting in a very large estimate of the bandwidth. We apply a trimming procedure

to avoid this situation. The estimated

λ∗

is trimmed at a large constant (1000) for all DGPs.

31

Table 1: Consistency of estimators by Monte Carlo DGPs DGP

Distribution of

εnj

ROL

ROP

MROL

(S)GMS

1

εnj

is

i.i.d. EV (0, 1, 0)

Yes

No

Yes

Yes

2

εnj

is

i.i.d. N (0.577, π2 /6)

No

Yes

No

Yes

3

εnj = 0.0055(zn4 + 2zn2 )nj where nj is i.i.d. N (0, 1)

No

No

No

Yes

4

εnj = 0.75xnj,2 nj where nj is i.i.d. N (0, 1)

No

No

No

5

εnj

is

i.i.d. EV (0, 1, 0)

No when Yes when

No

No

Yes

No when Yes when

6

Note: EV (0, 1, 0)

εnj = 0.75xnj,2 nj where nj is i.i.d. N (0, 1)

No

No

No

No when Yes when

M < 4; M =4 M < 4; M =4 M < 4; M =4

stands for the extreme value type 1 distribution, assumed by the ROL model, with a

0.577 and a variance of π 2 /6. Where relevant, the error component is i.i.d. for n = 1, . . . , N j = 1, . . . , J . M = 4 (M < 4) refers to an estimator that incorporates the complete (partial) rankings. (No) means the estimator of β2 /β1 is (not) consistent given the DGP.

mean of

and Yes

0.0034

-0.0007

-0.0006

2

4

-0.0016

4

1

0.0119

2

500

0.0300

1

100

Bias

M

N

0.0601

0.0805

0.1124

0.1382

0.1883

0.2698

RMSE

ROL

-0.0070

-0.0026

0.0091

-0.0075

0.0142

0.0490

Bias

0.0630

0.0834

0.1170

0.1466

0.1966

0.2972

RMSE

ROP

0.0047

0.0076

0.0151

0.0100

0.0297

0.0654

Bias

βˆ2 /βˆ1

0.0626

0.0841

0.1206

0.1450

0.2001

0.3100

RMSE

MROL

Table 2: Monte Carlo results on

0.0045

0.0200

0.0363

0.0653

0.0843

0.1453

Bias

0.1739

0.2157

0.2858

0.3355

0.4077

0.5777

RMSE

GMS

of DGP 1

0.0224

0.0338

0.0528

0.0632

0.0927

0.1403

Bias

0.1044

0.1439

0.2029

0.2422

0.3122

0.4759

RMSE

SGMS

32

0.0052

0.0081

0.0134

2

4

0.0154

4

1

0.0113

2

500

0.0243

1

100

Bias

M

N

0.0679

0.0811

0.1079

0.1488

0.1817

0.2491

RMSE

ROL

0.0038

0.0056

0.0097

0.0058

0.0125

0.0379

Bias

0.0629

0.0788

0.1086

0.1429

0.1845

0.2781

RMSE

ROP

0.0185

0.0164

0.0201

0.0278

0.0284

0.0543

Bias

βˆ2 /βˆ1

0.0719

0.0847

0.1147

0.1592

0.1942

0.2801

RMSE

MROL

Table 3: Monte Carlo results on

0.0191

0.0315

0.0363

0.0597

0.1106

0.1301

Bias

0.2072

0.2262

0.2756

0.3781

0.4572

0.5560

RMSE

GMS

of DGP 2

0.0305

0.0383

0.0463

0.0749

0.1002

0.1280

Bias

0.1205

0.1430

0.1823

0.2805

0.3434

0.4260

RMSE

SGMS

33

0.1569

0.1570

0.1593

2

4

0.1661

4

1

0.1609

2

500

0.1760

1

100

Bias

M

N

0.1645

0.1655

0.1752

0.1904

0.1980

0.2517

RMSE

ROL

0.2032

0.2038

0.2009

0.1928

0.1806

0.1892

Bias

0.2082

0.2122

0.2205

0.2179

0.2224

0.2848

RMSE

ROP

0.0697

0.0780

0.0830

0.0770

0.0844

0.1039

Bias

βˆ2 /βˆ1

0.0754

0.0873

0.1033

0.1013

0.1222

0.1810

RMSE

MROL

Table 4: Monte Carlo results on

-0.0002

0.0005

0.0021

0.0029

0.0055

0.0307

Bias

0.0193

0.0309

0.0603

0.0561

0.0940

0.1873

RMSE

GMS

of DGP 3

0.0196

0.0214

0.0266

0.0329

0.0342

0.0532

Bias

0.0294

0.0381

0.0590

0.0644

0.0864

0.1446

RMSE

SGMS

34

-0.1502

-0.3021

-0.4945

2

4

-0.4880

4

1

-0.3021

2

500

-0.1276

1

100

Bias

M

N

0.4998

0.3145

0.1913

0.5123

0.3529

0.2794

RMSE

ROL

-0.4658

-0.3055

-0.1541

-0.4191

-0.2494

-0.0674

Bias

0.4711

0.3182

0.2037

0.4485

0.3187

0.2881

RMSE

ROP

-0.2828

-0.1401

-0.0008

-0.2963

-0.1520

0.0073

Bias

βˆ2 /βˆ1

0.2906

0.1634

0.1181

0.3324

0.2457

0.2721

RMSE

MROL

Table 5: Monte Carlo results on

-0.0032

0.1500

0.2872

-0.0063

0.1593

0.3087

Bias

0.1537

0.2356

0.3687

0.2591

0.3600

0.5129

RMSE

GMS

of DGP 4

0.0277

0.1785

0.3221

0.0457

0.2065

0.3674

Bias

0.0904

0.2093

0.3559

0.2099

0.3252

0.5121

RMSE

SGMS

35

-0.2970

-0.2752

-0.2692

2

4

-0.2531

4

1

-0.2553

2

500

-0.2618

1

100

Bias

M

N

0.2905

0.2983

0.3268

0.3538

0.3728

0.4159

RMSE

ROL

-0.3259

-0.3329

-0.3488

-0.2763

-0.2763

-0.2739

Bias

0.3438

0.3532

0.3766

0.3713

0.3900

0.4408

RMSE

ROP

0.0194

0.0124

-0.0117

-0.0037

-0.0105

-0.0157

Bias

βˆ2 /βˆ1

0.1132

0.1277

0.1614

0.2466

0.2877

0.3681

RMSE

MROL

Table 6: Monte Carlo results on

0.0141

-0.0020

-0.0442

0.0161

0.0093

0.0196

Bias

0.2280

0.2670

0.3193

0.4255

0.4857

0.5917

RMSE

GMS

of DGP 5

0.0412

0.0193

-0.0220

0.0633

0.0469

0.0390

Bias

0.1660

0.1823

0.2348

0.3398

0.3968

0.4891

RMSE

SGMS

36

-0.3179

-0.4322

-0.5700

2

4

-0.5540

4

1

-0.4141

2

500

-0.2859

1

100

Bias

M

N

0.5764

0.4434

0.3411

0.5848

0.4707

0.4019

RMSE

ROL

-0.3259

-0.3329

-0.3488

-0.5042

-0.3838

-0.2519

Bias

0.3478

0.2892

0.3766

0.5371

0.4493

0.4001

RMSE

ROP

-0.2768

-0.1575

-0.0451

-0.2750

-0.1543

-0.0375

Bias

βˆ2 /βˆ1

0.2888

0.1907

0.1442

0.3478

0.2891

0.3176

RMSE

MROL

Table 7: Monte Carlo results on

0.0006

0.1058

0.1926

0.0012

0.0988

0.2058

Bias

0.1977

0.2370

0.3225

0.3607

0.4181

0.5294

RMSE

GMS

of DGP 6

0.0358

0.1368

0.2355

0.0622

0.1716

0.2816

Bias

0.1356

0.2012

0.3008

0.2960

0.3763

0.5007

RMSE

SGMS

37

38

and a reduction in RMSE, the expected trade-os from using a smoothing kernel to construct a surrogate objective function.

For DGPs 1, 2 and 4, at least one parametric method allows consistent maximum

likelihood estimation. The results suggest that the eciency gains (as measured by the reduction in RMSE) that a consistent SGMS estimator oers over a consistent GMS estimator are comparable to what a consistent parametric estimator oers over the SGMS estimator itself. The results pertaining to DGPs 3, 4 and 6 present a particularly strong case for the considering the use of the semiparametric methods in empirical practice.

While none of the popular parametric methods

is consistent under these DGPs, at least one parametric method arguably comes close to getting each DGP approximately right; yet, even in the larger sample conguration (N

= 500),

an approximately correct

parametric method may still exhibit a substantial amount of bias. In the context of DGP 3, for instance, the ROP model is a correct specication apart from its failure to capture interpersonal heteroskedasticity; yet, the ROP method's bias stays in the neighbourhood of 20% of the true parameter value. In DGP 4 and DGP 6, there is alternative-specic heteroskedasticity induced via a normal error component which multiplies the second attribute

xnj,2 ; this error component can be absorbed into the normal random coecient on xnj,2 , and

the MROL model is therefore a correct specication apart from that it postulates the presence of a redundant extreme value error component. While the MROL method's bias is indeed negligible when only information on the most preferred alternative is used (M

= 1),

used and may reach 28% with complete rankings (M

it becomes amplied as deeper ranking information is

= 4).

While our experiments were designed to illustrate the properties of the semiparametric methods, the results also add some cautionary notes to the debate over the reliability of rank-ordered choice data. Based on the intuitively convincing premise that ranking is a more cognitively demanding task than making a choice, some researchers contend that in case a parametric method using the rst preference (M complete rankings (M

= J − 1)

= 1)

and

yield systematically dierent estimates, the econometrician should not make

use of complete rankings: see Chapman and Staelin (1982) and Ben-Akiva

et al.

(1992) for the inuential

and earliest proponents of this view. The results pertaining to DGPs 3-6, however, caution against basing data and model selection on the comparisons of the rst preference estimates and the complete rankings estimates. Inconsistent parametric methods may or may not be equally biased at all depth levels, and it is not always the case that the rst preference estimates are subject to smaller misspecication bias than the

39

complete rankings estimates.

5

Conclusions

To collect more preference information from a given sample of individuals, multinomial choice surveys can be readily modied to elicit rank-ordered choices. All parametric methods for multinomial choices have their rank-ordered choice counterparts that exploit the extra information to estimate the underlying random utility model more eciently. But semiparametric methods for rank-ordered choices remain undeveloped, apart from the seminal work of Hausman and Ruud (1987) that is only applicable to continuous regressors. We develop two semiparametric methods for rank-ordered choices: the generalized maximum score (GMS) estimator and the smoothed generalized maximum score (SGMS) estimator. The GMS estimator builds on the maximum score (MS) estimator (Manski, 1975; Fox, 2007) for multinomial choices.

Like its predecessor, the GMS

estimator allows the consistent estimation of coecients on both continuous and discrete regressors when there is a suitable continuous regressor for normalizing the scale of utility. We establish conditions for the strong consistency of the GMS estimator, which follows a non-standard asymptotic distribution and displays a

N −1/3

convergence rate. The SGMS estimator complements the GMS estimator, much as Horowitz's (1992)

smoothed MS estimator complements Manski's (1985) MS estimator in the context of binomial choices. By adding mild regularity conditions, we show that the SGMS estimator is also strongly consistent, and that it is asymptotically normal with a convergence rate approaching

N −1/2

as the strength of the smoothing

conditions increases. Our results are fairly general and cover data on complete rankings (i.e. a full preference ranking of

J

alternatives is observed) as well as partial rankings (i.e. up to the

M th

most preferred out of

J

alternatives are observed). Our study nds that rank-ordered choices provide an interesting data environment which can facilitate and benet from the development of semiparametric methods.

Most interestingly, our results show that

using the extra information from rank-ordered choices is not just a matter of eciency gains, to the contrary of what parametric analyses might lead one to anticipate.

For our semiparametric estimators, it is also

a matter of consistency in the sense that using complete rankings instead of partial rankings allows the semiparametric estimators to become robust to wider classes of stochastic specications. More specically,

40

the MS estimator for multinomial choices and the GMS/SGMS estimators for partial rankings are robust to any form of interpersonal heteroskedasticity.

But they are not robust to any error variance-covariance

structure that varies across alternatives, meaning that they cannot consistently estimate exible parametric models including nested logit, unrestricted probit and random coecient logit. By contrast, the GMS/SGMS estimators for complete rankings can accommodate error structures as such, fullling the usual expectations for a semiparametric method to be more exible than popular parametric methods. The main intuition behind this contrast is that the use of complete rankings allows one to infer which alternative is more preferred in every possible pair of alternatives in a choice set. The strong consistency of the GMS/SGMS estimators for complete rankings can be therefore shown under almost the same assumptions as that of the MS estimator for binomial choices, without invoking stronger assumptions needed to address more analytically complex cases of multinomial choices or partial rankings. Together with our Monte Carlo evidence on the bias of parametric methods under misspecication, this nding calls for a reconsideration of the conventional wisdom prevailing in the empirical literature.

Since

Chapman and Staelin (1982), several studies have contended that in case the estimates using complete rankings diverge from the estimates using information on the best alternative alone (or other types of partial rankings), one should have more faith in the latter set of estimates and question the reliability of data on deeper preference ranks.

But with our semiparametric methods, it is the former set of estimates that

is consistent under a wider variety of true models.

And with parametric methods, the discrepancy may

arise even when the reliability of data is beyond any doubt as in simulated samples, because the amount of misspecication bias may vary (non-monotonically) in the depth of rankings used. While the premise that an individual nds it easier to tell her best alternative than, say third- or fourth-best alternative, is intuitively appealing, testing the validity of the conventional wisdom would require the use of a semiparametric method which oers the same degree of robustness regardless of the depth of ranked used in estimation. In our view, the development of a method as such is a promising avenue for future research.

41

A

Proof of Theorem 1

In Appendix A, we provide the proofs of Theorem 1 and of Lemmas 1-3. Lemma 1 establishes the identication condition. Lemma 2 veries the continuity property of the probability limit of the GMS estimator's objective function

QN (b).

Lemma 3 shows the uniform convergence of

Throughout, for

Q∗ (b) ≡ E

b ∈ Rq ,

 

X



1≤j
to its probability limit.

let

  1(rj < rk ) · 1(x0jk b ≥ 0) + 1(rk < rj ) · 1(x0kj b > 0) , 

denote the probability limit of

Lemma 1.

QN (b)

QN (b)

Under Assumptions

(A1)

in (9).

, the true preference parameter vector β uniquely maximizes Q∗ (b) for

2-4

b ∈ B.

Proof.

Applying the law of iterated expectations to the right-hand side of (A1) yields

Q∗ (b)

=

E

 

X



1≤j
=

h

 i P (rj < rk |X) · 1(x0jk b ≥ 0) + P (rk < rj |X) · 1(x0kj b > 0) 

n o E [P (rj < rk |X) − P (rk < rj |X)] · 1(x0jk b ≥ 0) + P (rk < rj |X)

X 1≤j
It follows from Assumption 3 that

rk |X) − P (rk < rj |X) of

Q∗ (b).

and

β−

for any

β

globally maximizes

is the same as the sign of

Consider a dierent parameter vector

x0jk β .

Q∗ (b)

β − ∈ B.

with positive probability, if we observe that

of alternatives

j, k ∈ J,

the argument for

then we can conclude

β1 = −1

is similar. If

If, for values of

x0jk β

β−

β X

because the sign of

and

x0jk β −

with positive probability,

Q∗ (b).

β

β

In other words,

have opposite signs for some pair

We will show this argument for

the set of points where

P (rj <

is a unique global maximizer

will not maximize

Q∗ (β) > Q∗ (β − ).

β1− = 1,

b ∈ B

Next, we show that

yield dierent rankings of systematic utilities, then

X

for

and

β−

β1 = 1;

yield dierent ranking of

42

systematic utility is

D(β, β − )

=

{X|x0jk β < 0 < x0jk β − f or some j, k ∈ J}

=

˜ < −xjk,1 < x ˜ − f or some j, k ∈ J}. ˜ 0jk β {X|˜ x0jk β D(β, β − )

By Assumption 4(a), the probability of one for any pair of alternatives

− Assumption 4(b). If β1

= −1,

equals zero if and only if

j, k ∈ J, which implies that Xβ = Xβ the set of points where

β

and

β

−

−

˜ =x ˜− ˜ 0jk β ˜ 0jk β x

with probability

with probability one. This contradicts

give dierent predictions is

˜ − , −˜ ˜ f or some j, k ∈ J}. D(β, β − ) = {X|xjk,1 < min(˜ x0jk β x0jk β)

The probability of parameter vector

Lemma 2.

Proof.

D(β, β − )

β

is positive by Assumption 4(a). Thus, we have proved that the true preference

uniquely maximizes

Under Assumptions

2

For any pair of alternatives

Q∗ (b)

for b ∈ B.

and 4, Q∗ (b) is continuous in b ∈ B. j < k,

dene

Q∗jk (b) ≡ E [1(rj < rk ) − 1(rk < rj )] · 1(x0jk b ≥ 0) + 1(rk < rj ) .

Assume that

b1 = 1.

Q∗jk (b)

The argument for

b1 = −1

(A2)

is symmetric. By the law of iterated expectation,

n o = E [P (rj < rk |xjk ) − P (rk < rj |xjk )] · 1(x0jk b ≥ 0) + P (rk < rj |xjk )

=

´ n´ ∞ ˜ −˜ x0jk b

o [P (rj < rk |xjk ) − P (rk < rj |xjk )] gjk (xjk,1 |˜ xjk )dxjk,1 dP (˜ xjk ) + P (rk < rj ), (A3)

where

P (˜ xjk )

denotes the cumulative distribution function of

˜ jk . x

The curly brackets inner integral of the

43

right-hand side of (A3) is a function of

X

Q∗ (b) =

˜ jk x

and

b

that is continuous in

b ∈ B.

Therefore,

Q∗jk (b)

1≤j
is also continuous in

Lemma 3.

Proof.

b ∈ B.

Under Assumptions

1-2

For any pair of alternatives

QN jk (b) ≡ N −1

and 4, QN (b) converges almost surely to Q∗ (b) uniformly over b ∈ B.

j, k ∈ J,

dene

N X [1(rnj < rnk ) − 1(rnk < rnj )] · 1(x0njk b ≥ 0) + 1(rnk < rnj ) . n=1

By denitions (A2), (9) and (A1), we have

X

QN (b) =

Q∗jk (b) = E[QN jk (b)],

QN jk (b),

1≤j
and

X

Q∗ (b) =

Q∗jk (b).

1≤j
By Lemma 4 of Manski (1985), with probability one,

limN →∞ supb∈Rq QN jk (b) − Q∗jk (b) = 0,

Because

QN (b)

uniformly over

Proof.

is the sum of a nite number of term

converges almost surely to

Q∗ (b)

b ∈ B.

(THEOREM 1) The proof of strong consistency involves verifying the conditions of Theorem 2.1 in

Newey and McFadden (1994): (1)

QN jk (b), QN (b)

Q∗ (b)

is uniquely maximized at

β;

44

(2) The parameter space (3)

Q∗ (b)

(4)

QN (b)

B

is continuous in

is compact;

b;

and

converges almost surely to its probability limit

Q∗ (b)

uniformly over

b∈B

.

Conditions (1), (3), and (4) are veried by Lemmas 1, 2, and 3, respectively. Condition (2) is guaranteed by Assumption 2. Therefore, the GMS estimator that maximizes

QN (b)

converges to

β

almost surely under

Assumptions 1-4.

B

Proof of Theorems 2-4

In Appendix B, we provide the proofs of Theorems 2-4 and of Lemmas 4-9. Lemma 4 establishes the uniform convergence of the SGMS objective function to its probability limit. Lemmas 5-6 establish the asymptotic distribution of the normalized forms of

tN (β, hN ).

Lemmas 7-9 justify that

H N (b∗N , hN )

converges to a

nonstochastic matrix in probability. By applying Taylor series expansion, Lemmas 5-7 can be used to derive the asymptotic distribution of the centered, properly normalized SGMS estimator for the random utility model.

Lemma 4.

Under Assumptions

1-4

and Condition 1, QN (b, hN ) converges almost surely to Q∗ (b) uniformly

over b ∈ B. Proof.

First, we show that

QSN (b, hN )

converges almost surely to

QN (b)

uniformly over

b∈B

following the

method in Lemma 4 of Horowitz (1992). We calculate

N X S QN (b, hN ) − QN (b) ≤ 1 N n=1

X

1 x0njk b > 0 − K x0njk b/hN .

1≤j
The right-hand side of (B1) is the sum of

cN 1 (η)

and

cN 2 (η),

where

N

cN 1 (η) ≡

1X N n=1

X

1 x0njk b > 0 − K x0njk b/hN · 1 x0njk b ≥ η ,

1≤j
N

1X cN 2 (η) ≡ N n=1

X 1≤j
1 x0njk b > 0 − K x0njk b/hN · 1 x0njk b < η ,

(B1)

45

and

η ∈R

is a positive number. Condition 1(b) implies that for any

|K(x) − 1| < δ · J −2 for any

N > N0 .

uniformly over

and

|K(−x)| < δ · J −2

Therefore,

b∈B

as

N → ∞. "

X

cN 2 (η) ≤

cN 1 (η) < δ

C N

for any

for any

Next consider

x > c.

As

N > N0 . cN 2 (η).

δ > 0,

there exists

c >0

such that

hN → 0, there exist N0 ∈ N such that η/hN > c

We have shown that for each

η > 0, cN 1 (η) → 0

By Condition 1(a), there is a nite

C

such that

# N X 0 1 xnjk b < η .

(B2)

n=1

1≤j
Horowitz (1992) shows that the inner-bracket part of the right-hand side of (B2) converges to 0 uniformly over

b ∈ B.

Because

J

cN 2 (η)

is nite,

also converges to 0 uniformly over

side of (B1) converges to 0 uniformly over

sup QSN (b, hN ) − Q∗ (b)

b∈B

b∈B

as

N → ∞.

The right-hand

because

≤ sup QSN (b, hN ) − QN (b) + |QN (b) − Q∗ (b)|

b∈B

(B3)

b∈B

≤ sup QSN (b, hN ) − QN (b) + sup |QN (b) − Q∗ (b)| , b∈B

b∈B

and we have proved that the right-hand side of (B3) converges to 0 almost surely. converges almost surely to

Proof.

Q∗ (b)

uniformly over

Therefore,

QSN (b, hN )

b ∈ B.

(THEOREM 2) The proof of strong consistency involves verifying the conditions of Theorem 2.1 in

Newey and McFadden (1994): (1)

Q∗ (b)

is uniquely maximized at

(2) The parameter space (3)

Q∗ (b)

(4)

QSN (b, hN )

B

is continuous in

β;

is compact;

b;

and

converges uniformly almost surely to its probability limit

Q∗ (b).

Conditions (1), (3), and (4) are veried by Lemmas 1, 2, and 4, respectively. Condition (2) is guaranteed by Assumption 2. Therefore, the SGMS estimator that maximizes

QSN (b, hN )

under Assumptions 1-4 and Condition 1.

Lemma 5.

(a)

Let Assumptions 1, 3-4, 6-7 and Conditions 1-2 hold. Then

limN →∞ E h−d N tN (β, hN ) = a;

and

converges to

β

almost surely

46

limN →∞ V ar (N hN )1/2 tN (β, hN ) = Ω.

(b)

Proof.

First, under Assumption 1 we calculate that

X

E h−d = N tN (β, hN )

n o −d−1 E [1(rj < rk ) − 1(rk < rj )] K 0 (x0jk β/hN )˜ xjk hN

1≤j
X

=

(B4)

djk ,

1≤j
where

djk ≡ E [1(rj < rk ) − 1(rk < rj )] K 0 (x0jk β/hN )˜ xjk h−d−1 . N

(B5)

By the law of iterated expectations,

nh i o −d−1 ˜ − P (rk < rj |v−j,k , v ˜ · K 0 (−v−j,k /hN )˜ ˜ −j,k , X) ˜ −j,k , X) P (rj < rk |v−j,k , v xjk hN h i ˜ · K 0 (−v−j,k /hN )˜ ˜ −j,k , X) = E F¯jk (v−j,k , v xjk h−d−1 . N

djk

= E

By Assumption 3, around

v−j,k = 0

˜ = 0. ˜ −j,k , X) F¯jk (0, v

ξ

is between 0 and

d−1 X 1 ¯ (i) 1 ¯ (d) i d ˜ ˜ ˜ −j,k , X)(v ˜ −j,k , X)(v Fjk (0, v Fjk (ξ, v −j,k ) + −j,k ) , i! d! i=1

v−j,k .

˜ = pjk (v−j,k |˜ v −j,k , X)

ξi

is between 0 and

Taylor expansion of

d−i−1 X c=0

where

˜ ˜ −j,k , X) F¯jk (v−j,k , v

yields

˜ = ˜ −j,k , X) F¯jk (v−j,k , v

where

Application of the Taylor series expansion for

(B6)

v−j,k .

˜ pjk (v−j,k |˜ v −j,k , X)

around

v−j,k = 0

(B7)

yields

1 (i) 1 (d−i) i d−i ˜ ˜ pjk (0|˜ v −j,k , X)(v p (ξi |˜ v −j,k , X)(v , −j,k ) + −j,k ) c! (d − i)! jk

Combining (B7) and (B8) yields

(B8)

47

˜ pjk (v−j,k |˜ ˜ F¯jk (v−j,k , −˜ v −j,k , X) v −j,k , X)

=

d−1 X i=1

1 (i) d ˜ p(d−i) (ξi |˜ ˜ ˜ −j,k , X) F¯ (0, v v −j,k , X)(v −j,k ) jk i!(d − i)! jk

1 ¯ (d) d ˜ pjk (v−j,k |˜ ˜ ˜ −j,k , X) + d! Fjk (ξ, v v −j,k , X)(v −j,k ) d−1 d−i−1 X X 1 (i) i+c ˜ p(c) (0|˜ ˜ ˜ −j,k , X) + F¯jk (0, v v −j,k , X)(v , −j,k ) jk i!c! i=1 c=0 (B9)

whenever the derivatives exist.

Assumptions 6-7 imply that all of the derivatives in the right-hand side

of (B9) exist and are uniformly bounded for almost every

ζjk = −v−j,k /hN .

Decompose

djk

˜ (˜ v −j,k , X)

if

|v−j,k | ≤ η

for some

η > 0.

Let

into two parts:

djk ≡ djk1 + djk2 ,

where

ˆ djk1 = h−d N

˜ pj,k (−ζjk hN |˜ ˜ xjk K 0 (ζjk )dζjk dP (˜ ˜ ˜ −j,k , X) F¯jk (−ζjk hN , v v −j,k , X)˜ v −j,k , X)

(B10)

˜ pj,k (−ζjk hN |˜ ˜ xjk K 0 (ζjk )dζjk dP (˜ ˜ ˜ −j,k , X) F¯jk (−ζjk hN , v v −j,k , X)˜ v −j,k , X).

(B11)

|hN ζjk |>η

and

ˆ djk2 = h−d N

|hN ζjk |≤η

Under Assumption 7 and Condition 2,

ˆ |djk1 | ≤ Ch−d N

where

|djk1 |

˜ → 0, |˜ xjk | · |K 0 (ζjk )| dζjk dP (˜ v −j,k , X) |hN ζjk |>η

denotes the vector of the absolute values of

djk1 .

Plugging (B9) into (B11) and using the

48

assumption that

djk2 → kd

as

N → ∞,

K 0 (·)

is a

dth

order kernel yield the result that

d X

h i 1 (i) ˜ p(d−i) (0|˜ ˜ xjk ˜ −j,k , X) E F¯jk (0, v v −j,k , X)˜ jk i!(d − i)! i=1

(B12)

by Lebesgue's dominated convergence theorem. Therefore, by (B4) we have proved part (a).

Next consider

V ar[(N hN )1/2 tN (β, hN )].

V ar[(N hN )1/2 tN (β, hN )] = hN V ar

By Assumption 1,

 

X

[1(rj < rk ) − 1(rk < rj )] K 0 (x0jk β/hN )˜ xjk h−1 N



 

.



1≤j
Denote

X

eN ≡

[1(rj < rk ) − 1(rk < rj )] K 0 (x0jk β/hN )˜ xjk h−1 N ,

(B13)

1≤j
then

V ar[(N hN )1/2 tN (β, hN )] = hN E(eN e0N ) − hN E(eN )E(e0N ).

In part (a), we show that where

E[h−d N eN ] = O(1),

so

hN E(eN )E(e0N ) = o(1).

(B14)

Since the binomial choice setting

J = 2 has been discussed in Horowitz (1992), the following discussion focuses on the case where J ≥ 3.

Dene

hN E(eN e0N ) ≡ LN 1 + LN 2 ,

(B15)

where

LN 1 =

X

2h−1 N E {[1(rj < rk ) − 1(rk < rj )] [1(rj < rl ) − 1(rl < rj )]

1≤j
˜ 0jl ·K 0 (x0jk β/hN )K 0 (x0jl β/hN )˜ xjk x ˜ 0kl + [1(rj < rk ) − 1(rk < rj )] [1(rk < rl ) − 1(rl < rk )] K 0 (x0jk β/hN )K 0 (x0kl β/hN )˜ xjk x o ˜ 0kl , + [1(rj < rl ) − 1(rl < rj )] [1(rk < rl ) − 1(rl < rk )] K 0 (x0jl β/hN )K 0 (x0kl β/hN )˜ xjl x

(B16)

49

and

o n 2 2 0 0 0 ˜ ˜ h−1 x x E [1(r < r ) − 1(r < r )] K (x β/h ) jk j k k j N jk jk . N

X

LN 2 =

(B17)

1≤j
Let

ζjk = −v−j,k /hN

for any pair of alternatives

X

LN 2 =

j, k ∈ J.

By the law of iterated expectation,

i ´h ˜ + Fkj (hN ζjk , v ˜ ˜ −j,k , X) ˜ −j,k + hN ζjk ιJ−1 , X) Fjk (−hN ζjk , v

(B18)

1≤j
By Assumptions 3, 6-7, Condition 2, and Lebesgue's dominated convergence theorem,

N → ∞.

LN 2 → Ω

By Assumption 7,

|LN 1 | ≤

´ ˜ ˜ 0jl |dζjk dζjl dP (˜ 2ChN [ |K 0 (ζjk )K 0 (ζjl )˜ xjk x v −j,kl , X)

X 1≤j
Lemma 6.

(B19)

´ ˜ ˜ 0kl |dζkj dζkl dP (˜ v −k,jl , X) + |K 0 (ζjk )K 0 (ζkl )˜ xjk x ´ ˜ ˜ 0kl | dζlj dζlk dP (˜ + |K 0 (ζjl )K 0 (ζkl )˜ xjl x v −l,jk , X)].

Thus, by Assumption 7 and Condition 2,

LN 1 → 0

when

N → ∞.

We have proved part (b).

Let Assumptions 1, 3-4, 6-7 and Conditions 1-2 hold. Then

(a)

2d+1 If N hN → ∞ as N → ∞, then h−d N tN (β, hN ) → a.

(b)

d 2d+1 If N hN → λ, where λ ∈ (0, ∞), as N → ∞, then (N hN )1/2 tN (β, hN ) → M V N (λ1/2 a, Ω).

Proof.

when

p

If

2d+1 N hN →∞

as

N → ∞,

−d V ar[hN tN (β, hN )] =

then

1 V ar[(N hN )1/2 tN (β, hN )] → 0 N h2d+1 N

by Lemma 5(b). Lemma 5(b) together with Chebyshev's Theorem imply Lemma 6(a). Next consider part (b). Dene

wN = (N hN )

1/2

{tN (β, hN ) − E[tN (β, hN )]} .

1/2 (N hN )1/2 E[tN (β, hN )] = (N h2d+1 )1/2 E[h−d a, N N tN (β, hN )] → λ

Lemma 5(a) implies that

50

so it suces to prove that dimensional vector

γ

γ 0 wN

γ 0 γ = 1.

such that

N X

tN (β, hN ) ≡ N −1

is asymptotically distributed as

N (0, γ 0 Ωγ)

for any nonstochastic

q−1

Denote that

tN n ,

n=1 where

X

tN n ≡ tN n (β, hN ) =

[1(rnj < rnk ) − 1(rnk < rnj )] K 0 (x0njk β/hN )˜ xnjk h−1 N .

1≤j
So we have

γ 0 wN = (hN /N )1/2 γ 0

N X

[tN n − E(tN n )].

n=1

Let

CFN (·)

denote the characteristic function of

γ 0 wN .

Using the proof of Lemma 6 in Horowitz (1992)

yields the result that

lim CFN (τ ) = exp(−γ 0 Ωγτ 2 /2),

N →∞

which is the same as the characteristic function of

Lemma 7.

N (0, γ 0 Ωγ).

Let Assumptions 1, 3-4, 6-8 and Conditions 1-2 hold. For any pair of alternatives j, k ∈ J,

˜ , assume that ||˜xjk || ≤ c for some c > 0. Let η be some positive real number such that p(1) v −j,k , X) jk (v−j,k |˜ (1) ˜ , ˜ −j,k , X) F¯jk (v−j,k , v

(2) ˜ exist and are uniformly bounded for almost every (˜ ˜ ˜ −j,k , X) and F¯jk (v−j,k , v v −j,k , X)

if |v−j,k | ≤ η. For θ ∈ Rq−1 , dene t∗N (θ)

=

(N h2N )−1

N X

X

˜ 0njk θ)˜ [1(rnj < rnk ) − 1(rnk < rnj )] K 0 (x0njk β/hN + x xnjk .

n=1 1≤j
Dene the sets ΘN , N = 1, 2, . . ., by ΘN = θ : θ ∈ Rq−1 , hN kθk ≤ η/2c .

51

(a)

Then plim sup kt∗N (θ) − E [t∗N (θ)]k = 0.

(B20)

N →∞ θ∈ΘN

(b)

There are nite numbers α1 and α2 such that for all θ ∈ ΘN kE[t∗N (θ)] − Hθk ≤ o(1) + α1 hN kθk + α2 hN kθk

2

(B21)

uniformly over θ ∈ ΘN . Proof.

Dene

g N n (θ)

X

=

n ˜ 0njk θ)˜ xnjk [1(rnj < rnk ) − 1(rnk < rnj )] K 0 (x0njk β/hN + x (B22)

1≤j
h

−E [1(rnj < rnk ) − 1(rnk < rnj )] K

0

(x0njk β/hN

+

˜ 0njk θ)˜ xnjk x

io

.

The remaining part of the proof of (B20) follows the proof of (A15) in Lemma 7 of Horowitz (1992). Next, we prove (B21). Dene

X

E [t∗N (θ)] ≡

t∗N jk (θ),

1≤j
where

t∗N jk (θ)

Decompose

o n 0 0 ˜ 0jk θ)˜ xjk . = h−2 N E [1(rj < rk ) − 1(rk < rj )] K (xjk β/hN + x h i 0 ˜ ¯ ˜ −j,k , X)K ˜ 0jk θ)˜ = h−2 (−v−j,k /hN + x xjk . N E Fjk (v−j,k , v

t∗N jk (θ)

t∗N jk1 = h−2 N

into two parts:

´ |v−j,k |>η

t∗N jk (θ) ≡ t∗N jk1 + t∗N jk2 ,

where

0 ˜ ˜ −j,k , X)K ˜ 0jk θ) F¯jk (v−j,k , v (−v−j,k /hN + x

˜ ˜ ·˜ xjk pjk (v−j,k |˜ v −j,k , X)dv v −j,k , X) −j,k dP (˜

52

and

t∗N jk2 = h−2 N

´ |v−j,k |≤η

0 ˜ ˜ −j,k , X)K ˜ 0jk θ) F¯jk (v−j,k , v (−v−j,k /hN + x

˜ ˜ ·˜ xjk pjk (v−j,k |˜ v −j,k , X)dv v −j,k , X). −j,k dP (˜ For some nite

C > 0,

by Assumption 7(a) and

ˆ

∗

tN jk1 ≤ Ch−2 N

Let

||˜ xjk || ≤ c,

0 ˜ K (−v−j,k /hN + x ˜ 0jk θ) dv−j,k dP (˜ v −j,k , X).

|v−j,k |>η

˜ 0jk θ . ζjk = −v−j,k /hN + x

Since

hN ||θ|| ≤ η/2c

and

||˜ xjk || ≤ c, |v−j,k | > η

implies that

|ζjk | > | − v−j,k /hN | − |˜ x0jk θ| > η/2hN

and

ˆ

∗

tN jk1 ≤ Ch−1 N

|K 0 (ζjk )| dζjk .

(B23)

|ζjk |>η/2hN

We have

lim sup t∗N jk1 = 0,

(B24)

N →∞θ∈ΘN

because the term on the right-hand side of (B23) converges to 0 by Condition 2. Next, we consider

|v−j,k | ≤ η ,

substitution of

d=2

If

into the right-hand side of (B9) yields

˜ pjk (v−j,k |˜ ˜ F¯jk (v−j,k , −˜ v −j,k , X) v −j,k , X) =

t∗N jk2 .

(1) ˜ pjk (0|˜ ˜ −j,k ˜ −j,k , X) F¯jk (0, v v −j,k , X)v (1) 2 ˜ p(1) (ξ1 |ξ1 , v ˜ ˜ −j,k , X) ˜ −j,k , X)(v +F¯jk (0, v −j,k ) jk (2) 2 ˜ pjk (v−j,k |˜ ˜ ˜ −j,k , X) +(1/2)F¯jk (ξ, v v −j,k , X)(v −j,k ) ,

(B25)

53

where

ξ

and

ξ1

are between 0 and

v−j,k .

Decompose

t∗N jk2

into two parts

t∗N jk2 ≡ sN jk1 + sN jk2 ,

where

´

sN jk1 = h−2 N

(1) ˜ pjk (0|˜ ˜ xjk ˜ −j,k , X) F¯jk (0, v v −j,k , X)˜

|v−j,k |≤η

˜ ˜ 0jk θ)dv−j,k dP (˜ ·v−j,k K 0 (−v−j,k /hN + x v −j,k , X), and

´

sN jk2 = h−2 N

|v−j,k |≤η

(1) ˜ p(1) (ξ1 |ξ1 , v ˜ ˜ −j,k , X) ˜ −j,k , X) [F¯jk (0, v jk

(2) ˜ pjk (v−j,k |˜ ˜ xjk ˜ −j,k , X) +(1/2)F¯jk (ξ, v v −j,k , X)]˜

˜ ˜ 0jk θ)dv−j,k dP (˜ v −j,k , X). ·(v−j,k )2 K 0 (−v−j,k /hN + x Let

˜ 0jk θ , ζjk = −v−j,k /hN + x

sN jk1 =

then

´ |ζjk −˜ x0jk θ|≤η/hN

(1) ˜ pjk (0|˜ ˜ ˜ −j,k , X) F¯jk (0, v v −j,k , X)

˜ ˜ 0jk θ)K 0 (ζjk )dζjk dP (˜ v −j,k , X). ·˜ xjk (ζjk − x Because

|

´

ζK 0 (ζ)dζ = 0

´ |ζjk −˜ x0jk θ|≤η/hN

and

|˜ x0jk θhN | ≤ η/2,

´ ζjk K 0 (ζjk )dζjk | = | |ζjk −˜x0 θ|>η/hN ζjk K 0 (ζjk )dζjk | jk ´ ≤ |ζjk |>η/2hN |ζjk K 0 (ζjk )|dζjk .

By Condition 2, the right-hand term of (B26) is bounded uniformly over

θ ∈ ΘN

(B26)

and it converges to 0.

Therefore, by Lebesgue's dominated convergence theorem,

lim sup |

N →∞θ∈ΘN

´ |ζjk −˜ x0jk θ|≤η/hN

(1) ˜ ˜ −j,k , X) F¯jk (0, v

˜ = 0. ˜ xjk ζjk K 0 (ζjk )dζjk dP (˜ ·pjk (0|˜ v −j,k , X)˜ v −j,k , X)|

(B27)

54

In addition,

||˜ x0jk θ

´ |ζjk −˜ x0jk θ|≤η/hN

˜ 0jk θ|| K 0 (ζjk )dζjk − x

(B28)

´ ≤ |˜ x0jk θhN |h−1 |K 0 (ζjk )|dζjk N |ζjk −˜ x0jk θ|>η/hN ´ ≤ (η/2)h−1 |K 0 (ζjk )|dζjk . N |ζjk −˜ x0 θ|>η/hN jk

By Condition 2, the right-hand side of (B28) is bounded uniformly over by Lebesgue's dominated convergence theorem and the denition of

N →∞ θ∈ΘN

´

X

lim || sup

|ζjk −˜ x0jk θ|≤η/hN

θ ∈ ΘN

and it converges to 0. Next,

H,

(1) ˜ ˜ −j,k , X) F¯jk (0, v

(B29)

1≤j
˜ − Hθ|| = 0. ˜ xjk x ˜ 0jk θζjk K 0 (ζjk )dζjk dP (˜ ·pjk (0|˜ v −j,k , X)˜ v −j,k , X) For some nite

C > 0,

||sN jk2 || ≤ ChN

´ |ζjk −˜ x0jk θ|≤η/hN

˜ ˜ 0jk θ)2 |K 0 (ζjk )|dζjk dP (˜ v −j,k , X) (ζjk − x

(B30)

≤ o(1) + αjk1 hN ||θ|| + αjk2 hN ||θ||2 for some nite

Lemma 8.

αjk1

and

αjk2 .

So part (b) is established by combining (B24), (B27), (B29), and (B30).

S

S ˜ Let Assumptions 1-9 and Conditions 1-2 hold and dene θN = (b˜N − β)/h N , where bN is a

SGMS estimator. Then the probability limit of θN is zero. Proof. Let

Pδ

Given any

δ > 0, choose γ to be a nite number such that P r(||˜ xjk || ≤ γf orany1 ≤ j < k ≤ J) ≥ 1−δ .

be the probability distribution of

k ≤ J}.

Let

Cγ0

X

denote the complement of

conditional on the event

Cγ .

Cγ ≡ {X : |||˜ xjk || ≤ γ f or any 1 ≤ j <

The remaining part of the proof of Lemma 8 follows the proof

in Lemma 8 of Horowitz (1992).

Lemma 9.

0

Let Assumptions 1-8 and Conditions 1-2 hold. Let {βN = (βN 1 , β˜ N )0 } be any sequence in B

such that (βN − β)/hN → 0 as N → ∞. Then the probability limit of H N (βN , hN ) is H . Proof.

Assume that

βN 1 = β1 ∈ {−1, 1} because this is true for suciently large N

˜ − β)/h ˜ θ N = (β N. N

Let

{aN }

be a sequence such that

aN → ∞

and

if

βN 1 − β1 → 0 .

a N θN → 0

as

N → ∞.

Denote Dene

55

˜ N = {X ˜ : ||˜ X xjk || ≤ aN f or any 1 ≤ j < k ≤ J}.

For any

> 0,

h i ˜N . lim P [|H N (β N , hN ) − H| > ] = lim P |H N (β N , hN ) − H| > |X

N →∞

N →∞

Therefore, it suces to show that shev's Theorem. Consider Dene

˜ N] → H E[H N (β N , hN )|X

˜ N] E[H N (β N , hN )|X

˜ N ], E N ≡ E[H N (β N , hN )|X

E N jk = h−2 N

´

then

and

˜ N] → 0 V ar[H N (β N , hN )|X

by Cheby-

rst.

EN =

P

1≤j
E N jk ,

where

˜ pjk (v−j,k |˜ ˜ xjk x ˜ −j,k , X) ˜ 0jk F¯jk (v−j,k , v v −j,k , X)˜

(B31)

˜ ˜ 0jk θ N )dv−j,k dPN jk (˜ ·K 00 (−v−j,k /hN + x v −j,k , X), and

η

PN jk

denote the distribution of

such that

˜ (˜ v −j,k , X)

conditional on

(1) ˜ , F¯ (2) (v−j,k , v ˜ , ˜ −j,k , X) ˜ −j,k , X) F¯jk (v−j,k , v jk

uniformly bounded if

|v−j,k | ≤ η .

and

˜ N. X

By Assumptions 6-7, there exists an

(1)

˜ pjk (v−j,k |˜ v −j,k , X)

exist and are almost surely

Therefore, substitution of (B25) into (B31) yields

E N jk = I N jk1 + I N jk2 + I N jk3 ,

where

I N jk1 = h−2 N

´ |v−j,k |≤η

(1) ˜ pjk (0|˜ ˜ xjk x ˜ −j,k , X) ˜ 0jk F¯jk (0, v v −j,k , X)˜

˜ ˜ 0jk θ N )dv−j,k dPN jk (˜ v −j,k , X), ·v−j,k K 00 (−v−j,k /hN + x I N jk2 = h−2 N

´ |v−j,k |≤η

(1) ˜ p(1) (ξ1 |˜ ˜ ˜ −j,k , X) [F¯jk (0, v v −j,k , X) jk

(2) ˜ pj,k (v−j,k |˜ ˜ xjk x ˜ −j,k , X) ˜ 0jk +(1/2)F¯jk (ξ, v v −j,k , X)]˜

˜ ˜ 0jk θ N )dv−j,k dPN jk (˜ ·(v−j,k )2 K 00 (−v−j,k /hN + x v −j,k , X), and

I N jk3 = h−2 N

´ |v−j,k |>η

˜ pjk (v−j,k |˜ ˜ xjk x ˜ −j,k , X) ˜ 0jk F¯jk (v−j,k , v v −j,k , X)˜

˜ ˜ 0jk θ N )dv−j,k dPN jk (˜ ·K 00 (−v−j,k /hN + x v −j,k , X).

(B32)

56

Dene

˜ 0jk θ N . ζjk = −v−j,k /hN + x

I N jk1 =

´ |ζjk −˜ x0jk θ N |≤η/hN

Then

(1) ˜ pj,k (0|˜ ˜ xjk x ˜ −j,k , X) ˜ 0jk F¯jk (0, v v −j,k , X)˜

˜ ˜ 0jk θ N )K 00 (ζjk )dζjk dPN jk (˜ ·(ζjk − x v −j,k , X). Because

|˜ x0jk θ N | ≤ aN ||θ N || → 0,

by Assumptions 6-7 and Conditions 1-2,

h i (1) ˜ pjk (0|˜ ˜ xjk x ˜ −j,k , X) ˜ 0jk . I N jk1 → E F¯jk (0, v v −j,k , X)˜

C > 0,

For some nite

|I N jk2 | ≤ ChN

(B33)

by Assumptions 6-7 and Conditions 1-2,

´ |ζjk −˜ x0jk θ N |≤η/hN

˜ 0jk | |˜ xjk x

(B34)

˜ → 0. ˜ 0jk θ N )2 |K 00 (ζjk )|dζjk dPN jk (˜ v −j,k , X) ·(ζjk − x Finally we calculate (B32):

ˆ |I N jk3 | ≤

Ch−1 N

|ζjk −˜ x0jk θ N |>η/hN

˜ ˜ 0jk | · |K 00 (ζjk )|dζjk dPN jk (˜ |˜ xjk x v −j,k , X).

Under Assumptions 6-7 and Condition 2, the right-hand side of (B35) converges to 0 as

(B35)

N

goes to innity.

Combination of (B35), (B33), (B34), and (B35) establishes that

X

EN =

E N jk =

1≤j
Next consider

=

X

(I N jk1 + I N jk2 + I N jk3 ) → H.

1≤j
˜ N ]: V ar[H(β N , hN )|X

˜ N] V ar[H(β N , hN )|X X ˜ ˜ 0jk θ N )˜ ˜ 0jk h−2 N −1 V ar{ [1(rnj < rnk ) − 1(rnk < rnj )] K 00 (−v−j,k /hN + x xjk x N |X N } 1≤j
=

N

−1

˜ N ] + O(N −1 ), E[r N (θ N )r N (θ N )0 |X

(B36)

57

where

X

r N (θ N ) =

˜ 0jk θ N )vec(˜ ˜ 0jk )h−2 [1(rnj < rnk ) − 1(rnk < rnj )] K 00 (−v−j,k /hN + x xjk x N .

1≤j
˜ 0jk θ N . ζjk = −v−j,k /hN + x

Let

For some nite

C,

˜ N] N −1 E[r N (θ N )r N (θ N )0 |X X ˆ ˜ 0jk )vec(˜ ˜ 0jk )0 vec(˜ xjk x ≤ ChN (N h4N )−1 xjk x

(B37)

1≤j
˜ ·[K 00 (ζjk )]2 dζjk dPN jk (˜ v −j,k , X) ˆ X ˜ 0jl )0 ˜ 0jk )vec(˜ xjl x vec(˜ xjk x

+Ch2N (N h4N )−1

k6=l; k,l∈J\{j}

˜ ·K 00 (ζjk )K 00 (ζjl )dζjk dζjl dPN jkl (˜ v −j,kl , X), where

PN jkl

is the distribution of

˜ (˜ v −j,kl , X)

conditional on

˜ N. X

The right-hand side of (B37) converges

to zero by Assumptions 6-8 and Condition 2. Therefore, it follows from (B36) that

˜ N] V ar[H(β N , hN )|X

converges to zero.

Proof. 1 as of

(THEOREM 3) By Theorem 2,

N → ∞,

tN (bSN , hN )

and consequently

˜S bSN,1 = β1 and b N

tN (bSN , hN ) = 0

around the true parameter vector

is an interior point of

˜ with probability approaching B

with probability approaching 1. A Taylor series expansion

β

yields

˜S − β) ˜ = 0, tN (β, hN ) + H N (b∗N , hN )(b N

where

b∗N

is between

β

and

(B38)

bSN .

Part (a): By (B38),

S

∗ −d ˜ ˜ h−d N tN (β, hN ) + H N (bN , hN )hN (bN − β) = 0

with probability approaching 1 as

N → ∞.

Lemmas 8-9 imply that

plimH N (b∗N , hN ) = H .

Because

H

is

58

nonsingular by Assumption 9, we have

−1 −d ˜S ˜ h−d hN tN (β, hN ) + op (1). N (bN − β) = −H

Part (a) now follows from Lemma 6(a). Part (b): By (B38),

˜S − β) ˜ =0 (N hN )1/2 tN (β, hN ) + H N (b∗N , hN )(N hN )1/2 (b N

with probability approaching 1 as

N → ∞.

So, by Lemmas 8-9 and Assumption 9,

S

˜ − β) ˜ = −H −1 (N hN )1/2 tN (β, hN ) + op (1). (N hN )1/2 (b N

Part (b) now follows from Lemma 6(b). Part (c): By the property of matrix trace,

˜ 0 ]}. ˜ b ˜S − β) ˜ = T r{W EA [(b ˜S − β)( ˜ 0 W (b ˜S − β)] ˜S − β) EA [(b N N N N

Part (b) implies that

˜ 0] ˜ b ˜S − β) ˜S − β)( EA [(b N N = N −2d/(2d+1) λ−1/(2d+1) H −1 ΩH −1 + λ2d/(2d+1) H −1 aa0 H −1 . Therefore, by the denition of MSE,

M SE = N −2d/(2d+1) T r W S −1 λ−1/(2d+1) Ω + λ2d/(2d+1) aa0 H −1 .

To minimize MSE, take the dierentiation of the right-hand side of (B39) with respect to

(B39)

λ.

From the rst

59

order condition, we show that MSE is minimized by setting

λ = λ∗ ,

where

λ∗ = [trace(W H −1 ΩH −1 )]/[trace(2dW H −1 aa0 H −1 )].

It follows from Part (b) that

˜S − β) ˜ N d/(2d+1) (b N

(B40)

has the asymptotic distribution

M V N (−(λ∗ )d/(2d+1) H −1 a, (λ∗ )−1/(2d+1) H −1 ΩH −1 ).

Proof.

(THEOREM 4)

Part (a): Applying Taylor expansion to

tN (bSN , h∗N )

around

β

yields

(h∗N )−d tN (bSN , h∗N ) = (h∗N )−d tN (β, h∗N ) ˜ +[∂tN (b∗N , h∗N )/∂ b with probability approaching one as

op (1).

Therefore,

˜S (h∗N )−d (b N

N → ∞,

∗ ˜S − β)/h ˜ (b N N = op (1).

˜ = op (1) − β)

˜S ] (h∗N )−d (b N

where

b∗N

˜ − β)

is between

Lemma 9 implies that

−d ˜S because (hN ) (bN

plim[(h∗N )−d tN (β, h∗N )] = a

(B41)

0

˜ = Op (1) − β)

bSN

and

β.

By Lemma 8,

S

˜ ˜ − β)/h (b N = N

˜0 ] = H . plim[∂tN (b∗N , h∗N )/∂ b

by Theorem 3 and

In addition,

(h∗N /hN )−d → 0.

by Lemma 5(a). Part (a) now follows by taking probability limits as

Finally,

N →∞

of

each side of (B41). Part (b): By Chebyshev's Theorem, it suces to show that consider

ˆ N) → Ω E(Ω

and

ˆ N ) → 0. V ar(Ω

First

ˆ N ): E(Ω

ˆ N ) = hN E[tN n (bS , hN )tN n (bS , hN )0 ] ≡ L∗ + L∗ , E(Ω N N N1 N2

where

L∗N 1

=

h−1 N

X 1≤j

h i2 S 0 0 0 ˜ jk x ˜ jk , E [1(rj < rk ) − 1(rk < rj )] K (xjk bN /hN ) x

(B42)

60

and

X

L∗N 2 =

2h−1 N E {[1(rj < rk ) − 1(rk < rj )] [1(rj < rl ) − 1(rl < rj )]

1≤j
˜ 0jl ·K 0 (x0jk bSN /hN )K 0 (x0jl bSN /hN )˜ xjk x ˜ 0kl + [1(rj < rk ) − 1(rk < rj )] [1(rk < rl ) − 1(rl < rk )] K 0 (x0jk bSN /hN )K 0 (x0kl bSN /hN )˜ xjk x ˜ 0kl ]. + [1(rj < rl ) − 1(rl < rj )] [1(rk < rl ) − 1(rl < rk )] K 0 (x0jl bSN /hN )K 0 (x0kl bSN /hN )˜ xjl x Let

˜S − β)/h ˜ θ N = (b N N ´

X

L∗N 1 =

and

˜ 0jk θ N . ζjk = −v−j,k /hN + x

Then

˜ ˜ 0jk θ N ), v ˜ −j,k , X] {Fjk [hN (−ζjk + x

(B43)

1≤j
˜ ˜ −j,k + hN (−ζjk + x ˜ 0jk θ N )ι0J−1 , X} ˜ 0jk θ N ), v +Fkj [hN (−ζjk + x ˜ ˜ xjk x ˜ 0jk [K 0 (ζjk )]2 dζjk dP (˜ ˜ 0jk θ N )|˜ v −j,k , X). v −j,k , X]˜ ·pjk [hN (−ζjk + x By Assumptions 3, 6-7, Condition 2, and Lebesgue's dominated convergence theorem, the right-hand side of (B43) converges to

|L∗N 2 | ≤

Ω

when

X 1≤j
N → ∞.

Under Assumption 7,

´ ˜ ˜ 0jl |dζjk dζjl dP (˜ v −j,kl , X) 2ChN [ |K 0 (ζjk )K 0 (ζjl )˜ xjk x ´ ˜ ˜ 0kl |dζkj dζkl dP (˜ + |K 0 (ζjk )K 0 (ζkl )˜ xjk x v −k,jl , X) ´ ˜ ˜ 0kl | dζlj dζlk dP (˜ v −l,jk , X)]. + |K 0 (ζjl )K 0 (ζkl )˜ xjl x

Therefore, the right-hand side of (B44) converges to 0 when

ˆ N) → Ω E(Ω

by (B42).

(B44)

N →∞

by Assumption 7 and Condition 2. So

61

Next consider

ˆ N) V ar(Ω

ˆ N ). V ar(Ω

By Assumption 1, we can calculate

=

0 S S V ar tN n bN , hN tN n bN , hN 0 S S 2 hN /N E vec tN n bN , hN tN n bN , hN ) 0 0 S S vec tN n bN , hN tN n bN , hN + o(1)

=

(N h2N )−1 E [cc0 ] + o(1),

=

h2N /N

(B45)

where

c≡

X

cjklm ,

1≤j
˜ 0lm ). xjk x cjklm ≡ [1(rj < rk ) − 1(rk < rj )] [1(rl < rm ) − 1(rm < rl )] K 0 (x0jk bSN /hN )K 0 (x0lm bSN /hN )vec(˜

The right-hand side of (B45) converges to 0 under Assumption 7 and Condition 2. Therefore we have proved that

ˆ N ) = 0. V ar(Ω

Part (c): This is a result implied by Lemma 9.

62

References [1] Abrevaya J and Huang J. 2005. On the Bootstrap of the Maximum Score Estimator.

Econometrica

73:

1175-1204.

[2] Barberá S and Pattanaik P. 1986. Falmagne and the Rationalizability of Stochastic Choices in Terms of Random Orderings.

Econometrica

54: 707-715.

[3] Beggs S, Cardell S, and Hausman J. 1981. Assessing the potential demand for electric cars.

Econometrics

Journal of

16: 1-19.

[4] Ben-Akiva M, Morikawa T, and Shiroishi F. 1992. Analysis of the reliability of preference ranking data.

Journal of Business Research

24: 149-164.

[5] Calfee J, Winston C, and Stempski R. 2001. Econometric Issues in Estimating Consumer Preferences from Stated Preference Data: a Case Study of the Value of Automobile Travel Time.

and Statistics

Review of Economics

83: 699-707.

[6] Caparros A, Oviedo J, and Campos P. 2008. Would you choose your preferred alternative? Comparing choice and recoded ranking experiments.

American Journal of Agricultural Economics

[7] Cavanagh CL. 1987. Limiting Behavior of Estimators Dened by Optimization.

90: 843-855.

Unpublished manuscript,

Department of Economics, Harvard University, Cambridge, MA.

[8] Chapman R and Staelin R. 1982. Exploiting rank ordered choice set data within the stochastic utility model.

Journal of Marketing Research

19: 288-301.

[9] Conte A, Hey JD, Moatt PG. 2011. Mixture models of choice under risk. Journal of Econometrics 162: 79-88.

[10] Dagsvik J and Liu G. 2009. A framework for analyzing rank-ordered data with application to automobile demand.

Transportation Research Part A 43:

1-12.

[11] Delgado M, Rodríguez-Poo J, and Wolf M. 2001. Subsampling Inference in Cube Root Asymptotics with an Application to Manski's Maximum Score Estimator.

Economics Letters, 73:

241-250.

63

[12] Falmagne J. 1978. A Representation Theorem for Finite Random Scale Systems.

Psychology, 18:

Journal of Mathematical

52-72.

[13] Fiebig D, Keane M, Louviere J, and Wasi N. 2010. The Generalized Multinomial Logit Model: Accounting for Scale and Coecient Heterogeneity.

Market Science, 29:

393-421.

[14] Fox J. 2007. Semiparametric Estimation of Multinomial Discrete-choice Models Using a Subset of Choices.

RAND Journal of Economics, 38:

1002-1019.

[15] Goeree JK, Holt C, Palfrey T. 2005. Regular Quantal Response Equilibrium.

Experimental Economics

8: 347-367.

[16] Greene WH, Hensher DA, Rose J. 2006. Accounting for heterogeneity in the variance of unobserved eects in mixed logit models. Transportation Research Part B 40: 75-92.

[17] Han A. 1987. Non-parametric Analysis of a Generalized Regression Model.

Journal of Econometrics

35:

303-316.

[18] Harrison GW, Rutstr¨ om EE. 2009. Expected utility theory and prospect theory: one wedding and a decent funeral. Experimental Economics 12: 133-158.

[19] Hausman J and Ruud P. 1987. Specifying and testing econometric models for rank-ordered data.

of Econometrics

34: 83-104.

[20] Hensher DA, Louviere J and Swait J. 1999. Combining sources of preference data.

metrics

Journal

Journal of Econo-

89: 197-221.

[21] Horowitz J. 1992. A Smoothed Maximum Score Estimator for the Binary Response Model.

Econometrica,

60: 505-531.

[22] Kim J and Pollard D. 1990. Cube Root Asymptotics.

Annals of Statistics

18: 191-219.

[23] Klein RW, Spady RH. 1993. An Ecient Semiparametric Estimator for Binary Response Models.

metrica

61: 387-421.

Econo-

64

[24] Layton D. 2000. Random coecient models for stated preference surveys.

Economics and Management

Journal of Environmental

40: 21-36.

[25] Layton D and Levine R. 2003. How Much Does the Far Future Matter? A Hierarchical Bayesian Analysis of the Public's Willingness to Mitigate Ecological Impacts of Climate Change.

Statistical Association

Journal the American

98: 533-544.

[26] Lee L-F. 1995. Semiparametric maximum likelihood estimation of polychotomous and sequential choice models.

Journal of Econometrics, 65:

381-428.

[27] Lewbel A. 2000. Semiparametric Qualitative Response Model Estimation With Instrumental Variables and Unknown Heteroscedasticity.

Journal of Econometrics

97: 145-177.

[28] Manski C. 1975. Maximum Score Estimation of the Stochastic Utility Model of Choice.

Econometrics, 3:

Journal of

205-228.

[29] Manski C. 1985. Semiparametric Analysis of Discrete Response: Asymptotic Properties of the Maximum Score Estimator.

Journal of Econometrics, 27:

313-334.

[30] McFadden D. 1974. Conditional logit analysis of qualitative choice behavior. In: Zarembka P. (Ed.),

Frontiers in Econometrics. Academic Press:

New York, pp.105-142.

[31] McFadden D. 1986. The Choice Theory Approach to Market Research.

Marketing Science

5: 275-297.

[32] Newey W. 1986. Linear Instrumental Variable Estimation of Limited Dependent Variable Models with Endogenous Explanatory Variables.

Journal of Econometrics, Vol 32:

127-141.

[33] Newey W and McFadden D. 1994. Large Sample Estimation and Hypothesis Testing.

Econometrics, Vol 4:

Handbook of

2111-2245.

[34] Oviedo J and Yoo H. 2016. A Latent Class Nested Logit Model for Rank-Ordered Data with Application to Cork Oak Reforestation.

s10640-016-0058-7.

Environmental and Resource Economics.

DOI:

10.1007/

65

[35] Ruud P. 1983. Sucient Conditions for the Consistency of Maximum Likelihood Estimation Despite Misspecication of Distribution in Multinomial Discrete Choice Models.

Econometrica, 51:

225-228.

[36] Ruud P. 1986. Consistent Estimation of Limited Dependent Variable Models Despite Misspecication of Distribution.

Journal of Econometrics, 32:

157-187.

[37] Scarpa R, Notaro S, Louviere J, and Raaeli R. 2011. Exploring scale eects of best/worst rank ordered choice data to estimate benets of tourism in Alpine Grazing Commons.

Economics

American Journal of Agricultural

93: 813-828.

[38] Sherman R. 1993. The Limiting Distribution of the Maximum Rank Correlation Estimator.

Econometrica

61: 123-137.

[39] Small K, Winston C, and Yan J. 2005. Uncovering the distribution of motorists' preferences for travel time and reliability.

Econometrica

73: 1367-1382.

[40] Storn R and Price K. 1997. Dierential EvolutionA Simple and Ecient Heuristic for Global Optimization over Continuous Spaces.

Journal of Global Optimization, 11:

341-359.

[41] Train K and Winston C. 2007. Vehicle choice behavior and the declining market share of U.S. automakers.

International Economic Review

48: 1469-1496.

[42] Yan J. 2012. A smoothed maximum score estimator for multinomial discrete choice models. Working paper.

[43] Yan J and Yoo H. 2014. The seeming unreliability of rank-ordered data as a consequence of model misspecication. MPRA Paper No. 56285. http://mpra.ub.uni-muenchen.de/56285/

[44] Yoo H and Doiron D. 2013. The use of alternative preference elicitation methods in complex discrete choice experiments.

Journal of Health Economics

32: 1166-1179.

Identification and Semiparametric Estimation of ...