Are University Admissions Academically Fair? Debopam Bhattacharya

Shin Kanaya

Margaret Stevens

University of Cambridge

University of Aarhus

University of Oxford

January 2, 2016.

Abstract High-pro…le universities often face public criticism for undermining academic merit and promoting social elitism or engineering through their admissions-process. Popular statistical tests for detecting such biases, based on the signi…cance of socioeconomic characteristics on admission probability after controlling for past test-scores, su¤er from omitted characteristic bias. In this paper, we develop an empirical test that attempts to circumvent this problem. We assume that students who are better-quali…ed on standard observable indicators would on average, but not necessarily with certainty, appear academically stronger to admission-tutors based on characteristics observable to them but not us. This assumption can be used to reveal information about the sign of di¤erences in admission standards across demographic groups which are robust to omitted characteristics. Using admissions-data from a selective British university, we provide empirical support for our identifying assumptions and then apply our analysis to show that male applicants face admission-standards that are signi…cantly higher than female applicants, and private-school applicants possibly face a slightly higher threshold than state-school ones. In contrast, admission success rates are equal across gender and schooltype both before and after controlling for key covariates. Our method is the …rst practical application of econometric bounds-analysis to the problem of detecting taste-based discrimination; it can be used to test meritocratic fairness of other institutional decisions such as loan-approval and surgery-referrals, where allegations of bias are common. Keywords: University admissions, a¢ rmative action, economic e¢ ciency, marginal admit, unobserved heterogeneity, threshold-crossing model, conditional stochastic dominance, partial identi…cation. Address for correspondence: Debopam Bhattacharya, Department of Economics, University of Cambridge, United Kingdom. Email: [email protected]

1

1

Introduction

Admission practices at selective universities generate considerable public interest and political controversy, owing to their close connection with issues of inter-generational mobility and social discrimination. For example, in the UK a highly publicized 2011 Sutton Trust report shows that nationally just 3% of schools – mostly expensive and independent (as opposed to state-run) institutions – account for 32% of undergraduate admissions to Oxford and Cambridge, while these universities claim to admit solely on the basis of academic merit. On the other hand, backgroundbased admission quotas such as caste-based reservation in India’s public universities and race-based a¢ rmative action in American state-funded colleges have been the subject of intense public controversy, the latter recently re-surfacing in the high-pro…le "Fisher versus University of Texas" lawsuit. Despite signi…cant public interest in these issues, rigorous methods for modelling and testing "fairness" of admissions based on empirical evidence are absent in the academic literature. In this paper, we develop an empirical framework to model meritocracy of admission decisions, and use it to infer whether all applicants are held to the same academic standard during admissions. A simple approach to detecting discrimination in admissions, popular in the education literature, is to test if demographic or socioeconomic characteristics of applicants are signi…cant determinants of admission, after controlling for commonly observed academic records such as past test-scores (c.f., Espenshade et al, 2004, Zimdars et al, 2009, Hurwitz, 2011). However, if admission o¢ cers observe more indices of academic ability than the researcher, and the relation between observable and unobservable (to the researcher) indices varies by demographics, then these naive tests become invalid, c.f., Heckman, 1998. For instance, if female candidates ceteris paribus perform better on interviews, and interview scores are unobserved by a researcher, then equal admission rate of observationally similar male and female candidates implies bias against female applicants. Indeed, in the empirical context investigated in the present paper, we …nd that socioeconomic backgrounds do not have statistically signi…cant e¤ects on admission rates, once we control for pre-admission test and interview scores. However, applying a more careful analysis that addresses the omitted characteristic problem, we …nd that male candidates face a higher admission threshold than female candidates, and that di¤erences in thresholds across type of school attended by the applicant is less signi…cant. Beyond their obvious legal and political signi…cance, such …ndings also have important policy implications. For example, knowing that one has to admit academically weaker female students to maintain gender balance in application success rates raises questions about what investments are needed at the school-level to improve the quality of female applicants; naive

2

satisfaction with gender equality in admission success would conceal this important role for potential interventions. Conversely, despite equal success rates by school types, if our proposed test revealed that state school students were actually being held to higher academic standards during admissions, then that would suggest a role for reforming the admissions process, given that a tax-funded educational sector like that in Britain is expected to ensure equality of opportunity. More broadly, if the distribution of academic abilities di¤ers across social or demographic groups, then there is a trade-o¤ between (a) maintaining equal success rate across groups, and (b) ensuring that all candidates are treated equally in the sense of being held to the same academic standard. Which of these con‡icting notions of equality would be pursued then becomes a policy decision for the institution concerned. Indeed, as we will see below, equating admission standards across applicants is also a consequence of admitting the academically best applicants, and as such, a violation of this goal may also be viewed as bias away from e¢ ciency. Application of the techniques proposed in this paper would enable one to understand the direction and extent of this bias. Methodologically, our approach to bias detection is related to the productivity based view of optimal decisions, in the tradition of Becker (1957). Viewed in this light, if admissions are purely meritocratic, then the marginal admitted student from a state-school should be expected to perform equally well in post-admission assessments, e.g., college exams, as the marginal admit from a private school. But her expected performance would be worse under a¢ rmative action. Conversely, taste-based discrimination against state-schools will lead to the marginal state-school admit to perform better than the marginal independent school admit. The di¤erence between expected performances of marginal candidates across demographic groups can therefore be interpreted as a measure of deviation from meritocracy. A challenge in implementing this approach directly is that a researcher typically observes a subset of the relevant applicant characteristics used by admissionstutors and the distributions of the unobserved characteristics may –and usually do –di¤er across demographic groups. This "omitted characteristics" problem jeopardizes the researcher’s attempt at reconstructing the decision-maker’s perceptions and spotting who the marginal admits are and, therefore, assessing whether the decision-maker acted in an academically unbiased way. Problems of this type been recognized by previous researchers, especially in the context of detecting tastebased discrimination in labor market hiring; see, for instance, Heckman, 1998, Blank et al., 2004, and the references therein. In the present paper, we devise a test for meritocratic admissions – based on the di¤ erences in admission-thresholds faced by di¤erent demographic groups – which, under appropriate assumptions, is robust to the omitted characteristics problem. Speci…cally, we construct an empirical, threshold-crossing model of admissions involving ob3

served applicant covariates and unobserved heterogeneity, i.e., applicant characteristics observed by admission-tutors but unobserved by the researcher. In our model, academic fairness corresponds to using identical thresholds of expected future performance across applicants from di¤erent demographic groups. Our key assumption –for which we will provide supporting empirical evidence –is that students who are signi…cantly better in terms of easily observable indicators of academic potential should statistically –but not necessarily with certainty –be more likely to appear stronger to the admission tutor, based on characteristics observed by her but not by the researcher. The distribution of unobservables, conditional on observables, is otherwise allowed to be arbitrarily different across demographic groups. We show that using this assumption in conjunction with pre and post enrolment data, one can learn about the sign of the di¤ erences between admission thresholds applied to di¤erent demographic groups. We apply our methods on admissions data from a popular undergraduate programme of study at a selective UK University on applicants who have cleared an initial, exam-based elimination round. We …rst provide evidence in support of our identifying assumption; we then apply our methods to show that male applicants face a higher admission standard than females,1 whereas standards faced by private school applicants are possibly slightly higher than those faced by state school applicants. In contrast, the application success rates are very similar across gender and type of school attended by the candidate, both before and after controlling for key covariates –thereby illustrating the crux of our approach. Literature: A large volume of research exists in educational statistics on the analysis of admissions to selective colleges and universities, focusing mainly on the United States. For a broad, historical perspective on selectivity in US college admission, see Hoxby (2009). In this context, our goal is to assess the extent of meritocracy in prevalent admission practice by focusing on the marginal admits in di¤erent demographic groups. This enables us to demonstrate empirically that equal success rate in admissions across demographic groups are consistent with very di¤erent admission standards across these di¤erent groups. See Sander, 2004, for an early discussion of these issues in the context of US law-school admissions. This is in contrast to many other studies –both academic and policy-oriented –which compare either average pre-admission test-scores (c.f. Herrnstein and Murray, 1994) or average post-admission performance across all (as opposed to marginal) admitted students from di¤erent socioeconomic groups (c.f. Keith et al., 1985, Sackett et al., 2009, Kane and William, 1998). Our paper also complements an existing literature on analyzing the consequences of a¢ rmative actions in college admissions. Fryer and Loury (2005) provide a critical 1

As a referee has pointed out, it remains possible that some academically stronger female candidates were erro-

neously eliminated in the …rst round; had they been retained, the gender gap may have appeared narrower.

4

review of the relevant theoretical literature and a comprehensive bibliography. On the empirical side, Arcidiacono (2005) uses a structural model of admissions to simulate the potential, counterfactual consequences of removing a¢ rmative action in US college admission and …nancial aid on applicant earning, Card and Krueger (2005) describe the reduced-form impact of eliminating a¢ rmative action on minority students’ application behavior in California, Hinrichs (2012) examines e¤ects of banning preferential admission policies on enrolment patterns of both minority and nonminority students. A comprehensive review of the empirical evidence on the e¤ect of a¢ rmative action on student-college mismatch is provided in Arcidiacono and Lovenheim (2015). The present paper, though substantively related to the above works, has a di¤erent goal, viz., here we construct a formal econometric model where a¢ rmative-action (or taste-based discrimination) and meritocracy have contradictory empirical implications, and use it in conjunction with admissions-related micro-data to detect deviations from meritocracy in prevalent admission practises. To our knowledge, the only other work in this literature which focuses on marginal admits is Bertrand, Hanna and Mullainathan (2010), who examined the consequences of a¢ rmative action in admission to an Indian college. In their setting, admission was based on score in a single entrance exam; admission thresholds di¤ered by applicants’social caste and were publicly announced. This set-up removes a key empirical challenge –that of de…ning and identifying the marginal admits and rejects –arising in general admissions contexts where entrance is based on several background variables, there is unobserved heterogeneity across applicants and admission thresholds are not explicitly announced. Our context requires us to deal with this more general scenario. Some other identi…cation strategies proposed in the literature to detect bias in treatment regimes are discussed below in section 6. Although this paper focuses on the issue of college admissions, the general methodology is applicable to many other settings of testing bias in institutional decision-making. Common examples include approval of business loan and mortgage applications, referrals to expensive surgery vis-a-vis cheaper medicine-based treatment, and hiring decisions. The data setting is one where a researcher has access to key characteristics of individual applicants, and the eventual decision made on their behalf by the approval agency. These "key" characteristics need not be exhaustive, and the present paper’s methodology allows for the possibility that approval agencies may observe a richer set of applicant characteristics than the researcher. Applying the methods developed in this paper, one can then test whether the observed data are consistent with meritocratic approval processes, e.g., that all loan applicants face a common ceiling of default probability below which the application is approved, or that each patient has to clear the same hurdle of expected survival days following the surgery in order to qualify for the procedure. 5

The rest of the paper is organized as follows: Section 2 sets up a simple theoretical model, followed by the corresponding empirical model of meritocratic admissions; Section 3 describes the data. Section 4 states the assumptions, provides empirical evidence in support of the key identifying assumption, and lays out the identi…cation analysis. Section 5 discusses inference. Section 6 reports the empirical …ndings …rst from a simulation exercise and then from the real dataset, presents some robustness checks and discusses some caveats. Section 7 concludes. Technical material are collected in an Appendix.

2

Benchmark Optimization Model

We start by laying out a benchmark economic model of admissions to help …x ideas. Based on this economic model, in the next section we will develop a corresponding econometric model incorporating unobserved heterogeneity, which can be taken to admissions data. Let W denote an applicant’s pre-admission characteristics, observed by the university. We let W := (X; G), where G denotes one or more discrete components of W capturing the group identity of the applicant (such as sex, race or type of high school attended) which forms the basis of commonly alleged mistreatment. The variables in X are the applicant’s other characteristics observed prior to admission which include one or more continuously distributed components like standardized test-scores. Also, let the binary indicator D denote whether the applicant received an admission o¤er and the binary indicator A denote whether the admission o¤er was accepted by the applicant. Let W denote the support of W , FW ( ) denote the marginal cumulative distribution function (C.D.F.) of W ;

(w) denote a w-type student’s expected outcome (e.g., expected future GPA)

if he/she enrols; and let

(w) denote the probability that a w-type student upon being o¤ered

admission eventually enrols. Let c 2 (0; 1) be a constant denoting the fraction of applicants who are to be admitted, given the number of available spaces. Admission protocols: We de…ne an admission protocol as a probability p ( ) : W ! [0; 1] such that an applicant with characteristics w is o¤ered admission with probability p (w). A generic objective of the university may be described as sup p( )2F

Z

p (w) h (w) (w) (w) dFW (w) subject to

w2W

Z

p (w) (w) dFW (w)

c:

w2W

Here, F denotes the set of all possible p’s, and h (w) denotes a non-negative welfare weight, capturing how much the outcome of a w-type applicant is worth to the university. For a¢ rmative 6

action policies, h ( ) will be larger for applicants from disadvantaged socioeconomic backgrounds or under-represented demographic groups. The overall objective is thus to maximize total welfareweighted expected outcome among the admitted applicants, subject to a capacity constraint. The solution to the above problem takes the form described below in Proposition 1, which holds under the following condition: Condition C: h (w) > 0 and Z

(w) > 0 for any w 2 W. Further, for some

w2W

i.e., admitting everyone with

(w) 1 f (w) (w)

0g dFW (w)

c+ ;

0 will exceed the capacity in expectation.

Proposition 1 Under Condition C, the solution to the problem: Z Z p (w) h (w) (w) (w) dFW (w) subject to sup p( )2F

> 0,

w2W

p (w) (w) dFW (w)

c

w2W

takes the form:

8 > > 1 if > < popt (w) = q if > > > : 0 if

where (w) := h (w) (w) ;

:= inffr :

Z

w2W

and q 2 [0; 1] satis…es Z

w2W

(w) > ; (1)

(w) = ; (w) < ;

(w) 1 f (w) > rg dFW (w)

cg;

(w) [1 f (w) > g + q1 f (w) = g] dFW (w) = c:

Proof in appendix. The result basically says that the planner should order individuals by their values of and …rst admit applicants with those values of W for which

(W )

(W ) is the largest, then to those for

whom it is the next largest and so on till all places are …lled. If the distribution of

(W ) has point

masses, then there could be a tie at the margin, which is then broken by randomization (hence the probability q). In the absence of any point masses in the distribution of protocol is of a simple threshold-crossing form popt (w) = 1 f (w) we will assume that this is the case. It is useful to note that

(W ), the optimal

g. For the rest of the paper,

(w) a¤ects the admission rule only

through its impact on ; the intuition is that individuals who do not accept an o¤er of admission contribute nothing to the budget constraint and this is taken into account in the admission process. 7

Academically fair admissions: We de…ne an academically fair admission protocol as one which maximizes total performance of the incoming cohort subject to the restriction on the number of vacant places. Such an objective is also "academically fair" in the sense that the expected performance criterion gives equal weight to the outcomes of all applicants, regardless of their value of W , i.e., h (w) is a constant. In this case, the previous solution takes the form popt (w) = 1 f (w)

g, where

solves c=

Z

(w) 1 f (w)

w2W

The key feature of the above rule is that

g dFW (w) :

does not depend on W and so the value of an applicant’s

W a¤ects the decision on his/her application only through its e¤ect on

(W ). To get some

intuition on this, consider the case where one of the covariates in W is gender and assume that the admission threshold for women,

f emale ,

is strictly lower than that for men,

the marginal female, admitted with w = (x; f emale), contributes expected aggregate outcome and takes up (=

(x; f emale)

f emale =

>

f emale

Then

(x; f emale) to the

(x; f emale) places, implying a contribution of

f emale

(x; f emale)) to the objective of average realized outcome. Similarly,

the marginal rejected male, if admitted, would contribute male

f emale

male .

male

to the average outcome. Since

we can increase the average outcome if we replaced the marginal female admit

with the marginal male reject. Thus di¤erent thresholds cannot be consistent with the objective of maximizing the overall outcome. Our goal is to use actual admissions data to understand whether admission o¢ cers use identical thresholds across socio-demographic groups. The key challenge is to allow for the possibility that admission-tutors’ inference about academic merit were based on more characteristics than the researchers observe, so that one cannot infer the admission thresholds simply based on observed characteristics. Therefore, we now turn to the task of constructing an econometric model incorporating unobserved heterogeneity in an empirical model of admissions.

2.1

Econometric Model

To set up the empirical framework, we assume that we observe the covariates X; G and the binary admission outcome D (= 1 if admitted, and = 0 otherwise) for applicants in the current year and one or more past years. Our aim is to evaluate academic e¢ ciency of current year’s admission, given data on (X; G; D) for all current year applicants and (X; G j A = 1) for past years’ (successful) applicants, where A = 1 denotes having enrolled in the university. Let Xg , Xh denote the support of X for applicants of type g and h, respectively in the current year. Also, let Xg denote the support

8

of X conditional on G = g and A = 1, i.e., Xg := fx : Pr [A = 1jX = x; G = g] > 0g : This is the set of the values of X which occur among the admits of type g in past years. Now, let Z denote an index of academic ability of applicants, based on "soft" characteristics, such as evidence of enthusiasm, academic reference letters, etc., which are unobservable to the analyst but observed by the admission-tutor. This may also include any random idiosyncrasies in the tutors’ expectation formation process.2 We assume that larger values of Z, without loss of generality, denote higher perceived academic potential. Under meritocratic admissions, admission tutors would decide on whether to admit applicant i in the current year, based on

(Xi ; Gi ; Zi ), their subjective assessment of i’s academic merit, e.g.,

how applicant i will perform when admitted.3 In accordance with our economic model, we assume that an applicant i with Gi = g, Zi = z and Xi = x 2 Xg is o¤ered admission (i.e., Di = 1) if and only if

(x; g; z)

, where

is,

denotes the university-wide baseline threshold for applicants. That 8 < 1 if (Xi ; Gi ; Zi ) Di = : 0 otherwise.

An admission practice is academically fair if and only if

;

(2)

does not vary by demographics. The

underlying intuition is that the only way covariates G should in‡uence the admission process is through their e¤ect on the perceived academic merit. Having a larger

for, say, females than

males implies that a male applicant with the same expected outcome as a female applicant is more likely to be admitted. Conversely, under a¢ rmative action type policies,

will be lower for those

demographics which represent historically disadvantaged groups. Therefore, we are interested in testing whether the values of the threshold

are identical across demographics. We will call

the

"admission threshold". Thus in our set-up, a female applicant with identical X as a male candidate can have a higher probability of being admitted and yet the admission process may be academically fair if females 2

When there are multiple sources of soft information, Z may be interpreted as a composite scalar index, e.g., a

weighted average, of these characteristics. 3 In line with the existing literature on bias-detection referenced above, we ignore issues about risk. Note, however, that for pursuing the method proposed here, one has to take a stance on what feature of future "productivity" is relevant for de…ning bias. With multiple objectives, one can de…ne a set of weights so that the weighted average of these measures can be used as the …nal measure of merit, e.g., a "risk-adjusted" expected performance measure. It would be useful to explore these issues further in future research.

9

have a higher expected performance than males with identical X. This notion of fairness di¤ers from one which requires that individuals who are identical on publicly veri…able variables (i.e., the Xs) must have equal chances of getting in, no matter what their value of G and no matter whether predicted future performance di¤ers across G for the same value of X. Remark 1 It is important to note that we do not assume that tutors literally calculate expected future performance in order to admit candidates. Our goal is to assess whether the admission process, whatever its goal and however it is conducted, is consistent with the goal of admitting the academically strongest applicants.

3

Data

Our empirical analysis is based on admissions data for two recent cohorts of applicants to a competitive and popular undergraduate degree programme at a selective UK University. Like in many other European and Asian countries, students enter British universities to study a speci…c subject from the start, rather than the US model of following a broad general curriculum in the beginning, followed by specialization in later years. Consequently, admissions are conducted primarily by faculty members (i.e., admission tutors) in the speci…c discipline to which the candidate has applied. An applicant competes with all others who apply to this speci…c subject and no switches are permitted across disciplines in later years. The admission process is held to be strictly academic where extra-curricular achievements are given no weight. In that sense, these admissions are more comparable with Ph.D. admissions in US universities. Furthermore, almost all UK applicants sit two common school-leaving examinations, viz., the GCSE and the A-levels before entering university. Each of these examinations requires the student to take written tests in speci…c subjects – e.g., Math, History, English, Physical and Biological Sciences etc. The examinations are centrally conducted and hence scores of individual students on these examinations are directly comparable, unlike high-school GPA in the US where candidates undergo school-speci…c assessments which may not be directly comparable across schools. In addition, all applicants take a multiple-choice aptitude test, similar to the SAT in the US, and write an essay that is graded. Choice of sample: For our empirical analysis, we will focus on UK-domiciled applicants. The application process consists of an initial stage whereby a standardized "UCAS" form is …lled by the applicant and submitted to the university. This form contains the applicant’s unique identi…er number, gender, school type, prior academic performance record, personal statement and a letter of reference from the school. The aptitude-test and essay scores are separately recorded. All of 10

this information is then entered into a spread-sheet held at a central database which all admission tutors can access. About one-third of all applicants are selected for interview by the university on the basis of the aptitude test and the rest rejected. Selected candidates are then assessed via a face-to-face interview and the interview scores are recorded in the central database. This sub-group of applicants who have been called to interview will constitute our sample of interest. Therefore, we are in e¤ect testing the academic e¢ ciency of the second round of the selection process, taking the …rst round as given. Accordingly, from now on, we will refer to those summoned for interview as the applicants. The …nal admission decision is made by considering all candidate-speci…c information from among the applicants called for interviews. For our application, we use anonymized data for two cohorts of applicants from their records held at the central admissions database at the university. To preserve anonymity, the data do not contain reference letters. Choice of covariates: We chose a preliminary set of potential covariates to be the observables, based on the information recorded on UCAS forms and the university’s application records. We use as observable components (i.e., X) the gcsescore, aptitude test scores, the examination essay-score and the interview score. A more detailed description of these covariates is provided in Table 0, below. The unobservable index of achievement Z pertains to information conveyed by recommendation letters. Given that those summoned for interview constitute our "population" of interest, we found that in terms of whether the applicant previously read two subjects recommended for entry, there is very little variation across these applicants and including these covariates makes no di¤erence to our eventual results. Therefore, we eventually dropped these variables from the analysis. Group identities G: We consider academic e¢ ciency of admissions with regards to two different group identities, viz., type of school attended by the applicant and the applicant’s gender. Selective universities in the UK are frequently criticized for the relatively high proportion of privately-educated students admitted (see the Introduction). The implication is that applicants from independent schools, where spending per student is very much higher than in state schools (Graddy and Stevens, 2005), have an unfair advantage in the admission process. This is of special concern in a country like the UK where most selective universities are largely funded by the taxpayer. The issue of gender di¤erences in admission and academic performance is, of course, a more universal issue. In the UK, as in most OECD countries, the higher education participation rate is higher for women, having overtaken that for men in 1993. However, selective universities in the UK appear to have lagged behind the trend: in 2010-11, 55% of undergraduates across all UK universities were female, but 44% of students admitted to the university we are analyzing were 11

female. Typically, gender imbalances are more pronounced in certain programmes and includes the one we study, where male enrolment is nearly twice the female enrolment. In our dataset, we can also match the post-admission academic performance of admitted students to their pre-admission characteristics. In principle, one can use this information for analyzing potential bias in admissions. Allowing for selection on unobservables, however, means that such data cannot be used without making more restrictive assumptions. For example, a regression of eventual academic performance on pre-admission covariates for admitted candidates does not yield a consistent estimate of the predictive power of these covariates for the pool of applicants, for whom the admission decision is made. Indeed, due to classical selection bias, one would expect such e¤ects to be biased toward zero (c.f., Rothstein, 2004 for discussion of related issues). Therefore, it is not possible to use such predictive regressions directly to throw light on biases in the admission systems. A second potential limitation of such data is that academic performance as measured by the university’s own exams may not be the sole index of academic ability sought by an admissiontutor. They might focus instead on a subjective measure of academic ability which may only be positively correlated with eventual performance in university exams. For these reasons, we did not include these data in our main analysis. Nonetheless, while interpreting our empirical results, we use these predictive regressions (see Fig. 3A and 3B and 6 below) as suggestive evidence of where these results might have arisen from.

4

Assumptions

In order to develop a test of meritocratic admissions, which can be applied to the above data, we will make a set of assumptions using the following notation. For any pair of individuals i and j, where i is of type g and has a value of X equal to xg and j is of type h and has X = xh with xg 2 Xg and xh 2 Xh , the notation xg

"

xh will mean that applicants i and j are identical with respect

to all qualitative attributes and, moreover, every continuously-distributed component of xg is at least " standard deviations larger than the corresponding component of xh . For example, if G = ‘school type’ and X = (SAT; GP A; male), then xg

"

male or both female and that SATi > SATj + "

and GP Ai > GP Aj + "

and

SAT

SAT

xh means that applicant i and j are both GP A ,

where,

GP A

are the standard deviation of GPA and SAT for the entire population of applicants. We

will denote by Q (ZjA) the th quantile of the random variable Z given the random variable A. Throughout the rest of the paper, we will maintain the following assumption: Assumption M (Median restriction) (i) There exists " > 0 such that for any e 12

", if xg 2 Xg

and xh 2 Xh and xg

e

xh , then,

Median [ZjX = xg ; G = g] for any g and h; (ii)

Median [ZjX = xh ; G = h] ;

(Xi ; Gi ; Zi ) (introduced just before equation (2)) is continuously dis-

tributed conditionally on any realization of (Xi ; Gi ). A stronger version of Assumption M is …rst-order stochastic dominance, which has the same intuitive interpretation as Assumption M (see immediately below): Assumption SD (Stochastic Dominance) There exists " > 0 such that for any e Xg and xh 2 Xh with xg

e

", if xg 2

xh , then the distribution of Z conditional on X = xg , G = g …rst

order stochastic dominates that of Z conditional on X = xh , G = h: Pr [Z

ajX = xg ; G = g]

for any a and for all g; h; (ii)

Pr [Z

ajX = xh ; G = h] ;

(Xi ; Gi ; Zi ) is continuously distributed conditionally on any

realization of (Xi ; Gi ). Discussion: Crudely speaking, Assumption M/SD means that applicants who are better along standard, observable indicators of academic ability are also likely to be better –"on average" –in terms of the index of unobserved characteristics which the tutors weigh positively in determining admissions. The motivation for this assumption comes from the fact that for meritocratic admissions, the outcome of interest may be thought of as a measure of future academic performance whereas the measures in X are a set of past academic performance in high-school or admissions-related assessments. It is therefore likely that candidates who have performed signi…cantly better in past assessments are statistically more likely to have performed better in those assessments (unobserved by the researcher) which admission tutors view as positive determinants of future performance and hence, under the assumption of being academically motivated, would weigh positively in the decision to admit. While assumption M/SD is likely to hold for the population of all students, some of this positive dependence may be partially eroded for the population of applicants if the decision to apply depends on unobservables. Indeed, if applications are costly and a student applies despite having low scores on observable tests, she is likely to be stronger on unobservable attributes relative to the average student with low observable test-scores in the population. Such selective application will reduce the extent of positive dependence between observables and unobservables among the applicants relative to that in the population of all students. We address this concern 13

below by providing evidence which strongly suggests that the aggregate impact of such "erosion" on the positive dependence is insigni…cant. The magnitude of " controls the strength of Assumption M. Thus " = 0 corresponds to the benchmark case where we are comparing a pair of g and h type applicants, such that the former has scored higher in each previous assessment than the latter. A strictly positive " leads to comparison of applicant-pairs with no overlap of pre-admission test-scores. The higher is ", the more likely are assumptions M or SD to hold, but the lower will be the power of our test, since fewer pairs of students will satisfy M/SD with a higher ". A practical method for choosing " in an application is suggested below. Note also that assumption M is substantively much weaker than two informal arguments often used in applied work –viz., (i) when the distribution of the observable covariates are balanced across treatment and control groups in quasi-experimental designs, it is taken to imply that they are also balanced in terms of unobservables (e.g., Greenstone and Gayer, 2009) and (ii) orthogonality of an instrument with observed covariates is taken as suggestive evidence that it is orthogonal with unobserved covariates (e.g., Angrist and Evans, 1998, p. 458). In our context, the type of variables typically unobservable to researchers but likely to a¤ect admissions include achievements such as winning special academic prizes, participation in science or math olympiads, high intellectual enthusiasm conveyed by applicants’ personal essays and the subjective impressions of previous teachers implied via reference letters. Such speci…c information can identify individual applicants and therefore are most likely to be withheld from researchers owing to privacy considerations. However, while making admission decisions, tutors are likely to observe these characteristics for current applicants via their dossiers or through personal interactions. It is intuitive that such achievements are statistically more likely to have occurred for individuals who score higher in terms of easily observable entrance assessments and aptitude tests than those who score lower. Finally, the continuity condition in Assumption M (ii) rules out "gaps" in the distribution of Z, which helps to relate the probability of admission to the admission thresholds. Such continuity is intuitive, especially when Z is a function of several underlying performance indicators which are themselves continuously distributed. Remark 2 Note that assumption M/SD does not say that applicants with higher X have higher Z with probability one; it simply says that their values of Z tend to be higher in a stochastic sense. Remark 3 The restriction on the median cannot be replaced by a restriction on the conditional expectation for identi…cation purpose since we are considering a discrete-choice problem, viz., D = 14

1f (X; G; Z)

G g.

See Manski (1975) for why a conditional quantile restriction is necessary for

the identi…cation of discrete-choice models. Remark 4 Assumption M allows the distribution of the unobservable Z to di¤ er by background variables; in particular, we allow both the location as well as the scale of Z to depend on G (conditional on X) and thus also allow for the realistic situation of larger uncertainty regarding applicants from historically under-represented communities. Empirical evidence of median-dominance: Among the pre-admission variables that we observe in our dataset, only the score on the interview is assigned by tutors. This is the type of variable most likely to be missing in other datasets since they re‡ect subjective assessment by the admission-tutors. We will …rst check our Assumption M for the applicants in our data by treating the interview score as the unobservable component. That is, we will verify whether the median interview score is higher for those types of applicants who are better in terms of all other "tutorindependent" test-scores X obtained in prior assessments. If applications are costly, a student with low scores on X will apply only if her potential performance on the interview is likely to be high, so that an applicant with low X is likely to be stronger on interview-skills relative to the average student with low X. The question is whether this negative relationship is strong enough to override the overall positive relationship in the population. Since the interview score is observed for the entire sample, we can test this hypothesis.4 The concrete steps leading to our test are as follows. Consider X =(GCSEscore, Aptitude_test_score, Exam_essay). First, run a median regression of interview score (which now plays the role of Z) on X and quadratics in components of X plus G, where G represents gender or school-type, and compute the predicted values. These represent Median[ZjX; G]. We then compare these predicted values for pairs of applicants where the …rst applicant is of type G = g and the second applicant is of type G = h. In Figure 1, we depict histograms capturing the marginal distribution of the conditional median di¤erences, for di¤erent combinations of g and h. The analog of our Assumption M here is that these histograms should have an entirely positive support, up to estimation error. For example, the histogram in the top left panel of Figure 1 shows the estimated marginal distribution of the variable Median[interview j Xg ; g = male] 4

Median[interview j Xh ; h = f emale]

Since we use only those applicants who were summoned for interview, there is an additional level of selection

which can further weaken the correlation between unobservables and observables. Our "test" (c.f. Fig. 2, below) therefore assesses the extent of correlation remaining after both levels of selection.

15

across all paired realizations (Xg ; Xh ) satisfying Xg

"

Xh . We choose " = 0:0; if we demonstrate

0

0

.05

.05

Density .1 .15

Density .1 .15

.2

.2

median dominance for " = 0:0, then dominance will obviously hold for all higher values of ".

2

4 male_female

6

8

0

2

4 indep_state

6

8

0

2

4 female_male

6

8

0

2

4 state_indep

6

8

0

0

.05

.05

Density .1 .15

Density .1 .15

.2

.2

.25

0

Figure 1: Evidence of Median Dominance It is evident that all four of these histograms have entirely positive support, suggesting that the median dominance conditions hold even for " = 0. In the appendix, we also show analogous histograms for the 25th and 75th quantiles with " = 0:0. There is overwhelming evidence that these histograms also have positive support and thus that the stronger SD condition is also likely to be true. As a second piece of evidence, we calculate the correlation matrix among the various indicators of academic merit at the pre-admission stage. These are reported in the following table. It is evident that all correlations are strictly positive, which lends further support to assumption M/SD. Score

GCSE

Essay

Apt-test

Interview

GCSE

1.00

0.10

0.26

0.11

Essay

0.10

1.00

0.21

0.09

Apt-test

0.26

0.21

1.00

0.25

Interview

0.11

0.09

0.25

1.00

The evidence presented above is of course suggestive, rather than de…nitive. Indeed, if we had found a negative or no relation between the interview score and the observable test-scores, our assumption M would be highly suspect, since the characteristics observed only by the admission 16

tutors are likely to be of a subjective type, like the interview score, rather than an "objective" one, like a multiple choice test score. The point of the above graph and tables is to show that this is not the case. Our next assumption relates to the structure of the

function.

Assumption CM (Conditional Monotonicity) (i) (x; g; z) is strictly increasing in z for every x and g; (ii) if xg and xh satisfy xg

"

xh , then

(xg ; g; z) >

(xh ; h; z) for any z, and any

g 6= h. Discussion: Part (i) of Assumption CM is essentially de…nitional (regarding Z) in that higher values of the index of ability based on unobserved characteristics are associated with higher values of the perceived expected outcome. Part (ii) says that if a g-type applicant is better than an h-type applicant along a set of key observable characteristics and is at least equally good along the ability index which is unobservable to us but observable to the decision-makers, then the gtype applicant will be perceived to have a higher expected outcome by the decision-maker. It is important for part (ii) that the g-type applicant is at least as good as the h-type applicant along the index Z; without this condition, it is easy to come up with counter-examples. For instance, suppose that admission tutors base their assessment on past written exams whose scores X are observed by us (researchers) and the quality of the reference letter Z, unobserved by us. Then a female candidate who has scored lower on every component of X than a male candidate but has a much better recommendation may or may not be perceived as having a lower potential than the male candidate. But a female candidate who has an equally strong recommendation Z as a male candidate but has scored lower on every X than him will likely be perceived to have lower academic potential in expectation. A di¤erent way to interpret this assumption is that the characteristic Z observed by the decision-maker and unobserved by the analyst is predominantly what causes g and h type applicants to have di¤erent predicted outcomes despite having the same observable characteristics X. Thus a su¢ cient but not necessary condition for CM(ii) to hold is that (a) (x; g; z) =

(x; h; z)

(x; z) for all x; z for any g 6= h, i.e., conditional on the observable X

and unobservable Z, the demographic characteristic G does not a¤ect the outcome of interest, and, furthermore, (b)

(x; z)

(x0 ; z) if x

"

x0 .

As a referee has pointed out, there is some evidence from the US state of California that females with lower SAT scores and high school GPA than males have performed systematically better in college examinations (c.f. Leonard and Jiang, 1999, Rothstein, 2004). This does seem somewhat unlikely in our application, given Figure 1 above and Figure 3A and 3B, below. Nonetheless, for 17

the sake of robustness in our empirical application, we consider a variant of assumption CM where instead of the raw scores Xg and Xh , we use their standardized versions. That is, for group g, each performance measure Xg is taken not to be the raw score, but as Xgcon where

g

and

g

raw_score

g

=

g,

are the mean and standard deviation of the raw score within group g. Accordingly,

con the condition Xmale

Xfcon emale refers to those male-female pairs where the males have higher

relative scores than females, i.e.,

Xmale

male male

Xf emale

f emale

f emale

+ . Then the contextual version of

assumption CM (ii) is given by Assumption CM’(Conditional Contextual Monotonicity) (i) for all g; z; the function

(X con ; g; z),

(X; g; z)

(xcon ; g; z) is strictly increasing in z for every xcon and g;5 (ii)

if xcon and xcon satisfy xcon g g h

xcon h , then

xcon g ; g; z >

(xcon h ; h; z) for any z, and any

g 6= h. This assumption means that an individual of group g whose group-standardized scores on each of the commonly observed, pre-entry performance measures are

or higher than those of another

individual of group h, and who is identical to him in terms of the ability index observed by admission o¢ cers but not the researcher, will be perceived to have a higher expected outcome by the decisionmaker. In other words, candidates whose performances are in the top echelons of their own sociodemographic group, will be perceived to be academically stronger. This assumption allows for the possibility of "biased" performance measures, e.g., that female applicants with lower raw scores on pre-entry evaluations may perform better in college exams, on average, and may therefore be favoured by admission o¢ cers over males with higher initial scores. In our empirical work, we will report the results using both the raw and the standardized scores to compare pairs of applicants. Choice of ": A practical way of choosing " is to draw histograms based on observables like Figure 1 for a range of values of " and then choose the smallest value for which the corresponding histograms have entirely positive support. In the application reported below, we report results for " = 0:1 and " = 0:25 to ensure that there is no overlap in observable characteristics between the pairs of students compared. Indeed, from Figure 1, it is obvious that any value of " larger than 0 should be acceptable for this application. We also provide some robustness check by reporting results over a range of " in Figure 7, below. 5

Part (i) of this assumption is identical to CM(i), since one can always rewrite

con

(x

; g; z) with the monotonicity of

(x; g; z) =

(x; g; z) in x carrying over to monotonicity of

18

con

(x

g

+

; g; z) in x

gx

con

con

.

; g; z

4.1

Identi…cation Analysis

We show how assumption M/SD and CM can be used to identify the sign of threshold di¤erences. To see this, denote the threshold used for type g and type h applicants by Under meritocratic admissions, one expects

g

=

h.

h,

respectively.

0:5 < p (xh ; h)g :

(3)

g

and

De…ne the function

p (x; g) : = Pr [D = 1jX = x; G = g] : = Pr

(X; G; Z) >

g jX

= xg ; G = g ;

and the set M (g; h; ") as M(g; h; ") := f(xg ; xh ) 2 Xg

X h : xg

"

xh ; p (xg ; g)

Note that the set M (g; h; ") can be directly computed from the data because it depends only on observables. Now, suppose that one …nds that M (g; h; ") is non-empty. Then, for any (xg ; xh ) in M (g; h; "), since p (xg ; g) = Pr

(xg ; g; Z) >

g jxg ; g

0:5, it must be true that

Median [ (X; G; Z) jX = xg ; G = g]

g

=

(xg ; g; Median [Zjxg ; g]) , by assumption CM(i)

>

(xh ; h; Median [Zjxg ; g]) , by CM(ii) (xh ; h; Median [Zjxh ; h]) , by assumption M

= Median [ (X; G; Z) jX = xh ; G = h] , by CM(i) h,

since 0:5 < p (xh ; h) .

Thus, the non-emptiness of the set M (g; h; ") leads to the inequality

g

>

h.

Under the stronger SD assumption, non-emptiness of the set SD(g; h; ") := f(xg ; xh ) 2 Xg would analogously imply that

g

>

h.

Xh : xg

"

xh ; p (xg ; g) < p (xh ; h)g

(4)

This is because if (xg ; xh ) 2 SD (g; h; "), then because

19

1

p (xg ; g) = Pr g

(X; G; Z) < p(xg ;g)

= Q1

g jX

= xg ; G = g , we have that

[ (X; G; Z) jX = xg ; G = g]

=

xg ; g; Q1

p(xg ;g)

[Zjxg ; g] , since

>

xg ; g; Q1

p(xh ;h)

[Zjxg ; g] , since p (xg ; g) < p (xh ; h)

xg ; g; Q1

p(xh ;h)

[Zjxh ; h] , by assumption SD since xg

xh ; h; Q1

p(xh ;h)

[Zjxh ; h] , by assumption CM (ii) since xg

= Q1

p(xh ;h)

(xg ; g; ) is increasing

f (xh ; h; Z) jxh ; hg , since

"

xh "

xh

(xh ; h; ) is increasing

h,

since 1

h jX

p (xh ; h) = Pr f (X; G; Z) <

= xh ; G = hg .

Intuitively speaking, here the identi…cation-relevant information comes from those pairs of gtype and h-type applicants for whom the dominance condition xg

"

xh holds and yet the g-

type’s probability of being accepted is lower. Assumption M (or SD) guarantees that these gtype applicants are also better, in a stochastic sense, in terms of unobservables. Note that these identifying pairs include applicants who are close to each other (albeit at least " standard deviations apart) in terms of observables and also those that are farther apart. Also when

g

h

> 0, it

must be the case that SD (h; g; ") is empty. Therefore, if one …nds that SD (g; h; ") is empty, then one may test if SD (h; g; ") is non-empty. If so, then one can conclude that

g

<

h.

Remark 5 The logical structure of our analysis is that if S SD (g; h; ") is non-empty, then we can conclude that

g

>

h.

But it is possible that although

g

>

h,

we …nd that S SD (g; h; ") is empty.

This is a generic feature of any analysis based on partially identi…ed parameters: they must be conclusive in fewer instances, compared to when model parameters are point-identi…ed. In other words, the cost of allowing for unobservables is that we may lose the ability to detect very small but positive threshold di¤ erences, but when we detect a di¤ erence, we can be certain about its existence. Indeed, without our proposed methods and the underlying assumptions justifying them, one cannot in general detect any threshold di¤ erence – however large they might be.

Alternative Identi…cation Strategies: The above methodology may be contrasted with some alternative strategies proposed in the literature in non-educational contexts. For instance, in the context of healthcare, Chandra and Staiger (2009) attempt to identify di¤erence in expected 20

outcome thresholds for surgery by assuming an index restriction on the unobservable’s distribution. This approach fails when the distribution of the unobservables di¤ers across G, conditional on observables, which is known to be a key di¢ culty in detecting who the marginal treatment recipients are. For example, in the admission context, it is quite likely that students from disadvantaged backgrounds have larger mean and variance in academic ability, conditional on having obtained the same score in school-leaving examinations as students from wealthier backgrounds. Our analysis imposes no such restriction on the unobservables’ distribution. In the healthcare context, Bhattacharya (2013) suggests an alternative approach to testing outcome-oriented treatment assignment via a partial identi…cation analysis using a combination of observational data and prior experimental …ndings from randomized controlled trials. Such experimental results are typically di¢ cult to come by in the college admission context. In other contexts involving law-enforcement and healthcare provision, several researchers have used economic optimization based reasoning to detect racial prejudice (c.f. Persico, 2009 for a survey). For instance, Knowles, Persico and Todd (2004)6 evoke the assumption that potential criminals respond optimally to drug-enforcement protocols by adjusting the amount of contraband they carry. This leads to the equating of the unobservable marginal with the observable average outcomes across "treated" individuals (i.e., motorists who are apprehended) and thus can be used to test whether marginal outcomes are equated across demographic groups. However, these approaches rely on the speci…cs of the context and do not generalize to situations involving university admissions. For example, it is both di¢ cult for university-applicants to alter their potential academic outcomes in response to admission protocols and impractical for them to want to do this, given the one-shot nature of admission exercise.

5

Estimation and Inference

Given the identi…cation analysis above, our next task is to develop a formal inference method for testing threshold-di¤erences. For this purpose, we will make the stronger assumption of SD, rather than M. Indeed, these two assumptions have the same intuitive interpretation; the evidence for SD (see section 6 and also part B of the Appendix) is strong and conducting statistical inference under it is slightly simpler. The key task regarding inference – corresponding to Assumptions SD and CM – is to test 6

See Antonovics and Knight, 2011, Brock et al, 2012, Anwar and Fang, 2012, Ayres and Waldfogel, 1994 for

related work in law-enforcement and medical treatment contexts.

21

whether SD (g; h; ") de…ned in equation (4), viz., SD(g; h; ") := f(xg ; xh ) 2 Xg

X h : xg

"

xh ; p (xg ; g) < p (xh ; h)g

is nonempty. Observe that the null hypothesis of an empty SD (g; h; ") is equivalent to the hypothesis that

0

0, where 0

:=

inf

(xg ;xh )2Xg Xh ; xg

" xh

[p (xg ; g)

p (xh ; h)] .

We will now outline how to test the emptiness of SD (g; h; "), based on an inference method developed for "intersection bounds" by CLR (2013). Although our identi…cation method is nonparametric in the sense of not requiring functional form speci…cations, estimation and inference for the nonparametric case is complicated. Due to relatively small sample-size, the two-sample nature of the problem and the complicated construction of "intersection bounds" for nonparametric estimates (requiring subjective choice of various tuning parameters), we do not consider such methods here. Instead, we focus on the case where p ( ; ) is parametrically speci…ed as a probit. That is, x0g

p (xg ; g) = Pr [D = 1j (X; G) = (xg ; g)] = where (

0;g ; 0;h )

are the probit coe¢ cients; and

under our parametric speci…cation,

(x0g

; and p (xh ; h) =

(x0h "

h)

is equivalent to x0g

xh ; x0g

0;g

x0h

0;h

The quantity

0

:=

inf

0;h

;

(xg ;xh )2Xg Xh ; xg

" xh

x0g

0;g

x0h

0;h

g

x0h

h

and thus

;

and thus emptiness of SD (g; h; ") is equivalent to the hypothesis that 0

x0h

is the C.D.F. of the standard normal. Note that

g)

SD (g; h; ") = xg

0;g

0

0, where

.

is exactly of the form analyzed in CLR (2013). We construct a one-sided 95%

con…dence interval C^n (0:95) =

1; ^n0 (0:95) for

0

by adapting the CLR method, as outlined

in part C of the Appendix, for each choice of g and h. If ^n0 (0:95) < 0, then we conclude that SD (g; h; ") is non-empty.

6

Empirical Analysis

Summary statistics: We provide summary statistics for our sample in Table 1. The left half of table 1 shows that male applicants have better aptitude test scores and interview averages. They perform slightly worse on average in their GCSE and A-levels. These di¤erences are statistically 22

signi…cant at the 5% level. Note that there is no signi…cant di¤erence in o¤er rates between male and female candidates. The independent and state school applicants are quite similar in terms of most characteristics except for a slightly higher GCSE for the former. In Table 2 we report the results of a probit regression of receiving an o¤er across all applicants. Table 2 strengthens the …ndings from Table 1 by showing that even after controlling for covariates, gender and school-type do not a¤ect the average admission-success rate among applicants. The value of McFadden’s pseudo-R2 for the probit model is about 50% and the corresponding R2 for a linear probability model (not reported here) is about 45% –which are about 10 times higher than the goodness-of-…t measures typically reported by applied researchers working with cross-sectional data. This suggests that the commonly observed covariates explain a very large fraction of admission outcomes. Moreover, Table 2 also shows that the aptitude test and interview scores have the largest impact upon receiving an o¤er for the applicant population (in terms of the t-statistics).

6.1

A thought experiment

Before performing empirical analysis of the actual data, we conduct a thought experiment where we investigate the usefulness of our approach in a situation where the "truth" is known. The idea is to treat one of the observed covariates –viz., the interview score –as unobserved, note that this "missing" covariate satis…es our assumption of median monotonicity (see Figure 1) and then run a simulation experiment where tutors accept applicants based on all characteristics including the interview score but the researcher does not observe it. In this simulation experiment, we vary the acceptance thresholds and check how small a di¤erence in thresholds can our bounds-based method detect when the interview score remains "unobserved" to us. The purpose is to investigate how well our method works when we a priori know the admission thresholds. In order implement this, we use the above dataset where we treat a school type as G, and the GCSE score, aptitude test and examination essay scores and gender as the commonly observed covariates, X. The interview score is taken to be unobserved by us (researchers) but observed by admission tutors for whom the admission decision is to be made. This will play the role of Z. We generate arti…cial observations on admissions in the following way. Using academic performance in the …rst year examination as the outcome, we estimate a regression model where X are used as regressors. We then generate the predicted outcomes for each current year applicant by using coe¢ cient estimates from the previous regression and adding a contribution from the "unobserved" interview score Z (normalized to have mean zero across the entire sample). If this sum plus a stochastic slippage error exceeds a threshold

23

value of 61:5 for state-school students (G = h) and 61:5 +

for independent school applicants

(G = g), then the student is assumed to have been o¤ered admission, i.e., the admission-dummy D is set to be 1. It is set to be 0 otherwise. That is, we set g

=

h

=

hXn

i=1 hXn

1 fGi = gg Xi Xi0

1 fGi = hg Xi Xi0

i=1

Di = 1 Xi0

i

g 1 fGi

i

= gg + Xi0

1 Xn

i=1 1 Xn

1 fGi = gg Xi Yi ;

i=1

h 1 fGi

1 fGi = hg Xi Yi ;

= hg + 0:05Zi + ui

61:5 +

1 fGi = gg ;

where 0:05Zi is the contribution from an "unobserved" interview score; ui is the stochastic slippage component drawn from the normal distribution N (0; 1 fGi = gg + 2

1 fGi = hg) and thus the

sum 0:05Zi + ui represents the unobserved index variable Zi ; and …nally, , which is set externally by us, is the extent of a¢ rmative action. A positive value of

indicates that independent school

applicants are being held to a higher threshold of expected performance. For each value of , we then perform our bounds analysis by pretending that we observe X but not the interview score. This is meant to capture the situation that admission tutors may base their decision on some subjectively assessed performances Z, unobserved by the researcher, in addition to the prediction based on the commonly observed covariates. Since the interview-score satis…es Assumption M (see Figure. 1), our bounds analysis is applicable in this case. The following table reports true values of

and the corresponding upper limit of the one-sided con…dence interval for

testing emptiness of SD (g; h; "). True Di¤erence

RHS of CLR 95% CI

4

-4.14

3

-4.45

2

-3.57

1

-1.66

0

0.13

-2

1.86

It can be seen from the above table that threshold di¤erences of 2 or more points out of 100 (overall standard deviation of the outcome distribution is about 5 points) are clearly detected; a positive di¤erence of 1 or less still yields a nonempty SD (g; h; "). For a negative value of , an empty SD (g; h; ") cannot be rejected, as expected. Overall, this table presents strong evidence that our method works well in practice.

24

6.2

Results

We now turn to the real application where we use the aptitude test score, the essay score and the interview score as the covariates X for de…ning dominance. That is, if a g-type candidate has scored " standard deviations higher on each of these three key assessment scores than an h-type candidate, then the conditional distribution (or median) of the unobservable component of assessment for the former is assumed to dominate that for the latter for all g and h, as per Assumption M or SD above. In accordance with the discussion in Section 5, the …rst step is to examine emptiness of SD (g; h; ") using data on only X and D. We …rst investigate this graphically in Figure 2 by plotting the marginal C.D.F. of the di¤erence in admission probabilities p (Xg ; g) pairs of (Xg ; Xh ) satisfying Xg

"

p (Xh ; h) for

Xh for " = 0:1 for various combinations of g and h. The pre-

dicted probabilities p ( ; ) are calculated separately for each group g, via standard probit using gcsescore, aptitude test score, the examination essay score and the interview score as regressors. Since we concluded dominance with " = 0:0, with Z being the interview score, we chose a slightly higher (i.e., more conservative) value of " = 0:1 to investigate emptiness of SD (g; h; "). When the event fXg Pr [Xg

"

"

Xh g happens with positive probability, an empty SD (g; h; ") is equivalent to

Xh ; p (Xg ; g) < p (Xh ; h)] = 0, where the probability is taken with respect to the distrib-

utions of Xg and Xh . Therefore, a positive mass at and below zero for these C.D.F.’s indicates that SD (g; h; ") is nonempty. In the left panel, when g = male, h = f emale, the C.D.F. is represented by the solid curve labelled male_fem; and when g = f emale and h = male, it is the dashed curve, labelled fem_male. A positive height at zero indicates that applicants with higher observables in the …rst group has lower admission probabilities than the second. Clearly, the …rst curve has signi…cant mass below zero and the dashed curve has almost no mass below zero, suggesting a positive probability that p (Xmale ; male) < p (Xf emale ; f emale) although Xmale

"

Xf emale . This evidence is still present in the right panel with independent and state

schools replacing male and female, respectively, but to a slightly lesser extent, suggesting that indep

may be only slightly larger than

state .

To perform the test formally, in table 3, we report

^0n (0:95), the upper limit of a one-sided con…dence interval, calculated using the method of CLR, as explained in Section 5. We report results for " = 0:1 (recall that we concluded dominance even with " = 0, c.f., …g. 2). A negative upper limit indicates that the set SD (g; h; ") is nonempty and consequently we reject the null of

g

h

in favour of

g

>

h.

It is evident from the …rst four

rows of table 3 that we reject emptiness for g = male, h = f emale and for g = indep, h = state but

25

do not reject emptiness in the other cases. This suggests that males and private school applicants face higher admission thresholds. The exact upper limits of con…dence intervals reported above vary slightly across functional speci…cations (e.g., whether higher order terms and interactions in the test scores are or are not used to estimate p ( ; )), but two empirical …ndings are robust across all speci…cations: (a) the gender gap is large, persistent and statistically signi…cant in every case,7 and (b) the independent-state school di¤erence is less persistent across speci…cations but is always negative. Given the evidence of a large gender-gap, we investigated it further by breaking the data up by schooltype. Results reported in the last two rows of table 3 show that the gender-gap is large within both state and private school categories, indicating that male applicants are held to a higher standard for applicants from both state and private school backgrounds. Interpretation of the empirical …ndings: It would be natural to conjecture that the threshold di¤erences arise primarily from the implicit or explicit practice of a¢ rmative action, viz., the overweighting of outcomes for historically disadvantaged groups. A second possibility is that, in face of political and/or media pressure, admission tutors try to equate an application success rate for, say, males with one for females, which is also consistent with our empirical …ndings (see Tables 1 and 2). This would make the e¤ective male threshold higher if, say, the conditional male outcome distribution has a thicker right tail. A third possibility is that female applicants are set a lower admission threshold in order to encourage more female candidates to apply in future. Note from Table 1 that the number of female applications is nearly half the number of male ones. Regardless of what the underlying determinants of the tutors’behavior are, we can conclude from our analysis that the admission practice under study deviates from the outcome-oriented benchmark and makes male and independent school applicants face signi…cantly higher admission thresholds. In order to gain some further insight into how the threshold discrepancies arise, we plot the empirical C.D.F.s of predicted academic performance based on the observable characteristics. This is done by regressing …rst-year and then …nal year examination scores in university on gcsescore, aptitude test and essay score, interview grades and gender/schooltype for enrolled students. The regression output appears in table 4. The estimated CDFs of predicted performance by gender and schooltype are plotted in Figure 3B. 7

As noted by a referee, this …nding is somewhat curious, given that girls routinely outperform boys in the majority

of high school and college tests across the world, including the PISA assessments, c.f. Goldin, et al, 2006 and Niederle and Vesterlund, 2010. Indeed in our data, the performance of the average – as opposed to marginal – female admit is also lower than that of the average male admit (in particular, see the newly added Fig. 3A), although this has nothing to do with admission-thresholds and fair admission, per se.

26

It is clear that in both graphs, the male distribution …rst-order stochastic dominates the female distribution. This means that if admissions were determined solely by predicted performance based on observables (i.e., there is no unobserved heterogeneity), any common acceptance rate across gender will result in a higher predicted outcome for the marginal accepted male than the marginal accepted female. The dominance is less pronounced in the case of school-type, since female independent school candidates appear to face a lower threshold than male state-school candidates. Our results in Table 3 imply that allowing for unobserved heterogeneity does not change this scenario substantively, and suggests that equating the application success-rates (see table 1) leads to the use of higher admission thresholds for male and, to a lesser extent, for private school candidates. Indeed, as a referee has pointed out, if admission-o¢ cers believe that eventual exam performance is not the relevant measure of merit, then one needs to repeat the analysis with whichever performance measure "meritocracy" is de…ned with respect to. Taking the attainment of at least a 2.1, i.e., a "high second class" mark of 64% – a minimum requirement for entry into most postgraduate programmes –as the relevant outcome produces a very similar result, presented in Fig. 6. At this point, it is worth considering whether our …ndings could be consistent with two other alternative explanations, as follows. G-blind admissions: The …rst possibility is where admission tutors ignore G completely in forming their assessment and use a common admission cut-o¤ across G, thereby generating insigni…cant e¤ects of gender and school-type on admission probabilities, both unconditionally (c.f. Table 1) and conditionally on past test-scores (c.f. Table 2). Such behavior could arise either from an institutional norm banning any conditioning on demographic characteristics, or from the tutors’ belief that such characteristics have no explanatory power beyond the pre-admission test scores. Therefore, the question is whether by including G in our analysis, we are detecting threshold di¤erences that are not "intentional". Even if that were the case, we would argue that in order for admissions to be meritocratic, admission tutors should take G into account. For example, suppose G denotes a school type, state-school students are more able than independent school students with the same test score, and therefore perform better in post-admission exams. If tutors ignore G, then an independent and a state school student with identical pre-admission test scores will have equal probability of admission, even though the state-school student is more meritorious, which would contradict the notion of meritocratic admissions. Biased interviews score: A second issue concerns the use of interview scores in calculating

27

the lower bounds. Suppose that tutors are biased in favour of type-g applicants and award them higher interview marks (relative to true performance) than type h. But as we saw in Figure 1, the interview score does appear to satisfy Assumption M (with " = 0), which would be unlikely if one type of candidates was systematically awarded higher interview scores relative to their performance in the other more "objective" tests. For example for g = male and h = f emale, if males are awarded systematically higher interview scores, then we would expect to see a signi…cant mass in the negative orthant of the top right histogram in Figure 1, which is clearly not the case.

6.3

Some Robustness Checks

Biased test-scores: One feature of our approach is that we are taking the pre-admission test scores as true indicators of academic merit. However, students from privileged backgrounds might perform well in these tests simply on account of having being coached extensively. It is not possible to conduct any analysis of meritocracy if no previous measure of achievement can be taken to be accurate. As mentioned above, post-admission performance is not observed for non-admitted candidates, and thus cannot be incorporated in the analysis without strong assumptions. Therefore, it is important to examine whether our substantive conclusions are a¤ected if we use "contextualized", i.e., standardized scores within each demographic group as an alternative measure of merit. Accordingly, we repeat the above analysis by replacing each test-score by its standardized version and invoking assumption CM’, above. Recall the condition Xmale

Xf emale which refers to those

male-female pairs where the males have higher relative scores than females. Then we can conclude that group g faces a higher threshold than group h if 0

inf

(xg ;xh )2Xg Xh ; xg

xh

0

x0g

< 0, where 0;g

x0h

0;h

.

The results from this exercise are shown in Table 3, in the last column titled "Standardized Scores" corresponding to

= 1:25 (the smallest

for which histograms analogous to those in Figure 1,

above, have positive support). As before, a negative upper limit of the CLR con…dence interval indicates that group g faces a higher threshold than group h, since some group g members with high relative test-scores have a lower probability of admission than some group h members with lower relative test-scores. As is apparent from Table 3, last column, it still remains the case that male applicants face a higher admission-threshold than female candidates. However, the test for a threshold di¤erence between independent and private school students now becomes inconclusive. This con…rms the previous substantive …nding that threshold di¤erences by schooltype are insigni…cant, but the gender di¤erences are pronounced. 28

First-stage Selection: In principle, we can repeat our analysis to test meritocracy in the …rst stage selection process, as well. However, the …rst stage selection in our empirical context is based entirely on the ranking in the aptitude test-scores; there is e¤ectively no selection on unobservables at this stage. In particular, all applicants are classi…ed into bands by their overall aptitude test score. Then private school students in approximately the top half and state school students in the top two-third are interviewed. Figure 5 presents suggestive evidence regarding …rst-stage selection of candidates. The left panel plots the CDF of aptitude test scores for those making it to the interview stage. The right graph plots the CDF of predicted interview scores based on the aptitude test score (analogous to Figure 3B for the second stage of selection). A common success rate for entry to the interview stage would imply a lower threshold for female and state school candidates, but with male state school candidates facing a higher threshold than female independent school candidates. Thus, in fact, one sees a very similar overall picture as in the second stage selection (see Fig. 3B). Choice of ": Finally, in Figure 7, we plot the upper limits of the CLR con…dence intervals across a range of " for both the overall gender-gap as well as the gender gap within each school-type. The persistence of the negative upper limits in Figure 7 reinforces the conclusion that female candidates face lower thresholds than males both on average and within each type of school-background.

6.4

Caveats

Several caveats apply to our methods and data. The …rst is that we ignore peer-e¤ects, both at the individual level and also at the institutional level. For example, it is possible that an applicant is admitted (rejected) because he/she is deemed to have the potential to create positive (negative) externalities on his/her peers’ performance. But it seems unlikely to us that admission tutors can be con…dent enough in predicting peer-e¤ects, for this consideration to play a signi…cant role in admissions. Nonetheless, there remains .a possibility that some students get admitted simply because they come from demographic groups that "…t better" with the institution, although their test-scores might be lower. Indeed, if future academic performance is an index of that …t, then …gures 3A, 3B and 6 do not support these possibilities. But of course, the …t may be judged with respect to other indices, and thus this caveat remains. The second caveat pertains to the data we use. In reality, di¤erent applicants in our context are assessed by di¤erent tutors, each assessing a set of applications. But there is signi…cant reallocation of …les across tutors so that meritorious candidates are not excluded simply because the tutor as-

29

sessing their …les happened to have assessed a disproportionately large number of strong applicants. However, the reallocation of …les need not be perfectly managed. Therefore, our test should be viewed as one of meritocratic admission at the level of the university "as a whole", and deviations from it should be interpreted as having arisen from a variety of possible sources including explicit a¢ rmative action, ine¢ cient reallocation of …les, and systematically incorrect beliefs of tutors. A third possibility, raised by a referee, is that in other contexts (notably in the US), it has been found that female students perform better in college exams than males with the same pre-admission test-scores. If that were true, admission o¢ cers may admit female applicants who have scored relatively lower on pre-admission assessments. This is unlikely to be the case in our application; indeed, Fig 3A shows that post-entry college performance of males …rst-order stochastic dominates that of females, which is inconsistent with the superior female performance explanation. Moreover, Fig 3B shows that predicted college performance on the basis of observables is also stochastically higher for males, which provides further evidence against that explanation. In general, however, when applying our methods to other contexts, it would be advisable to draw graphs analogous to Fig 3A and 3B as a preliminary check.

7

Summary and Conclusion

This paper has proposed an empirical approach to testing, on the basis of micro-data, whether an existing admission protocol is meritocratic, when a researcher observes some but not all applicantspeci…c information observed by admission tutors. The approach works by de…ning meritocratic admissions through a threshold-crossing model and then using admission data to learn the sign of the di¤ erence in admission thresholds for di¤erent demographic groups. These quantities are robust to the unobserved characteristics problem, under an intuitive assumption about the ranking of applicants by unobservable attributes. Applying our methods to admissions data for a selective UK university, we …nd that admission thresholds faced by male applicants are signi…cantly higher than females while those for private-school applicants possibly only slightly higher than for state school applicants. In contrast, average admission rates are nearly identical across gender and across school-type. These conclusions hold up to a large variety of robustness checks. Beyond the application to college-admissions, our methods are potentially useful for testing fairness of other binary decisions such as mortgage-approval, surgery-referrals etc., where allegations of bias are common. For example, in the mortgage application case, the typical observable covariates X would be the applicant’s income, education, years of work, current debt obligation and monthly 30

expenditure, while relevant group identities G would be gender, nationality (native versus foreigner) and ethnicity. The dominance assumption would be that applicants who have higher income, education and work tenure, and lower debt and expenditure will be statistically more preferable along dimensions that loan-o¢ cers, but not the researcher, may observe. Then a higher probability of approval for, say, a minority applicant who is "dominated" by a non-minority applicant along observable characteristics, would imply discrimination in favour of the former.

31

   

Table 0: Variable-Label gcsescore

Overall score in GCSE, 0-4

alevelscore

Average A-level scores 80-120

aptitude test Overall score in Aptitude Test 0-100 essay Score on Substantive Essay 0-100 Interview Performance score in interview 0-100 prelim_avg Average score in first year university exam; 0-100 offer Whether offered admission Note: The gcsescore is an average of the GCSE grades achieved by the candidate for eight subjects, where A* = 4, A = 3, B = 2, C = 1, D or below =0. The grades used are mathematics plus the other seven best grades. The alevelscore is an average of the A-levels achieved by or predicted for the candidate by his/her school, excluding general studies. Scores are calculated on the scale A=120, A/B = 113, B/A = 107, B = 100, C = 80, D = 60, E = 40, as per England-wide UCAS norm.

32

Table 1. Means by Gender and by Schooltype Variable Female (N=241) Male (N=394) pvalue_diff State (N=355) Indep (N=280) pvalue_diff gcsescore 3.79 3.72 0 3.67 3.85 0 alevelscore 119.73 119.59 0.01 119.60 119.73 0.02 aptitude test 62.02 65.09 0 63.16 64.85 0.0015 essay 61.77 63.38 0 62.98 64.42 0.5 interview 63.74 64.69 0.04 64.24 64.43 0.65 prelim_avg 61.02 62.33 0.04 61.83 61.83 0.03 offer 0.33 0.37 0.14 0.34 0.35 0.24 accept 0.33 0.37 0.5 0.34 0.35 0.46 Note: The data pertain to two cohorts of applicants. The variable names are explained in table 0. Columns 3 and 6 record the p-value corresponding to a test of equal means against a one-sided alternative. Differences in unconditional offer rates across school-types (highlighted) are seen to be statistically indistinguishable from zero at 5%.

33

Table 2: Probit Regression of Admission

Variable gcsescore interview aptitude test essay male independent

Coef. 0.188 0.225 0.087 0.007 -0.210 -0.129

Marginal Marg.Eff/Std.Err Effect Coef/std.err 0.055 0.75 0.76 0.066 11.72 10.43 0.026 6.76 6.99 0.002 0.59 0.59 0.062 -1.31 -1.33 0.037 -0.84 -0.84

Note: Probit regression of eventual admission for all UK-based applicants, together with two-sided p-value; the highlighted fields show insignificant effect of gender and school background on admission probabilities, controlling for aptitude test-scores. Data pertain to two cohorts of UK-based applicants. Marginal effects are calculated at mean values of covariates and for moving from 0 to 1 for male and independent. Gender and schooltype remain insignificant (highlighted in yellow) even after controlling for past test-scores.

34

.25 CumulativeDistributionFunction .1 .15 .2 .05 0

0

.05

CumulativeDistributionFunction .1 .15 .2

.25

Figure 2: Graphical evidence of different admission thresholds

-.3

-.2 -.1 0 .1 Difference in admission probability fem_male

.2

male_fem

-.3

-.2 -.1 0 .1 Difference in admission probability state_indep

.2

indep_state

Note: The above graphs plot the marginal C.D.F. of the difference in admission probabilities p(Xg,g)-p(Xh,h) for pairs of (Xg,Xh) satisfying Xg ε Xh for ε=0.1 for various combinations of g and h. A positive height at zero indicates that applicants with higher observables in the first group (g) have lower admission probabilities than those with lower observables in the second group (h). The solid curve on the left panel shows, for example, that a subgroup of males with higher observables have lower admission probability than a subgroup of females with lower observables.

35

Table 3: Testing Unequal Thresholds Difference

ε=0.1

ε=0.25

g=male, h=female g=female, h=male g=indep, h=state g=state, h=indep g=state_male, h=state_female g=indep_male,h=indep_female

-1.73 0.57 -1.29 0.92 -1.36 -1.11

-2.02 0.67 -0.58 0.04 -1.01 -3.39

Quadratics in PreAdmission Scores ε=0.1 -3.49 0.684 -2.75 0.635 -6.85 -2.7

Standardized scores δ=1.5 -2.01 0.43 0.012 1.87 -1.19 -3.56

Note: This table reports the upper limit of the one-sided 95% Confidence Interval for testing whether group g is facing a higher admission threshold than group h, with a negative upper limit indicating that it is. The first two columns with ε=0.1 and ε=0.25 correspond to evaluating difference in (indices of) admission probability, as a function of pre-admission performance measures between a g-type and an h-type applicant, where the former has performed ε standard deviations higher on each of the measures. The final column corresponds to the same difference but where the former has scored 1.5 points or higher on  standardized Z-score versions of the pre-admission performance measures, as explained in the text in sections 8.2 and 8.5, respectively. The last-but-one column shows the results when quadratics and second-order interactions between all pre-admission performance measures are used as additional controls, beyond the linear versions of them, to predict admission probabilities, as a robustness check.

36

Table 4: Regression of first year performance on observable covariates

Coefficient Std error t-value 3.33 1.77 gcsescore 1.88 0.19 0.04 aptitude test 4.31 -0.004 0.047 essay -0.08 0.06 0.03 interview 1.78 1.14 0.69 male 1.66 0.41 0.68 indep 0.75 Note: Regression of admitted candidates' performance in first year examinations on pre-admission performance measures, gender and school-type. Highlighted fields show significant positive impact of being male but insignificant effect of being from private-school on subsequent performance, conditional on admission.

37

1 .8 .6 .4 .2 0

0

.2

.4

.6

.8

1

Figure 3A: CDF of first year and third year performance, by gender

50

55

60 65 Prelim_Score Male

70

75

50

55

Female

60 65 Finals_Score Male

70

75

Female

Note: CDF of first-year (left panel) and final year (right panel) aggregate performance (percentage of total scored by the student) in college for admitted candidates. The male CDFs are seen to lie almost entirely to the right of the female CDFs, with dominance more pronounced for first-year exams.

38

1 .8 .6 .4 .2 0

0

.2

.4

.6

.8

1

Figure 3B: CDF of predicted first year and third year performance based on observables, by gender and schooltype

50

55

60 65 Prelim_Score

70

75

50

55

60 65 Final_Score

70

75

male_indep

male_state

male_indep

male_state

female_indep

female_state

female_indep

female_state

Note: CDF of predicted first-year (left panel) and final year (right panel) performance in college for admitted candidates, based on observable pre-admission performance measures. The male CDFs are seen to lie almost entirely to the right of the female CDFs, implying that a common admission rate would imply that marginal male entrants will have significantly higher expected score on first and final year exams.

39

Figure 4: Testing Monotonicity of Median Interview Score in “Contextualised” Pre-admission Performance Measures

Monotonicity under Normalized Test-scores .3

delta=1.25

D ensity

.15

0

0

.05

.1

.1

D ensity

.2

.2

.25

delta=1.0

-2

0

2 4 indep_state

6

8

0

2

4 male_fem

6

Note: The above graphs plot histograms of the difference in admission probabilities p(Xmale,male) p(Xfemale, female) for pairs of (Xmale, Xfemale) satisfying Xmale Xfemale+ δ, for δ =1.0 and δ =1.25, where Xmale, Xfemale are the standardized performance measures observed prior to admissions. The smallest δ for which these histograms have positive support is δ =1.25. We use this value of δ to do our robustness checks, as explained in the paper in Section 8.3.

40

8

1 .8 .6 .4 .2 0

0

.2

.4

.6

.8

1

Figure 5: First-stage Selection

50

60

70 Aptitude_Test

80

90

55

60 65 70 Predicted_Interview

75

male_indep

male_state

male_indep

male_state

female_indep

female_state

female_indep

female_state

Note: The above graphs present suggestive evidence regarding first-stage selection of candidates. The left panel plots the CDF of the raw aptitude test-scores for those making it to the interview stage. The right graph plots the CDF of predicted interview scores based on aptitude test score and GCSE score, and is analogous to Figure 3 above which pertains to the second stage of selection. A common success rate across gender and schooltype for entry to the interview stage would imply a lower threshold for female and state school candidates, but with male state school candidates facing a higher threshold than female independent school candidates.

41

0

.2

.4

.6

.8

1

Figure 6: Predicted Probability of attaining a 2.1 level mark

0

.2

.4

.6

.8

1

Two_one male_indep female_indep

male_state female_state

Note: The above graph plots the CDF of the predicted probability of getting at least a high second class level mark (64%) in the first year exams, based on pre-admission performance measures. The horizontal axis marks the probability of getting at least a 2.1, and the vertical axis is the admission probability. A common success rate for entry would imply a lower threshold for female and state school candidates, but with male state school candidates facing a higher threshold than female independent school candidates. For instance, a 30% success rate across schooltype and gender would imply that about 63% of female candidates from state-schools and about 75% of male private-school candidates would get at least a 2.1 degree in expectation. This figure is a robustness check on Figure 3, above.

42

-2

Upper Limit of 95% CI -1.5 -1 -.5

0

Figure 7: Effect of ε on gender-gap in admission thresholds

.1

.15

.2

.25

.3

.35

epsilon state_male-state_female male_female

indep_male-indep_female zero

Note: In this figure, we examine how the overall male-female gap in thresholds differs by school-type, and also how the results are affected by one’s choice of ε. We plot upper limits of 95% CLR confidence intervals, with a negative upper limit implying that the first group faces a higher threshold than the second. These limits are plotted across a range of ε.

43

References [1] Altonji, J.G. & Blank, R.M. (1999) Race and gender in the labor market, Handbook of Labor Economics Vol. 3C (O. Ashenfelter and D.Card, eds.), 3143-259. Elsevier, New York. [2] Antonovics, K. & Brian G. Knight, 2009. A New Look at Racial Pro…ling: Evidence from the Boston Police Department. The Review of Economics and Statistics, MIT Press, vol. 91(1), pages 163-177. [3] Arcidiacono, P. (2005) A¢ rmative action in higher education: How do admission and …nancial aid rules a¤ect future earnings?, Econometrica, 73-5, 1477-1524. [4] Arcidiacono, P., E. M. Aucejo & K. Spenner (2011) What happens after enrollment? An analysis of the time path of racial di¤erences in GPA and major choice?, working paper, Duke University. [5] Arcidiacono, P., and M. Lovenheim (2015). A¢ rmative action and the quality-…t tradeo¤. No. w20962. National Bureau of Economic Research. [6] Becker, G. (1957) The economics of discrimination, University of Chicago Press. [7] Bertrand, M., R. Hanna & S. Mullainathan (2010) A¢ rmative action in education: Evidence from engineering college admissions in India, Journal of Public Economics, 94, 1-2, 16-29. [8] Bhattacharya, D. & P. Dupas (2012) Inferring e¢ cient treatment assignment under budget constraints, Journal of Econometrics, 167, 168-196. [9] Bhattacharya, D. (2013) Evaluating treatment protocols using data combination, Journal of Econometrics, 173, 160-174. [10] Blank, R., M. Dabady & C. Citro (2004): Measuring Racial Discrimination, Washington, D.C.: National Research Council, National Academy Press. [11] Card, D. & A.B. Krueger (2005) Would the elimination of a¢ rmative action a¤ect highly quali…ed minority applicants? Evidence from California and Texas, Industrial and Labor Relations Review, 58-3, 416-434. [12] Chandra, A. & D. Staiger (2009) Identifying provider prejudice in medical care, Mimeo, Harvard University and Dartmouth College.

44

[13] Chernozhukov, V., S. Lee & A. Rosen (2013) Intersection bounds: Estimation and inference, Econometrica, 81-2, 667-737. [14] Brock, William, Jane Cooley, Steven Durlauf, S. Navarro (2012): On The Observational Implications of Taste-Based Discrimination in Racial Pro…ling”, Journal of Econometrics, 166(1), 2012. [15] Espenshade, Thomas J., Chang Y. Chung, and Joan L. Walling (2004). Admission Preferences for Minority Students, Athletes, and Legacies at Elite Universities. Social Science Quarterly 85.5 (2004): 1422-1446. [16] Fryer Jr., R.G. & G.C. Loury (2005) A¢ rmative action and Its mythology, Journal of Economic Perspectives, 19-3, 147-162. [17] Goldin, C, L. F. Katz, and Ilyana Kuziemko. 2006. The Homecoming of American College Women: The Reversal of the College Gender Gap. Journal of Economic Perspectives, 20(4): 133–56. [18] Graddy, K. & M. Stevens (2005) The Impact of School Inputs on Student Performance: An Empirical Study of Private Schools in the United Kingdom, Industrial and Labor Relations Review, 58-3, 435-451. [19] Greenstone, M. & T. Gayer (2001) Quasi-experimental and experimental approaches to environmental economics, Journal of Environmental Economics and Management, 57, 21-44. [20] Heckman, J. (1998) Detecting discrimination, Journal of Economic Perspectives, 12-2, 101-116. [21] Hinrichs, Peter. "The e¤ects of a¢ rmative action bans on college enrollment, educational attainment, and the demographic composition of universities." Review of Economics and Statistics 94.3 (2012): 712-722. [22] Hoxby, C.M. (2009) The changing selectivity of American colleges, Journal of Economic Perspectives, American Economic Association, 23-4, 95-118. [23] Hurwitz, M. (2011). The impact of legacy status on undergraduate admissions at elite colleges and universities, Economics of Education Review, vol. 30, issue 3, pages 480-492. [24] Kane, T. J. & W.T. William (1998) Racial and ethnic preference in college admissions, in Christopher Jencks and Meredith Phillips (eds.), The Black-White Test Score Gap, Washington: Brookings Institution. 45

[25] Keith, S., R.M. Bell, A.G. Swanson & A.P. Williams (1985) E¤ects of a¢ rmative action in medical schools – A study of the class of 1975, The New England Journal of Medicine, 313, 1519-1525. [26] Kobrin, J.L., B.F. Patterson, E.J. Shaw, K.D. Mattern & S.M. Barbuti (2008) Validity of the SAT for predicting …rst-year college grade point average, College Board, New York. [27] Kuncel, N. R., S.A. Hezlett & D.S. Ones (2001) A comprehensive meta-analysis of the predictive validity of the Graduate Record Examinations: Implications for graduate student selection and performance. Psychological Bulletin, 127, 162-181. [28] Leonard, D., and J. Jiang (1999). "Gender bias and the college predictions of the SATs: A cry of despair." Research in Higher Education 40.4: 375-407. [29] Manski, C. (1988) Identi…cation of binary response models, Journal of the American Statistical Association, 83, 729-738. [30] Manski, C. (2009): Identi…cation for Prediction and Decision, Cambridge, Massachusetts: Harvard University Press. [31] Niederle, Muriel, and Lise Vesterlund (2010). Explaining the gender gap in math test scores: The role of competition. The Journal of Economic Perspectives, 129-144. [32] Ogg , T., A. Zimdars & A. Heath (2009) Schooling e¤ects on degree performance: a comparison of the predictive validity of aptitude testing and secondary school grades at Oxford University, British Educational Research Journal, 35-5. [33] Persico, N (2009) Racial pro…ling? Detecting bias using statistical evidence, Annual Review of Economics, 1, 229-254. [34] Pope, Devin G., and Justin R. Sydnor (2011). "What’s in a Picture? Evidence of Discrimination from Prosper. com." Journal of Human Resources 46.1: 53-92. [35] Rothstein, J.M. (2004). College performance predictions and the SAT. Journal of Econometrics, Elsevier, vol. 121(1-2), pages 297-317. [36] Sackett, P., N. Kuncel, J. Arneson, G. Cooper & S. Waters (2009) Socioeconomic status and the relationship between the SAT and freshman GPA - An analysis of data from 41 colleges and universities, available online at: http://professionals.collegeboard.com/data-reports-research/cb/SES-SAT-FreshmanGPA 46

[37] Sander, R. (2004): A Systemic Analysis of A¢ rmative Action in American Law Schools, 57 Stanford Law Review 367-483 [38] Sawyer, R. (2010) Usefulness of high school average and ACT scores in making college admission decisions, available online at: http://www.act.org/research/researchers/reports/pdf/ACT_RR2010-2.pdf [39] Tamer, Elie (2010): Partial Identi…cation in Econometrics," Annual Reviews of Economics, Vol. 2, No.1, 2010, pp. 167-195. [40] Zimdars, A., A. Sullivan & A. Heath (2009) Elite higher education admissions in the arts and sciences: Is cultural capital the key?, Sociology, 4, 648-66.

47

Technical Appendix Part A: Proof of Proposition 1 Consider any feasible rule p ( ) satisfying the budget constraint. Since popt ( ) satis…es the budget constraint with equality (recall the de…nition of R

implying that

w2W

(w) popt (w) dFW (w) = c

R

w2W

Let W (p) :=

R

w2W

and q) and p ( ) is feasible, we must have

(w) popt (w)

R

w2W

(w) p (w) dFW (w);

p (w) dFW (w)

0:

(5)

(6)

p (w) (w) (w) dFW (w). Now, the productivity resulting from p ( ) di¤ers

from that from popt ( ) by W popt W (p) R R = w2W popt (w) p (w) (w) [ (w) ] dFW (w) + w2W popt (w) p (w) (w) dFW (w) R opt (w) p (w) (w) [ (w) ] dFW (w) w2W p R opt = (w) p (w) (w) [ (w) ] dFW (w) (w)> p R + (w)< popt (w) p (w) (w) [ (w) ] dFW (w) R R = [1 p (w)] [ (w) ] (w) dF (w) + (w)] (w) dFW (w) 0;(7) W (w)> (w)< p (w) [

where the …rst inequality holds by (6) and that

> 0. Therefore, we have W popt

W (p) for

any feasible p ( ), and the solution popt ( ) given in (1) is optimal. To show the uniqueness, consider any feasible rule p ( ) which di¤ers from popt ( ) on some set R whose measure is not zero, i.e., w2S(p) dFW (w) > 0 for S (p) := fw 2 W j popt (w) 6= p (w)g. Now, assume that the last equality in (7) holds for this p ( ). In this case, since the last equality on the RHS of (7) holds with equality, p ( ) must take the following form: 8 < 1 if (w) > ; p (w) = : 0 if (w) < ;

for almost every w (with respect to FW ). This implies that p (w) = popt (w) for almost every w except when

(w) = . Since the measure of S (p) is not zero, we must have popt (w) 6= p (w) for

(w) = , and S (p) = fw 2 W j that q > p (w) when

(w) =

(w) = g, which, together with the budget constraint, implies

. However, this in turn implies that we have a strict inequality

in the third line on the RHS of (7), which contradicts our assumption. Therefore, we now have R shown that W popt > W (p) for any feasible p ( ) with w2S(p) dFW (w) > 0, leading to the desired uniqueness property of popt ( ).

48

Part B: Evidence of dominance: Other quantiles The following histograms are for substantiating assumption SD. They are analogous to those reported in Figure 1 but for quantiles other than the median. For example, the top left histogram in Fig. 4 corresponds to Q:25 [interview j Xmale ; male]

Q0:25 [interview j Xf emale ; f emale]

computed across all pairs of males and females satisfying Xmale

"=0

Xf emale . The strictly positive

0

0

.05

.05

Density .1 .15

Density .1 .15

.2

.2

.25

support of these histograms implies dominance with respect to quantiles other than the median.

2

4 male_female

6

8

0

2

4 indep_state

6

8

0

2

4 female_male

6

8

4 state_indep

6

8

0

0

.05

.05

Density .1 .15

Density .1 .15 .2

.2

.25

.25

0

0

2

0

0

.05

.05

Density .1 .15

Density .1 .15

.2

.2

.25

Figure 8: Dominance for 25th percentile

2

4 male_female

6

8

0

2

4 indep_state

6

8

0

2

4 female_male

6

8

4 state_indep

6

8

0

0

.05

.05

Density .1 .15

Density .1 .15 .2

.2

.25

.25

0

0

2

Figure 9: Dominance for 75th percentile

49

Part C: Test of emptiness The null hypothesis of an empty SD (g; h; ") can be stated as 0

The quantity

0

=

inf

(xg ;xh )2Xg Xh ; xg

" xh

0h )

p (xh ; h)].

is of a form analyzed in Chernozhukov, Lee and Rosen (2013, CLR). We con-

sider constructing a 95% con…dence interval for and p (x0h

[p (xg ; g)

0, where

0

0

in the parametric case p (xg ; g) =

by following the CLR method. Accordingly, denote the dimension of (

k, a k-variate standard normal by Nk and the asymptotic variance of

0 0 AVar[(^g ; ^h )0 ]

=

0 0 (^g ; ^h )0

by

x0g 0 0 0 g ; h)

0h

by

, that is,

. Denote the th quantile of a random variable W by Q (W ). Now the null

hypothesis is equivalent to inf

(xg ;xh )2Xg Xh ; xg

" xh

[x0g

x0h

0;g

0;h ]

0

In order to map the notation of this paper into the CLR notation, let v = (xg ; xh ) ;

=(

V = f(xg ; xh ) 2 Xg ^n (v) = [x0 ^g g

g ; h) ;

X h : xg

"

x0h ^h ];

sn (v) = jj(x0g ; x0h ) ^ 1=2 jj; ZnF (v) = p

kn;V (p) = Q

xh g ;

[supv2V ZnF (v)];

(x0g ; x0h ) ^ 1=2 Nk ; jj(x0 ; x0 ) ^ 1=2 jj g

h

^n0 (p) = inf v2V [^n (v) + kn;V (p) sn (v)]: Then a 100p% one-sided con…dence interval (CI) for

0

is given by C^n (p) =

1; ^n0 (p) . If

^n0 (p) < 0, then we conclude that SD (g; h; ") is non-empty. In the application, we use p = 0:95 and report the CI, C^n (0:95), for each choice of g; h.

50

Are University Admissions Academically Fair? - SSRN papers

Jan 2, 2016 - High-profile universities often face public criticism for undermining ... moting social elitism or engineering through their admissions-process.

984KB Sizes 1 Downloads 372 Views

Recommend Documents

Are University Admissions Academically Fair?
and college-performance data for the admitted ones. The notion of .... University, focusing on first year academic performance as the outcome of interest. The overall ...... Evidence From California And Texas, Industrial and Labor. Relations ...

Are University Admissions Academically Fair?
use it in conjunction with admissions-related micro-data to detect deviations from .... However, selective universities in the UK have lagged behind the trend: in ...

Blaming Youth - SSRN papers
Dec 14, 2002 - Social Science Research Network Electronic Paper Collection ... T. MacArthur Foundation Research Network on Adolescent Development and.

Are Voters Sensitive to Terrorism? Direct Evidence ... - SSRN papers
Candidate, Master of Public Policy (MPP), Georgetown University, Expected ... supportive of the policy voting hypothesis, according to which “parties benefit from ...

law review - SSRN papers
Tlie present sentencing debate focuses on which decisionmaker is best .... minimum sentences even after a sentencing guideline system is in place to control ...

School of Law University of California, Davis - SSRN papers
http://www.law.ucdavis.edu. UC Davis Legal Studies Research Paper Series. Research Paper No. 312. October 2012. Does Geoengineering Present a Moral Hazard? Albert Lin. This paper can be downloaded without charge from. The Social Science Research Netw

Optimism and Communication - SSRN papers
Oct 10, 2010 - Abstract. I examine how the communication incentive of an agent (sender) changes when the prior of the principal (receiver) about the agent's ...

yale law school - SSRN papers
YALE LAW SCHOOL. Public Law & Legal Theory. Research Paper Series by. Daniel C. Esty. This paper can be downloaded without charge from the.

Organizational Capital, Corporate Leadership, and ... - SSRN papers
Organizational Capital, Corporate Leadership, and Firm. Dynamics. Wouter Dessein and Andrea Prat. Columbia University*. September 21, 2017. Abstract. We argue that economists have studied the role of management from three perspec- tives: contingency

Negotiation, Organizations and Markets Research ... - SSRN papers
May 5, 2001 - Harvard Business School. Modularity after the Crash. Carliss Y. Baldwin. Kim B. Clark. This paper can be downloaded without charge from the.

Is Advertising Informative? Evidence from ... - SSRN papers
Jan 23, 2012 - doctor-level prescription and advertising exposure data for statin ..... allows advertising to be persuasive, in the sense that both E[xat] > δa.

directed search and firm size - SSRN papers
Standard directed search models predict that larger firms pay lower wages than smaller firms, ... 1 This is a revised version of a chapter of my Ph.D. dissertation.

All-Stage Strong Correlated Equilibrium - SSRN papers
Nov 15, 2009 - Fax: 972-3-640-9357. Email: [email protected]. Abstract. A strong ... Existing solution concepts assume that players receive simultane-.

Competition, Markups, and Predictable Returns - SSRN papers
business formation and markups forecast the equity premium. ... by markups, profit shares, and net business formation, which we find strong empirical support for ...

international r&d collaboration networks - SSRN papers
and efficiency of networks of R&D collaboration among three firms located in different countries. A conflict between stability and efficiency is likely to occur.

The Political Economy of - SSRN papers
Jul 21, 2017 - This evidence is consistent with the idea that with inelastic demand, competition entails narrower productive inefficiencies but also.

Bank Interest Rate Risk Management - SSRN papers
Apr 6, 2017 - Email: [email protected]. 1 ...... by the desire to increase current debt capacity, it is a useful benchmark to assess the quantitative relevance of ...

the path to convergence: intergenerational ... - SSRN papers
Apr 24, 2006 - THE PATH TO CONVERGENCE: INTERGENERATIONAL. OCCUPATIONAL MOBILITY IN BRITAIN AND THE US. IN THREE ERAS*.

Recreating the South Sea Bubble - SSRN papers
Aug 28, 2013 - Discussion Paper No. 9652. September 2013. Centre for Economic Policy Research. 77 Bastwick Street, London EC1V 3PZ, UK. Tel: (44 20) ...

Equity and Efficiency in Rationed Labor Markets - SSRN papers
Mar 4, 2016 - Tel: +49 89 24246 – 0. Fax: +49 89 24246 – 501. E-mail: [email protected] http://www.tax.mpg.de. Working papers of the Max Planck Institute ...

Food Security: A Question of Entitlements - SSRN papers
Trade Liberalization, Food Security, and the. Environment: The Neoliberal Threat to Sustainable Rural. Development. Carmen G. Gonzalez*. I. FOOD SECURITY ...

On the Twenty-Fifth Anniversary of Lucas - SSRN papers
My focus here is identifying the components of a successful Lucas claim and the implications of my findings for those who practice in this area. The Lucas rule, and how its many contours play out on the ground, is important for not only theorists but