Gender Differences: Evidence from Field Tournaments∗ Jos´e de Sousa and Guillaume Hollard† January 26, 2018

Abstract Women are under-represented in top positions, such as in Business or Politics. The traditional explanations of gender differences in ability and discrimination have now been complemented by psychological explanations based on laboratory experiments. In this paper, we attempt to assess the comparative importance of the psychological and traditional explanations in a natural field setting, namely chess competitions. Controlling for discrimination and ability, we find that women suffer from a systematic drop in performance when competing against men. This drop is only marginally smaller when we consider the most experienced players or the most woman-friendly countries. The gender difference is further amplified through the tournament structure, which prevents women from reaching top positions in the chess hierarchy.



We thank Ghazala Azmat, Raicho Bojilov, Thomas Buser, Pierre Cahuc, Andrew Clark, Bruno Decreuse, Habiba Djebbari, Uri Gneezy, Julio Gonz´alez-D´ıaz, Olivier Gossner, Emeric Henry, Nagore Iriberri, Koen Jochmans, Emir Kamenica, Henrik Kleven, Hela Maafi, Alan Manning, Muriel Nierdele, Antoinette Schoar, Tanguy van Ypersele, and Antoine Terracol, as well as seminar participants at the Eighth Transatlantic Theory Workshop in Northwestern U., Aix Marseille School of Economics, Barcelona GSE Forum, the 7th Maastricht Behavioral and Experimental Economics Symposium (M-BEES 2014), Ivry INRA, GATE Lyon, Paris Dauphine, Paris School of Economics and Ecole Polytechnique. We are very grateful to Jeff Sonas for providing us with the database on FIDE chess games. † Jos´e de Sousa: Universit´e Paris Sud, RITM and Sciences Po Paris, LIEPP, [email protected]. Guillaume Hollard: Ecole Polytechnique and CNRS, [email protected].

1

1

Introduction

Around the world, women are massively under-represented at the top of hierarchies in Business, Politics and Science (Hausman, et al. 2013).1 In our profession, while women represent 19% of the RePEc authors, they is only one woman in the world’s top 100.2 Considerable attrition is observed along the hierarchical ladder, resulting in systematic vertical segregation that has traditionally been attributed to women having lower productivity, facing different career-family trade-offs or being discriminated against in the most rewarding social activities (Altonji and Blank, 1999). However, new explanations have been proposed using laboratory and field experiments.3 For instance, gender differences in taste for competition have been shown to explain part of the horizontal segregation, i.e. the fact that women self-select into different study tracks or jobs than do men.4 This paper focuses on gender differences in performance generated by competitive situations to explain vertical segregation, i.e. the fact that women are under-represented at the top of hierarchies. Over the last decade, two psychological phenomena have been argued to hold women back on their way to the top. The first hinges on attitudes towards competition. When competing against men, women may experience a drop in their relative performance 1

For instance, women account for 47% of PhD graduates, 37% of Associate Professors but only 21% of Full Professors in Europe (Commission, 2016). Similar patterns are observed for Lawyers in the US (women represent 45% of associates but under 20% of partners – NALP, 2016) and among Corporate Directors in Europe (women represent only 12% of the membership of Boards of Directors, despite being 45% of the labor force – Pande and Ford, 2011). These figures for Academics, Lawyers and Corporate Directors are fairly similar in other parts of the world. 2 Women representation in RePEc authors is found here. The list of top 100 RePEc economists as of December 2017 is here. 3 See Azmat and Petrongolo (2014), Bertrand (2011), Croson and Gneezy (2009), Niederle (2016), Niederle and Vesterlund (2011). 4 Differences in competitiveness, as measured by Niederle and Vesterlund (2007), are associated with gender differences in various outcomes. For instance, Buser et al. (2014) show that gender differences in competitiveness explain about 20% of the gender difference in the choice of academic track. In the same spirit, Flory et al. (2015) show that women avoid applying to jobs in which wages are set based on competition. Closer to our purpose here, Reuben et al. (2015) find that “competitive” individuals earn 9% more than do their less-competitive counterparts. Moreover, they show that gender differences in the taste for competition explain around 10% of the overall gender pay gap among former MBA students.

2

(Gneezy and Rustichini, 2004).5 The second relates to stereotype threats.6 Negative stereotypes, when gender is explicitly recalled and made salient, can trigger a stereotype threat, resulting again in a drop in relative performance (Iriberri and Rey-Biel, 2017, Walton et al., 2015). Both psychological explanations underline the gender composition of the pool of competitors as an important driver of relative performance. Detrimental gender effects are thus expected to be increasingly present up the hierarchical ladder, as women compete increasingly against men as they move to the top. However, as stated by Bertrand (2011), “whether this body of psychological research will be more than just a decade-long fad and have a long-lasting impact on how labor economists think about gender differences will crucially depend on further demonstration of its economic significance in real markets.” Following this agenda for vertical segregation requires finding a real hierarchical organization that offers enough controls to single out psychological explanations, while controlling for the classic factors (ability, the careerfamily trade-off and discrimination). Chess competitions are a good fit here. Chess competitions share a number of important characteristics with other hierarchical organizations. Individuals self-select into competitions, and devote time and effort to increase their rank in the hierarchy.7 Like other competitive organizations, chess competitions exhibit severe vertical segregation: women represent 12% of our sample, but only 3% of the top 1% and there is only one woman in the top 100.8 But unlike other competitive organizations, chess competitions offer unique controls. Chess rankings are based on transparent and observable performance, using a publicly-available mechanism called the Elo system. This produces a clean measure of ability and excludes discrimination, 5

Note that the drop in performance is not found when women perform in teams (Dargnies, 2012) and in gender-balanced groups (Lavy, 2013), or when they perform tasks usually carried out by women (Dreber et al. 2014). A difference in performance is only observed when women compete with men in tasks that are usually performed by men, i.e. “male” tasks. 6 In a typical stereotype-threat experiment, subjects are asked to perform a test and to indicate their “group affiliation”, i.e., whether they belong to a negatively-stereotyped group. In the control condition, the test is performed first and the affiliation question is asked afterwards, while this order is reversed in the treatment condition (i.e., subjects indicate their affiliation before the task is performed). Members of social groups associated with negative stereotypes perform worse when the affiliation question comes first. 7 It is important to bear in mind here that we look at official competitions and do not consider chess played as a hobby. 8 These figures are based on the ranked active players in January 2017.

3

since promotions are based solely on performance. Another considerable advantage of chess competitions is the size of the sample: Our sample covers all official games played around the world over a four-year period, producing over three million pairings of players of all ages (from ages 5 to 90). Spanning all ages, our sample allows us to check for gender difference in performance at ages where the career-family trade-off is presumably less prevalent, i.e., below 21 and above 55. Moreover, as our sample covers over 150 countries, we can carry out cross-country comparisons. One contribution of our work here is to reveal robust gender performance differences in the field: women under-perform when competing against men. In various econometric specifications, we consistently find that the expected score of a woman is about 2.5% lower when playing against a man instead of an otherwise identical woman with the same Elo rating, age, and nationality. Furthermore, the same difference is found for those under 21, above 55 and in all countries for which data is available, including the most womanfriendly countries (raising intriguing questions about the role of culture). In addition, chess experience mitigates the gender-competition difference, but does not eliminate it. We check the robustness of our findings in three ways. We first ensure that any imperfections in Elo ratings do not produce gender bias. To this end, we use exogenous variation in the frequency of Elo-rating updates by the World Chess Federation, from every three months to every month, and find no impact on the size of the gender-competition effect. Second, we check whether the gender-competition effect may be an artifact caused by gender differences in ranking acquisition. Women can, for example, self-select into women-only tournaments. The competition effect remains unchanged when we control for the proportion of games played against men. Third, we use a variety of estimation strategies, including matching estimators, to show that the size of the effect does not depend on any particular parametric assumption. How does this modest gender difference in performance tie in with massive vertical segregation? Another contribution of our work is to propose a simple model which shows that, under fairly general conditions, small differences in performance accumulate over time and prevent women from reaching top positions. As a result, once a gender gap

4

appears, it will increase over time. Also note that if women do not anticipate the penalty they suffer from when playing against men, they will perform below their expectations. Women are thus more likely than men to be upset by their performance, leading to a greater likelihood of dropping out. In line with this, we find that women who faced a greater gender disadvantage at the beginning of our sample period are also more likely to have dropped out three years later. Putting the fact that women are more likely to drop out together with increasing gender differences over time, a small initial gender performance difference may yield considerable vertical segregation. The remainder of the article is organized as follows. Section 2 briefly describes chess competitions and our data, and Section 3 presents the descriptive statistics. Section 4 then describes our estimation strategy and presents the benchmark regression estimates. We then check the sensitivity of our results regarding the Elo rating system (Section 5), experience (Section 6), and the career-family trade-off (Section 7). In Section 8 we ask if the gender-competition effect is cultural. In Section 9 we first link dropouts to the competition effect, and then set out a simple theoretical model explaining how the competition effect can produce a large gender gap. Last, Section 10 discusses the importance of the present findings for the labor market.

2

Context and data

Chess is played all around the world, and we here focus on internationally-rated players who participate in official chess tournaments registered at the World Chess Federation (FIDE). FIDE governs international chess competitions in 181 member federations and publishes, at regular intervals, the ratings of all international chess players. A typical tournament consists of nine games, spread out over a week. As games last for about four hours on average, playing in a tournament is time consuming. As a result, most players only play in a few tournaments per year and are highly motivated.

Rating and updates. A player’s rating depends on the ratings of their opponents and their results against them. FIDE applies a simple numerical system to update the rating 5

of a player j, which compares the player’s expected score in a game, Ej , to their actual score Sj (0 for a loss, 0.5 for a draw and 1 for a win).9 The expected score, Ej , is calculated from the rating difference between player j and their opponent, ∆j , using the following “winning-expectation” formula (using the logistic curve):

Ej =

1 ∆j

.

(1)

1 + 10− 400

From (1), player j’s rating after each game is then updated as

Elot = Elot−1 + Kj (Sj − Ej ),

(2)

where Elot and Elot−1 are the new and previous player ratings. The K-factor shows how much the game affects the player’s rating. This is a critical element in maintaining an accurate rating, and is player-specific. FIDE gives newcomers higher K values so that their rating corresponds more closely to their current level.10 An example helps to clarify how ratings are updated. Consider a game in which player j has a 20 point higher rating (∆j = 20), so that from the winning-expectation formula (1) their expected score is about Ej = 0.529. If the game is then drawn, j’s Elo rating is updated according to equation (2), losing Kj (0.5 − 0.529) points. The Elo update can be carried out after each game or tournament, or after any suitable rating per0iod. We will below exploit an exogenous variation in the frequency of Elo updating in our empirical analysis.

Data. Our data set, kindly provided by the chess statistician Jeff Sonas, covers all games registered by the World Chess Federation played between February 2008 and April 2013. Apart from the Elo ratings, FIDE provides each player’s name, year of birth, nationality, gender, as well as the result of each game (win, loss or draw). We here use data from 3,272,577 rated games from 154 different countries (see Table A1 in the Appendix for a 9

Note that an unrated player receives an Elo rating after playing a minimum of five games against rated opponents. Games against unrated opponents are not rated. 10 According to FIDE rules, K = 40 for a player who is new to the rating list or before their 18th birthday (as long as their rating remains under 2300). Afterwards, K = 20 after playing at least 30 games or reaching the age of 18. Finally, K = 10 once a player’s published rating has reached 2400, and remains at this level even if their rating subsequently drops back below 2400.

6

complete list of countries). In what follows, we refer to the players as player 1 and player 2. These roles are randomly assigned.

3

Descriptive statistics

The descriptive statistics show significant raw gender differences. Women remain overall underrepresented among chess players, participating in about 12% of the games recorded in our database. Chess is a man’s world, as revealed by three stylized facts. First, the gender distribution of ratings reveals a substantial gap. Women are on average rated lower by about 123 points, as depicted in Figure 1: women’s mean Elo ranking is 1921 (with a standard deviation of 272) versus 2044 (261) for men. This difference is significant at every conventional level. Its size, with a Cohen’s d of .46, shows that gender differences in chess competition are substantial, very much in line with other social organizations such as firms.11 This difference cannot be readily attributed to gender differences in some psychological or cognitive trait as we are not comparing random samples of men and women. As such, our research design to look for psychological gender differences should control at least for ratings. Figure 1: Density of ratings by gender

.0015

Kernel density estimate

0

.0005

Density .001

Male rating Female rating

1000

1500

2000 Elo rating

2500

3000

kernel = epanechnikov, bandwidth = 22.2546

Second, the 12% of women in our sample are not uniformly distributed across the ranking. There are very few women at the top of the hierarchy: there is only one woman in 11

See Niederle (2016) for a discussion of the use of Cohen’s d in the gender literature.

7

the top 100 and 30 in the top 1000 (as of May 2015). Despite being low, these numbers do indicate that it is possible for women to beat even the best men.12 This is in sharp contrast to other sports, e.g., track and field, where no woman has ever performed anywhere close to the best men. Third, as can be seen in Figure 2, female players are on average much younger than male players: the average age is 22.4 for women (with a standard deviation of 12.6) as against 36.3 (18) for men. This age difference can be explained by two factors. First, a significant number of women drop out before age 30, and, second, there are older male newcomers who enter official competitions for the first time as adults (while very few women do so). This considerable age difference will be controlled for in our empirical analysis. Figure 2: Density of birth year by gender Male

0

.02

Density

.04

.06

Female

1920

1940

1960

1980

2000

1920

1940

1960

1980

2000

Birth Year Graphs by gender

4

Empirical analysis

We here want to test whether the probabilities of win, draw, or loss vary by gender. However, since the women in our data set are on average younger and have lower ratings, we require an identification strategy that controls for these differences. Our strategy amounts 12

As an example, Judit Polgar, the first and, to date, only woman to break into the top 10, defeated ten World Chess Champions, including Gary Kasparov and Anatoly Karpov.

8

to comparing the scores of thousands of games when a woman is playing against either a man or an otherwise identical woman. In other words, the female player’s opponent is a man or a woman of the same age and ability (as proxied by the Elo rating). Conditional on age and ability, does the opponent’s gender affect the outcome of the game? To answer this question, we use first a parametric estimator (an ordered logit model) and then a non-parametric matching approach.

4.1

Parametric estimations

The score in a chess game can take on three values, depending on the outcome i: loss (i = 1), draw (i = 2), and win (i = 3). As these outcomes are ordered, the ordered logit statistical model is a natural choice for the analysis.13 The estimation of this model produces both coefficients on the regressors and the cutoff points that separate adjacent values of the score. These cutoff points divide the density function of the standard logistic distribution into three parts. These densities show the predicted probabilities of win, draw and loss when all of the explanatory variables (here age and rating) are set to zero. The location of the density function shifts, relative to the fixed cutoff points, as the values of the regressors change. The interpretation of a given estimated coefficient requires knowledge of both the size of the estimate and the values of the cutoff points. For a game between players 1 and 2, the probability of observing outcome i (= 1, 2, 3) corresponds to the probability that the estimated linear function, plus the logisticallydistributed error ε, is within the range of the cutoff points c estimated for the outcome

Pr(outcome12 = i) = Pr(ci−1 < x1 β + x2 γ + ε12 ≤ ci ),

(3)

where outcome12 is the score of the game between players 1 and 2, and x1 and x2 are the vectors of player-1 and player-2 regressors, respectively. The error term ε12 is assumed to be logistically distributed in the ordered logit. The model estimates the coefficients β and γ together with the cutoff points c1 and c2, where c0 is taken as −∞, and c3 as +∞. We 13

Estimations using ordered probit and OLS are available upon request. Our results are robust to these alternative estimators, as the relations described appear fairly linear in practice.

9

use a heteroskedastic ordered-logit model, and allow the variance of the unobservables to vary byt gender.14 One reason why we may expect gender differences in the variance of unobservables is that women may self-select into some women-only tournaments.

Benchmark results Table 1 shows the ordered-logit results. The dependent variable is the score of player 1 (win, draw, loss). Female 1 and Female 2 are dummies indicating the gender of players 1 and 2. The first column does not control for differences in age and ability. The estimates show that when a man plays against a woman (rather than another man), he gains a considerable advantage. The score of player 1 is then lower when player 1 is a woman and higher when player 2 is a woman. To compare the results from different specifications of the ordered logit regressions, we follow Buser et al. (2014) and standardize the coefficient for the female dummies. We divide each female dummy by the difference between the estimated ordered logit thresholds of the highest and the lowest scores. As shown at the foot of column 1, playing against a woman spans 33% (= .372/(.571 + .570) = .326) of the gap between a loss and a victory. The results in column 2 indicate that part of the above gender difference is explained by the Elo-rating (ability) and age differences (women have, on average, lower rankings and are younger). However, a substantial gender difference remains after controlling for ability and age. This gender difference spans 7.4% of the gap between loss and victory. In other words, almost one quarter of the observed gender difference cannot be accounted for by ability and age (the exact value is .074/.326 = .23). The other estimates are as expected: higher rankings increase the score, while older players do worse. We find a substantial first-mover advantage, as shown by the coefficient on “Player 1 has white”, which is well-documented in chess. Column 3 shows the estimates from the heteroskedastic model, which allows the variance of unobservables to vary by gender. This is our preferred model for the following regressions. The estimates in column 3 are slightly larger, except for the age coefficients. 14

See Neumark (2012) for a simple presentation of the heteroskedastic (probit) model in the case of race discrimination.

10

Table 1: The determinants of outcomes in chess competitions Ordered Logit (Win>Draw>Loss) (1)

(2)

(3)

-0.368a (0.004) 0.372a (0.004)

-0.109a (0.004) 0.109a (0.004) 0.561a (0.001) -0.562a (0.001) -0.015a (0.000) 0.015a (0.000) 0.316a (0.002)

-0.120a (0.005) 0.121a (0.005) 0.567a (0.001) -0.568a (0.001) -0.015a (0.000) 0.015a (0.000) 0.319a (0.002)

Cut 1 (C1) Cut 2 (C2)

-0.571a 0.570a

-0.600a 0.868a

-0.606a 0.878a

Female 1/(C2 - C1) Female 2/(C2 - C1)

-0.322a 0.326a

-0.074a 0.074a

-0.081a 0.081a

Female 1 Female 2 Player 1’s rating Player 2’s rating Player 1’s age Player 2’s age Player 1 has white

Based on the estimates in column 3: Average marginal effects of a man playing vs. a Pr(score1=1) Pr(score1=.5) Pr(score1=0)

Man .322a .355a .323a

Woman .358a .335a .307a

Notes: There are 3,272,577 observations in all regressions. The dependent variable is the score of player 1 (win=1, draw=.5, loss=0). The coefficients are from ordered-logit regressions. In column 3, the estimation is carried out using a heteroskedastic model, which allows the variance of the unobservables to vary by gender. Female 1 and Female 2 are dummies for players 1 and 2 being female. Player 1 may have the white or black pieces. Robust standard errors are in parentheses, with a denoting significance at the 1% level. The pvalues for Female C2−C1 and the marginal effects are calculated using the Delta method.

11

The effect of competition on women’s performance remains statistically and economically significant. How large is the competition effect on women’s performance? From the estimates in column 3, women ceteris paribus suffer from systematic disadvantage when confronted with men. For comparison, consider a man playing against a man of the same age and ranking: the estimated probabilities at the foot of Table 1 are 32.3% for a win, 35.5% for a draw and 32.2% for a loss. However, when the man plays against an otherwise identical woman these figures shift to 35.8%, 33.5%, and 30.7%, respectively: on average, men win more and lose less against a woman than a comparable man. The expected score of a male playing against a female (instead of a comparable male) is 2.5% higher.15 Converted into Elo points, the gender difference is about 20 Elo points.16 To illustrate the impact of this 20-point difference,17 consider the following situation. Two chess players, one man and one woman, of the exact same age and initial Elo (i.e. ability) repeatedly compete for promotion (e.g. the world champion title). Given the gender competition effect, the winning expectation of the man is 0.529 (see equation (1)), meaning that 52.9% of the time he will win the contest and will be promoted in a contest-wining system. Moreover, a simple simulation based on equations 1 and 2 shows that the man will be better ranked 60% of the time. So a system that promotes the best-ranked player will select the man 60% of the time. A modest 20-point difference can thus turn into a sizable effect.

4.2

Non-parametric estimations

The above ordered-logit regressions make assumptions about the functional form linking our variable of interest, i.e. game outcomes, to the observable covariates. We may want to estimate the size and significance of the gender effect using a less parametric approach based on matching estimators. 15

This difference in the expected score is calculated as (35.8 + 33.5*0.5) - (32.3 + 35.5*0.5). This difference is obtained by comparing the coefficient on rating in the regression in column 3 (expressed as the Elo divided by 100), namely 0.567, to that on the gender difference, namely 0.120. A fall in the woman’s rating of 20 points exactly offsets the gender difference. 17 To fix ideas, a 20-point difference separates the world numbers 2 and 9 in the FIDE list of October 2017. 16

12

The basic principle of matching here is to find for each game played by a man against a woman, a “twin” or counterfactual game played between two men (ie., a game involving identical players in terms of Elo ratings and age). Formally, consider a game g between a man and an opponent who can be a man or a woman. Denote the game status by a dummy variable with two possible values {M, F }, where M indicates a male-male pairing and F a male-female pairing. Ideally, for each male-female game g with observed score SFg , we want to establish the counterfactual score, SMg , had the male played against another man with the same age and Elo rating as the female opponent. There is a gender effect if the average difference SFg − SMg across games is statistically different from zero. Using the terminology of difference-in-differences estimations (Imbens, 2004), we consider gender as our treatment variable and the difference SFg − SMg as our treatment effect. The estimation of SFg − SMg is unbiased if male and female players are randomly selected in sets with the distribution of the covariates. The game status here, M or F , would be independent of the covariates, Xg , such as age and player ratings. However, as noted in the stylized facts section, there are some significant differences. For instance, women are 14 years younger than men on average and are less numerous. The sets of male-male and male-female games are thus not balanced, which may produce a biased estimate of the average treatment effect. Matching techniques are a way of overcoming selection bias. The principle here is to create two balanced groups by finding a counterfactual game for each male-female F game in the large set of M games. The distribution of covariates will thus be the same in the treatment and matched control groups. There are a number of ways of creating these two samples, depending, for instance, on the matching technique and the number of allowed matches for each observation. We here use two techniques:18 the Nearest-Neighbor Matching (NNM) estimator as our main nonparametric technique, and Propensity Score Matching (PSM) as a robustness check. The NNM looks for the closest game using the Euclidean or Mahalanobis distance in the covariate space, i.e. the age and rating of both opponents, and who has the white pieces.19 The PSM calculates a propensity score of 18 19

Additional variations are available upon request, but none produced any significantly different results. We follow Abadie and Imbens (2006, 2011), and correct for the large-sample bias arising when

13

belonging to the set of M games, using a logit regression, and then matches games via their propensity scores. The effect of rebalancing covariates can be assessed in quantile-quantile (Q-Q) plots comparing the empirical distributions of each covariate. These Q-Q plots appear in Figure 3 for the matched and unmatched samples. In these samples, player 1 is always male, and player 2 can be female or male. The unmatched sample contains 2,942,407 games, of which 167,833 are male-female pairings, while the matched sample is restricted to these 167,833 treated games plus 167,833 twin control games. In Figure 3, we compare the matched and unmatched samples according to two key variables: age and the rating of player 2, who can be either male or female. When the empirical distributions are the same in the treatment and control groups in each sample, the points in the Q-Q plots are all on the 45-degree line. As expected, this is not what we find in the unmatched sample, due to the age and rating differences (see panels a and c). In particular, the age distribution is not the same for men and women in the whole sample. However, the matching of the treatment and control groups allows us to avoid these differences. A crucial feature of our dataset is that there is considerable overlap between the two sets of games: although men are older and better-ranked on average, there are still enough observations to produce high-quality matches. For instance, no observation was excluded. Table 2 reports the results of our matching estimations based on the Nearest-Neighbor Matching with Euclidean (col. 1) and Mahalanobis (col. 2) distances and Propensity Score Matching (col. 3). The gender effect is very significant at all conventional levels (Z > 12). The effect size here is remarkably similar to that in the ordered-logit regressions: the expected score of a man playing against a woman (instead of a comparable man) is 2.4% higher on average, compared to the 2.5% figure from the estimates in column 3 of Table 1. The propensity-score estimate reported in column 3 produces a slightly larger effect. In sum, the parametric and non-parametric estimations yield a consistent message: matching on more than one continuous covariate.

14

Figure 3: Checking for Balance: Age and Rating

(a) Age, unmatched sample

(b) Age, matched sample

(c) Rating, unmatched sample

(d) Rating, matched sample

Note: In the matched sample (panels b and d), a twin control male-male game is found for each treated game in the NNM estimator via the Mahalanobis distance on the covariates X (age, rating and a white pieces dummy).

there is a robust very significant gender effect that is similar in size across specifications.

15

Table 2: Matching estimates of the effect of female on outcomes Matching

NNM

NNM

PSM

Distance

Euclidean (1)

Mahalanobis (2)

(3)

0.024a (0.001)

0.024a (0.001)

0.030a (0.001)

Score (diff-in-diff)

Notes: The dependent variable is the score of player 1 (win, draw or loss). Standard errors appear in parentheses, with a denoting significance at the 1% level. The table shows the estimates of the average treatment effect in the treated group, which is the difference between the outcomes of player 1 when playing a woman and a man. The matching estimates control for both players’ ages and Elo ratings, as well player 1 having the white pieces. There are 167,833 male-female games matched with male-male games from a sample of 2,774,574 potential matches. NNM stands for NearestNeighbor Matching and PSM for Propensity Score Matching.

5

Sensitivity checks on the Elo rating

The validity of our identification strategy depends on the accuracy of the Elo rating. Any imprecision in the latter will add noise to our analysis. However, a more serious concern arises if Elo ratings are gender-biased. Most players participate in mixed open tournaments in which they cannot choose their opponents, so that female-male pairings are random. However, women are able to self-select into specific women-only tournaments, which may bias their Elo ratings. We here address this concern in three ways. We first control for women’s history via the proportion of games they played against other women (Section 5.1). We then identify a subset of countries in which self-selection into womenonly tournaments is almost impossible or very restricted, ensuring that the male-female matching is random. These countries are compared to others in which there is obvious self-selection (Section 5.2). Last, we appeal to an exogenous variation in the frequency of Elo rating updates that dramatically increases its accuracy (Section 5.3).

5.1

Gender differences in rating acquisition

We are concerned about a potential gender difference in rating acquisition. In particular, women may self-select into tournaments where female players are more numerous, or even into women-only tournaments. This selection could be far from marginal. If matching 16

were gender-blind, and given that women represent approximately 10% of the chess population, we should only find about 1% of games in which both players are women; in practice this figure is closer to 5%, suggesting considerable self-selection. We analyze the potential impact of this self-selection by defining two groups of female players. In the first group, women play 50% or more of their games against other women, while in the second group they play under 50%. We create a dummy variable for each group, and interact these with the female dummies. The results appear in Table 3. Table 3: Females in mixed tournaments Ordered Logit (Win>Draw>Loss) Female 1 playing mostly against women (F1) Female 1 playing mostly against men (F2) Female 2 playing mostly against women (F3) Female 2 playing mostly against men (F4) Player 1’s rating Player 2’s rating Player 1’s age Player 2’s age Player 1 has white

-0.067a (0.006) -0.126a (0.005) 0.068a (0.006) 0.127a (0.005) 0.561a (0.001) -0.561a (0.001) -0.015a (0.000) 0.015a (0.000) 0.316a (0.002)

Cut 1 (C1) Cut 2 (C2)

-0.576a 0.892a

F1/(C2 F2/(C2 F3/(C2 F4/(C2

-0.045a -0.086a 0.046a 0.086a

-

C1) C1) C1) C1)

Observations

3,272,577

Notes: The dependent variable is the score of player 1 (0, .5, 1). The coefficients come from heteroskedastic ordered-logit regressions, in which the variance of unobservables varies by gender and segregation. F1 to F4 are dummies for the player being a woman mostly playing against women (50% of more of their games) or mostly against men (less than 50%) . Player 1 may have the white or black pieces. Robust standard errors are in parentheses, with a denoting significance at the 1% F values are calculated using the Delta method. level. The p-values for C2−C1

17

The size of the gender gap for women who play mostly against men is about 8.6% (calculated as the value of F 2 divided by the distance between the two cut points). This gap is close to our benchmark value of 7.4%. The corresponding figure for women who play mostly against other women is actually lower, at about 4.5%. As such, any distortion from the segregation of male and female events seems to be that women are under-rated when playing in mostly female events. Our observed gender difference cannot then be attributed to women being over-rated due to tournament segregation.

5.2

Gender as a treatment variable

As noted above, women may choose to participate in tournaments with a larger proportion of female players, leading to a higher than normal proportion of female-female matches. However, this choice is not the same worldwide. In some countries, the proportion of female-female tournament matches corresponds to the prediction from random matching. In any mixed open tournament, player pairings are generated at random. If women do not or cannot self-select into “women-only events”, the proportion of female-female matches will be purely random. In these countries, gender can be considered as a treatment variable: playing against a woman just a random event that affects all players in the same way. For each country in our dataset, we first calculate the expected proportion of mixedgender games under purely random matching. We then calculate the difference between the expected and observed proportions and rank countries accordingly. Roughly one third of observations come from countries in which no difference is found (according to a χ2 test), suggesting random gender matching. In the second third there are only small differences (the χ2 statistic is significant at the 10% level, but only for some countries), while these differences are significant in the last third. We thus split our sample up into three groups, corresponding to the three columns in Table 4. Column 1 refers to countries in which there is no difference between the expected and observed proportions, and where gender can be considered as a treatment variable. Column 2 corresponds to a more intermediate situation, while in column 3 we find the subset of countries in which self-selection is more

18

Table 4: Gender as a treatment variable Ordered Logit (Win>Draw>Loss) (1)

(2)

(3)

Female 1

-0.134a (0.007)

-0.126a (0.008)

-0.107a (0.009)

Female 2

0.133a (0.007)

0.113a (0.008)

0.117a (0.009)

Player 1’s rating

0.562a (0.001)

0.570a (0.001)

0.573a (0.001)

Player 2’s rating

-0.563a (0.001)

-0.569a (0.001)

-0.573a (0.001)

Player 1’s age

-0.015a (0.000)

-0.016a (0.000)

-0.015a (0.000)

Player 2’s age

0.015a (0.000)

0.016a (0.000)

0.015a (0.000)

Player 1 has White

0.362a (0.004)

0.335a (0.004)

0.264a (0.004)

Cut 1 (C1) Cut 2 (C2)

-0.604a 0.872a

-0.610a 0.969a

-0.535a 0.816a

Female 1/(C2 - C1) Female 2/(C2 - C1)

-0.082a 0.081a

-0.071a 0.064a

-0.073a 0.080a

1,145,821

1,056,869

1,069,391

Observations

Notes: The dependent variable is the score of player 1 (0, .5, 1). The coefficients are from heteroskedastic ordered-logit regressions, in which the variance of unobservables varies with gender and segregation. Column 1 refers to countries in which there is no difference between the expected and observed proportions of mixed gender games. Column 2 corresponds to an intermediate situation, while column 3 focuses on the subset of countries in which self-selection is more pronounced. Robust standard errors are in parentheses, with a denoting significance at the 1% level. The p-values for Female C2−C1 are calculated using the Delta method.

19

pronounced. The coefficients on our variables of interest, Female 1 and Female 2, fall slightly from column 1 to column 3. However, once they are normalized by the cutoff points C1 and C2, as at the foot of Table 4, the effects are very similar across subsamples. As a result, self-selection into women-only tournaments does not seem to create bias in our estimates: the gender difference in performance is found in each sub-sample.

5.3

Imperfections in Elo ratings

Elo ratings may be inaccurate. At a given point in time, some players may be under- or over-rated relative to their “true” value. The frequency of the updates especially affects fast-improving or fast-deteriorating players. Consider an example with ratings published every six months. Suppose there is a young player with a published rating of 2300 on the first of January 2017 and one of 2350 at the end of February. She is fast-improving, and up to the first of July 2017 she would be underrated by 50 points. In this case, more-frequent updates would reduce the inaccuracy in the Elo. Our concern is that this inaccuracy is gender-specific. Assume that women on average devote less effort to chess than men. Men would thus improve faster and would be underrated compared to women. Our competition effect would thus be an artifact of not controlling for male ability. The size of the gender difference would then vary according to the frequency of updates. Infrequent updates should lead to a greater gender difference in competition, which would fall with more frequent updates. To rule out this possibility, we exploit naturally-occurring variations in the frequency of Elo ratings. From 2000 to the first half of 2009, four lists per year were published by FIDE. By the second half of 2009, there were six lists per year. Then in July 2012 FIDE started publishing monthly ratings. Our database covers all FIDE games played from February 2008 to April 2013. The frequency of updates thus tripled over this period. As we know the date of the game, we can exploit the sizable changes in frequency in July 2009 (from 4 to 6 lists) and July 2012 (from 6 to 12). Table 5 shows the results. We first run a regression for each period (columns 1 to 3)

20

and then a pooled regression in which the female dummies are interacted with the periods (column 4). We expect the Elo to become more accurate as the frequency of updates increases. However, this greater accuracy does not seem to be gender-specific. Our gender competition gap continues to hold despite the exogenous variations in update frequency. The standardized effects at the foot of the table show that being a woman accounts for between 7.3% and 9.7% of the gap between a loss and a win. The regression on the whole period (column 4) allows us to test the equality of the interaction terms for each female dummy. The interaction estimates in the first period (-0.144 and 0.136) appear to be statistically different from those in the other periods. However, the separate regressions clearly show that the cutoff points vary from one period to another. This produces insignificant differences in the standardized effects. Overall, our identified gender competition effect is robust to changes in the frequency of the rating updates and self-selection into women-only tournaments.

21

Table 5: The size of the gender difference by the frequency of Elo udpdates Ordered Logit (Win>Draw>Loss) (1)

(2)

(3)

(4)

Period 1 Period 2 Period 3 02.2008 to 06.2009 07.2009 to 06.2012 07.2012 to 04.2013 Female 1 (F1)

-0.135a (0.010)

-0.116a (0.006)

-0.113a (0.010) -0.144a (0.009) -0.115a (0.006) -0.108a (0.011)

Female 1 × Period 1 (F1.1) Female 1 × Period 2 (F1.2) Female 1 × Period 3 (F1.3) Female 2 (F2)

0.125a (0.010)

0.103a (0.006)

0.129a (0.010)

0.587a (0.002) -0.589a (0.002) -0.014a (0.001) 0.014a (0.001) 0.333a (0.005)

0.567a (0.001) -0.567a (0.001) -0.015a (0.001) 0.015a (0.001) 0.319a (0.003)

0.547a (0.002) -0.548a (0.001) -0.016a (0.001) 0.016a (0.001) 0.301a (0.005)

0.136a (0.009) 0.115a (0.006) 0.122a (0.010) 0.567a (0.001) -0.568a (0.001) -0.015a (0.001) 0.015a (0.001) 0.319a (0.002)

725,341

1,966,361

58,0875

3,272,577

Female 2 × Period 1 (F2.1) Female 2 × Period 2 (F2.2) Female 2 × Period 3 (F2.3) Player 1’s rating Player 2’s rating Player 1’s age Player 2’s age Player 1 has white Observations

Whole Period

Cut 1 (C1) Cut 2 (C2)

a

-0.650 0.902a

a

a

-0.599 0.884a

-0.576 0.826a

F1/(C2 - C1) F1.1/(C2 - C1) F1.2/(C2 - C1) F1.3/(C2 - C1) F2/(C2 - C1) F1.1/(C2 - C1) F1.2/(C2 - C1) F1.3/(C2 - C1)

-0.087a

-0.078a

-0.081a

-0.607a 0.878a -0.097a -0.077a -0.073a

-0.081a

0.078a

-0.092a 0.091a 0.077a 0.082a

Notes: The dependent variable is the score of player 1 (0, .5, 1). The coefficients come from heteroskedastic ordered-logit regressions, in which the variance of unobservables varies by gender and period. F1 and F2 are dummies for the player being a woman. These dummies are interacted with the three periods in column 3, which includes period dummies. The frequency of rating updates differs across the three periods. Player 1 may have the white or black pieces. Robust standard F errors are in parentheses, with a denoting significance at the 1% level. The p-values for C2−C1 are calculated using the Delta method.

22

6

Does the competition effect fall with experience?

The gender differences observed in the laboratory are generally found when subjects perform very specific tasks that are rather unusual for them, e.g. solving mazes or adding numbers. We may legitimately wonder if experienced subjects, who repeatedly carry out tasks with which they are familiar, also exhibit gender competition effects. To answer this question, we focus on experienced players who have played over 100 official games (given that the average number of games per player per year is 38), and have obtained 2100 Elo points (and are so in the top 24.5% of rated players).20 We then ask whether experienced players also exhibit gender differences in performance. The results in Table 6 from the heteroskedastic ordered-logit model, where the variance of the errors varies by gender and experience. Column 1 reproduces the benchmark estimation from column 3 of Table 1. Column 2 shows the results with the experience variables. The female dummies are interacted with a dummy for both players being experienced, so that the gender gap depends on experience. The interaction terms are both statistically and economically significant. Inexperienced women face a larger gender difference than do experienced women. However, experienced women still have a gap representing over 5.1% of the distance between a win and a loss (this is shown in the row before last) as opposed to 8.1% for an inexperienced woman.

20

These thresholds play no particular role: they were chosen so as to produce enough observations. The results with alternative thresholds are available upon request. Note also that as experience acquired before 2008 is not observed, we do not claim to capture all experienced players. However, our group of experienced players certainly does only contain experienced players.

23

Table 6: The effect of experience Ordered Logit (Win>Draw>Loss) (1) Both players experienced (BothExp)

(2)

0.567a (0.001) -0.568a (0.001) -0.015a (0.000) 0.015a (0.000) 0.319a (0.002)

-0.003 (0.003) -0.114a (0.005) 0.041a (0.011) 0.117a (0.005) -0.053a (0.011) 0.540a (0.001) -0.541a (0.001) -0.014a (0.000) 0.014a (0.000) 0.305a (0.002)

Cut 1 (C1) Cut 2 (C2)

-0.606a 0.878a

-0.565a 0.841a

Female 1/(C2 - C1) Female 2/(C2 - C1) (Female 1 + Female 1×BothExp)/(C2 - C1) (Female 2 + Female 2×BothExp)/(C2 - C1)

-0.081a 0.081a -0.048a 0.058a

-0.081a 0.083a -0.051a 0.045a

3,272,577

3,272,577

-0.120a (0.005)

Female 1 Female 1 × BothExp

0.121a (0.005)

Female 2 Female 2 × BothExp Player 1’s rating Player 2’s rating Player 1’s age Player 2’s age Player 1 has white

Observations

Notes: The dependent variable is the score of player 1 (loss, draw or win). The coefficients are from heteroskedastic ordered-logit regressions. In column 2, the variance of unobservables varies with gender and experience. Female 1 and Female 2 are dummies for the players being female. Column 2 interacts the female regressors with a dummy for both players being experienced. A player is considered as “experienced” if she/he has played at least 100 games between 2008 and 2013 and has obtained an Elo rating of at least 2100. Player 1 may have the white or black pieces. Robust standard errors are in parentheses, with a denoting significance at the 1% level. The p-values for Female C2−C1 are calculated using the Delta method.

24

7

Checking for the career-family trade-off: from young players to retirees

In the United States, women between the ages of 21 and 55 spend roughly twice as much time on child care as do men (Guryan et al., 2008). Similar figures are found in many other countries. There are at least two reasons we may want to consider these gender asymmetries in the career-family trade-off. The first is that women may be less devoted to their task, i.e. playing chess, during a game than men because of domestic issues. Women may need to check, for example, whether they have received urgent messages on their phone regarding their children. Here, the gender performance difference that we find would reflect the unequal sharing of domestic tasks across genders, rather than psychological effects. Another possible effect of gender differences in the career-family trade-off is that when the burden of domestic tasks increases, women devote less time and energy to chess than do men. Women are then more likely to experience a relative fall in their Elo ratings. If Elo ratings are slow to adjust, then the small gender difference in performance detected will be wrongly attributed to a psychological effect. A similar point was raised in Section 5. We here propose a simple robustness check. We split our sample by age to look at gender performance differences at ages where the career-family trade-off is presumably less prevalent, i.e. under 16 or 21 and over 55 or 64. The results appear in Table 7. We consider different age thresholds: under 21 (column 1), under 16 (column 2), over 55 (column 3) and over 64 (column 4). In all of these age categories, in which the career-family trade-off is presumably less relevant, we find a gender performance difference. Girls under the age of 16 or 21 competing against boys of the same age face a competition effect that does not disappear at the older ages of above 55 or 64.

25

Table 7: The size of the gender difference by age group Ordered Logit (Win>Draw>Loss) (1)

(2)

(3)

(4)

Below 21

Below 16

Above 55

Above 64

Female 1

-0.163a (0.008)

-0.180a (0.013)

-0.216a (0.027)

-0.200a (0.038)

Female 2

0.156a (0.008)

0.160a (0.013)

0.188a (0.027)

0.160a (0.039)

Player 1’s rating

0.522a (0.002)

0.499a (0.003)

0.602a (0.003)

0.618a (0.005)

Player 2’s rating

-0.520a (0.002)

-0.499a (0.003)

-0.604a (0.003)

-0.618a (0.005)

Player 1’s age

-0.019a (0.001)

0.038a (0.003)

-0.015a (0.001)

-0.015a (0.001)

Player 2’s age

0.018a (0.001)

-0.041a (0.003)

0.015a (0.001)

0.016a (0.002)

Player 1 has White

0.303a (0.005)

0.287a (0.009)

0.285a (0.009)

0.286a (0.014)

Cut 1 (C1) Cut 2 (C2)

-0.508a 0.812a

-0.542a 0.709a

-0.646a 0.870a

-0.516a 1.062a

Female 1/(C2 - C1) Female 2/(C2 - C1)

-0.124a 0.118a 527,493

-0.143a 0.128a 195,523

-0.142a 0.124a 179,331

-0.127a 0.101a 75,173

Observations

Notes: The dependent variable is the score of player 1 (0, .5, 1). The coefficients come from heteroskedastic ordered-logit regressions, in which the variance of the unobservables varies by gender and age. Female 1 and Female 2 are dummies for the players being female. Each column refers to different parts of the age distribution. Robust standard errors are in parentheses, with a denoting signifF are calculated using the Delta icance at the 1% level. The p-values for C2−C1 method.

26

8

Is the competition effect cultural?

Does culture matter for our identified gender-competition effect? “Culture” is certainly hard to define, but may be understood as a body of shared knowledge, understanding, and practice (Fern´andez, 2010). We here make the simplifying but convenient assumption that players of the same nationality share a common culture, whatever its definition. We first test whether more women-friendly countries succeed in eliminating the gap, assuming that the relevant definition of culture can be captured by country indicators. Second, in order to avoid relying on a particular country index, we simply add country fixed-effects to our regression. Last, we group countries into what could be considered as culturallyhomogeneous areas. These three approaches are all intended to detect cultural effects.

8.1

Do women-friendly countries eliminate the competition effect?

Our dataset allows us to compare the size of the gender-competition effect across many countries. There are significant differences in gender-based gaps for various outcomes across-countries, such as wages, labor-force participation and educational attainment. For instance, the World Economic Forum constructs an index, the Gender Gap Index (GGI), that ranks countries by their gender gaps (see Hausmann et al., 2013). The GGI measures gender-based gaps in access to resources and opportunities. For example, the four highest-ranked countries (Iceland, Finland, Sweden and Norway) have closed over 80% of their gender gaps, while the figure for the lowest-ranked countries is only a little over 50%. To assess the robustness of our gender effect, we focus on the countries with the highest GGI values. Our assumption is that these countries may have been successful in avoiding most of the stereotypes that are detrimental to female performance. We thus concentrate our attention on the games played between players from the Top-10 or Top-20 countries in the GGI ranking (listed in Table A1 in the Appendix). Columns 1 and 2 of Table 8 show the results for the Top-10 and Top-20 samples respectively. As can be seen, overall the estimates are very similar to those in the full

27

sample (see Table 1, column 3). Surprisingly, the gender differences in competition are almost the same in the most women-friendly countries as in the rest of the world: we move from a figure of 7.4% (whole sample) to 6.9% for countries in the Top 20 (column 2) and 6.2% for those in the Top 10 (column 1). Table 8: Cultural effects Ordered Logit (Win>Draw>Loss) (1)

(2)

TOP10 GGI

(3)

TOP20 GGI

Country Fixed Effects

Female 1

a

-0.124 (0.035)

a

-0.118 (0.014)

-0.132a (0.005)

Female 2

0.131a (0.035)

0.134a (0.013)

0.134a (0.005)

Player 1’s rating

0.570a (0.004)

0.589a (0.002)

0.565a (0.001)

Player 2’s rating

-0.568a (0.004)

-0.590a (0.002)

-0.566a (0.001)

Player 1’s age

-0.015a (0.001)

-0.016a (0.001)

-0.015a (0.001)

Player 2’s age

0.015a (0.001)

0.016a (0.001)

0.015a (0.000)

Player 1 has white

0.258a (0.011)

0.315a (0.005)

0.319a (0.002)

Cut 1 (C1) Cut 2 (C2)

-0.617a 0.960a

-0.692a 1.005a

-0.564a 0.922a

Female 1/(C2 - C1) Female 2/(C2 - C1)

-0.078a 0.083a

-0.069a 0.078a

-0.089a 0.080a

Observations

116,619

515,541

3,272,577

no no

no no

yes yes

Player’s 1 Country Fixed Effects Player’s 2 Country Fixed Effects

Notes: The dependent variable is the score of player 1 (0, .5, 1). The coefficients come from heteroskedastic ordered-logit regressions, in which the variance of the unobservables varies by gender. Female 1 and Female 2 are dummies for the players being female. Player 1 may have the white or black pieces. Column 3 adds country fixed effects (FE). Robust standard errors are in parentheses, with a denoting significance at the 1% level. The p-values for Female C2−C1 are calculated using the Delta method. GGI stands for the Gender Gap Index. Countries in the Top 10 and Top 20 GGI are listed in Table (A1). In columns 1 and 2 both players are nationals of countries in the Top 10 and the Top 20 samples respectively.

8.2

Does nationality count?

It may be argued that countries differ on a number of dimensions that are not captured by the Gender Gap Index. For instance, the gender gap in math (measured using standardized tests) also differs greatly across countries, but with a world ranking that differs from 28

that for the GGI: Iran has one of the lowest GGI in the world, but no gender math gap (see Fryer and Levitt, 2010, for detailed evidence and a discussion). Rather than relying on a country-specific index of gender differences, we introduce country fixed effects to capture unobservable time-invariant country characteristics such as culture.21 The results in column 3 of Table 8 confirm the gender difference in the first two columns and the benchmark estimations of Table 1. The estimated gender difference in competition does not depend on player nationality.

8.3

Is the competition effect region-specific?

We consider here eleven regions that can be thought of as culturally-homogeneous. Countries that do not appear in one of the eleven categories fall into a catch-all group (the regions and countries are listed in Table A1). Our purpose here is not to classify as many countries as possible, but rather to create culturally-homogeneous regions, with the additional requirement that these contain sufficient observations. for example, countries with a very large number of players, like Russia, are considered as one sole region. The female dummy is then interacted with each region dummy to evaluate the gender performance difference in each area. Figure 4 depicts the standardized effects of the interaction terms with the Female 1 dummy.22 We find a substantial gender difference in each region, comparable to the benchmark effect of −0.074 in column 3 of Table 1 and represented by the vertical line in Figure 4. Regional heterogeneity is only small, and there is no clear pattern explaining the differences (what, for example, lies behind the difference between Eastern and Southern Asia?). In conclusion, under the assumptions that our regions are culturally homogeneous and that cultural effects can be proxied via player nationality, we do not find any notable cultural effects.

21

Note that the incidental-parameter problem is not a concern here as the number of countries is fairly fixed and does not grow with the number of observations. The 154 countries are listed in Table A1. 22 For a given region, the standardized effect is calculated as the estimate on Female 1 plus the estimate on its interaction with the region divided by (C2 -C1). The estimates appear in Table A2.

29

Figure 4: Comparison across culturally-homogeneous regions Eastern Asia Southern Asia South America Southern Europe Russia Post-Soviet Europe Post-Soviet Asia Central Europe Belgium_France Scandinavia Northern America -.2

-.15

-.1

-.05

0

Notes: This figure shows the standardized effects of the interaction terms between the region of player 1 and Female 1, calculated as, (Female 1 + Region 1*Female 1)/(C2 - C1), where C1 and C2 are the cut points of the heteroskedastic ordered logit model. The vertical axis is specified at -0.081, reported at the bottom of Table 1. The confidence intervals are calculated using the Delta method. The regions are defined in the Appendix.

9

The competition effect and the overall gender gap

Men have an advantage when competing against women. In addition, this competition effect may accumulate and produce a more profound impact on the overall gender gap. We investigate two possible long-run effects. First, women may be discouraged and stop participating in official competitions. Second, the competition effect may lead women to reduce their effort.

9.1

Dropouts versus stayers

Does our gender-competition effect have any significant impact on the probability of quitting official competition? In the absence of any measure of outside options, we cannot establish a causal impact between the competition effect and dropping out of chess. However, we can obtain valuable insights by comparing two groups of women: those who were active during all our sample period and those who left. Table 9 provides this comparison. We compare “stayers”, women who were active in 2009 and 2012 (column 1), to “dropouts”, women who were active in 2009 but who dropped out in 2012 (column 2). 30

Table 9: Female Dropouts versus Female Stayers Ordered Logit (Win>Draw>Loss) (1) Female players:

(2)

Stayers

Dropouts

Female 1

a

-0.098 (0.013)

-0.287a (0.030)

Female 2

0.073a (0.013)

0.266a (0.030)

Player 1’s rating

0.579a (0.002)

0.578a (0.002)

Player 2’s rating

-0.580a (0.002)

-0.579a (0.002)

Player 1’s age

-0.015a (0.000)

-0.015a (0.000)

Player 2’s age

0.015a (0.000)

0.015a (0.000)

Player 1 has white

0.334a (0.005)

0.328a (0.006)

Cut 1 (C1) Cut 2 (C2)

-0.596a 0.943a

-0.614a 0.930a

Female 1/(C2 - C1) Female 2/(C2 - C1)

-0.064a 0.047a

-0.186a 0.173a

Observations

527,326

476,761

Notes: The dependent variable is the score of player 1 (loss, draw or win). The coefficients come from heteroskedastic ordered-logit regressions. Columns (1) and (2) show results by dropout status. A woman is considered to “drop out” if she is present in 2009 but not in 2012, and be a “stayer” if she is present in both 2009 and 2012. Female 1 and Female 2 are dummies for the players being female. Player 1 may have the white or black pieces. All regressions control for month fixed effects. Robust standard errors are in parentheses, with a denoting significance at the 1% level. The p-values for Female C2−C1 are calculated using the Delta method.

31

As shown in Table 9, the estimates on rating, age, and white are very similar between the two groups, including the value of the cutoffs. However, the gender difference is much larger (rising from 0.064 to 0.186 in standardized form). Women who faced a greater gender disadvantage at the beginning of our sample period are more likely to have dropped out. This correlation obviously cannot be considered as causal. However, the difference is significant, and it is not easy to find reasonable alternative explanations. For instance, why should women who have better outside options also be more sensitive to gender effects? We believe that the competition effect is thus likely to reduce the pool of women and, as such, the probability that more women will make it to the top.

9.2

A simple model of gender differences in chess competition

We consider a simple multi-period model to show that the gender competition effect may produce a considerable effect on the overall gender gap. At the beginning of each period t, player i has a rating of Eloit and decides how much effort eit to allocate to chess (e.g., hours spent studying). Effort will be converted into Elo points based on a variant of equation 2 and an effort function f (e):

Eloit = Eloit−1 + f (eit ).

(4)

A realistic effort function should have the following properties: f (0) < 0, i.e., exerting no effort leads to a fall in the Elo rating, while f 0 > 0 and f 00 < 0. As women may experience a gender competition effect, the share of games played against male opponents during period t, ShareMenit , will enter Eq. (4) as Eloit = Eloit−1 + f (eit ) − Ci · ShareMenit .

(5)

Players can thus differ in two ways: (1) the share of men they played against, and (2) the size of the competition effect Ci . Based on our estimates, we assume that Ci = 0 if player i is a man and Ci > 0 if i is a woman.

32

To choose their optimal effort level, the player maximizes utility in each period

U (eit ) = g(Eloit ) − h(eit ),

(6)

where g denotes the rewards associated with a given rating, and h the disutility of effort. It is natural to assume that g 0 > 0 and g 00 > 0, so that rewards rise with the rating. For instance, the higher the rating, the higher are prizes and the lower the tournament entry fee. Moreover, titles in chess are awarded after reaching a certain required rating.23 We also assume that the cost of effort has the following properties: h(0) = 0, h0 > 0 and h00 < 0.

Assumptions. For simplicity, our model hinges on two main assumptions: 1. We assume that women are not aware of the competition effect, and act as if it does not exist. Our motivation here is that the effect is only small and that it is difficult to detect for a single player who only tracks her own results. 2. A technical assumption is required to ensure that optimal effort is positive and finite: f , g and h are chosen so that U 0 > 0 and U 00 < 0 for all positive values of effort and rating (Eloit ). A first obvious result is that a man and a woman with the same Elo rating and the same effort will end the period with different Elo ratings. This difference reflects the gender difference in competition. From this effect, we can further analyze the evolution of the ratings difference between men and women. We establish the following proposition: Proposition. Consider two players i and j such that Eloit−1 < Elojt−1 , and Ci ≥ 0 and Cj = 0; we obtain e∗i,t < e∗j,t and Eloit−1 − Elojt−1 < Eloit − Elojt . Proof. By combining equations 5 and 6, we have for player i

U 0 (eit ) = g 0 (Eloit−1 + f (eii ) − Ci · PropMenit ) · f 0 (eit ) − h0 (eit ). 23

This is 2500 Elo for an international grandmaster and 2400 for an international master.

33

Consider player i’s optimal effort, e∗i,t . Under Assumption 2, this is characterized by U 0 (e∗i,t ) = g 0 (Eloit−1 + f (e∗i,t ) − Ci · PropMenit ) × f 0 (e∗i,t ) − h0 (e∗i,t ) = 0. Now consider that player j exerts the same effort. Given that Eloit−1 < Elojt−1 and Ci ≥ 0, we have g 0 (Elojt−1 + f (e∗i,t )) × f 0 (e∗i,t ) − h0 (e∗i,t ) > 0. The marginal utility of player j when he exerts the same level of effort is positive, which implies that his optimal level of effort, e∗j,t , is larger than e∗i,t . This will thus amplify the Elo difference: Eloit−1 − Elojt−1 < Eloit − Elojt . This proposition establishes that once an Elo difference between a man and woman exists, it will become larger over time. Furthermore, the competition effect will accumulate, in the sense that each new game against a male opponent will widen the gender gap.

10

Discussion and conclusion

Reaching the top requires women to compete in a man’s world. This is precisely the situation in which experimental evidence has predicted that women’s relative performance will be worse. However, the existing evidence is based on small and particular subject pools (e.g., young children or college students) observed over short periods of time. Our work here has rather provided cross-country evidence based on thousands of individuals of all ages observed over a number of years. Controlling for the traditional explanations of gender gaps, we find that women competing with men perform worse. The effect is found in all countries, at both younger and older ages, and persists even after a number years of competition. What exactly triggers psychological differences in performance is still an open question. For instance, Paserman (2007) analyzes tennis data and finds that women are more likely than men to commit unforced errors at critical points in the game. However, Paserman (2007) only looks at behavioral responses to competitive pressure in single-sex environments. Closer to our purpose, some papers address this questions by looking at the sequence of chess moves. Gerdes and Gr¨ansmark (2010) find that males choose more ag-

34

gressive strategies when playing against women, even though these strategies reduce their winning probability. Dreber et al. (2013) show that male players choose riskier strategies when playing against attractive women, without improving their performance. Backus et al. (2016) show that the gender of the opponent does not affect a man’s player quality of play, while it does for women: they play worse against men. Antonovics et al. (2009) find opposite evidence when examining data from the television game show The Weakest Link. A man’s probability of answering a question correctly in the final round increases from 48.7% when facing a man to 55.5% when facing a woman. No such gender effect exists for female contestants. It is thus possible, as in Gneezy and Rustichini (2004), that part of the effect is due to men being boosted by competition. Our results also suggest that the women who are the most sensitive to the gender competition effect stop competing and drop out. We need to be careful about this finding. We speculate that this observed drop-out could reflect that the competition effect is ‘small enough’ to pass under the radar in the short run. However, these small effects can accumulate over time to end up in drop-out. The women who suffer the most from the gender competition effect will thus drop out relatively more. What kinds of policies can be proposed to mitigate this effect? Our gender competition effect on performance appears to be persistent: it is found at all ages, in every country, and does not disappear with substantial experience or market segregation. All of the evidence points to the competition effect being universal: we have found no particular group in which this effect does not hold. It is hence difficult to think of an easy way of reducing the gender gap in the context of one-on-one competition. However, experimental results suggest that organizations which promote competition among teams, rather than individuals, may well be able to avoid the gender effects observed in chess (see for instance Flory et al., 2015).

References Abadie, A. and G. W. Imbens (2006): “Large sample properties of matching estimators for average treatment effects,” Econometrica, 74, 235–267. ——— (2011): “Bias-corrected matching estimators for average treatment effects,” Journal of Business & Economic Statistics, 29, 1–11.

35

Altonji, J. G. and R. M. Blank (1999): “Race and gender in the labor market,” Handbook of Labor Economics, 3, 3143–3259. Antonovics, K., P. Arcidiacono, and R. Walsh (2009): “The effects of gender interactions in the lab and in the field,” Review of Economics and Statistics, 91, 152–162. Azmat, G. and B. Petrongolo (2014): “Gender and the labor market: What have we learned from field and lab experiments?” Labour Economics, 30, 32–40. Backus, P., M. Cubel, M. Guid, S. Sanchez-Pages, and E. Lopez Manas (2016): “Gender, Competition and Performance: Evidence from Real Tournaments,” Tech. rep., IEB n. 2016/27. Bertrand, M. (2011): “New Perspectives on Gender,” Handbook of Labor Economics, 4, 1543– 1590. Buser, T., M. Niederle, and H. Oosterbeek (2014): “Gender, Competitiveness and Career Choices,” Quarterly Journal of Economics, 129, 1409–1447. Commission, E. (2016): “She Figures 2015: Gender in Research and Innovation,” Tech. rep., Luxembourg: Publications Office of the European Union. Croson, R. and U. Gneezy (2009): “Gender differences in preferences,” Journal of Economic Literature, 42, 448–474. Dargnies, M.-P. (2012): “Men too sometimes shy away from competition: The case of team competition,” Management Science, 58, 1982–2000. ¨ nsmark (2013): “Beauty queens and battling knights: Dreber, A., C. Gerdes, and P. Gra Risk taking and attractiveness in chess,” Journal of Economic Behavior & Organization, 90, 1–18. Dreber, A., E. von Essen, and E. Ranehill (2014): “Gender and competition in adolescence: task matters,” Experimental Economics, 17, 154–172. ´ ndez, R. (2010): “Does culture matter?” Tech. Rep. 16277, National Bureau of EcoFerna nomic Research. Flory, J. A., A. Leibbrandt, and J. A. List (2015): “Do Competitive Workplaces Deter Female Workers? A Large-Scale Natural Field Experiment on Job-Entry Decisions,” Review of Economic Studies, 82, 122–155. Fryer, R. G. and S. Levitt (2010): “An Empirical Analysis of the Gender Gap in Mathematics,” American Economic Journal: Applied Economics, 2, 210–40. ¨ nsmark (2010): “Strategic behavior across gender: a comparison of Gerdes, C. and P. Gra female and male expert chess players,” Labour Economics, 17, 766–775. Gneezy, U. and A. Rustichini (2004): “Gender and competition at a young age,” American Economic Review, 94, 377–381. Guryan, J., E. Hurst, and M. Kearney (2008): “Parental Education and Parental Time with Children,” Journal of Economic Perspectives, 22, 23–46. Hausmann, R., Y. Bekhouche, L. Tyson, and S. Zahidi (2013): “The Global Gender Gap Report,” World Economic Forum.

36

Imbens, G. W. (2004): “Nonparametric estimation of average treatment effects under exogeneity: A review,” Review of Economics and Statistics, 86, 4–29. Iriberri, N. and P. Rey-Biel (2017): “Stereotypes are only a threat when beliefs are reinforced: On the sensitivity of gender differences in performance under competition to information provision,” Journal of Economic Behavior & Organization, 135, 99–111. Lavy, V. (2013): “Gender Differences in Market Competitiveness in a Real Workplace: Evidence from Performance-based Pay Tournaments among Teachers,” Economic Journal, 123, 540–573. NALP (2016): “Report on Diversity in U.S. Law Firms,” Tech. rep., National Association for Law Placement. Neumark, D. (2012): “Detecting Discrimination in Audit and Correspondence Studies,” Journal of Human Resources, 47, 1128–1157. Niederle, M. (2016): “Gender,” in Handbook of Experimental Economics, ed. by J. Kagel and A. E. Roth, Princeton University Press, 481–553. Niederle, M. and L. Vesterlund (2007): “Do Women Shy away from Competition? Do Men Compete too Much?” Quarterly Journal of Economics, 122, 1067–1101. ——— (2011): “Gender and Competition,” Annual Review of Economics, 3, 601–630. Pande, R. and D. Ford (2011): “Gender quotas and female leadership: A review,” Tech. rep., Washington DC: Worldbank. Paserman, M. D. (2007): “Gender differences in performance in competitive environments: evidence from professional tennis players,” Tech. rep., IZA n. 2834. Reuben, E., P. Sapienza, and L. Zingales (2015): “Taste for competition and the gender gap among young business professionals,” Tech. Rep. 21695, National Bureau of Economic Research. Walton, G. M., M. C. Murphy, and A. M. Ryan (2015): “Stereotype threat in organizations: implications for equity and performance,” Annu. Rev. Organ. Psychol. Organ. Behav., 2, 523–550.

37

Table A1: List of countries and regions Region

Countries

No. of % of countries obs. (1) (2) (3) (4) Russia Russia 1 8.88 Belgium-France Belgium† , France, Monaco 3 9.52 Northern America Bermuda, Canada† , United States 3 1.79 Central Europe Austria† , Germany† , Netherlands† , Liechtenstein, Luxembourg† , Switzerland? 6 12.52 Scandinavia Denmark? , Finland? , Faroe Islands, Iceland? , Norway? , Sweden? 6 3.52 Post-Soviet Asia Armenia, Azerbaijan, Georgia, Kazakhstan, Kyrgzhistan, Tajikstan, Turkmenistan, Uzbekistan 8 2.20 Eastern Asia China, Hong Kong, Macao, Mongolia, South Korea, Thailand, Taiwan, Vietnam 8 0.83 Post-Soviet Europe Belarus, Bulgaria, Czech Republic, Estonia, Hungary, Latvia† , Lithuania, Moldavia, Poland, 12 20.17 Romania, Slovakia, Ukraine Southern Asia Afghanistan, Bangladesh, Brunei, India, Iran, Malaysia, Maldives, Myanmar, Nepal, 12 7.16 Pakistan, Singapore, Sri Lanka South America Argentina, Bolivia, Brazil, Chile, Colombia, Ecuador, Guyana, Paraguay, Peru, Surinam, 12 4.31 Uruguay, Venezuela Southern Europe Albania, Andorra, Bosnia-Herzegovina, Croatia, Cyprus, Greece, Italy, Macedonia, Malta, 15 22.16 Montenegro, Portugal, San Marino, Serbia, Slovenia, Spain Rest of the World Algeria, Angola, Aruba, Australia, Bahamas, Bahrain, Barbados, Botswana, Cameroon, 68 6.94 Costa Rica, Cuba† , Dominican Republic, Egypt, Ethiopia, Fiji, United Kingdom† , Ghana, Guam, Guatemala, Haiti, Indonesia, Iraq, Ireland? , Israel, Jamaica, Japan, Jordan, Kenya, Kuwait, Lebanon, Libya, Morocco, Madagascar, Mexico, Mali, Mozambique, Mauritania, Mauritius, Malawi, Namibia, Nigeria, Nicaragua? , New Zealand? , Panama, Philippines? , Palau, Papua New Guinea, Palestine, Qatar, Rwanda, Sudan, Sierra Leone, El Salvador, Somalia, Sao Tome and Principe, Seychelles, South Africa† , Syrian Arab Republic, Trinidad and Tobago, Tunisia, Turkey, Uganda, British Virgin Islands, Virgin Islands U.S., Yemen, Zambia, Zimbabwe Total 154 100% Notes: The columns show (1) the name of the region, (2) the list of countries included, (3) the number of countries per region and (4) the percentage of observations for which player 1 comes from the region. The star indicates the 10 countries in the Top-10 Gender Gap Index, while the dagger indicates the additional 10 countries in the Top-20 Gender Gap Index.

38

Table A2: Regional effects Ordered Logit (Win>Draw>Loss) 0.566a (0.001) -0.567a (0.001) -0.015a (0.001) 0.015a (0.001) 0.319a (0.002)

Player 1’s rating Player 2’s rating Player 1’s age Player 2’s age Player 1 has white

(1) Female 1

(2) Female 2

Rest of the World (benchmark)

-0.192a (0.015)

0.196a (0.016)

North America

-0.147a (0.037)

0.106a (0.036)

Eastern Asia

-0.217a (0.031)

0.157a (0.030)

Southern Asia

-0.084a (0.014)

0.093a (0.014)

South America

-0.233a (0.20)

0.223a (0.020)

Southern Europe

-0.114a (0.011)

0.119a (0.011)

Russia

-0.116a (0.012)

0.140a (0.012)

Post-Soviet Europe

-0.119a (0.009)

0.117a (0.009)

Post-Soviet Asia

-0.093a (0.020)

0.106a (0.020)

Central Europe

-0.088a (0.014)

0.097a (0.014)

Belgium-France

-0.157a (0.016)

0.155a (0.016)

Scandinavia

-0.123a (0.031)

0.087a (0.030)

Regional estimates of:

-0.613a 0.924a

Cut 1 (C1) Cut 2 (C2)

Observations

3,272,577

Player’s 1 Region Fixed Effects Player’s 2 Region Fixed Effects

yes yes

Notes: The dependent variable is the score of Player 1 (0, .5, 1), who may have the white or black pieces. The coefficients com from heteroskedastic ordered-logit regressions, except the regional effects in col. (1) and (2), which are calculated as the estimate of (Femalei + Femalei X Regj ). Femalei indicates if player i = 1, 2 is female, and Regj is region j; e.g. if j=North America, the effect in column 1 is -0.147 = -0.192 + 0.045. All regressions control for regional fixed effects. Standard errors in parentheses, with a denoting significance at the 1% level. The standard errors of the regional effects are calculated using the Delta method.

39

Gender differences: evidence from field tournaments

greater gender disadvantage at the beginning of our sample period are also more likely to have dropped out three .... on age and ability, does the opponent's gender affect the outcome of the game? To answer this question, we use first a parametric estimator (an ordered logit model) and then a non-parametric matching ...

482KB Sizes 3 Downloads 287 Views

Recommend Documents

Gender Differences- Discussion - UsingEnglish.com
Custody of the children after a divorce. ➢ Decision making in a family. ➢ Disapproval if someone has an affair. ➢ Disapproval if someone has pre-marital sex. ➢ Disapproval when someone uses slang, e.g. swearing. ➢ Discounts. ➢ Entry to ce

Evidence from a Field Experiment
Oct 25, 2014 - answers had been entered into an electronic database, did we compile such a list .... This rules out fatigue, end-of-employment, and ..... no reciprocity concerns and supplies e = 0 for any wage offer (the normalization to zero is.

Race, Gender, and Juries: Evidence from North Carolina
Carter et al. (2015. 2015), Cameron and Miller. Cameron and Miller (2015. 2015)), as a robustness check I also report p-values derived from a wild cluster ...... Jane, John... Leslie? a historical method for algorithmic gender predic- tion. Digital H

Race, Gender, and Juries: Evidence from North Carolina
∗I gratefully acknowledge support from NSF Grant SES-1628538, and the Wake Forest University Pilot Grant program, as well as help from the North Carolina Jury Sunshine Project, Ronald Wright, ...... Sommers, S. R. and Marotta, S. A. (2014). Racial

Fredrik M. Sjoberg MAKING VOTERS COUNT: Evidence from Field ...
Oct 8, 2012 - The main unit of analysis in this study is the lowest level in the election admin- .... partners are members of the European Network of Election ...... MYAGKOV, M. & ORDESHOOK, P. C. (2001) The trail of votes in Russia's 1999.

Hours, Occupations, and Gender Differences in Labor ...
supply in many contexts: over time, both secularly and over the business cycle, across. 2 ..... force making the dispersion of hours in occupation 1 small relative to that of occupation 2. ...... the model only accounts for 60% of the gender gap in o

Gender Differences in Higher Education Efficiency and the Effect of ...
m ale dominated education fields are computer science, engineering, and ... great social skills, and good influence on male students instead (Nagy, 2015). .... secondary school, therefore fewer females obtain a degree in those fields (Keller ...

Gender Similarities and Differences in Children's Social ...
differences must be manifested in overall act trends and illustrate how gender differences in ... behavior rates, average trait ratings, or summary checklist scores.

Gender Similarities and Differences in Children's Social ...
Gender Similarities and Differences in Children's Social Behavior: Finding Personality in Contextualized Patterns of Adaptation. Audrey L. Zakriski. Connecticut College. Jack C. Wright. Brown University. Marion K. Underwood. University of Texas at Da

Gender Differences in Factors Influencing Pursuit of Computer ...
Jul 8, 2015 - women's decisions to pursue computer science-related degrees and the ways in ... students to pursue studies or careers in technology typically: 1) had small .... science believed that first year computer science courses were.

Gender Differences in Attitudes Toward Police ... - Wiley Online Library
mother, Zofia Cisowski, who waited several hours before returning to her home in Kamloops, British Columbia, under the mistaken impression that her son had not arrived in Canada. Unable to speak English, Mr. Dziekanski became distressed and began sho

Gender Differences in Deviancy Training in a Clinical ...
Apr 28, 2006 - point and 7.2 years at the last data collection point. ... data from the Add Heath survey, a nationally representative sample of ...... Merrill-Palmer.

Gender Differences in High School Students ... - Research at Google
We provide descriptive data highlighting ... example, our data show wanting a career that helps people accounts for 6% of girls' ... unstructured activities. Among ...

Gender Differences in Deviancy Training in a Clinical ...
Apr 28, 2006 - their drug and alcohol use and sexual behavior before entering college and about 2.4 years after entering .... dyads and showed that deviancy training (in which laughter reinforced rule-breaking talk) predicted .... the peer interactio

Gender Differences in Happiness and Life Satisfaction ... - Springer Link
Accepted: 7 January 2015 / Published online: 14 January 2015. © Springer Science+Business Media Dordrecht 2015. Abstract This study uses survey data from .... 1.2 Effects of Relationships and Self-Concept Across Gender. In one of the ...

Gender Differences in Factors Influencing Pursuit of Computer ...
the influence of these factors is different for men. In particular, the influence of .... (teachers, role models, peers, and media) contributed significantly more to girls' ...

Gender Differences in Higher Education Efficiency and the Effect of ...
m ale dominated education fields are computer science, engineering, and ... models, we explore how the proportion of women at a faculty and one's ..... students and 4th year for students in undivided training, which offers a master's degree).

Gender Similarities and Differences in Children's Social ...
Recently, the two cultures view has suggested that girls and boys differ in ..... sampled, and the present study examined data for 360 children.2 The composition ...

Gender differences, physiological arousal and the role ...
support for this link comes from the sensation seeking literature ..... during a computer-generated gambling task, British. Journal of ... Behaviors, 18, 365–372.

Evidence from Head Start
Sep 30, 2013 - Portuguesa, Banco de Portugal, 2008 RES Conference, 2008 SOLE meetings, 2008 ESPE ... Opponents call for the outright termination of ..... We construct each child's income eligibility status in the following way (a detailed.

NORTH SUBURBAN TENNIS TOURNAMENTS - 2014
Again, this year there will be savings for players who prepay. Please read carefully the information relating to cost. Also, please email the tournament director ...

Evidence from Goa
hardly any opportunity for business, less opportunity to enhance human ... labour market, his continuance in Goa or his duration of residence depends not only.

NORTH SUBURBAN TENNIS TOURNAMENTS - 2014
... registration form. The tournament director can contact players via email for tournament updates. .... N on Lexington Ave. Email: [email protected].