The Effects of a Teacher Performance-Pay Program on Student Achievement: A Regression Discontinuity Approach Yusuke Jinnai∗

Abstract This paper presents evidence from a regression-discontinuity analysis of North Carolina’s teacher performance-pay program, in which teachers are awarded school-wide cash bonuses for improving their students’ achievement. Results show that schools that failed to reach an expected benchmark for their students’ achievement, resulting in no bonuses, performed significantly better in the subsequent year than those that reached this benchmark and thus received bonuses. This finding highlights that the presence of performance-pay incentives affects student achievement in not only current years but also future years. Moreover, the results demonstrate that such impact disappeared once the state government repealed the pay scheme—another indication that teachers actively respond to monetary bonuses.

∗ Department

of Economics, University of Rochester, e-mail: [email protected]. I am grateful to Greg Caetano, Josh Kinsler, and Ronni Pavan for their guidance and extensive feedback. I benefitted from discussion with Tom Ahn, Jacob Vigdor, and Jeffrey Zabel. I would like to thank seminar participants at the University of Rochester and at the 37th annual conference of the Association for Education Finance and Policy for their helpful comments and suggestions. I also thank Kara Bonneau for her help in obtaining data from the North Carolina Education Research Data Center. All errors are my own.

1

Keywords: School accountability, Performance pay, Teacher incentive JEL classification: I21, H4

1

Introduction

Coupled with school choice, school accountability has been a centerpiece of public education reform in the United States for the last two decades. Accountability programs, such as the one used under the federal No Child Left Behind Act of 2001 (NCLB), are designed to provide useful information to parents and legislators so that they can effectively monitor the performance of each school. These programs typically evaluate schools based on student achievement on statewide standardized tests and assign simple ratings (e.g. A to F) for public reporting. With a variety of rewards for high-performing schools as well as sanctions for low-performing ones, accountability programs have increased student achievement as policymakers have anticipated (Ladd, 1999; Hanushek and Raymond, 2005; Jacob, 2005; Figlio and Rouse, 2006; Reback, 2008).1 At the same time, however, a number of studies have raised the question of whether observed test score gains from accountability pressure are primarily the results of educational reforms or those of “gaming” behaviors. These studies found that some low-performing schools removed low-achieving students from school performance calculations by reclassifying them into special education (Cullen and Reback, 2006) or by subjecting them to longer disciplinary suspension close to testing dates (Figlio, 2006). Other schools focused only on marginal students who were just under the qualified level and who could contribute to increases in school performance more easily than other 1 Figlio

and Getzler (2006), and Figlio and Loeb (2011) provide detailed reviews of

the relationship between accountability programs and student achievement as well as of the “gaming” behaviors of schools under accountability pressure.

2

students (Neal and Schanzenbach, 2010). In this study, I address the effectiveness of North Carolina’s accountability program established in 1996. In contrast to NCLB and many other programs that set level targets, North Carolina’s sophisticated system sets growth targets that take into account prior scores to adjust for students’ diverse characteristics and family backgrounds.2 Moreover, the program provides each teacher with up to $1,500 per year in school-wide incentives for improving student achievement. The main question in this study is how receiving incentive bonuses affects teachers and schools in the following year. In order to answer this question, I estimate the impact of receiving bonuses on student achievement by exploiting the regression discontinuity (RD) design with a threshold that separates bonus-qualified schools and non-qualified ones. Those schools around the threshold can be considered almost identical in school quality and other characteristics; however, only the teachers at schools which exceed the threshold receive bonuses. Therefore, the difference in the following year’s student achievement between schools just above and those just below the threshold can be attributed to the difference in the receipt of bonuses. In the analysis, I explore detailed panel data provided by the North Carolina Education Research Data Center (NCERDC) and the North Carolina Department of Public Instruction (DPI). The datasets include school-level growth scores as well as student-level test scores in math and reading from all public schools in North Carolina. Furthermore, I divide the data into three stages. The first period consists of two school years 2005-06 and 2006-07, when qualified teachers were awarded the maximum bonus of $1,500. In 2007-08, the second period, North Carolina reduced the maximum amount to $1,053. Because of its economic situation, in the end, the state 2 Ladd

and Lauen (2010) argue that growth models are less vulnerable to gaming

behaviors than level models.

3

repealed its bonus system in the 2008-09 school year, which is the third period of my study. By separating the sample into these three different stages, this study estimates the impact of the accountability program with (i) full, (ii) reduced, and (iii) no incentives, respectively. In particular, I use the period under no bonuses for a placebo test, differentiating betweem the impact of monetary bonuses and the impact of school ratings. Estimation results show that schools where teachers did not receive bonuses performed significantly better in the following year than schools where teachers did. In practice, bonus non-qualified schools increased their average academic growth by 0.060.08 standard deviations compared to qualified schools. Moreover, the placebo test demonstrates that such impact disappeared once the state government repealed the pay scheme—another indication that teachers are responsive to cash bonuses but not to school ratings. Bonus non-qualified schools also improved their student-teacher ratios and increased the proportion of teachers with advanced degrees. These findings suggest that financial incentives, in effect, improved school quality at low-performing schools, while high-achieving schools maintained their previously high standards. This paper contributes to the literature of school accountability programs by providing two new empirical findings. First, this study builds on Vigdor (2008) and shows that its finding of positive impact on bonus non-qualified schools is persistent over time even when incentives are reduced. Second, the finding from the placebo test illustrates that teachers are sensitive to monetary rewards but not to school ratings without such bonuses. This study is closely related to Ahn and Vigdor (2012), who also evaluate the teacher performance pay program in North Carolina. My paper provides the above empirical findings with consistent estimates from using accurate information on the measures of school-level growth scores, while Ahn and Vigdor (2012) provide a theoretical perspec4

tive on the findings. Other related studies include Chiang (2009), Rockoff and Turner (2010), and Craig, Imberman and Perdue (2011), who also employ RD design; instead of bonus incentives for high-performing schools, these three papers examine the threat of sanctions on low-performing schools. By comparing my study’s estimation results with theirs, I conclude that North Carolina’s teacher performance pay program has a large potential for improving its incentive design as well as cost-effectiveness. This paper is also in line with other studies on teacher incentives. Teacher performance pay programs are, in part, designed to identify productive teachers whose capabilities are unobserved and to induce more effort from teachers.3 However, empirical results are inconclusive. While the introduction of monetary incentives for teachers significantly increased teacher effort and student achievement in Israel (Lavy, 2002, 2009), Kenya (Glewwe, Ilias and Kremer, 2010), and India (Muralidharan and Sundararaman, 2011), the impact in the U.S. is less clear. Although Figlio and Kenny (2007) document a positive relationship between individual-based teacher incentives and student achievement, there is no evidence that teacher incentives increased student achievement in New York City (Goodman and Turner, 2010; Fryer, 2011). The rest of the paper is organized as follows. The next section describes the accountability program in North Carolina. Section 3 details datasets with graphical evidence of RD design. Section 4 shows the identification strategy and Section 5 demonstrates estimation results. Section 6 discusses the explanation of the results and Section 7 concludes. 3 Rockoff

(2004), Rivkin, Hanushek and Kain (2005), Kane, Rockoff and Staiger

(2008), Kane and Staiger (2008), and Rockoff et al. (2008) discuss the relationship between teachers’ observable characteristics and their effectiveness.

5

2

North Carolina’s accountability program

North Carolina provides a particularly good setting for examining the effects of an accountability system because it has had a carefully designed educational accountability system in place since academic year 1996-97.4 Of particular significance, the North Carolina accountability program evaluates schools primarily on the annual achievement gains of their students from one year to the next. This growth approach to accountability aims at leveling the playing field for all students; for instance, students from economically disadvantaged and minority families tend to perform worse on tests than those from more affluent families. Because of its focus on individual growth, North Carolina’s model is considered more sophisticated than level models, which judge schools on the average level of test scores, although the level model is the basis for the federal NCLB and many other accountability programs. In 2005-06, new formulas for determining student performance were introduced to account for reversion to the mean.5 The new formula calculates each student’s academic growth, using standardized test scores for each grade and each year. In practice, student i’s growth in year t at grade g is calculated as:

growthigt = Zigt − δ

Z

i,g−1,t−1

+ Zi,g−2,t−2  , 2

(1)

where Zigt is a normalized test score based on the mean and standard deviation from the first year a particular test was used in the state. The discount factor δ accounts for mean reversion: δ = 0.92 when two-year observation is available, and δ = 0.82 when 4 The

background of North Carolina’s accountability program is described in Ap-

pendix A.1. 5 See Kane and Staiger (2002) for the reversion to the mean and for measurement errors in test scores.

6

only a single year is available. For each school, average growth across all students in all subjects is calculated. In elementary and middle schools, if a school’s average growth is equal to or greater than zero, the school is said to have met “Expected Growth,” and all of its certified teachers receive the same amount of $750 in bonuses per person each year. If a school met expected growth (i.e., “average growth” ≥ 0) and at least 60% of its students achieve the required growth (which is defined as “change ratio” ≥

60% (100−60)%

= 1.5), the school is

said to have met “High Growth” and teachers are eligible for $1,500 in bonuses.6 Thus, schools that achieve strong test score growth by raising the performance of a limited number of students generally do not receive the full bonus. This two-dimensional evaluation is shown in Figure 1. In 2007-08, however, the amount of bonus was reduced to $1,053 at High Growth schools and to $527 at Expected Growth schools. Although teachers had taught their classes with the expectation of the full bonus of $1,500, they were notified of this reduction at the end of the academic year. Since 2008-09, incentive bonuses have been suspended because of North Carolina’s economic condition. Figure 2 shows the percentage of High Growth, Expected Growth, and Less than Expected schools, respectively, over time when teachers were awarded with cash bonuses. In 2005-06, only 11.5% of all public regular schools achieved High Growth, 42.8% made Expected Growth, and 45.8% were Less than Expected. In 2007-08, however, as much as 56.4% achieved High Growth, 25.8% made Expected Growth, and the proportion of Less than Expected schools decreased to 17.8%. As described, teachers at High Growth or Expected Growth schools received bonuses, while those at Less than Expected schools received none. 6 Detailed

definitions of “Expected Growth” and “High Growth” are shown in Ap-

pendix A.2 and A.3.

7

3

Data

3.1

Summary statistics

In this study I use data for school years from 2005-06 to 2009-10 because the new growth formulas implemented in 2005-06 make comparisons to previous years inappropriate. Detailed data sets on students, teachers, and schools are provided by the NCERDC, and school-level average growth scores are provided by the DPI. To begin with, Table 1 shows the summary statistics for public schools in North Carolina in the 2005-06 school year. The figures do not include high schools or non-regular schools because they follow different rules for school ratings and bonus eligibility. While 914 schools qualified for bonuses (i.e., average growth ≥ 0) with mean average growth equal to 0.082, other 851 schools were non-qualified (i.e., average growth < 0) with mean equal to -0.082. As expected, there are also remarkable differences in academic achievement captured by other measurements between qualified and non-qualified schools. The mean of change ratio, which builds on the percentage of students who achieved the required academic level, is 1.35 for qualified schools and 0.81 for non-qualified schools. In addition, the proportions of schools that met Adequate Yearly Progress (AYP), the primary measurement used under the federal NCLB, is 0.58 for qualified schools and 0.31 for non-qualified. Although the differences in school characteristics are relatively small, those in student characteristics between the two kinds of schools are noticeable. Bonus qualified schools tend to have more white students, fewer black students, and fewer students who are eligible for free or reduced-price lunch programs. Regarding teacher characteristics, bonus qualified schools have more teachers with advanced degrees and experience lower turnover rates.

8

3.2

Graphical evidence for RD design

This subsection illustrates graphical evidence for RD design. Since this study is primarily interested in whether a school achieved at least Expected Growth (i.e. average growth ≥ 0), but not in whether a school made High Growth or Expected Growth, I focus on the data around the threshold where the average growth equals zero (Figure 1).7 Figure 3 shows the information on each school’s average growth for (a) 2005-06, (b) 2006-07, and (c) 2007-08, respectively. I do not show corresponding figures for 2008-09 and after because no schools have received bonuses since then. The left-hand side of Figure 3 shows the number of schools for each level of the average growth. Clearly, the densities of schools just below and above the threshold are similar in each year, indicating that it is unlikely that schools in the proximity of the threshold have actively manipulated their average growth score.8 The right-hand side of Figure 3 shows an indicator for bonus receipt, which equals one for qualified schools and zero for failed schools. As expected, most schools with average growth above zero qualify for bonuses and those with average growth below zero do not. A few schools, however, do not follow this rule; for instance, in Figure 3 (c) right, some schools are qualified for bonuses despite their less-than-zero average growth. This imperfect compliance results from the complex formulas that determine High Growth and Expected Growth, as described in Appendix A. The following four figures illustrate the impact of bonus receipt on schools’ average growth in years of 2006-07 (Figure 4), 2007-08 (Figure 5), 2008-09 (Figure 6), and 200910 (Figure 7), respectively. Note again that the first two years are associated with full 7 Vigdor

(2008) argues that the distinction between receiving $750 or $1,500 is closer

to a random draw and that knowing whether a school received any bonus is more meaningful than knowing the amount of the bonus. 8 This observation is in the spirit of the density test suggested by McCrary (2008). 9

bonus incentives, the third year with reduced bonus incentives, and the fourth year with no bonus incentives (i.e., placebo test). These four figures show graphical evidence of the discontinuity which I will further examine in the next section. The x axis of Figure 4, for instance, represents the schoollevel average growth in 2005-06, while the y axis represents this in the following 2006-07 school year. Each point in the graph demonstrates a mean for schools in 0.02-point bins of 2005-06 average growth, while the curve represents a quadratic fit for each side of the threshold of zero with 95% confidence intervals (CI). Since those schools whose average growth is below zero did not receive bonuses, the discontinuity at the threshold measures the impact of not having bonuses on school-level average growth in the following year. The gap at the threshold in the figure suggests that there was a significant effect, indicating that bonus non-qualified schools performed better than qualified schools in the following year. This evidence, however, does not imply that the performance level of bonus qualified schools fell in the following year. In practice, they performed as good in 2006-07 as they did in 2005-06. As shown in Figure 4, those schools which qualified for bonuses in 2005-06 (i.e., average growth ≥ 0), on average, also qualified for bonuses again in 2006-07. Produced in the same manner, Figure 5 shows a similar pattern to Figure 4. Schools just below the threshold zero, on average, performed better than those just above zero; however, the difference at the threshold seems to be smaller than that from the previous year. Figure 6 demonstrates a different pattern where the observations under zero became volatile, resulting in a wider confidence interval. This pattern is associated with reduced bonus incentives. Figure 7 shows another pattern for under-performing (i.e., below-zero) schools. The lowest-performing schools, on average, achieved lower growth than they did in the 10

previous year. This pattern is associated with no bonus incentives, and now the gap at the threshold seems to be insignificant. In total, the patterns in school performance are almost identical under the fullbonus program (Figure 4 and 5) and are relatively similar under the reduced-bonus program (Figure 6), but are different under the no-bonus program (Figure 7). Thus, schools and teachers seem to have systematically responded to monetary rewards. One interesting observation is that high-achieving (i.e., above-zero) schools have exhibited strongly positive correlation over time despite the changes in bonus amount; on the other hand, low-achieving schools seem to have quickly responded to such changes. In order to apply the RD estimation, I also check the validity of the underlying assumption that other school, student, and teacher characteristics are continuous at the threshold; otherwise, the difference in outcomes in the following year cannot be solely attributed to the receipt of bonuses. These figures are shown in Appendix B

4

Identification

Following Imbens and Lemieux (2008), the regression equation for a sharp RD design in this study is given as follows:

ys,t+1 = αB + βB Growthst + ,

(2)

for the observations that satisfy −h < Growthst < 0 for a given bandwidth h. In equation (2), Growthst is the average growth of school s in year t, ys,t+1 is an outcome variable of school s in the following year, and  is an i.i.d. error term. The estimate of αB captures the impact of being categorized just below the threshold of zero (i.e., bonus non-qualified) on the outcome. Likewise, the impact of being categorized just above the threshold of zero (i.e., bonus qualified) is estimated by limiting the samples 11

to 0 ≤ Growthst < h for the same bandwidth h as follows:

ys,t+1 = αA + βA Growthst + .

(3)

The estimated difference α ˆA − α ˆ B measures the effect of having the average growth just above zero compared to just below zero (i.e., the impact of receiving bonus). In practice, the two equations are combined into one 9 :

ys,t+1 = β0 + β1 Dst + β2 Growthst + β3 Dst Growthst + Xs,t+1 γ + ,

(4)

where the samples are limited to −h < Growthst < h and Dst = 1 if school s receives bonuses in year t (treatment); otherwise Dst = 0 (control). School characteristics are represented by Xs,t+1 in year t + 1, and I show estimation results with and without Xs,t+1 . The estimated coefficient β1 captures the impact of teachers’ receiving bonuses on outcome variables. Critical to the estimation is the choice of bandwidth h. This is informed by using Silverman’s rule of thumb, which is suggested as an initial choice by Imbens and Kalyanaraman (2009). 9 This

approach is equivalent to the local linear regression using a rectangular kernel.

While other kernels (triangular, Epanechnikov, etc.) can also be used, Lee and Lemieux (2010) argue that those kernels provide only marginal improvements in efficiency.

12

5

Results

5.1

Main results

This section provides estimation results for three periods, under (i) full bonus incentives, (ii) reduced bonus incentives, and (iii) no bonus incentives. First, I show the results for school-level average growth, that is I set ys,t+1 = Growths,t+1 in equation (4). This estimation aims at assessing the difference at the threshold in Figures 4 to 7. For the bandwidth h, I employ Silverman’s rule of thumb that is mentioned in Imbens and Kalyanaraman (2009). In practice, h = h0 = 1.06 ∗ sd ∗ N −1/5 , where sd denotes the standard deviation of the treatment-defining variable (i.e., Growthst ) and N denotes the number of observations. Table 2 demonstrates the results from RD estimation for the period under full bonus incentives: the average treatment effect on 2006-07 and 2007-08. Column (1) and (2) show the results with the widest bandwidth h = h1 = 23 h0 , column (3) and (4) show those with h = h0 chosen by Silverman’s rule, and column (5) and (6) show those with the shortest bandwidth h = h2 = 23 h0 . As a result, sample size is largest for h1 and smallest for h2 . Column (1) shows the estimate of -0.0565 with significance at the 1% level, implying that schools that did not receive bonuses significantly increased their average growth by 0.0565 standard deviations in the following year, compared to bonus-qualified schools. Column (2) shows the estimation result with control variables Xs,t+1 in equation (4), which include school characteristics such as class size and school enrollment, student characteristics such as the proportions of white and black students, and teacher characteristics such as the proportion of teachers with advanced degrees. The estimate of -0.0513 captures the impact of bonus receipt that affected school performance through channels other than these control variables. The estimates suggest that North 13

Carolina’s teacher performance pay scheme, in effect, induced additional effort from teachers at low-performing schools, while high-achieving schools maintained their high standard as shown in Figures 4 to 7. Columns (3) to (6) demonstrate the results in a similar manner. Although the RD method is sensitive to the bandwidth selection, the estimates are significant and negative for any bandwidth, suggesting that bonus non-qualified schools performed significantly better than bonus qualified schools. Table 3 shows the estimates under the reduced bonus incentives: the impact on 2008-09. Similar to Table 2, all of the estimates in Table 3, except the one in column (5), are significant and negative. This finding suggests that even after the large reduction in the maximum amount of the bonus (from $1,500 to $1,053), teachers at bonus nonqualified schools kept exerting additional effort compared to those at qualified schools. Since the magnitude of the impact is similar between the full bonus program (Table 2) and the reduced bonus program (Table 3), I use these three years to estimate the average impact of the bonus receipt regardless of its amount, which is the primary result in this study. First, in Table 4, I demonstrate that estimation samples drawn by using h0 = 0.0338 are well balanced between the treatment group (Dst = 1) and the control group (Dst = 0) for school, student, and teacher characteristics from 2005-06 to 2007-08. On average, schools in the treatment group are larger in both enrollment and class size, but the differences in means are not significant even at the 15% level. Also, those schools have more white, less black, and less free-lunch-eligible students. As expected, however, the differences in means are not significant. In addition, teacher characteristics are not significant between the two groups, either. These comparisons verify that schools are well-balanced around the threshold, which allows this study to apply the quasi-experimental RD design. 14

Table 5 demonstrates the main results: the average treatment effect of bonus receipt regardless of its amount. With the largest bandwidth h1 = 0.0507, estimates are highly significant around -0.060 either with or without controls. With Silverman’s bandwidth h0 = 0.0338, they are also highly significant around -0.075. Those with the shortest bandwidth h2 = 0.0225 are less significant, but this may have resulted from larger standard errors due to smaller sample size compared to the estimates with h1 or h0 . Although the magnitude ranges from around -0.060 to -0.095, the impact of bonus receipt is significant, and bonus non-qualified schools performed better than qualified schools in the following year. By contrast, Table 6 shows insignificant results: the impact under no bonus incentives (i.e., placebo test), which is associated with Figure 7. Strikingly, all of the estimates in Table 6 are not significant for any bandwidth. This finding suggests that once monetary incentives are removed, low-achieving schools do not exert any more effort than high-achieving schools. Another implication is that school ratings themselves do not induce more effort from teachers or schools. Even without such bonuses, schools receive one of the three ratings: “High Growth”, “Expected Growth” or “Less than Expected.” However, the insignificant estimates in Table 6 illustrate that schools and teachers are not sensitive to school ratings. In sum, Table 5 and Table 6 shed light on the fact that low-performing schools expend additional effort only when they are provided with bonus incentives.

5.2

Results for subsamples

To further examine the impact of bonus receipt, I also show estimation results conditional on each school’s bonus receipt. In practice, I separate the sample into two groups; one consists of schools that received bonuses in the previous year, and the other of those which did not. Panel A in Table 7 shows the estimated impact of receiving bonuses in 15

2006-07 (Dst ) on the school-level growth scores in 2007-08 (Growths,t+1 ) conditional on the bonus receipt in 2005-06 (Ds,t−1 ). Panel B shows the impact on 2008-09 also conditional on the bonus receipt in 2006-07. The results in Panel A show that all of the estimates are negative for any bandwidth choice; however, none is significant. Similarly, those in Panel B demonstrate that they are also negative, showing one result for which Ds,t−1 = 1 and h1 is significant. The magnitude of this estimate is larger than that demonstrated in the main results (Table 5). This means that schools which received bonuses last year but failed this year increase their achievement more than those that consecutively fail to receive bonuses.

6

Discussion

Although it is clear from the previous section that bonus non-qualified schools significantly increase their average growth in the following year, the reason remains unclear because both qualified and non-qualified schools are subject to the same incentive program every year. To provide a possible explanation in this section, I further estimate the impact of bonus receipt on outcomes other than school-level average growth scores.10 If low-performing schools are concerned about future sanctions, including cutbacks in staff and potential school closure, they would have greater motivation to raise student achievement than their high-performing counterparts (Rockoff and Turner, 2010). Empirically, sanction threats have not only increased student achievement but also raised school spending on instructional technology, curricular development, and teacher training (Chiang, 2009; Craig, Imberman and Perdue, 2011). I follow the same argument 10 See

Ahn and Vigdor (2012) for a theoretical viewpoint on the effects of bonus

receipt.

16

and examine whether bonus non-qualified schools in North Carolina also have improved education quality in part due to the possibility of future sanctions enhanced by not being qualified for bonuses. Table 8 shows the estimation results where school characteristics (Panel A) and teacher characteristics (Panel B) are used as school-level outcomes ys,t+1 in year t + 1. Regarding school characteristics, there are no significant differences in enrollment or class size between bonus qualified schools and non-qualified ones in the following year. By contrast, the student-teacher ratio significantly increased at bonus qualified schools (estimates for bandwidth h1 and h0 ), which is equivalent to saying that teachers at bonus non-qualified schools have, on average, a smaller number of students than those at qualified schools. This observation points to the possibility that bonus non-qualified schools hired more new teachers in the following year than qualified schools, leading to a lower student-teacher ratio. The above explanation is strengthened by the findings when the proportion of advanced degree is used as an outcome variable in the first row of Panel B. The negative estimates suggest that the proportion of teachers with advanced degrees dropped at bonus qualified schools (thus, relatively increased at bonus non-qualified schools), though the results are significant only at the 10% level with bandwidth h0 . Combined with the change in student-teacher ratio, this impact appears to be driven by hiring new teachers who have advanced degrees. In total, compared to bonus qualified schools, non-qualified ones improved education quality in the following year by reducing student-teacher ratios and raising the proportion of better educated teachers. These changes have led to better performance among non-qualified schools, but as shown in Table 5 the estimates are still highly significant even after controlling for observed characteristics such as Xs,t+1 . This fact illustrates that unobserved characteristics such as teacher effort are still key to explain17

ing achievement gains.

7

Conclusion

Although a growing number of states and countries have introduced school accountability programs and teacher performance pay systems, recent studies have found that these policies do not always accomplish their expected results. In this paper, I examined the impact of teachers’ receiving incentive bonuses on student achievement in the following year. Despite the fact that all schools and teachers are subject to the same incentive scheme each year, this study finds that (1) bonus non-qualified schools performed significantly better than qualified schools in the following year, (2) the same pattern was also observed even under the reduced bonus incentives, and (3) such impact disappeared when the state government repealed the bonus program. The reason for these empirical findings is less clear. However, this paper also finds that schools which did not receive bonuses improved their school quality by increasing the proportion of teachers with advanced degrees and by reducing the student-teacher ratio. Since these results are consistent with those from the related studies on the threat of sanctions, it is likely that bonus non-qualified schools and teachers exerted more effort to avoid potential sanctions in the future. Another important implication which can be drawn from this study is that, although the magnitudes of the estimates of bonus receipt (0.06-0.08 standard deviations) are statistically significant, their economic significance is relatively small. Compared to the findings from Chiang (2009) and Rockoff and Turner (2010) that the threat of sanctions made low-achieving schools increase their performance by 0.10-0.12 standard deviations, North Carolina’s bonus incentive program has much room for improvement of its incentive design and cost-effectiveness.

18

References Ahn, T., and J. Vigdor. 2012. “How salient are performance incentives in education? Evidence from North Carolina.” Working Paper at the University of Kentucky. Chiang, H. 2009. “How accountability pressure on failing schools affects student achievement.” Journal of Public Economics, 93(9-10): 1045–1057. Craig, S., S. Imberman, and A. Perdue. 2011. “Does it pay to get an A? school resource allocations in response to accountability ratings.” Working Paper at the University of Houston. Cullen, J., and R. Reback. 2006. “Tinkering toward accolades: School gaming under a performance accountability system.” In Advances in Applied Microeconomics. . 14 ed., , ed. T. Gronberg and D. Jansen, Chapter 1, 1–34. Emerald Group Publishing Limited. Figlio, D. 2006. “Testing, crime and punishment.” Journal of Public Economics, 90(45): 837–851. Figlio, D., and C. Rouse. 2006. “Do accountability and voucher threats improve low-performing schools?” Journal of Public Economics, 90(1-2): 239–255. Figlio, D., and L. Getzler. 2006. “Accountability, ability, and disability: Gaming the system?” In Advances in Applied Microeconomics. , ed. T. Gronberg and D. Jansen, Chapter 2, 35–49. Emerald Group Publishing Limited. Figlio, D., and L. Kenny. 2007. “Individual teacher incentives and student performance.” Journal of Public Economics, 91(5-6): 901–914. Figlio, D., and S. Loeb. 2011. “School accountability.” In Handbook of the Economics of Education. Vol. 3, , ed. E. Hanushek, S. Machin and L. Woessmann, Chapter 8, 383–421. North Holland. Fryer, R. 2011. “Teacher incentives and student achievement: Evidence from New York City public schools.” NBER Working Paper No.16850. Glewwe, P., N. Ilias, and M. Kremer. 2010. “Teacher incentives.” American Economic Journal: Applied Economics, 2(July): 205–227. Goodman, S., and L. Turner. 2010. “Teacher incentive pay and educational outcomes: Evidence from the New York City bonus program.” Working Paper at Columbia University. Hanushek, E., and M. Raymond. 2005. “Does school accountability lead to improved student performance?” Journal of Policy Analysis and Management, 24(2): 297–327. 19

Imbens, G., and K. Kalyanaraman. 2009. “Optimal bandwidth choice for the regression discontinuity estimator.” NBER Working Paper No.14726. Imbens, G., and T. Lemieux. 2008. “Regression discontinuity designs: A guide to practice.” Journal of Econometrics, 142(2): 615–635. Jacob, B. 2005. “Accountability, incentives and behavior: The impact of high-stakes testing in the Chicago Public Schools.” Journal of Public Economics, 89(5-6): 761– 796. Kane, T., and D. Staiger. 2002. “The promise and pitfalls of using imprecise school accountability measures.” Journal of Economic Perspectives, 16(4): 91–114. Kane, T., and D. Staiger. 2008. “Estimating teacher impacts on student achievement: An experimental evaluation.” NBER Working Paper No.14607. Kane, T., J. Rockoff, and D. Staiger. 2008. “What does certification tell us about teacher effectiveness? Evidence from New York City.” Economics of Education Review, 27(6): 615–631. Ladd, H. 1999. “The Dallas school accountability and incentive program: an evaluation of its impacts on student outcomes.” Economics of Education Review, 18: 1–16. Ladd, H., and D. Lauen. 2010. “Status versus growth: The distributional effects of school accountability policies.” Journal of Policy Analysis and Management, 29(3): 426–450. Lavy, V. 2002. “Evaluating the effect of teachers’ group performance incentives on pupil achievement.” Journal of Political Economy, 110(6): 1286–1317. Lavy, V. 2009. “Performance pay and teachers’ effort, productivity, and grading ethics.” American Economic Review, 99(5): 1979–2011. Lee, D., and T. Lemieux. 2010. “Regression discontinuity design in economics.” Journal of Economic Literature, 48(June): 281–355. McCrary, J. 2008. “Manipulation of the running variable in the regression discontinuity design: A density test.” Journal of Econometrics, 142(2): 698–714. Muralidharan, K., and V. Sundararaman. 2011. “Teacher performance pay: Experimental evidence from India.” Journal of Political Economy, 119(1): 39–77. Neal, D., and D. Schanzenbach. 2010. “Left behind by design: Proficiency counts and test-based accountability.” Review of Economics and Statistics, 92(2): 263–283. Reback, R. 2008. “Teaching to the rating: School accountability and the distribution of student achievement.” Journal of Public Economics, 92(5-6): 1394–1415. 20

Rivkin, S., E. Hanushek, and J. Kain. 2005. “Teachers, schools, and academic achievement.” Econometrica, 73(2): 417–458. Rockoff, J. 2004. “The impact of individual teachers on student achievement: Evidence from panel data.” American Economic Review, 94(2): 247–252. Rockoff, J., and L. Turner. 2010. “Short-run impacts of accountability on school quality.” American Economic Journal: Economic Policy, 2(November): 119–147. Rockoff, J., B. Jacob, T. Kane, and D. Staiger. 2008. “Can you recognize an effective teacher when you recruit one?” NBER Working Paper No.14485. Vigdor, J. 2008. “Teacher salary bonuses in North Carolina.” Working Paper at National Center for Analysis of Longitudinal Data in Education Research.

21

Figure 1: The evaluation of schools under North Carolina’s accountability program

Note: The receipt of a bonus depends only on each school’s average growth. Conditional on bonus receipt, its amount relies on each school’s change ratio.

Figure 2: The proportion of public schools in North Carolina from 2005-06 to 2007-08

Note: The figures are restricted to regular schools and do not include high schools because they follow different rules for school accountability.

22

0

0

Bonus receipt

Number of Schools 50

1

100

Figure 3: The number of schools (left) and bonus receipt (right) (a) 2005-06

−.5

0

.5 1 Average Growth

1.5

2

−.5

0

.5 1 Average Growth

1.5

2

0

.5 1 Average Growth

1.5

2

0

.5 1 Average Growth

1.5

2

0

0

Bonus receipt

Number of Schools 50

1

100

(b) 2006-07

−.5

0

.5 1 Average Growth

1.5

2

−.5

0

0

Bonus receipt

Number of Schools 50

1

100

(c) 2007-08

−.5

0

.5 1 Average Growth

1.5

2

−.5

23

−.1

Average Growth 2006−07 0 .1 .2

.3

Figure 4: The impact of full bonus incentives on 2006-07

−.2

−.1

0 .1 Average Growth 2005−06

Average Growth 2006−07

.2

Quadratic fit

.3 95% CI

0

Average Growth 2007−08 .1 .2 .3

.4

Figure 5: The impact of full bonus incentives on 2007-08

−.2

−.1

0 .1 Average Growth 2006−07

Average Growth 2007−08

Quadratic fit

24

.2

.3 95% CI

−.2

Average Growth 2008−09 −.1 0 .1

.2

Figure 6: The impact of reduced bonus incentives on 2008-09

−.2

−.1

0 .1 Average Growth 2007−08

Average Growth 2008−09

.2

Quadratic fit

.3 95% CI

−.2

Average Growth 2009−10 −.1 0 .1

.2

Figure 7: The impact of no bonus incentives on 2009-10

−.2

−.1

0 .1 Average Growth 2008−09

Average Growth 2009−10

Quadratic fit

25

.2

.3 95% CI

Table 1: Summary statistics for public schools in North Carolina in 2005-06 Bonus qualified Bonus non-qualified (average growth ≥ 0) (average growth < 0) mean s.d. mean s.d. Academic achievement Average growth 0.082 0.091 -0.082 0.062 Change ratio 1.35 0.49 0.81 0.15 AYP met(%) 0.58 0.49 0.31 0.46 School characteristics Enrollment 558.1 248.3 530.7 231.5 Class size 20.1 2.66 19.7 2.69 Student-teacher ratio 14.9 4.24 14.9 4.68 Student characteristics White(%) 0.617 0.262 0.467 0.290 Black(%) 0.264 0.227 0.400 0.271 Hispanic(%) 0.085 0.089 0.097 0.099 Free lunch eligible(%) 0.372 0.190 0.495 0.194 Teacher characteristics Advanced degree(%) 0.269 0.095 0.228 0.089 Turnover(%) 0.202 0.101 0.225 0.106 N 914 851 Note: The figures do not include high schools or non-regular schools that follow different rules for school ratings and bonus receipt.

26

Table 2: RD estimates with full bonus incentives (impact on 2006-07 and 2007-08)

Dst Control Xs,t+1 N

bandwidth h1 (1) (2) -0.0565*** -0.0513** (0.0209) (0.0207) No Yes 858 822

bandwidth h0 (3) (4) -0.0619** -0.0524* (0.0303) (0.0299) No Yes 558 535

bandwidth h2 (5) (6) -0.0274** -0.0249* (0.0134) (0.0133) No Yes 286 274

Note: The dependent variable is school-level average growth. Standard errors clustered by school in parentheses. *, **, and *** denote significance at the 10%, 5%, and 1% levels, respectively.

Table 3: RD estimates with reduced bonus incentives (impact on 2008-09) bandwidth h1 bandwidth h0 bandwidth h2 (1) (2) (3) (4) (5) (6) Dst -0.0536* -0.0594** -0.0640* -0.0811** -0.0984 -0.119* (0.0293) (0.0293) (0.0373) (0.0377) (0.0627) (0.0627) Control Xs,t+1 No Yes No Yes No Yes N 148 140 102 95 70 65 Note: The dependent variable is school-level average growth. Standard errors in parentheses. *, **, and *** denote significance at the 10%, 5%, and 1% levels, respectively.

27

Table 4: Summary statistics for estimation samples from 2005-06 to 2007-08 Dst = 0 Dst = 1 Difference in means School characteristics Enrollment 568.3 596.7 -28.4 (253.9) (252.6) [0.151] Class size 19.8 20.0 -0.14 (2.74) (2.75) [0.512] Student characteristics White(%) 0.505 0.537 -0.032 (0.280) (0.277) [0.138] Black(%) 0.335 0.312 0.024 (0.251) (0.241) [0.214] Free lunch eligible(%) 0.435 0.417 0.018 (0.194) (0.188) [0.253] Teacher characteristics Advanced degree(%) 0.249 0.242 0.007 (0.093) (0.087) [0.332] Turnover(%) 0.210 0.199 0.010 (0.101) (0.096) [0.198] N 296 371 – Note: Standard deviation in parentheses, and p-value in square brackets.

Table 5: RD estimates with bonus incentives (impact on 2006-07 through 2008-09)

Dst Control Xs,t+1 N

bandwidth h1 (1) (2) -0.0612*** -0.0594*** (0.0147) (0.0145) No Yes 1238 1188

bandwidth h0 (3) (4) -0.0790*** -0.0721*** (0.0238) (0.0236) No Yes 667 637

bandwidth h2 (5) (6) -0.0956** -0.0847* (0.0396) (0.0437) No Yes 364 347

Note: The dependent variable is school-level average growth. Standard errors clustered by school in parentheses. *, **, and *** denote significance at the 10%, 5%, and 1% levels, respectively.

28

Table 6: RD estimates with no bonus incentives (impact on 2009-10) bandwidth h1 bandwidth h0 bandwidth h2 (1) (2) (3) (4) (5) (6) Dst 0.0176 0.00569 0.0168 0.0116 0.0368 0.0439 (0.0230) (0.0234) (0.0319) (0.0322) (0.0466) (0.0481) Control Xs,t+1 No Yes No Yes No Yes N 305 288 203 191 137 129 Note: The dependent variable is school-level average growth. Standard errors in parentheses. *, **, and *** denote significance at the 10%, 5%, and 1% levels, respectively.

Table 7: Estimation results conditional on Ds,t−1 bandwidth h1 bandwidth h0 bandwidth h2 Panel A. Impact on 2007-08 Dst -0.0141 -0.0408 -0.1350 (when Ds,t−1 = 1) (0.0396) (0.0569) (0.0915) N 241 157 97 Dst -0.0460 -0.0344 -0.0440 (when Ds,t−1 = 0) (0.0286) (0.0430) (0.0635) N 359 228 146 Panel B. Impact on 2008-09 Dst -0.1210** -0.0356 -0.2480 (when Ds,t−1 = 1) (0.0522) (0.0456) (0.1550) N 62 36 24 Dst -0.0148 -0.0408 -0.0892 (when Ds,t−1 = 0) (0.0368) (0.0569) (0.0750) N 80 62 43 Note: Standard errors in parentheses. *, **, and *** denote significance at the 10%, 5%, and 1% levels, respectively.

29

Table 8: Estimation results for other dependent variables ys,t+1 Dependent variable Panel A. School characteristics Enrollment N Class size N Student-teacher ratio N Panel B. Teacher characteristics Advanced degree(%) N Turnover(%) N

bandwidth h1

bandwidth h0

bandwidth h2

34.64 (33.20) 1238 0.288 (0.352) 1237 0.817** (0.347) 811

28.29 (54.72) 667 0.076 (0.586) 667 1.310** (0.549) 450

-82.15 (106.6) 364 -1.882 (1.231) 364 0.112 (1.017) 250

-0.012 (0.013) 1189 0.009 (0.013) 1175

-0.040* (0.021) 637 0.015 (0.021) 629

-0.009 (0.051) 347 0.003 (0.043) 343

Note: Standard errors in parentheses. *, **, and *** denote significance at the 10%, 5%, and 1% levels, respectively.

30

Appendices A

Accountability program in North Carolina

A.1

Background

In 1996, the North Carolina State Board of Education (SBE) developed a school accountability program, referred to as the ABCs of Public Education, which focused on strong Accountability, teaching the Basics with an emphasis on high educational standards, and maximum local Control. In 2002-03, the ABCs program was expanded to incorporate the new statutory accountability requirements of the federal NCLB. In 2005-06, new growth formulas were implemented that make comparisons to previous years inappropriate. The ABCs accountability program sets growth and performance standards for each elementary, middle, and high school in the state. End-of-Grade (EOG) and End-ofCourse (EOC) test results, and other selected components, are used to measure a school’s growth and performance. Schools that attain the standards are normally eligible for incentive awards or other recognition. Schools where growth and performance fall below specified levels are designated as low-performing, and may receive mandated assistance based on action by the SBE.

A.2

Definition of Expected Growth

A school’s ABCs growth status is determined by its growth calculation and its change ratio (a measure of the percent of students meeting their individual growth targets). A school’s grade span and/or courses determine the composition of these measures, as described below. The average growth for a school may include:

31

(1) Average growth on EOG reading and mathematics for grades 3-8 and any EOC tests. (2) Change over a two-year baseline in the percent of students completing the college/university prep and college tech prep courses of study. (3) Change in the competency passing rate. (4) Change in the ABCs dropout rate. The schools whose average growth is equal to the growth expectation (shown by an average difference of 0.00 or better) are said to have met Expected Growth.

A.3

Definition of High Growth

The change ratio used to determine the attainment of high growth is calculated as follows. The factors are arranged such that the number of students meeting their individual growth standards is in the numerator along with the change in competency pass rate and college/university prep and college tech prep courses of study. Students not meeting their individual growth standard are in the denominator and the decrease in dropout rate is subtracted from the denominator. Schools that have an average growth of 0.00 or better (met expected growth) and have a change ratio of 1.50 or better are said to have met High Growth.

A.4

Bonus amount

In the academic years of 2005-06 and 2006-07, certified staff members each received up to $1,500 and teacher assistants up to $500 at High Growth schools. Similarly, certified staff members each received up to $750 and teacher assistants up to $375 at Expected Growth schools. In 2007-08, the amount of bonus was reduced to $1,053 and $351 respectively at High Growth schools, and $527 and $263 respectively at Expected

32

Growth schools. Staff members and assistants were notified of this reduction after the corresponding academic year ended. Since 2008-09, the attainment of ABCs growth standards has been calculated as usual, but incentive awards have been suspended because of the state’s economic condition.

B

Continuity at the threshold

The following three figures illustrate the continuity at the threshold of zero for class size, racial proportion, and teacher quality, respectively. As same as Figure 4, the x axis represents the school-level average growth in 2005-06 with each point being a mean for schools in 0.02-point bins. The y axis, in contrast, represents school, student, or teacher characteristics in the same 2005-06 school year. It is clear from the figure that, in any case, there is no discontinuity at the threshold. With this assumption that other variables are continuously distributed across the threshold, the RD method consistently estimates the impact of teachers’ receiving bonuses.

33

0

Class size 2005−06 10 20

30

Figure 8: Continuity for class size

−.2

−.1 Class size

0 .1 Average Growth 2005−06 Quadratic fit

.2

.3

95% CI

0

Proportion of Students 2005−06 .2 .4 .6 .8

1

Figure 9: Continuity for racial proportion

−.2 White

−.1

0 .1 Average Growth 2005−06

Black

Quadratic fit

34

.2

.3 95% CI

0

Proportion of Teachers 2005−06 .1 .2 .3

.4

Figure 10: Continuity for teacher quality

−.2 Turnover

−.1

0 .1 Average Growth 2005−06

Advanced degree

35

Quadratic fit

.2

.3 95% CI

A Regression Discontinuity Approach

“The promise and pitfalls of using imprecise school accountability measures.” Journal of Economic Perspectives, 16(4): 91–114. Kane, T., and D. Staiger. 2008.

535KB Sizes 0 Downloads 267 Views

Recommend Documents

A Regression Discontinuity Approach
We use information technology and tools to increase productivity and facilitate new forms of scholarship. ... duration and post-unemployment job quality. In.

A Regression Discontinuity Approach
Post-Unemployment Jobs: A Regression Discontinuity Approach .... Data. The empirical analysis for the regional extended benefit program uses administrative ...

Regression Discontinuity Design with Measurement ...
Nov 20, 2011 - All errors are my own. †Industrial Relations Section, Princeton University, Firestone Library, Princeton, NJ 08544-2098. E-mail: zpei@princeton.

Regression Discontinuity Designs in Economics
(1999) exploited threshold rules often used by educational .... however, is that there is some room for ... with more data points, the bias would generally remain—.

Read PDF Matching, Regression Discontinuity ...
Discontinuity, Difference in Differences, and. Beyond - Ebook PDF, EPUB, KINDLE isbn : 0190258748 q. Related. Propensity Score Analysis: Statistical Methods and Applications (Advanced Quantitative Techniques in the · Social Sciences) · Mastering 'Met

Regression Discontinuity Design with Measurement ...
“The Devil is in the Tails: Regression Discontinuity Design with .... E[D|X∗ = x∗] and E[Y|X∗ = x∗] are recovered by an application of the Bayes' Theorem. E[D|X.

Regression Discontinuity Designs in Economics - Vancouver School ...
with more data points, the bias would generally remain— even with .... data away from the discontinuity.7 Indeed, ...... In the presence of heterogeneous treat-.

Local Polynomial Order in Regression Discontinuity Designs
Oct 21, 2014 - but we argue that it should not always dominate other local polynomial estimators in empirical studies. We show that the local linear estimator in the data .... property of a high-order global polynomial estimator is that it may assign

Interpreting Regression Discontinuity Designs with ...
Gonzalo Vazquez-Bare, University of Michigan. We consider ... normalizing-and-pooling strategy so commonly employed in practice may not fully exploit all the information available .... on Chay, McEwan, and Urquiola (2005), where school im-.

A Practical Introduction to Regression Discontinuity ...
May 29, 2017 - variables—the student's score in the mathematics exam and her score in the ..... at the raw cloud of points around the cutoff in Figure 3.1.

rdrobust: Software for Regression Discontinuity Designs - Chicago Booth
Jan 18, 2017 - 2. rdbwselect. This command now offers data-driven bandwidth selection for ei- ..... residuals with the usual degrees-of-freedom adjustment).

Regression Discontinuity and the Price Effects of Stock ...
∗Shanghai Advanced Institute of Finance, Shanghai Jiao Tong University. †Princeton .... The lines drawn fit linear functions of rank on either side of the cut-off.

Power Calculations for Regression Discontinuity Designs
Mar 17, 2018 - The latest version of this software, as well as other related software for RD designs, can be found at: https://sites.google.com/site/rdpackages/. 2 Overview of Methods. We briefly ...... and Applications (Advances in Econometrics, vol

Local Polynomial Order in Regression Discontinuity ...
Jun 29, 2018 - Central European University and IZA ... the polynomial order in an ad hoc fashion, and suggest a cross-validation method to choose the ...

Power Calculations for Regression Discontinuity Designs
first command rdpower conducts power calculations employing modern robust .... and therefore we do not assume perfect compliance (i.e., we do not force Ti ...

Optimal Data-Driven Regression Discontinuity Plots
the general goal of providing a visual representation of the design without ... Calonico, Cattaneo, and Titiunik: Optimal Data-Driven RD Plots. 1755 disciplined ...

Partisan Imbalance in Regression Discontinuity Studies ...
Many papers use regression discontinuity (RD) designs that exploit the discontinuity in. “close” election outcomes in order to identify various political and ...

Optimal Data-Driven Regression Discontinuity Plots ...
Nov 25, 2015 - 6 Numerical Comparison of Partitioning Schemes ...... sistently delivered a disciplined “cloud of points”, which appears to be substantially more ...

rdrobust: Software for Regression Discontinuity Designs - Chicago Booth
Jan 18, 2017 - This section provides a brief account of the main new features included in the upgraded version of the rdrobust ... See Card et al. (2015) for ...

A Regression-Based Approach (Methodology in
Introduction to Mediation, Moderation, and Conditional Process Analysis: A .... to Design Methodology Approach Data were obtained in three studies Study 1 a ...