How Sensitive are Sales Prices to Online Price Estimates in the Real Estate Market? Yong Suk Lee and Yuya Sasaki∗ March 12, 2014
Abstract This paper investigates how sensitive sales prices are to online price estimates in the real estate market. With our preliminary national and MSA-level analysis, we fail to reject the null hypothesis that online price estimates do not affect actual transaction prices. This macro-level evidence is followed by microeconometric analysis using houselevel data. To account for correlated house specific unobservables, we use the differences between listing prices and online price estimates as proxies to form a partial linear model. To account for correlated neighborhood specific unobservables, we use neighborhood first differencing. Using house price estimates and sales prices collected from Zillow.com, we find that the elasticity of sales price with respect to the Zillow estimate is one, controlling for the aforementioned unobservables as well as observed house attributes. Our results imply that online price estimates can have a large impact on real estate price dynamics. Keywords: real estate pricing, online price estimates, hedonic valuation, neighborhood panel data, proxies
JEL Codes: D82, R21, R31, R32 ∗
Lee: Williams College; Sasaki: Johns Hopkins University. The authors thank Danny Guo and Simmon Kim for data collection and research assistance.
1
1
Introduction
Like other types of assets, the price of real estate is determined by the observed and unobserved attributes of the asset. Houses, especially single family houses, exhibit unobserved heterogeneity across various dimensions. Same sized bedrooms can be valued differently depending on the location of the window.
The topography of same sized lots can affect the value of the
property. Neighborhood amenities, like nearby schools and parks, are important determinants of property prices. While no two houses in general are alike, houses have been priced based on what appraisers or brokers refer to as comparables, observationally similar houses that were recently sold in the same or nearby neighborhood. Pricing adjustments are made to reflect the differences between the house of interest and the comparable houses. In other words, the pricing of real estate takes into account the information of other real estate prices. With the advancement of the internet, one can easily search sales price information for a large number of properties. Furthermore, there are online services that provide their own property estimates for free based on the property and neighborhood attributes, as well as the sales prices of comparable properties. Does the availability of such price information impact actual sales prices? How large is the extent of this impact? Economists have long been interested in how information, or the lack of information, impacts asset prices. Easley and O’Hara (1987, 2004) show that large trades in the securities market reflect better information and impact security prices, and that investors demand higher returns on stocks for which there is less public information. Researchers have found evidence that information impacts sales prices in the real estate market as well. Levitt and Syversson (2008) show that informational advantage translates to higher sales prices by examining properties owned by real estate brokers. They find that realtors sell their own houses at about 4 percent higher prices. The recent financial crisis have triggered interests in the impact of foreclosure on house prices. Foreclosures can potentially impact the price of non-foreclosure houses by conveying new information about unobserved neighborhood attributes, or more directly by being included as comparables. Campbell et al. (2011) find that foreclosed homes 2
lower prices of nearby houses by about 1 percent.1 The real estate market is prone to imperfect information and market participants will likely value information that can help price the asset. Sellers may generally have the informational advantage over the buyers about the condition of properties.2 In many cases, the buyer and seller may simply not know the exact values of certain heterogeneous components of the property. In this paper, we analyze the value of real estate price information and estimate how sensitive sales prices are to online price estimates in the real estate market. In order to estimate the impact of information on prices in the real estate market, we propose a reduced-form pricing equation as the convex combination of third-party price estimate and self valuation of a property. The main challenge for estimation is to control for the unobserved house and neighborhood attributes in the model. We propose a method that nonparametrically proxies for unobserved house specific attributes by using the difference between the listing price and the online price estimate. Our method is similar to the approaches used in the production literature where researchers have used observed inputs and investments to nonparametrically proxy for unobserved technologies. We also control for unobserved neighborhood attributes by first differencing properties within same neighborhoods. The literature has dealt with unobserved area specific attributes by using boundary discontinuities (Black 1999, Bayer et al. 2005) or quasi-experimental research designs (Chay and Greenstone 2005). Bajari et al. (2012) propose a method that relies less on the research design but on the structural assumption that prior sales prices can be used to control for time-varying unobservable attributes in a hedonic regression. Similarly, we estimate the extended hedonic model by relying on the structural assumption that prior list prices contain unobserved house specific information and on the data requirement of having at least two properties per neighborhood. We collect home value estimates, list prices, sales prices, and house and neighborhood attributes from Zillow.com, an online real estate information provider, for 1,200 houses across 1
Real life examples of markets for information, like car reports for used cars or online reviews for restaurants, more directly speak to the value people put on information. 2 The seller may know of a problem that may not be apparent to one who has not lived in the house for multiple days or even a home inspector (e.g. seasonal drafts, neighbor issues, etc.).
3
30 Metropolitan Stastistical Areas (MSAs) in the US. We find that the elasticity of sales prices with respect to the Zillow home price estimates is one. The results are robust regardless of how we calculate the proxy variable to control for unobserved house attributes. These results imply that online price estimates can have a big impact on real estate price dynamics. Additionally, we explore possible factors that might explain the variation in the elasticity estimates across the 30 MSAs. The percent of population with a bachelor degree or above significantly explains the variation in the elasticity estimates across MSAs. The paper is organized as follows. Section 2 presents the MSA level analysis. Section 3 presents the microeconometric model and its estimation strategies. Section 4 explains the house level data collected from the real estate website, Zillow.com. In Section 5, we present our elasticity estimates. Section 6 concludes and discusses the implications.
2
The MSA Level Analysis
As a preliminary step, we first examine the hypothesis that online property price estimates impact actual sales prices at the aggregate level. If real estate price information directly impacts house prices, we expect the relation to hold at an aggregate level as well. Specifically, we test whether Zillow’s median price estimates Granger cause the median sales price as reported by Zillow across 30 MSAs in the US. The 30 MSAs were chosen based on Zillow’s MSA level report and the availability of individual sales price information.3 Table 1 lists the 30 MSAs and the summary statistics of the median sales price and Zillow estimates for three bedroom single family houses. The MSA level data is available at Zillow’s research division and we collect monthly data from October 2008 to April 2013.4 The following two subsections introduce the empirical methodology for the MSA level analysis, and the third subsection presents empirical results. 3 4
Section 4 describes the selection of MSAs in more detail The data is available at http://www.zillow.com/blog/research/data/
4
2.1
Granger Causality in VAR
For each MSA, we denote Zillow’s median log house price estimate at time t by Zt . The median log sales price at time t is denoted by Yt . We assume that they jointly follow the p-th order vector autoregressive (VAR(p)) process:
p
Zt A0,1 X Aq,1,1 Aq,1,2 Zt−q εt,1 + = + Aq,2,1 Aq,2,2 Yt−q εt,2 Yt A0,2 q=1
(2.1)
We say that Zt does not Granger cause Yt if Aq,2,1 = 0 for all q = 1, · · · , p. A test of this null hypothesis can be conducted by the Wald test on (A1,2,1 , · · · , Ap,2,1 ). Let A2 = (A0,2 , A1,2,1 , A1,2,2 , · · · , Ap,2,1 , Ap,2,2 ) be the (2p + 1)-dimensional vector of the coefficient in the ˆ A2 second row of the above VAR model (2.1). Let Aˆ2 denote its consistent estimate, and let Σ denote a consistent estimate of the variance matrix of the coefficient estimate Aˆ2 . The Wald statistic is computed by ˆ = Aˆ02 R0 (RΣ ˆ A2 R0 )−1 RAˆ2 W where R is the p by 2p+1 restriction matrix whose 2r-th column is one for each row r = 1, · · · , p, and all the other elements are zero. Under the null hypothesis H0 : A2 = ~0, this Wald statistic ˆ follows the chi-square distribution of p degrees of freedom. We report this statistic and the W associated p-value for the test of Granger causality.
2.2
Model Selection
There is arbitrariness in the choice of the order p of the VAR model (2.1). Some commonly used approaches to selecting p include Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). We conduct the hypothesis testing after selecting the order p of the VAR process by choosing the minimum AIC or BIC. However, we remark that these approaches have some drawbacks in terms of consistency of model selection and validity in post-selection inference. 5
A recently popular method of model selection in econometrics is the least absolute shrinkage and selection operator (LASSO: Tibshirani, 1996). In particular, the adaptive LASSO (Zou, 2006) enjoys the nice Oracle property as well as consistency of the model selection. This method works as follows. Let Aˆ denote a preliminary consistent estimate of the parameters A in model (2.1) without model restriction, e.g., the least squares estimate under a choice of large order p. The adaptive LASSO estimate Aˆ∗ is obtained by the L1 penalized least square problem
Aˆ∗
2
p T
Zt
X X A A A Z 0,1 q,1,1 q,1,2 t−q
= arg min − −
A
Y A0,2 Aq,2,1 Aq,2,2 Yt−q t=p+1 q=1
t p 2 X 2 X X |A | |A | 0,r q,r,c κ . κ + +λT ˆ ˆ A0,r c=1 q=1 Aq,r,c r=1
The theory (Zou, 2006) requies that the tuning parameters κ > 0 and λT > 0 satisfy the √ asymptotic order λT / T → 0 and λT T (κ−1)/2 → ∞ as T → ∞. In practice, however, T is fixed given a finite sample, and thus this asymptotic guideline may not be useful. Therefore, we present empirical estimation results for each of the different values of the tuning parameter, and examine their robustness.
2.3
Results from the MSA level analysis
Table 2 presents the tests of Granger causality of median sales price yt by Zillow’s median price estimates for each MSA. The results are based on the best information criteria with the maximum p set to 10. Column (1) presents results when optimal p is chosen using the minimum AIC. The optimal lag ranges from 7 to 10 with 10 being the most common. In all 30 cities but one, Boston, the joint hypothesis that all the coefficients on Zt−q are zero is rejected at the 10 percent level. Column (2) uses the minimum BIC to choose p and the joint hypothesis tests reject the null for all cities except Boston, Denver, and Philadelphia. The more preferred LASSO method with tuning parameters of 0.5, 1.0, and 2.0 are presented in columns (3) (5). The optimal p tends to be smaller in these columns than in columns (1) and (2), but the 6
general results are similar. Other than in three to five cities, we reject the null hypothesis that the online price estimates do not Granger cause sales prices. The last row of Table 2 presents results for the entire US. The selected lag orders are 5 and 6 in the LASSO models of columns (4) and (5) and 10 in the other models. The p-values imply that Zillow’s median price estimate Granger-cause actual sales prices at the national level in all five models. The MSA and national level aggregate results indicate that real estate price information may impact actual transaction prices.
3
The Methods of Micro Level Analysis
Our analysis based on aggregate data and the VAR model in the previous section is limited in its ability of causal inference. In order to convincingly illustrate the relationship between online price estimates and sales prices, we obtain consistent and robust evidence based on microeconomic analysis using house-level data. This section presents a model and empirical methods to this end.
3.1
The Extended Hedonic Model
We propose a microeconometric method that estimates the impact of real estate price information on sales prices. Specifically, we extend the traditional hedonic framework to the one that incorporates the potential effects of real estate price information, in particular, the individual house level price estimate provided by Zillow. The following is a list of economic factors that may potentially affect transaction prices for house i in a neighborhood Ni : • Xi : A vector of house-specific amenities including: lot size, square footage, number of bedrooms, number of bathrooms, and year built. • Ui : The value of unobserved house-specific amenities including: floor plans and appliances.
7
• VNi : The value of unobserved neighborhood-specific amenities including: public schools, crime, curb appeal, environmental quality, and other public services. • Zi : Home price information, i.e. Zillow’s price estimates, that real estate market participants can observe. The standard hedonic pricing models forecast the transaction price as follows: Residual 1
Residual 2
Residual 3 z}|{ z}|{ z}|{ Yi = α + Xi β + VNi + Ui + εi . | {z } | {z } Regression
(3.1)
Residual
For the purpose of elucidating the problem that we face in our study, we decompose the usual residual into three components, the first one reflecting the value VNi of neighborhood-specific amenities, the second one reflecting the value Ui of house-specific amenities, and the third one representing idiosyncratic errors εi . The standard hedonic pricing model (3.1) assumes that sellers and/or buyers take the vector of house-specific amenities (Xi , Ui ) and the value of neighborhood-specific amenities VNi into account when making decisions about transaction prices Yi in the equilibrium. Econometricians estimate the reduced-form marginality coefficient β, called contributory values, for the part Xi of the house-specific amenities that they can observe in data. We hypothesize that agents may also take into account the home price information Zi , the one that is produced by real estate information providers like Zillow, when proposing to set transaction prices. This hypothesis may reflect that both buyers and sellers may not be so confident of their own home evaluation based on the information of the house and the neighborhood, and therefore tend to use the measure Zi provided by third parties. In this light, we propose an extended reduced-form equilibrium pricing model simply as the convex combination of outside and self valuations:
Yi = γZi + (1 − γ)[α + Xi β + VNi + Ui + εi ].
8
(3.2)
The expression in the square brackets in the second term, α + Xi β + VNi + Ui + εi , constitute those factors used in the traditional hedonic pricing models (3.1). Further, we add the first term Zi to reflect the potential effects of the home price information Zi on transaction prices Yi . As such, the parameter γ may be interpreted as the degree of agents’ reliance on the third-party information. Our null hypothesis that the home price estimates Zi do not impact the actual transaction prices is thus represented by the equality γ = 0, which is readily testable once a √ N -consistent estimate of γ is obtained. The OLS estimators of the parameters α, β, and γ would be consistent if (VNi , Ui ) were mean independent of both Zi and Xi . However, this statistical independence assumption is hard to justify at least for two reasons. First, the unobserved house-specific amenities Ui are likely to be correlated with the observed house-specific amenities Xi . Second, more importantly in our study, the introduction of Zi in the extended pricing model (3.2) causes another source of endogeneity. To see this, it may help to think of how the home price information Zi is generated by real estate information providers. Although these service agencies do not disclose their formulas, those estimates Zi are constructed using the recent transaction data in the neighborhood Ni of house i. (See Section A.1 in the appendix for the case of Zillow.) As such, the statistical independence Zi ⊥⊥ VNi between the price estimate and unobserved neighborhood characteristics, or the corresponding mean independence, will probably not hold even if we control for the observed house specific amenities Xi . We therefore propose a couple of approaches to handle these two sources of endogeneity in the subsequent sections.
3.2
Proxy Variable
To control for the endogenous unobserved house-specific amenities Ui , we follow the proxy variable approach often taken in the production literature (Olley and Pakes, 1992; Levinsohn and Petrin, 2003), which is formalized by Wooldridge (2009). Specifically, we construct a proxy variable using listing prices, denoted by Li . The seller can perceive house-specific amenities Ui that econometricians cannot observe. They may add these values to benchmark hedonic
9
valuations Hi when proposing their listing prices, i.e.,
Li = Hi + g(Ui ).
List prices may differ from the online hedonic estimates Hi for various reasons. List prices tend to start high since the seller predicts that the negotiation process will ultimately result in a lower sales price. How quickly the seller needs to sell the property could also impact the list price. The function g thus captures the seller’s adjustment of the self-valuation of Ui . Note that the identity function g(u) = u implies that there is no markup or markdown in the listing prices above the observed and unobserved value of the house.5 Finally, to take this structure into estimation of the parameters, we assume that g is strictly increasing so that its inverse g −1 exists. With this inverse function, we can recover the unobserved house-specific amenities Ui by Ui = g −1 (Li − Hi ).
Substituting this expression in (3.2) yields
Yi = γZi + (1 − γ)[α + Xi β + VNi + g −1 (Li − Hi ) + εi ] = γZi + α ˜ + Xi β˜ + γ˜ VNi + g˜(Li − Hi ) + ε˜i ,
(3.3)
where α ˜ := (1 − γ)α, β˜ := (1 − γ)β, γ˜ = (1 − γ), g˜ := (1 − γ)g −1 (·) and ε˜i = (1 − γ)εi for shorthand notations. This operation removes one of the two sources of endogeneity, namely Ui , and it thus remains to handle the other unobserved variable VNi . For estimation of the parameters with the additive nonparametric function g˜ provided VNi is known, we can use Robinson (1988) which Olley and Pakes (1992) and Levinsohn and Petrin (2003) use for the similar purpose of handling proxy variables nonparametrically. 5
We find that initial list prices are higher than the prior hedonic estimates in about 68 percent and lower in about 32 percent of the observations in our sample.
10
This method works as follows. If the mean independence E[εi | Ui ] = 0 is true, then E[Yi | U˜i ] = γ E[Zi | U˜i ] + α ˜ + E[Xi | U˜i ]β˜ + γ˜ E[VNi | U˜i ] + g˜(U˜ ) follows, where U˜i = Li − Hi for a short-hand notation. Thus, we obtain Yi − E[Yi | U˜i ] = γ(Zi − E[Zi | U˜i ]) + (Xi − E[Xi | U˜i ])β˜ + γ˜ (VNi − E[VNi | U˜i ]) + ε˜i
If the contributory value VNi of neighborhood Ni were observed, then γ may be
√ N -consistently
estimated by the OLS of Yi − E[Yi | Li − Hi ] on Zi − E[Zi | Li − Hi ], Xi − E[Xi | Li − Hi ], and VNi − E[VNi | Li − Hi ], where the nonparametric regressions E[Yi | Li − Hi ], E[Zi | Li − Hi ], E[Xi | Li − Hi ] and E[VNi | Li − Hi ] are pre-estimated using the kernel method.
3.3
Local First Differencing
The previous section introduced a way to control for house-specific unobservables Ui , provided that the contributory value VNi of neighborhood Ni were observed. If we have multiple observations per neighborhood, however, we do not need to observe VNi since we can take first differences within a neighborhood to vanish the VNi terms. Note that Ni = Nj clearly implies VNi = VNj . Hence, we can take the difference of (3.3) between two properties, i and j to obtain the equation
Yi − Yj = γ(Zi − Zj ) + (Xi − Xj )β˜ + g˜(Li − Hi ) − g˜(Lj − Hj ) + ε˜i − ε˜j .
(3.4)
for any pair (i, j) such that Ni = Nj , i.e., within the same neighborhood. This operation, mechanically identical to the method of first differencing for panel data analysis, removes the neighborhood fixed effect VNi . For this sort of first-differenced partially linear equations, Li and Stengos (1996) extend the Robinson’s method (see the previous section). √ Specifically, γ may be N -consistently estimated by the OLS of Yi − Yj − E[Yi − Yj | Li −
11
Hi , Lj −Hj ] on Zi −Zj −E[Zi −Zj | Li −Hi , Lj −Hj ], and Xi −Xj −E[Xi −Xj | Li −Hi , Lj −Hj ], where the nonparametric regressions E[Yi − Yj | Li − Hi , Lj − Hj ], E[Zi − Zj | Li − Hi , Lj − Hj ], and E[Xi − Xj | Li − Hi , Lj − Hj ] are pre-estimated using the kernel method.6
4
Data
We collect house level data from Zillow, one of the major online real estate information providers. There are many websites providing real estate information. Zillow provides individual house price estimates called Zestimates. The estimates are available regardless of whether the property is on the market or not.7 Zillow does not disclose the formula that they use to generate their price estimates, but they mention that they use the physical attributes of the property, tax assessments, and prior and current transactions of the property itself and the comparable properties nearby (see Appendix A). In addition to their current house price estimates, Zillow provides their past estimates, current and past listing prices when available, and the most recent sales price, and past sales prices when available. We collect the sales date and price, Zillow estimate at the time of sales, the estimates one, two, three, and six months before sales, the initial listing price, and historical sales and listing prices when available. In addition, a rich set of house specific and neighborhood/town specific information are available for Xi . We collect the address of the house, square footage, number of bedrooms, number of bathrooms, lot size, year built, and property tax. Zillow also provides nearby school names and the school ratings from GreatSchools.org. In collecting a sample, we make sure to include multiple houses from each neighborhood for 6
Baltagi and Li (2002ab) propose alternative methods to estimate first-differenced partially linear models with discussions on asymptotic properties of the estimators – they suggest that the nonparametric preestimations be done with series approximation instead of the kernel method in order to take advantage of the additivity between g˜(Li − Hi ) and g˜(Lj − Hj ). 7 Currently there are many real estate websites. Many of these websites are brokerage websites where listing and selling of properties on the market is the main business model. These websites belong to the local multiple listing service (MLS) which are local associations where real estate agents share their property listings. Other websites, for example Zillow and Trulia, are not real estate brokerage firms but mainly serve as an information provider to various parties interested in the real estate market. Their business model aims to gain a wide audience and profit from advertisement fees, not through brokerage fees.
12
neighborhood first differencing. The following procedure is employed to collect our sample of house level data. We first choose 30 MSAs where Zillow provides both their price estimates and the sales price information.8 For each MSA we find 10 neighborhoods with median Zestimates closest to the MSA median zestimate and collect data on four houses per neighborhood. If there are less than 10 neighborhoods in an MSA we additionally collect four more houses from existing neighboroods, starting with neighborhoods that have median Zestimates closest to the MSA median Zestimate. Within each neighborhood, we restrict the search to single family homes that are 2000 sqft or above, have 3 bedrooms or more, 2 bathrooms or more, and were last sold between July 2012 and July 2013. For each neighborhood, we narrow down to houses that have Zestimates that are closest to its Zipcode median Zestimate and that list the same set of nearby public elementary schools. We then randomly select the first four houses that have non-missing information on Zestimate at time of sales, sales price, initial listing price, number of beds and baths, house square footage, lot size, and year built. We also record the number of skips for each MSA, if there were any skip due to missing information.9 This procedure returns 40 houses in 30 MSAs for a total of 1,200 observations. Table 3 Panel A summarizes the characteristics of these houses. We also collect information on the accuracy of Zillow’s estimates by MSAs. This information is provided by Zillow Research and is available online. The variables provided are the number of homes on Zillow, the number of homes with Zillow estimates, and the error margin between Zillow’s estimates and sales price. Furthermore we collect MSA level population, land area, household number, education attainment from the census. Table 3 Panel B presents the summary statistics for the additional MSA level variables. 8
We first refer to the MSA Zestimate accuracy file. The MSA file contains 30 MSAs. However, in a few MSAs sale prices were not reported and we replaced those MSAs with other MSAs where accuracy estimates and sales prices were available. The Zestimate accuracy file is accessible at http://www.zillow.com/howto/DataCoverageZestimateAccuracy.htm. 9 On average there were about 16 skips per MSAs. We examined whether the number of skips impact the distribution of γ estimates across cities by including the number of skips in the OLS regression in Table 7. Including the number of skips does not alter the coefficient estimate on education and the coefficient estimate on the number of skips is statistically indistinguishable from zero.
13
5
Evidences from the Micro Level Analysis
5.1
The Impact of Real Estate Information on Sales Prices
We first implement our procedure on the full sample and then by MSAs. Tabel 4 presents the full sample results. Each cell in columns (1) through (6) displays the estimate of γ. Columns (1) through (3) report results when we do not perform the neighborhood first differencing, so we are not controlling for the unobserved neighborhood component. Column (1) presents estimates when we do not proxy for the unobserved house specific characteristics, column (2) controls for the unobserved house specific characteristics by including a linear proxy, which is the difference between the listing price and previous Zillow estimate, and column (3) uses the non-parametric nonlinear proxy in place of the linear proxy as described in Section 4.2. Columns (4) through (6) report results for specifications that parallel columns (1) through (3) but also based on the neighborhood first-differencing procedure in order to control for the unobserved neighborhood characteristics, as described in Section 4.3. Each row represents the combination of the Zillow estimate of interest, Zi in equation 4.2, and the Zillow estimate used in calculating the proxy variable, Hi in equation 4.3. In columns (1) through (3), when the neighborhood unobservables are not controlled for, the estimate of γ ranges from 0.825 to 1.013. The estimates decrease when we control for the endogenous unobserved house-specific amenities Ui using a proxy. In general, the estimates based on the non-parametric nonlinear proxy are smaller than the estimates based on the linear proxy. When we further control for the unobserved neighborhood specific amenity VNI by first differencing observations within neighborhoods, the estimates of γ decrease relative to the parallel specifications in columns (1) through (3). Overall, the estimates indicate that controlling for unobserved house and neighborhood attributes decreases the estimated partial effects of Zillow’s online estimates on sales prices. The estimates in column (6) which control for both UI and VNI are our preferred estimates for γ. Columns (7) and (8) conduct hypothesis tests where the null hypotheses are γ=0 and γ=1. All estimates of γ are statistically different from
14
zero at the 1 percent level. For some specifications, we can not reject the null that γ=1 at the 5 percent level. The result that the estimates are close to one implies that online house price estimate may well be a major factor in determining sales price. If the underlying model indeed follows equation (3.2) so that the coefficient estimates on the observable attributes Xi is (1 − γ)β, then γ=1 implies that the estimated coefficients on Xi would equal zero. We report test results on the joint hypothesis that all the coefficient estimates on the observable attributes Xi are zero. The p-values are reported in Table 5 where each column represents the same specification as described in Table 4. We focus on the column (5) and (6) results which use proxy variables and implement the neighborhood first-differencing. Even though we are testing a very restrictive hypothesis, we can not reject the null hypothesis that all the beta coefficients are zero at the 5 percent level in any of the specifications in column (5). Similarly in column (6), we can not reject the null hypothesis that all the beta coefficients are zero in all the specifications at the 1 percent level and we can not reject the null in all but one specification at the 5 percent level. Given the availability of online price estimates, participants in the real estate market no longer seem to rely on own hedonic assessments of house or neighborhood attributes. Another interpretation of these results is that the scalar index, namely Zi , is sufficient for the decisions by the market players. The multiple dimensions of the information, Xi , Ui and VNi , are available for them to look at, but they are redundant given the scalar index Zi . The results indicate that information is the prime driver of house prices. The magnitude of one for γ may initially seem surprising. However, once one thinks about how property transactions are made, this finding may seem less surprising. A hedonic framework is often used to estimate the marginal valuation of one attribute in a composite good. It is an ex post revelation of what people’s marginal willingness to pay for an attribute is. For instance, we can back out the marginal value of a bedroom from a hedonic regression. However, the market participant, be it the seller, buyer, or broker, would rarely use the estimates from a hedonic method to price the composite good, the house. One of the most common methods to value
15
property is to use recent sales prices of comparable properties in the neighborhood and then to make marginal adjustments based on the different attributes of the houses. Hence, the finding that online third party price estimates can serve as an important determinant of sales prices is not that surprising. We next examine how the estimates of γ vary across MSAs. Table 6 presents the impact of one month prior Zillow estimates on sales prices in each of the 30 MSAs using various benchmark online estimates in calculating the nonlinear proxy variable. For each estimate we conduct hypothesis tests where the null hypotheses are γ=0 and γ=1. Focusing on the column (2) results, which use 2 month prior online estimates in the nonlinear proxy, all estimates are statistically different from zero at the 10 percent level except for Charlotte and Riverside. The estimates vary considerably across MSAs, e.g. ranging from 0.377 in Riverside to 1.276 in Nashville. Furthermore, many of the estimates are statistically indistinguishable from one even at the MSA level. Using 3 months or 6 months prior Zillow estimates in the nonlinear proxy return similar results across the different MSAs.10
5.2
What Explains the Variation in the Elasticity Estimates?
What might explain the variation in the value of information in the real estate market and hence the elasticity estimates across MSAs? The size of the housing market, availability of the third party estimates, or the precision of the estimates could impact people’s reliance on online house price estimates. One could also hypothesize that the education level or income level of the population would impact the use of online real estate information. A simple correlation analysis finds that educational attainment of the MSA population, specifically, the share of age 25 or above population with a bachelor degree or above explains the variation in the γ estimates. Figure 1 presents a scatter plot between the two variables. The γ estimates in the figure are from the specification used in Table 6 column (1). The positive correlation is stark. MSAs with higher share of the population with college education or above tend to value online 10
Appendix Tables 1 present the MSA results with the different specifications as used in Table 3.
16
real estate price estimates more. Table 7 examines the statistical significance of this relation in an OLS regression. Column (1) presents the bivariate regression. The coefficient estimate on the educational attainment variable is 1.82 and statistically significant at the 1 percent level. A 10 percentage point increase in the share of bachelor degree or above is related to a γ estimate that is higer by about 0.18. The coefficient estimates on the education variable are robust to additionally including the set of MSA size variables in column (2) or the set of Zillow coverage variables in column (3). In column (4), we add the median error between Zillow estimates and sales price and the coefficient estimate on gamma barely changes. In column (5), we control for a categorical variable used by Zillow to indicate how accurate Zillow estimates are. The variable ranges from one to four with four indicating the most accurate and two the least accurate. Category one is for cities where accuracy is unknown. The coefficient estimate on education slightly drops and is statistically significant at the 10 percent level. Finally, in column (6) we additionally include median family income. Median family income is highly correlated with share of college educated or above. The coefficient estimate on the college share variable increases but the standard error increases as well, most likely due to the colinearity with income. Overall, Table 7 indicates that real estate sales prices are more sensitive to online price estimates in MSAs with a more educated population.
6
Conclusion
We investigate how sensitive sales prices are to online price estimates in the real estate market. For our micro data analysis, we propose a reduced-form equilibrium pricing model as the convex combination of third-party price estimate and self valuation of properties. Our method nonparametrically proxies for unobservable house attributes by using the difference between the listing price and the online price estimate, and controls for unobserved neighborhood attributes by neighborhood first differencing. We collect house price estimates, sales and list prices, in addition to various house and neighborhood attributes from Zillow.com across 30 MSAs in the US. The empirical obtained in this paper results show that the elasticity of sales price with 17
respect to the Zillow estimate is one. In addition, we find that the population share of college educated and above significantly explains the variation in the elasticity estimates across MSAs, although this simple correlation analysis is not the main feature of our analysis. The literature has found that information impacts asset prices, in particular in the securities market. We add to this literature evidence that information is valued in the real estate market to a large extent. Our finding of unit elasticity between sales prices and online price estimates in the real estate market may have significant implications. If information is more important than fundamentals in determining real estate prices, then how information is generated could have a big impact on real estate price dynamics. One may conjecture that the prevalence of online real estate information and the reliance on such information may have contributed partially to the recent boom and bust in the real estate market.
References Bajari, Patrick, Jane Cooley, Kyoo il Kim, and Christopher Timmins. 2012.“A Rational Expectations Approach to Hedonic Price Regressions with Time-Varying Unobserved Product Attributes: The Price of Pollution,” American Economic Review, 102(5): 1898-1926. Baltagi, Badi and Dong Li. 2002a. “Series Estimation of Partially Linear Panel Data Models with Fixed Effects,” Annals of Economics and Finance, 3: 103-116. Baltagi, Badi and Qi Li. 2002b. “On Instrumental Variable Estimation of Semiparametric Dynamic Panel Data Models,” Economics Letters, 76: 1-9. Bayer, Patrick, Fernando Ferreira, and Robert McMillan. 2007. “A Unified Framework for Measuring Preferences for Schools and Neighborhoods,” Journal of Political Economy, 114(4): 588-638. Black, Sandra. 1999. “Do Better Schools Matter? Parental Valuation of Elementary Education,” Quarterly Journal of Economics, 114(2): 577-599. 18
Campbell, John Y., Stefano Giglio, and Parag Pathak. 2011. “Forced Sales and House Prices,” American Economic Review, 101: 2108-2131. Chay, Kenneth and Michael Greenstone. 2005. “Does Air Quality Matter? Evidence from the Housing Market,” Journal of Political Economy, 113(2): 376-424. Easley, David, Soeren Hvidkjaer, and Maureen O’Hara. 2002. “Is Information Risk a Determinant of Asset Returns?” Journal of Finance, 57(5): 2185-2221. Easley, David and Maureen O’Hara. 1987. “Price, Trade Size, and Information in Securities Markets,” Journal of Financial Economics, 19: 69-90. Easley, David and Maureen O’Hara. 2004. “Information and the Cost of Capital,” Journal of Finance, 59(4): 1553-1583. Levinsohn, James and Amil Petrin. 2003. “Estimating Production Functions Using Inputs to Control for Unobservables,” Review of Economic Studies, 70: 317-341. Levitt, Steven D. and Chad Syverson. 2008. “Market Distortions When Agents Are Better Informed: The Value of Information In Real Estate Transactions,” Review of Economics and Statistics, 90(4): 599-611. Li, Qi and Thanasis Stengos 1996. “Semiparametric Estimation of Partially Linear Panel Data Models,” Journal of Econometrics, 71: 389-397. Olley, Steven and Ariel Pakes. 1992. “The Dynamics of Productivity in the Telecommunications Equipment Industry,” Econometrica, 64: 1263-1297. Robinson, Peter. 1988. “Root-N-Consistent Semiparametric Regression,” Econometrica, 56: 931-954. Rosen, Sherwin. 1974. “Hedonic Prices and Implicit Markets: Product Differentiation in Pure Competition,” Journal of Political Economy, 82(1): 34-55
19
Stiglitz, Joseph E. 2000. “The Contributions of The Economics of Information to Twentieth Century Economics,” The Quarterly Journal of Economics, Tibshirani, Robert. 1996. “Regression Shrinkage and Selection via the Lasso,” Journal of the Royal Statistical Society: Series B, 58: 267-288. Wooldridge, Jeffrey M. 2009. “On Estimating Firm-Level Production Functions Using Proxy Variables to Control for Unobservables,” Economics Letters, 104: 112–114. Zou, Hui. 2006. “The Adaptive Lasso and Its Oracle Properties,” Journal of the American Statistical Association, 101: 1418-1429.
A
Appendix
A.1
Zillow estimates
Zillow does not disclose the formula for how their hedonic price estimates are produced, but they mention which data they use. According to their website, some of the data that they use include: • Physical attributes: Location, lot size, square footage, number of bedrooms and bathrooms and many other details. • Tax assessments: Property tax information, actual property taxes paid, exceptions to tax assessments and other information provided in the tax assessors’ records. • Prior and current transactions: Actual sale prices over time of the home itself and comparable recent sales of nearby homes
20
Table 1. The list of Metropolitan Statistical Areas and the summary statistics of variables used in the MSA level VAR analysis Median Sales Price
Median Zillow Estimates
October 2008
October 2010
October 2012
October 2008
October 2010
October 2012
Atlanta Baltimore
194240 270735
148650 274823
154635 259375
155200 255700
131700 230000
112600 220800
Boston Charlotte
321025 176225
331100 180775
322985 176900
323300 150600
313500 137200
315000 136000
Chicago
238100
210125
195925
221000
183900
161600
Cincinnati Columbus
136300 154200
127400 148200
123600 157175
147700 137200
143900 128900
141460 126700
Denver Las Vegas
239645 226500
240925 138105
246325 132500
213600 199100
208200 127700
224400 122500
Los Angeles Miami-Fort Lauderdale Minneapolis
458752
405300
400000
435300
415200
405600
240200
157523
161050
196400
143300
149800
220000
195965
195750
207500
180300
173400
Nashville New York
160885 399750
171750 386652
171725 366450
155100 396400
148500 364500
148400 343100
Orlando Philadelphia
200500 224748
121250 223575
130900 213550
175500 216300
127400 199900
123700 186500
Phoenix
215933
149125
165250
189300
133700
154100
Pittsburgh Portland
126800 258036
121800 239100
131425 233750
102800 259500
105900 226300
111100 226800
Providence Riverside
242250 286500
231225 198900
207965 202500
237600 236300
227200 193200
211000 192000
Sacramento
289950
228050
221750
267700
226400
217900
St. Louis San Diego
163075 401500
150825 355750
154057 358900
141200 373900
135400 364900
127100 362800
San Francisco San Jose
572925 606100
483250 556425
480800 564000
536600 605700
499000 561800
512400 610400
Seattle 330875 309225 296612 330400 278900 Tampa 168475 130196 124165 147000 117700 Virginia 220441 222750 215125 223100 210900 Beach Washington 348750 358245 339717 329500 307700 DC Notes: The median sales price and Zillow estimates are for three bedroom single family houses.
267300 111700
21
195700 320200
Table 2. MSA level VAR results (1) Minimum AIC p (df) p-val
(2) Minimum BIC p (df) p-val
(3) LASSO (0.5) df p-val
(4) LASSO (1.0) df p-val
(5) LASSO (2.0) df p-val
Atlanta
10
0.000
9
0.000
10
0.000
8
0.000
10
0.000
Baltimore
10
0.000
10
0.000
10
0.000
9
0.000
10
0.006
Boston Charlotte
10 10
0.159 0.010
10 9
0.159 0.017
10 10
0.153 0.010
10 10
0.186 0.009
10 6
0.159 0.367
Chicago Cincinnati
9 10
0.000 0.001
9 10
0.000 0.001
9 10
0.000 0.000
9 9
0.000 0.000
9 9
0.000 0.000
Columbus
10
0.000
5
0.000
9
0.000
2
0.276
7
0.000
Denver Las Vegas
10 10
0.084 0.003
7 10
0.602 0.003
10 7
0.142 0.000
10 4
0.084 0.000
9 7
0.077 0.000
Los Angeles Miami-Fort Lauderdale
9 10
0.000 0.000
9 10
0.000 0.000
10 10
0.000 0.000
8 10
0.000 0.000
9 10
0.000 0.000
Minneapolis Nashville
7 10
0.000 0.061
5 10
0.000 0.061
10 8
0.005 0.114
10 9
0.003 0.029
9 6
0.003 0.135
New York
10
0.044
6
0.000
9
0.007
10
0.021
10
0.000
Orlando Philadelphia
10 10
0.000 0.003
10 4
0.000 0.239
10 10
0.000 0.003
10 9
0.000 0.010
10 9
0.000 0.001
Phoenix Pittsburgh
10 10
0.000 0.000
9 10
0.000 0.000
10 9
0.000 0.000
9 10
0.000 0.002
10 10
0.000 0.000
Portland
10
0.065
10
0.065
9
0.039
4
0.000
2
0.232
Providence Riverside
10 10
0.000 0.000
10 10
0.000 0.000
10 8
0.000 0.000
9 10
0.000 0.000
10 8
0.000 0.000
Sacramento St. Louis
10 10
0.000 0.014
10 9
0.000 0.011
9 10
0.000 0.025
10 8
0.000 0.007
8 6
0.000 0.002
San Diego San Francisco
9 10
0.000 0.000
6 10
0.000 0.000
10 10
0.000 0.042
10 10
0.000 0.005
9 9
0.000 0.000
San Jose
10
0.000
10
0.000
9
0.000
8
0.000
4
0.000
Seattle Tampa
10 10
0.000 0.000
10 8
0.000 0.000
9 10
0.000 0.000
8 10
0.000 0.000
9 10
0.000 0.000
Virginia Beach Washington DC
10 10
0.034 0.000
4 10
0.054 0.000
9 10
0.021 0.000
4 10
0.566 0.000
3 10
0.415 0.000
United States 10 0.009 10 0.009 10 0.006 5 0.000 6 Notes: The analysis was performed on monthly data over the period between Oct. 2008 and Apr. 2013.
0.015
22
Table 3. Summary statistics Variable
Mean
Std. Dev.
Min
Max
Obs
Panel A: House level data Sales price
320977
243894
10000
2950000
1200
Zillow estimate when sold Zillow estimate 1 month prior to sale Zillow estimate 1 month prior to sale Zillow estimate 1 month prior to sale Zillow estimate 1 month prior to sale List price
322056
230858
36000
2600000
1200
319858
230901
38000
2600000
1200
320484
246462
37000
3.34E+06
1200
315433
230222
16100
2600000
1200
313439
239434
26000
3080000
1200
340065
250090
19900
2900000
1199
Number of bedrooms
3.85
0.87
3
9
1200
Number of bathrooms
2.68
0.64
2
6
1199
Square footage
2373
547
2000
10890
1200
Year built
1960
37
1810
2013
1200
Panel B: MSA level data Percent of 25 and above population with bachelor degree or above
0.34
0.07
0.187
0.475
30
Median family income, 2007
69083
11047
51554
97095
30
Number of households, 2007
1353754
1348206
46675
6717007
30
6040.7
4783.9
1600.9
27259.9
30
Population, 2008
2763774
2836117
165829
1.29E+07
30
Houses on Zillow
1332274
1043634
159506
4976087
30
Houses with Zillow Estimates
1248442
977717.8
155126
4542490
30
3.17
0.87
1
4
30
Land area
Zillow estimate accuracy
23
Table 4. The micro-econometric analysis: estimates of γ (effect of online price estimates on sales prices) for the pooled sample Estimates of γ (Effect of online price estimates on sales prices) Y - Sales Prices
No Local First Differencing
All 30 MSAs Pooled
No Proxy (1)
Linear Proxy (2)
Nonlinear Proxy (3)
Z - 1 Month Before H - 2 Months Before
1.002 (0.028)
Z - 1 Month Before H - 3 Months Before
Z - 1 Month Before H - 6 Months Before
1.013 (0.029)
Z - 2 Months Before H - 3 Months Before
Local First Differencing No Proxy (4)
Linear Proxy (5)
Nonlinear Proxy (6)
1.005 (0.031)
0.805 (0.120)
0.988 (0.027)
0.999 (0.032)
0.994 (0.029)
0.985 (0.031)
0.856 (0.117)
0.833 (0.142)
0.808 (0.125)
p-Value
Local First Diff Nonlinear Proxy
H0: γ = 0 (7)
H0: γ = 1 (8)
0.993 (0.047)
0.000
0.887
0.832 (0.113)
1.003 (0.041)
0.000
0.944
0.805 (0.120)
0.893 (0.053)
0.000
0.043
0.292 (0.201)
0.684 (0.225)
0.002
0.160
Z - 2 Months Before H - 6 Months Before
0.874 (0.121)
0.856 (0.119)
0.825 (0.133)
0.266 (0.190)
0.273 (0.192)
0.570 (0.203)
0.005
0.034
Z - 3 Months Before H - 6 Months Before
1.001 (0.031)
0.982 (0.299)
0.970 (0.031)
0.582 (0.100)
0.604 (0.095)
0.857 (0.055)
0.000
0.009
Notes: The γ estimates measure the impact of online price estimates on actual sales prices. Z is the online price estimate of interest and H is the online price estimate used in calculating the nonlinear proxy. Different numbers of months prior to sales were used for the different Z and H combinations. Standard errors are in parentheses.
24
Table 5. The micro-econometric analysis: tests of the joint hypothesis that β’ = 0 based on the Wald statistic using the pooled sample p-value for the joint hypothesis that β’ = 0 based on the Wald statistics Y - Sales Prices
No Local First Differencing Linear Proxy (2)
Nonlinear
Z - 1 Month Before H - 2 Months Before
0.015
Z - 1 Month Before H - 3 Months Before
All 30 MSAs Pooled
Z - 1 Month Before H - 6 Months Before
No Proxy (1)
0.019
Z - 2 Months Before H - 3 Months Before
Local First Differencing Linear Proxy (5)
Nonlinear
0.011
0.089
0.050
0.018
0.045
0.264
0.016
0.065
0.138
0.200
0.371
0.101
0.075
0.723
0.474
Proxy (3)
No Proxy (4)
0.078
Proxy (6)
Z - 2 Months Before H - 6 Months Before
0.024
0.131
0.181
0.300
0.522
0.874
Z - 3 Months Before H - 6 Months Before
0.003
0.009
0.005
0.036
0.134
0.291
Notes: The γ estimates measure the impact of online price estimates on actual sales prices. Z is the online price estimate of interest and H is the online price estimate used in calculating the nonlinear proxy. Different numbers of months prior to sales were used for the different Z and H combinations. Standard errors are in parentheses.
25
Table 6. The micro-econometric analysis: estimates of γ (effect of online price estimates on sales prices) by MSA Local First Differencing with Nonparametric Proxy Estimates of γ
Atlanta Baltimore Boston Charlotte Chicago Cincinnati Columbus Denver Las Vegas Los Angeles Miami-Fort Lauderdale Minneapolis Nashville New York Orlando Philadelphia Phoenix Pittsburgh Portland Providence Riverside Sacramento San Diego San Francisco San Jose
Benchmark Hedonic Value – 2 Months Prior to Sales p-value p-value Est of γ H0: γ=0 H0: γ=1 (1) (2) (3) 0.675 0.000 0.029 (0.149) 0.773 0.000 0.009 (0.087) 1.036 0.000 0.683 (0.089) 0.170 0.675 0.040 (0.405) 0.922 0.000 0.328 (0.080) 0.873 0.000 0.131 (0.084) 1.255 0.000 0.136 (0.171) 1.065 0.000 0.635 (0.137) 0.583 0.100 0.239 (0.354) 0.505 0.015 0.017 (0.208) 0.592 0.005 0.053 (0.211) 1.019 0.000 0.890 (0.139) 1.276 0.000 0.002 (0.088) 1.201 0.000 0.519 (0.315) 0.865 0.000 0.395 (0.158) 1.282 0.000 0.273 (0.258) 1.131 0.000 0.427 (0.165) 1.022 0.000 0.755 (0.071) 0.725 0.000 0.059 (0.146) 0.611 0.011 0.104 (0.239) 0.377 0.152 0.018 (0.263) 0.703 0.004 0.230 (0.247) 0.867 0.000 0.416 (0.164) 1.211 0.000 0.467 (0.311) 1.091 0.000 0.637 (0.193)
Benchmark Hedonic Value – 3 Months Prior to Sales p-value p-value Est of γ H0: γ=0 H0: γ=1 (4) (5) (6) 0.698 0.000 0.048 (0.153) 0.748 0.000 0.005 (0.090) 1.013 0.000 0.920 (0.126) 0.168 0.693 0.051 (0.426) 0.927 0.000 0.377 (0.082) 0.828 0.000 0.037 (0.082) 1.180 0.000 0.334 (0.186) 0.849 0.000 0.319 (0.151) 0.475 0.167 0.127 (0.344) 0.607 0.019 0.128 (0.258) 0.483 0.032 0.022 (0.226) 1.019 0.000 0.909 (0.165) 1.156 0.000 0.271 (0.141) 1.168 0.000 0.615 (0.335) 1.025 0.000 0.897 (0.195) 1.203 0.000 0.421 (0.252) 1.184 0.000 0.376 (0.207) 0.994 0.000 0.942 (0.082) 0.801 0.000 0.151 (0.138) 0.648 0.004 0.118 (0.225) 0.271 0.298 0.005 (0.261) 0.674 0.006 0.183 (0.245) 0.754 0.000 0.121 (0.158) 1.186 0.000 0.543 (0.305) 0.937 0.000 0.770 (0.217)
26
Benchmark Hedonic Value – 6 Months Prior to Sales p-value p-value Est of γ H0: γ=0 H0: γ=1 (7) (8) (9) 0.698 0.000 0.049 (0.154) 0.793 0.000 0.126 (0.135) 0.765 0.001 0.326 (0.240) 0.401 0.200 0.056 (0.313) 0.890 0.000 0.192 (0.084) 0.841 0.000 0.106 (0.098) 1.045 0.000 0.806 (0.183) 0.491 0.001 0.001 (0.152) 0.431 0.169 0.069 (0.313) 0.408 0.002 0.000 (0.129) 0.289 0.125 0.000 (0.188) 1.003 0.000 0.984 (0.165) 1.199 0.000 0.094 (0.119) 1.018 0.005 0.961 (0.361) 0.827 0.000 0.327 (0.176) 1.200 0.000 0.415 (0.245) 1.080 0.000 0.658 (0.181) 0.939 0.000 0.471 (0.084) 0.569 0.004 0.029 (0.197) 0.809 0.000 0.332 (0.197) 0.112 0.660 0.000 (0.254) 0.520 0.029 0.044 (0.238) 0.703 0.000 0.050 (0.148) 0.889 0.016 0.765 (0.369) 0.761 0.002 0.323 (0.242)
Seattle St. Louis Tampa Virginia Beach Washington DC
1.086 (0.146) 0.853 (0.113) 1.223 (0.069) 0.434 (0.323) 0.986 (0.055)
0.000
0.555
0.000
0.190
0.000
0.001
0.179
0.080
0.000
0.797
1.117 (0.137) 0.951 (0.101) 1.229 (0.078) 0.521 (0.295) 0.994 (0.056)
0.000
0.395
0.000
0.630
0.000
0.003
0.077
0.105
0.000
0.920
0.994 (0.144) 0.778 (0.126) 1.203 (0.067) 0.405 (0.242) 0.968 (0.239)
0.000
0.970
0.000
0.079
0.000
0.002
0.094
0.014
0.000
0.895
Notes: The γ estimates measure the impact of one month prior online price estimates on actual sales prices. Online estimate two, three, and six months prior to sales were used in calculating the nonlinear proxy. Robust standard errors are in parentheses.
27
Table 7. Determinants of the γ estimate Dependent variable:
Percent bachelor degree or higher Ln(number of households 2007) Ln(land area) Ln(population 2008)
Estimate of γ (Effect of online price estimates on sales prices) (1)
(2)
(3)
(4)
(5)
(6)
1.820*** (0.484)
1.858*** (0.603) -0.00422 (0.0354) 0.0219 (0.0979) 0.00957 (0.0383)
1.761** (0.722) -0.0250 (0.0309) -0.0553 (0.124) -0.00256 (0.0475) 1.038 (1.223) -0.958 (1.225)
1.816** (0.839) -0.0218 (0.0537) -0.0484 (0.129) -0.00372 (0.0589) 0.950 (1.107) -0.877 (1.143) 1.104 (3.105)
1.635* (0.906) -0.0281 (0.0618) -0.0578 (0.134) 0.00142 (0.0635) 1.094 (1.253) -1.007 (1.268)
1.924 (1.396) -0.0397 (0.0791) -0.0642 (0.142) -0.00912 (0.0813) 1.014 (1.373) -0.903 (1.422)
Ln(number of houses on Zillow) Ln (number of houses with Zillow estimates) Zillow estimate median error
-0.252 (0.871)
Log(median family income 2007)
Zillow estimate accuracy dummy Y Y variables Observations 30 30 30 29 30 30 R-squared 0.189 0.191 0.230 0.232 0.235 0.238 Notes: Estimates are based on the specification that uses nonlinear proxy and local first differencing. The γ estimates measure the impact of one month prior online price estimates on actual sales prices. The observation drops by one in column (4) because Zillow did not report their median error for St. Louis. The Zillow estimate accuracy dummy variables are four categorical variables based on the degree of error between actual sales price and Zillow’s price estimate. Four implies the most accurate (lowest error) and two implies the least accurate (highest error). Category one is for cities where accuracy is unknown, i.e. St. Louis. The two month prior online estimates were used in calculating the nonlinear proxy. Robust standard errors are in parentheses. *** p<0.01, ** p<0.05, * p<0.1.
28
1.4
Figure1. Scatterplot between the elasticity estimate and educational attainment across MSAs
Estimate of gamma 1 .6 .8
1.2
Philadelphia Nashville Columbus Tampa New York San Francisco Phoenix Jose Seattle San Denver Boston Pittsburgh Minneapolis Washington DC Chicago Cincinnati Orlando St. LouisSan Diego
.2
.4
Baltimore Portland Sacramento Atlanta Providence Miami-Fort Lauderdale Las Vegas Los Angeles Virginia Beach Riverside
0
Charlotte
0
.1
.2 .3 .4 .5 .6 Percent bachelor degree or above in 2010
.7
.8
Notes: Estimates are based on the specification that uses nonlinear proxy and local first-differencing. The gamma estimates measure the impact of one month prior online price estimates on actual sales prices. The two month prior online estimates were used in calculating the nonlinear proxy.
29
Appendix Table 1. Estimates of γ (effect of online price estimates on sales prices) by MSA using the 6 months prior estimate in calculating the proxy variable Estimates of γ (Effect of Online Estimates 1 Month Prior to Sales) No Local First Differencing Local First Differencing
Atlanta Baltimore Boston Charlotte Chicago Cincinnati Columbus Denver Las Vegas Los Angeles Miami-Fort Lauderdale Minneapolis Nashville New York Orlando Philadelphia Phoenix Pittsburgh Portland Providence Riverside Sacramento San Diego San Francisco
No Proxy
Linear Proxy
Nonlinear Proxy
No Proxy
Linear Proxy
Nonlinear Proxy
0.822 (0.090) 1.196 (0.095) 1.113 (0.071) 0.648 (0.569) 0.958 (0.548) 1.024 (0.073) 1.121 (0.065) 0.900 (0.208) 0.818 (0.210) 0.793 (0.152) 1.130 (0.108) 0.903 (0.153) 1.342 (0.070) 0.980 (0.106) 1.173 (0.105) 1.131 (0.131) 0.775 (0.189) 0.942 (0.071) 1.256 (0.133) 0.939 (0.092) 0.740 (0.179) 1.131 (0.281) 0.891 (0.127) 0.810
0.811 (0.077) 1.010 (0.116) 1.110 (0.070) 1..248 (0.420) 0.901 (0.053) 0.982 (0.065) 1.035 (0.051) 0.608 (0.139) 0.715 (0.182) 0.791 (0.142) 0.737 (0.098) 1.098 (0.090) 0.917 (0.076) 0.932 (0.098) 0.872 (0.082) 0.988 (0.131) 0.734 (0.170) 0.888 (0.051) 1.103 (0.108) 0.891 (0.080) 0.720 (0.210) 0.710 (0.193) 0.699 (0.106) 0.894
0.827 (0.069) 0.941 (0.059) 1.095 (0.079) 0.776 (0.462) 0.919 (0.049) 0.956 (0.064) 1.059 (0.056) 0.625 (0.120) 0.834 (0.166) 0.793 (0.108) 0.793 (0.139) 1.090 (0.110) 0.994 (0.092) 0.953 (0.100) 0.878 (0.088) 0.993 (0.133) 0.756 (0.170) 0.934 (0.041) 1.081 (0.100) 0.887 (0.093) 0.384 (0.174) 0.676 (0.159) 0.727 (0.107) 1.033
0.573 (0.160) 1.036 (0.197) 1.385 (0.563) -1.004 (1.558) 0.887 (0.102) 0.854 (0.097) 1.212 (0.271) 0.550 (0.376) 0.698 (0.355) 0.157 (0.201) 0.781 (0.462) 0.578 (0.290) 1.471 (0.146) 1.264 (0.335) 1.187 (0.211) 1.178 (0.297) 0.875 (0.157) 1.053 (0.106) 0.521 (0.224) 0.876 (0.202) 0.121 (0.252) 0.900 (0.295) 1.307 (0.159) 0.509
0.488 (0.140) 0.968 (0.211) 1.384 (0.563) -0.656 (1.173) 0.860 (0.102) 0.798 (0.095) 1.170 (0.343) 0.510 (0.320) 0.627 (0.347) 0.147 (0.189) 0.577 (0.317) 0.692 (0.278) 1.089 (0.138) 1.388 (0.346) 1.125 (0.152) 1.062 (0.321) 0.886 (0.153) 0.954 (0.139) 0.557 (0.222) 0.838 (0.292) 0.051 (0.234) 0.626 (0.234) 1.264 (0.147) 0.676
0.698 (0.154) 0.793 (0.135) 0.765 (0.240) 0.401 (0.313) 0.890 (0.084) 0.841 (0.098) 1.045 (0.183) 0.491 (0.152) 0.431 (0.313) 0.408 (0.129) 0.289 (0.188) 1.003 (0.165) 1.199 (0.119) 1.018 (0.361) 0.827 (0.176) 1.200 (0.245) 1.080 (0.181) 0.939 (0.084) 0.569 (0.197) 0.809 (0.197) 0.112 (0.254) 0.520 (0.238) 0.703 (0.148) 0.889
30
p-Value
Local First Diff Nonlinear Proxy
H0: γ = 0 0.000
H0: γ = 1 0.049
0.000
0.126
0.001
0.326
0.200
0.056
0.000
0.192
0.000
0.106
0.000
0.806
0.001
0.001
0.169
0.069
0.002
0.000
0.125
0.000
0.000
0.984
0.000
0.094
0.005
0.961
0.000
0.327
0.000
0.415
0.000
0.658
0.000
0.471
0.004
0.029
0.000
0.332
0.660
0.000
0.029
0.044
0.000
0.050
0.016
0.765
(0.186) (0.140) (0.169) (0.292) (0.285) (0.369) 0.856 0.897 0.940 0.678 0.761 0.761 0.002 0.323 (0.078) (0.057) (0.052) (0.252) (0.279) (0.242) Seattle 0.947 0.934 0.939 1.074 1.059 0.994 0.000 0.970 (0.103) (0.086) (0.086) (0.128) (0.139) (0.144) St. Louis 1.014 0.873 0.926 0.694 0.756 0.778 0.000 0.079 (0.160) (0.108) (0.107) (0.195) (0.138) (0.126) Tampa 1.163 0.856 1.086 1.423 1.177 1.203 0.000 0.002 (0.108) (0.073) (0.063) (0.145) (0.123) (0.067) Virginia Beach 0.932 1.001 1.040 0.137 0.133 0.405 0.094 0.014 (0.129) (0.062) (0.085) (0.261) (0.243) (0.242) Washington DC 0.951 0.965 0.974 0.351 0.453 0.968 0.000 0.895 (0.048) (0.032) (0.030) (0.394) (0.349) (0.239) Notes: The γ estimates measure the impact of one month prior online price estimates on actual sales prices. The six month prior online estimates were used in calculating the nonlinear proxy. Robust standard errors are in parentheses. San Jose
31
Appendix Table 2. Determinants of the γ estimate (3 month prior estimates used in the nonlinear proxy) Dependent variable:
Percent bachelor degree or higher
Estimate of γ (Effect of online price estimates on sales prices) (1)
(2)
(3)
(4)
(5)
1.584** (0.623)
1.579** (0.670) -0.00979 (0.0399) -0.00427 (0.106) 0.0117 (0.0364)
1.361* (0.724) -0.0417 (0.0317) -0.114 (0.128) -0.0120 (0.0453) 1.084 (1.290) -0.960 (1.291)
1.451* (0.833) -0.0231 (0.0519) -0.109 (0.133) -0.00572 (0.0568) 1.095 (1.218) -0.990 (1.238) 0.306 (2.997)
1.496 (0.888) -0.0199 (0.0552) -0.126 (0.132) -0.00400 (0.0594) 1.194 (1.309) -1.103 (1.311)
Ln(number of households 2007) Ln(land area) Ln(population 2008) Ln(number of houses on Zillow) Ln (number of houses with Zillow estimates) Zillow estimate median error
Zillow estimate accuracy dummy Y variables Observations 30 30 30 29 30 R-squared 0.151 0.153 0.226 0.227 0.243 Notes: Estimates are based on the specification that uses nonlinear proxy and local first differencing. The γ estimates measure the impact of one month prior online price estimates on actual sales prices. The observation drops by one in column (4) because Zillow did not report their median error for St. Louis. The Zillow estimate accuracy dummy variables are four categorical variables based on the degree of error between actual sales price and Zillow’s price estimate. Four implies the most accurate (lowest error) and two implies the least accurate (highest error). Category one is for cities where accuracy is unknown, i.e. St. Louis. The three month prior online estimates were used in calculating the nonlinear proxy. Robust standard errors are in parentheses. *** p<0.01, ** p<0.05, * p<0.1.
32