ACHIEVING A HYPERLOCAL HOUSING PRICE INDEX:OVERCOMING DATA SPARSITY BY BAYESIANDYNAMICAL MODELING OF MULTIPLE DATASTREAMSBy You Ren, Emily B. Fox, and Andrew BruceUniversity of WashingtonUnderstanding how housing values evolve over time is importantto policy makers, consumers and real estate professionals. Existingmethods for constructing housing indices are computed at a coarsespatial granularity, such as metropolitan regions, which can maskor distort price dynamics apparent in local markets, such as neigh-borhoods and census tracts. A challenge in moving to estimates at,for example, the census tract level is the sparsity of spatiotempo-rally localized house sales observations. Our work aims at address-ing this challenge by leveraging observations from multiple censustracts discovered to have correlated valuation dynamics. Our pro-posed Bayesian nonparametric approach builds on the framework oflatent factor models to enable a exible, data-driven method for in-ferring the clustering of correlated census tracts. We explore methodsfor scalability and parallelizability of computations, yielding a hous-ing valuation index at the level of census tract rather than zip code,and on a monthly basis rather than quarterly. Our analysis is pro-vided on a large Seattle metropolitan housing dataset.1. Introduction.The housing market is a large part of the global econ-omy. In the United States, roughly fty percent of household wealth is in res-idential real estate, according to a Federal Reserve Study (Iacoviello, 2011).Between 15% and 17% of the U.S. gross domestic product is on housingand housing related services according to GDP statistics published by theU.S. Bureau of Economic Analysis. Understanding how the value of housingchanges over time is important to policy makers, consumers, real estate pro-fessionals and mortgage lenders. Valuation is relatively straightforward forcommoditized sectors of the economy, such as energy or non-discretionaryspending. By contrast, valuation of residential real estate is intrinsicallydicult due to the individual nature of houses. Since the composition ofthe houses sold changes from one time period to the next, the change inthe reported prices does not necessarily re ect the overall change in value.Consequently, economists and public policy researchers have devoted con-siderable e ort to developing a meaningful index to measure the change inhousing prices over time.1arXiv:1505.01164v1 [stat.AP] 5 May 20152Y. REN ET AL.The most common approach to constructing a housing price index is therepeat sales model, rst proposed by Bailey et al. (1963). The main ideais to use a pair of sales for the same house to model the price trend overtime. Assuming the house remains in the same condition, the rst sales priceserves as a surrogate for the househedonics(house-level covariates) and thedi erence in the subsequent sales price captures the change in value over thatintra-sales period. This approach largely circumvents the problem causedby the change in composition of houses sold. A large body of literatureextends the original repeat sales model with numerous modi cations andimprovements (cf., Case and Shiller, 1987, 1989; Gatzla and Haurin, 1997;Shiller, 1991; Goetzmann and Peng, 2002). The repeat sales model is thebasis for the Case-Shiller home value index, published by Core-Logic andwidely disseminated by the media.One drawback of a repeat sales model is that houses with only a singlesales transaction get discarded from the dataset. Case and Shiller (1987)report that, over a study period of 16 years, single sales make up as muchas 93%-97% of total transactions for metropolitan areas such as Atlanta,Dallas, Chicago and San Francisco. As such, studies based on repeat salesdata rely on only a fraction of all transactions and may not be a goodrepresentation of the entire house market. Englund and Redfearn (1999)and Meese and Wallace (1997) detected a sampling selection bias in whichthe repeat sales properties are older, smaller and more modest than single-sale properties. Furthermore, small samples lead to less precise parameterestimation. To overcome this, Case and Quigley (1991) propose a hybridmodel that combines repeat sales with hedonic information to make use of allsales. Recently, Nagaraja et al. (2011) propose an autoregressive repeat salesmodel that utilizes all sales data without the need for hedonic information.Their approach leads to an index estimated quarterly at the zip code level.Existing repeat sales models, even those using all of the transactions,perform the best when t to relatively l
đang được dịch, vui lòng đợi..