By You Ren

, Emily B. Fox

, and Andrew Bruce

University of Washington

Understanding how housing values evolve over time is important
to policy makers, consumers and real estate professionals. Existing
methods for constructing housing indices are computed at a coarse
spatial granularity, such as metropolitan regions, which can mask
or distort price dynamics apparent in local markets, such as neigh-
borhoods and census tracts. A challenge in moving to estimates at,
for example, the census tract level is the sparsity of spatiotempo-
rally localized house sales observations. Our work aims at address-
ing this challenge by leveraging observations from multiple census
tracts discovered to have correlated valuation dynamics. Our pro-
posed Bayesian nonparametric approach builds on the framework of
latent factor models to enable a exible, data-driven method for in-
ferring the clustering of correlated census tracts. We explore methods
for scalability and parallelizability of computations, yielding a hous-
ing valuation index at the level of census tract rather than zip code,
and on a monthly basis rather than quarterly. Our analysis is pro-
vided on a large Seattle metropolitan housing dataset.
1. Introduction.
The housing market is a large part of the global econ-
omy. In the United States, roughly fty percent of household wealth is in res-
idential real estate, according to a Federal Reserve Study (Iacoviello, 2011).
Between 15% and 17% of the U.S. gross domestic product is on housing
and housing related services according to GDP statistics published by the
U.S. Bureau of Economic Analysis. Understanding how the value of housing
changes over time is important to policy makers, consumers, real estate pro-
fessionals and mortgage lenders. Valuation is relatively straightforward for
commoditized sectors of the economy, such as energy or non-discretionary
spending. By contrast, valuation of residential real estate is intrinsically
dicult due to the individual nature of houses. Since the composition of
the houses sold changes from one time period to the next, the change in
the reported prices does not necessarily re ect the overall change in value.
Consequently, economists and public policy researchers have devoted con-
siderable e ort to developing a meaningful index to measure the change in
housing prices over time.
arXiv:1505.01164v1 [stat.AP] 5 May 2015
The most common approach to constructing a housing price index is the
repeat sales model, rst proposed by Bailey et al. (1963). The main idea
is to use a pair of sales for the same house to model the price trend over
time. Assuming the house remains in the same condition, the rst sales price
serves as a surrogate for the house
(house-level covariates) and the
di erence in the subsequent sales price captures the change in value over that
intra-sales period. This approach largely circumvents the problem caused
by the change in composition of houses sold. A large body of literature
extends the original repeat sales model with numerous modi cations and
improvements (cf., Case and Shiller, 1987, 1989; Gatzla and Haurin, 1997;
Shiller, 1991; Goetzmann and Peng, 2002). The repeat sales model is the
basis for the Case-Shiller home value index, published by Core-Logic and
widely disseminated by the media.
One drawback of a repeat sales model is that houses with only a single
sales transaction get discarded from the dataset. Case and Shiller (1987)
report that, over a study period of 16 years, single sales make up as much
as 93%-97% of total transactions for metropolitan areas such as Atlanta,
Dallas, Chicago and San Francisco. As such, studies based on repeat sales
data rely on only a fraction of all transactions and may not be a good
representation of the entire house market. Englund and Redfearn (1999)
and Meese and Wallace (1997) detected a sampling selection bias in which
the repeat sales properties are older, smaller and more modest than single-
sale properties. Furthermore, small samples lead to less precise parameter
estimation. To overcome this, Case and Quigley (1991) propose a hybrid
model that combines repeat sales with hedonic information to make use of all
sales. Recently, Nagaraja et al. (2011) propose an autoregressive repeat sales
model that utilizes all sales data without the need for hedonic information.
Their approach leads to an index estimated quarterly at the zip code level.
Existing repeat sales models, even those using all of the transactions,
perform the best when t to relatively l
