3 Generalized regression estimation
Jan de Haan and Rens Hendriks
Previous | Next
3.1 A simple GREG
method
In this section we will outline an alternative approach
to measuring house price change that makes use of appraisal data. The
appraisals now serve as auxiliary information in a generalized regression
(GREG) framework. Consider the following simple two-variable linear regression
model:
where is the error term. Unlike hedonic regression
models, which postulate a causal relation between the selling price and a set of characteristics relating to the
structure and the location of the housing units, this model does not say
anything about how house prices are generated; equation (3.1) is merely a
descriptive model.
Estimating model (3.1) by least squares regression on
the data of sample yields predicted prices
The regression residuals for are Assuming random sampling, as before, we can
write the Horvitz-Thompson estimator of the mean value as
Replacing the sample average of appraisals, by its population counterpart yields the generalized regression (GREG)
estimator:
Model-assisted sampling theory shows that GREG
estimators are asymptotically design
unbiased (Särndal, et al.
1992), irrespective of the choice of regressors. Unless the sample would be
small, the bias can be neglected. It is obvious that the GREG estimator (3.4)
will be more efficient in the sense that it has a lower variance than the Horvitz-Thompson estimator (3.3). As
a result, the GREG estimator will usually outperform the Horvitz-Thompson
estimator in terms of the mean square error (the sum of the variance and the
squared bias).
The same procedure can be applied to the comparison
period After estimating the model
through least squares regression on the data of the
current period sample we obtain predicted prices
which lead to the GREG estimator of the mean value
of the housing stock in period
where denote the period regression residuals. For a fixed housing
stock we have hence and it follows that
The GREG estimator of house price change results simply
from taking the ratio of equations (3.8) and (3.4):
where Some additional small sample bias will be
introduced due to the non-linear (ratio) structure. When using Ordinary Least
Squares (OLS) regression to estimate the models (3.1) and (3.5), the unweighted
sample means of regression residuals in (3.9), and will be equal to 0 and the GREG index reduces
to
As the first expression on the right-hand side of (3.10)
indicates, the (OLS) GREG approach essentially imputes prices pertaining to the
base period and the current period using equations (3.2) and (3.6). The
difference with the hedonic double
imputation method is twofold: a descriptive model, not a hedonic one, is
used to estimate predicted prices so that we cannot speak of unbiased predicted
prices and prices are imputed for all houses of the
housing stock instead of the sub-set of sampled houses.
3.2 Properties of the GREG index
The (OLS) GREG index has several properties worth
mentioning. First, the computation of the GREG index is very simple. Once the
population mean of appraisals and the base period regression coefficients and have been calculated, all that is needed is
running a regression each month of selling prices against appraisals and
plugging the coefficients and into (3.10). Note that the GREG index can be
written as a pseudo chain index:
This
can be helpful in practice, particularly when new appraisal data becomes
available. New appraisal data often becomes available to the statistical
agency with a considerable time lag, up to more than a year. There are two
reasons for using the latest appraisal information. The quality of the
appraisals may improve over time, which seems to have been the case in the
Netherlands (de Vries et al.
2009). Also, the assumption of a fixed housing stock can be relaxed so that
newly-built properties can be incorporated through chaining; the resulting
chained GREG index takes the dynamics of the housing stock into account. The
same advantages of chaining apply to the SPAR method. Suppose new appraisals, relating to period are available in period The time series can then be
updated through chain-linking, i.e., by multiplying by the month-to-month change where the coefficients now
pertain to a regression of selling prices on the period appraisals.
Second, standard
errors of the GREG index can be estimated rather easily using the variance-covariance
matrix of the regression coefficients, which is standard output of most
statistical packages. An expression for the approximate standard error is
derived in the Appendix. The standard error of the GREG index depends on the
goodness of fit of the regression model. It is most likely
that for the base period regression is higher than
that for the current period regressions. This is because we expect to find a
strong linear relation between appraisals and sale prices in the appraisal
reference period while in later periods this relation will probably be weaker
due to differing price trends across different types of houses or regions. The derivation of approximate
standard errors for the SPAR index is a bit more complex because there is an
additional source of sampling error, namely the sampling variability of the
mean appraisals; see de Haan (2007).
The latter point brings us to the third property of the
GREG index, namely its dependence on the quality
of the appraisal data. For two reasons at least the appraisals may not
exactly represent the transaction prices during the base period so that the
model fit is not perfect The assessment authorities may not have (real
time) access to the actual sale prices and therefore have to make their own
judgements based on other information. But even if they knew the selling
prices, the authorities may still decide to make adjustments when determining
the property values. It can be argued
that selling prices do not always properly measure the unknown market values which
can be seen as a latent variable and
tend to be more volatile. In this respect, Francke (2010) and others have used the term
transaction noise.
The way in which
the appraisals have been determined will affect the standard error of the GREG
index. As long as the quality of the appraisal data is the same for all houses
in stock, no bias arises since the appraisals only serve as an auxiliary
variable in regressions run on the samples and of properties sold in periods 0 and However, in general we expect the quality of
the appraisals to be higher for properties belonging to the appraisal reference
(base) period sample although this will most likely differ across
valuation methods. In the
Netherlands the properties are assessed for tax purposes, both for income tax
and local taxes. The municipalities are responsible for the valuations. Several
municipalities value the houses which are sold during the reference period
(January) by the selling price. Houses which were not sold are sometimes valued
by comparing them to similar traded houses. Some municipalities apparently use
a form of hedonic regression to value the houses, but the methodology is
unfortunately not made publicly available. For more information on the Dutch
appraisal system, see de Vries et al.
(2009).
So far we have assumed that the quality of the
individual houses stays the same over time. This is a strong assumption. Thus,
the fourth property and most important drawback of the GREG method is that the resulting price
index suffers from quality change bias
since explicit quality adjustments are not carried out. The same drawback holds
true for the SPAR method and for the standard repeat sales method. In
principle, hedonic regression methods can deal with the quality change problem,
although it may prove difficult to control for all relevant price determining
characteristics, in particular micro location. The SPAR method automatically
controls for micro location, provided of course that the appraisals sufficiently
account for this, as it is based on the matched-model methodology where the
matching is done at the address level.
3.3 Alternative GREG
estimators
Statistics Netherlands not only computes house price
indexes for the whole country but also for segments of the housing market,
according to type of house (family dwellings and apartments) and region
(provinces and large cities), mainly because of user needs. Another motivation
behind stratifying the sample can be to mitigate the effect of sample selection bias. This type of bias
may arise if the set of houses sold in a particular period is not a random
selection from the housing stock. The nationwide index should then be
indirectly computed as a weighted average of the stratum indexes instead of
directly from all observations.
Suppose the total housing stock is sub-divided into non-overlapping strata of size The target price index (2.3) can now be rewritten
as
where is the target price index for stratum The base period stock value shares which serve as weights for the stratum
indexes, are unknown and have to be estimated. Assuming the variables that
define the strata are known for all a natural choice for the weights would be the
appraisal shares Obviously, the stratum-defining housing variables should be included in the
appraisal data set. In the Netherlands
address and type of dwelling are included. This allows a sub-division of the
population into cross classifications of location and type of dwelling. Appraisals
may not always be accurate estimates of the 'true' market values of the
individual properties but at the stratum level we expect the accuracy of the
average appraisals to be sufficient for the computation of the weights.
Statistical techniques such as GREG estimation are
typically applied to estimate totals or means for small domains for which the
number of observations is so small that the standard errors using traditional
(Horvitz-Thompson) estimators in our case the ratio of sample means would become unacceptably high. It should be
mentioned that, even with the GREG method, the stratification scheme should not
be too detailed since that might unduly raise the variance of the stratum
indexes and hence of the aggregate index. More importantly perhaps, small
sample bias will increase and may become non-negligible with very small
samples.
OLS regressions of selling prices on appraisals should
now be run in every time period for each stratum in order to compute the
aggregate GREG index. The stratified (OLS) GREG index is
Differences in the slope coefficients across the strata could be the result of
sampling error or reflect a real phenomenon. The latter can be of particular
importance for periods which are very distant from period 0 as
different housing market segments tend to show varying price trends. Whether
any differences in the slope coefficients reflect a real phenomenon could be
tested.
An alternative model, to be estimated on the entire data
set, is one with a single intercept term, but where the are allowed to differ across the strata. Let be a dummy variable that has the value 1 if
property belongs to stratum and 0 otherwise. In period the model
is estimated by OLS regression on the data of the
sample yielding predicted prices for The residuals again sum to zero and the new
(unstratified) OLS GREG index becomes
Model (3.14) is more flexible than the original model
given by equations (3.1) and (3.5), and could be useful if the proportionality
between sale prices and appraisals fails. Estimator (3.15) reduces to the
original GREG index (3.10) if the are all equal. In practice this will not
happen, and (3.15) and (3.10) will give different answers. A common
justification for the use of GREG estimators is that, being asymptotically
unbiased, they are relatively robust to
model choice. So we would expect the impact of the alternative model
specification (3.15) to be moderate. On the other hand, it is well recognized
in the literature that model dependence can be an issue under specific
circumstances, notably when dealing with highly variable and outlier-prone
populations. For example, Hedlin, Falvey, Chambers and Kokic (2001)
stress the importance of a careful model specification search while Beaumont and Alavi
(2004) focus on the treatment of outliers. It would therefore be
worthwhile examining the effect of this alternative model specification.
Previous | Next