# 2 Horvitz-Thompson estimators and the SPAR index

Jan de Haan and Rens Hendriks

The typical aim of survey sampling is to estimate the total or (arithmetic) mean of some variable for a finite population. In a housing context we may want to estimate the total value of the housing stock in, say, period 0. Let ${U}^{0}$ denote the housing stock of size ${N}^{0}$ and ${p}_{n}^{0}$ the value of house $n(n=1,\dots ,{N}^{0}).$ The target to be estimated is

$${V}^{0}={\displaystyle \sum _{n\in {U}^{0}}{p}_{n}^{0}}.\text{}\text{}\text{}\left(2.1\right)$$

Suppose we have a sample ${S}^{0}$ consisting of ${n}^{0}$ houses sold in the base period. If the houses were selected by simple random sampling from the housing stock ${U}^{0},$ where each house had the same inclusion probability, then the Horvitz-Thompson estimator

$${\widehat{V}}^{0}=({N}^{0}/{n}^{0}){\displaystyle \sum _{n=1}^{{n}^{0}}{p}_{n}^{0}}\text{}\text{}\left(2.2\right)$$

is an unbiased estimator of (2.1); see *e.g.*, Cochran (1977).

A natural target $\u2013$ though not the only possibility $\u2013$ for a house price index would be the value
change of a fixed housing stock. Conditioning on the *base period stock* has two implications: additions to the stock
(mostly newly-built houses) should be excluded and the price changes of
existing properties should be adjusted for quality changes, *i.e.*, for the impact of depreciation,
renovations and extensions. For convenience we assume that such quality changes
are negligible. In that case the target price index going from the base period
0 to the comparison period $t(>0)$ is defined as

$${P}^{0t}=\frac{{\displaystyle \sum _{n\in {U}^{0}}{p}_{n}^{t}}}{{\displaystyle \sum _{n\in {U}^{0}}{p}_{n}^{0}}},\text{}\text{}\left(2.3\right)$$

with obvious notation. Suppose that we also have a sample ${S}^{t},$ consisting of ${n}^{t}$ houses sold in period $t$ and assume that it is an independent random draw from the base period stock. The ratio of the Horvitz-Thompson estimators (the sample means) in both periods

$${\widehat{P}}^{0t}=\frac{({N}^{0}/{n}^{t}){\displaystyle \sum _{n\in {S}^{t}}{p}_{n}^{t}}}{({N}^{0}/{n}^{0}){\displaystyle \sum _{n\in {S}^{0}}{p}_{n}^{0}}}=\frac{{\displaystyle \sum _{n\in {S}^{t}}{p}_{n}^{t}/{n}^{t}}}{{\displaystyle \sum _{n\in {S}^{0}}{p}_{n}^{0}/{n}^{0}}}\text{}\text{}\text{}\left(2.4\right)$$

might seem a natural estimator of our target index (2.3). However, if the samples ${S}^{0}$ and ${S}^{t}$ are independently drawn, the variance of estimator (2.4) can be substantial. Moreover, an estimated ratio such as (2.4) has a bias that depends on the variance of the numerator and the covariance of the numerator and the denominator (Cochran 1977). From an index number perspective the issue at stake is that the mix of properties traded in period $t$ differs from that in period 0. That is, we are not comparing like with like.

The standard approach to estimating price indexes relies on the matched model methodology where prices ${p}_{n}^{0}$ and ${p}_{n}^{t}$ are observed for a fixed panel of items. The use of panel data ensures that like is compared with like and will reduce the variance of the ratio estimator because ${p}_{n}^{0}$ and ${p}_{n}^{t}$ are typically positively correlated. However, unless the samples ${S}^{0}$ and ${S}^{t}$ are extraordinary large, there will only be few matched houses, if any. Hence, while prices ${p}_{n}^{t}$ are observed for the houses belonging to ${S}^{t},$ for most of those houses the base period prices ${p}_{n}^{0}$ are 'missing'. What may be available instead are government assessments ${a}_{n}^{0}.$ We could use these as base period values and construct the following (pseudo) matched-model estimator of house price change:

$${\tilde{P}}^{0t}=\frac{{\displaystyle \sum _{n\in {S}^{t}}{p}_{n}^{t}}/{n}^{t}}{{\displaystyle \sum _{n\in {S}^{t}}{a}_{n}^{0}}/{n}^{t}}.\text{}\text{}\text{}\left(2.5\right)$$

A problem associated with estimator (2.5) is that the base period index number will differ from 1 because the appraisals ${a}_{n}^{0}$ differ from the selling prices ${p}_{n}^{0}.$ Rescaling (2.5) by dividing it by its base period value is an obvious solution, yielding

$${\widehat{P}}_{\text{SPAR}}^{0t}=\frac{{\displaystyle \sum _{n\in {S}^{t}}{p}_{n}^{t}}/{n}^{t}}{{\displaystyle \sum _{n\in {S}^{t}}{a}_{n}^{0}}/{n}^{t}}{\left[\frac{{\displaystyle \sum _{n\in {S}^{0}}{p}_{n}^{0}}/{n}^{0}}{{\displaystyle \sum _{n\in {S}^{0}}{a}_{n}^{0}}/{n}^{0}}\right]}^{-1}=\frac{{\displaystyle \sum _{n\in {S}^{t}}{p}_{n}^{t}}/{n}^{t}}{{\displaystyle \sum _{n\in {S}^{0}}{p}_{n}^{0}}/{n}^{0}}\left[\frac{{\displaystyle \sum _{n\in {S}^{0}}{a}_{n}^{0}}/{n}^{0}}{{\displaystyle \sum _{n\in {S}^{t}}{a}_{n}^{0}}/{n}^{t}}\right].\text{}\text{}\text{}\left(2.6\right)$$

Note that the rescaling factor is stochastic, as it is a ratio of sample means for the base period, and will increase the variance of (2.6) as compared to the estimator given by (2.5), depending on the correlations between the appraisals and the selling prices. Details can be found in de Haan (2007). But we cannot circumvent rescaling since a price index that does not start at the value 1 would be meaningless.

Expression (2.6) is called a Sale Price Appraisal Ratio
(SPAR) index. The SPAR method has been applied in the *housing stock*, which is a measure of the change in wealth. In the
context of the Harmonized Index of Consumer Prices on the other hand, the house
price index should measure the price change of the *houses sold* during the base period (Makaronidis and Hayes 2006;
Eurostat 2010). Under the latter concept there would be no sampling involved if
all transactions are recorded and used in the compilation of the index, as is
the case in the

The second expression on the right-hand side of (2.6)
writes the SPAR index as the product of two factors, the ratio of sample means
and a factor between brackets. As the SPAR index is essentially based on the
matched model methodology (using base period appraisals instead of sale
prices), this factor adjusts the ratio of sample means for changes in the
quality mix of the samples that occur between period 0 and period $t.$ A potential problem is that the SPAR index is *not a panel-type estimator*. A SPAR time
series, say for periods $t=0,\dots ,T,$ might therefore suffer from short-term
volatility due to mix changes, especially when the number of sales is low.

## Report a problem on this page

Is something not working? Is there information outdated? Can't find what you're looking for?

Please contact us and let us know how we can help you.

- Date modified: