# Register-based sampling for household panels 5. Linear weightingRegister-based sampling for household panels 5. Linear weighting

For household surveys like the RIS, estimates are required for person characteristics as well as household characteristics. Let ${t}_{y}$  denote the total of a target variable $y.$  With linear weighting, an estimator for a person based target variable is defined as

${\stackrel{^}{t}}_{y}=\sum _{h=1}^{H}\sum _{k\in 1}^{{m}_{h}}\sum _{j\in k}{w}_{kj}{y}_{kjh},\text{ }\text{ }\text{ }\text{ }\text{ }\left(5.1\right)$

with ${y}_{kjh}$ the value of the target variable for person $\left(k,j,h\right)$ and ${w}_{kj}$ a weight for person $j$ belonging to household $k.$ An estimator for a household based target variable is given by

${\stackrel{^}{t}}_{y}=\sum _{h=1}^{H}\sum _{k=1}^{{m}_{h}}{w}_{k}{y}_{kh},\text{ }\text{ }\text{ }\text{ }\text{ }\left(5.2\right)$

with ${y}_{kh}$ the value of the target variable for household $k$ from stratum $h$ and ${w}_{k}$ a weight for the corresponding household.

Weights are obtained by means of the GREG estimator to use auxiliary variables which are observed in the sample and for which the population totals are known from other sources (Särndal et al. 1992). Consequently, the weights reflect the (unequal) inclusion expectations of the sampling units and an adjustment such that for auxiliary variables the weighted observations sum to the known population totals. Often categorical variables like gender, age, marital status or region are used as auxiliary variables. Due to the fact that the values of auxiliary variables differ from person to person within the same household, different weights can be derived for people from the same household. To ensure that relationships between household variables and person variables are reflected in estimated totals, it is relevant to apply a weighting method which yields one unique household weight for all its household members. If the weights for persons within a household are the same, then household and person based estimates of the same target variables are consistent with each other (for example the total income estimated from households and that from persons). This can be achieved with so-called integrated weighting methods.

Lemaître and Dufour (1987) apply an integrated weighting method at the persons level and replace the original auxiliary variables defined at the person level by the corresponding household mean. In this way, members of the same household have the same inclusion expectation and share the same auxiliary information, and therefore the resulting regression weights are forced to be the same. Nieuwenbroek (1993) proposes a slightly more general approach by applying the linear weighting method at the household level, where the auxiliary information of person based characteristics is aggregated at the household level. Nieuwenbroek (1993) mentions that the linear weighting method at the household level is equal to the linear weighting method of Lemaître and Dufour (1987) at the person level, if the residual variance of the regression model at the household level is chosen proportional to the number of persons within the household. Steel and Clark (2007) and Estevao and Särndal (2006) further generalize the integrated weighting of person and household surveys. Steel and Clark (2007) address the issue of whether the cosmetic benefits of integrated weighting result in an increased design variance of the GREG estimates. They show that large-sample design variances obtained by linear weighting at the household level is less than or equal to the design variance obtained with linear weighting at the person level. For small samples there can be a small increase in the design variance due to integrated weighting. As a result there is little or no loss in efficiency by applying an integrated weighting method.

In this paper the integrated weighting approach at the household level is applied. Let ${x}_{kh}$ denote a $q$ -vector containing $q$ auxiliary variables for household $k$ from stratum $h.$ Person based characteristics are aggregated to household totals. The GREG estimator is derived from a linear regression model that specifies the relation between the target variable and the available auxiliary variables for which population totals are known, and is defined as:

${y}_{kh}={x}_{kh}^{t}\beta +{e}_{kh},\text{ }\text{with}\text{ }{\text{E}}_{m}\left({e}_{kh}\right)=0,\text{ }{\text{V}}_{m}\left({e}_{kh}\right)={\sigma }_{kh}^{2}.\text{ }\text{ }\text{ }\text{ }\text{ }\left(5.3\right)$

In (5.3) $\beta$ denotes a vector containing the $q$ regression coefficients of the regression of ${y}_{kh}$ on ${x}_{kh}$ and ${e}_{kh}$ the residuals and ${\text{E}}_{m}$ and ${\text{V}}_{m}$ denote the expectation and variance with respect to the regression model. In this application, the variance structure is taken proportional to the household size, i.e., ${\sigma }_{hk}^{2}={g}_{k}{\sigma }^{2}.$ Nieuwenbroek (1993) shows that in this case the weighting applied at the household level is equal to the method of Lemaître and Dufour (1987).

Regression weights for the households are finally obtained by

${w}_{k}=\frac{1}{{\pi }_{k}}\left(1+{\left({t}_{x}-{\stackrel{^}{t}}_{x\pi }\right)}^{t}{\left(\sum _{k=1}^{m}\frac{{x}_{kh}{x}_{kh}^{t}}{{\pi }_{k}{g}_{k}}\right)}^{-1}\frac{{x}_{kh}}{{g}_{k}}\right),$

with ${t}_{x}$ a $q$ vector containing the known population totals of the auxiliary variables $x,$ ${\stackrel{^}{t}}_{x\pi }$ the HT estimator for ${t}_{x}.$ The weights calculated at the household level can be used for weighting person based characteristics of the corresponding household members, using formula (5.1) since ${w}_{kj}={w}_{k}$ for all persons belonging to the same household $k.$

Is something not working? Is there information outdated? Can't find what you're looking for?