Register-based sampling for household panels 2. Sampling design
The target population of the RIS is all natural persons residing in the Netherlands. The sample frame is a register containing all natural persons aged 15 years and over residing in the Netherlands as far as they are known to the Tax Office. From this register a stratified simple random sample of so-called core persons is drawn with a sample fraction of 0.16. Neighbourhoods are used as the stratification variable. Although an equal probability design is used, stratified sampling is useful to eliminate the variation between strata and to meet minimum precision requirements for the individual strata. The Netherlands is divided in about 2,830 neighbourhoods with an average size of 5,000 persons aged 15 years and over.
The RIS has been conducted as a panel since 1994. A first requirement for correct cross-sectional inference with this panel is to have correct first and second order inclusion expectations for the sampling units, which are derived in Section 3. A second requirement for correct cross-sectional inference is to keep the panel representative of the target population. To this end, it is determined on a yearly basis which part of the population has entered the target population of the RIS through birth and immigration. From this subpopulation, a stratified simple random sample of core persons with a sample fraction of 0.16 is selected. These core persons are added to the panel of the RIS, with the purpose to maintain a representative sample.
Neighbourhoods are the most detailed level of publication for the RIS and are therefore used as strata. In Section 4 expressions for minimum sample sizes based on precision requirements are derived. Core persons remain in the panel indefinitely. On each survey occasion, all members of the core person’s household are also included in the sample. Persons that leave the household of a core person also leave the panel. New persons entering the household of the core person are followed in the panel as long as this person stays in the household of a core person. Information about the household composition of the core persons are obtained from the Municipal Basis Administration (MBA), which is the Dutch government’s registry of all residents in the country. Dutch citizens are required by law to report changes in their demographics to their municipalities. The MBA is used in combination with the information from tax administrations to identify household members of the core persons in the sample.
The sample design results in a sample of households where the households are selected with probabilities proportional to the number of persons aged 15 years or older belonging to a household at the current period. Households can be selected more than once, but with a maximum that equals the number of household members aged 15 year or older. In this paper the term core persons is used to refer to the persons that are initially included in the sample and are followed over time in the panel. The term persons is used to refer to the sample obtained if all the household members at a particular period are included in the sample.
The IPS applies a similar sample design with a substantially smaller sampling fraction. The RIS, like the IPS, are register based samples which implies that for each person that is included in the sample, the necessary information for the RIS variables is obtained from the registers of the Tax Office. Core persons and their household members are therefore not aware that they are included in these samples. This has the advantage that there are no problems with selective non-response and panel attrition. This also makes it possible to include the core persons indefinitely. In the case of a panel where sampling units must complete a questionnaire, some kind of rotating design would be required in order to avoid selectivity bias due to panel attrition. Also, problems with measurement bias associated with data collection where sampling units are asked to complete a questionnaire do not occur. Of course other types of measurement errors are encountered with a survey that is based on registrations (Wallgren and Wallgren 2007). It is assumed that all the required information about income to estimate the target parameters of the RIS and the IPS are available in these registers. Since all the required information is available in a register, a complete enumeration of the population is possible. In the past, however, the IT infrastructure was insufficient to produce timely regional income statistics based on a complete enumeration of the Dutch population. Therefore the RIS was traditionally based on a large sample with a fraction of 0.16 core persons. For the same reason the IPS is traditionally based on a sample of about 80,000 core persons. With the current computational capacity a complete enumeration would still be very demanding but not impossible. The main rationale for conducting this survey as a sample is to maintain the panel for longitudinal analysis that cover time periods from the past where a census was impossible.