Section 5
Processing errors

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Proportion of households or individuals requiring imputation, at the national and provincial levels

Errors can arise in all types of data handling. The main stages of data processing are response coding, data entry, editing, imputation of partial nonresponse and weighting. In the Survey of Household Spending (SHS), different procedures are applied at each stage in order to minimize processing errors, and the survey estimates are compared with other data sources prior to release. Errors related to the adjustments made at the weighting stage are described in sections 2 and 3. The other types of processing errors are covered in this section.

Because of the shift to a computer-assisted collection method in 2006, data processing and quality control procedures were altered. Automated edits incorporated into the questionnaire replaced the previously conducted balance edit checks and edits in regional offices. For the 2008 SHS, interviewers entered responses on a portable computer and conducted an initial edit simultaneously. Thus, interval controls, which showed minimums and maximums for certain purchases, were applied if the interviewer entered an unusual amount. Other edits targeted inconsistent responses, such as a household renting its dwelling but paying no rent.

The processing of SHS data also involves imputation for partial nonresponse. Partial nonresponse occurs when the respondent refuses to answer or does not know the answer to certain questions. The imputation approach differs depending on whether the data are categorical or continuous. Categorical data take on only specific values (as in yes/no questions or type of dwelling questions), while continuous data can take any numerical value (such as for income and expenditure data).

Income and expenditure data are imputed using the nearest neighbour technique. The imputation is done on one group of variables at a time, with the groups formed on the basis of the relationships among the variables. A group generally corresponds to a section of the questionnaire. For each group, the missing values for a recipient (a household that has some missing data for at least one of these variables) are imputed from data on the most similar record among all donors (households that have no missing values for these variables). For each recipient, the closest donor is chosen as the one that minimizes a particular distance function. This function is based on matching variables that are chosen because they are correlated with the variables to be imputed. For example, the total income of a household is chosen as a matching variable for all sections pertaining to expenditures. It must also be ensured that, after receiving the donor values, the recipient household satisfies certain consistency rules. In general, the imputation is done at the household level, but in some groups (e.g., income and clothing expenditures), the imputation is done at the person level since the original data are collected at that level for these variables.

Note that since 2001, the imputation of all expenditure and income data has been done using the Canadian Census Edit and Imputation System (CANCEIS) of Statistics Canada. This new system is based on methodology that is slightly different from that in the system used previously. The new system allows a better use of categorical variables as matching fields when selecting a donor. Moreover, this system lends itself to the imputation of both continuous and categorical data. The new system was tested prior to its implementation and the results were similar to those provided by the old system. As of 2003, categorical data, which are found mainly in the dwelling characteristics and facilities sections of the questionnaire, have been imputed with the CANCEIS system. The categorical data were previously imputed with the help of a "hot deck" imputation technique that randomly chooses a donor from a group of respondent households with similar characteristics.

The bias caused by imputation of partial nonresponse is difficult to evaluate. It depends on the differences between respondents and nonrespondents as well as the ability of the imputation method to produce unbiased estimates. However, the imputation rates give an indication of the importance of partial nonresponse. They are presented in the following section.

5.1 Proportion of households or individuals requiring imputation, at the national and provincial levels

A first indication of the magnitude of partial nonresponse is the proportion of households requiring imputation and the number of variables imputed per household. The questionnaire can be divided into two major groups of variables: those collected at the household level and those collected at the individual level (such as income and clothing expenditure). For the latter, it is important to note that the respondent may provide only the total income or total clothing expenditures if he/she is unable to provide the breakdowns by source of income or type of expenditure. This increases the level of imputation for the components of income and clothing expenditure, but it does not affect total income, total clothing expenditure or total expenditure.

The percentage of households requiring imputation for household expenditure (excluding clothing expenditures and expenditures in the section on Personal Taxes, Security and Money Gifts) is presented in the next subsection. The subsection after that presents the percentage of persons requiring imputation for a clothing expenditure variable, the percentage of persons requiring imputation for an income variable and the percentage of persons requiring imputation for a variable in the section on Personal Taxes, Security and Money Gifts. The last subsection presents the results for the percentage of households requiring imputation of at least one categorical variable. After data imputation by the system, some corrections might have been needed on both imputed and non-imputed variables to ensure data consistency. In reality, these changes constitute only a very small percentage. The results are provided for the national and provincial levels. This gives an indication of which provinces are most affected by imputation.

5.1.1 Household expenditure imputation by province

Table 5.1-1 shows the percentage of usable households requiring imputation of at least one expenditure variable. Usable households are all households living in eligible dwellings, excluding households that could not be contacted, refused to participate in the survey or provided incomplete data (see definitions in section 2.1). The table is broken down by the number of imputed variables (out of 246) for a household.

Note that regular mortgage payments and mortgage insurance premiums are included under shelter costs and thus under total expenditure. Starting with 2002, these two variables were added to the calculation of imputation rates shown in Table 5.1-1. The impact of this change is a higher overall imputation rate.

Starting in 2004, a change was made to the questionnaire regarding expenditures on residential communication services (telephone, cell phone and Internet access), cable television services, satellite distribution services and security systems. Because of the growing use of packages (bundled services), a household may be billed for combined services and may therefore be unable to provide expenditures for individual services. In such a case, the respondent household may provide only the total expenditure for these services while indicating which services are included in the package. Expenditures for individual services are then imputed in two stages. First, we impute households for which only a few services are missing, followed by households for which only the total expenditure for the package is available. For the latter households, the imputed expenditures for services (those included in the package) are adjusted proportionally so that their sum corresponds to the total expenditure on the package as provided by the respondent household. Since this change has had a major impact on the overall imputation rate for expenditures, the imputation rates in Table 5.1-1 are shown separately with and without the costs of residential communications services, rental of cable television services, rental of satellite distribution services and rental of security services. Also, since this change has had an impact on the level of imputation of expenditures for these six services, Table 5.1-2 is provided, showing the imputation rate and a measure of the impact of imputation for each of these services.

Table 5.1-1 Households requiring expenditure imputation by province

Since there is a higher level of imputation for expenditures related to residential communications services, cable television services, satellite distribution services and security systems, it is important to measure the effect of imputation on the estimates of totals for these six variables. This measure, along with the imputation rate, can be used to see how the amount of imputation done for these variables changes over time. Owing to the growing popularity of packages (bundled services), the imputation level should increase over time. To measure the impact of imputation, the weighted total of the imputed data is divided by the total estimate (sum of weighted values). This measure represents the proportion of the total value of the estimate that is obtained from imputed data.

Table 5.1-2 Impact of imputation of residential communications services, cable television services, satellite distribution services and security systems at the national level

5.1.2 Individual expenditure and income imputation for the provinces

Since some respondents provide only totals for clothing expenditure and income variables, a two-step procedure is used to impute these variables (at the individual level). Individuals who require imputation of only certain components are imputed first, followed by those for whom only totals are available but imputation on all components is required. See reference [1] for a more detailed description of this process.

The percentage of usable individuals (members of usable households) requiring imputation of at least one income variable is presented for the provinces in Table 5.2. The table shows the percentage of persons who had exactly one variable imputed, the percentage who had two or more variables (but not all) imputed and the percentage of persons for whom only total income was available (and hence required having all their components imputed). The total percentage of persons requiring some form of income imputation is also provided. The second to last column of Table 5.2 indicates the total percentage of persons requiring some form of imputation for clothing expenditure variables. The last column of Table 5.2 indicates the total percentage of persons requiring some form of imputation for the Personal Taxes, Security and Money Gifts section of the questionnaire.

Note that only household members aged 15 or over on December 31 of the reference year must answer the questions relating to personal income, personal taxes, security and money gifts. Thus, since the 2003 reference year, the percentage of persons requiring some form of imputation for income variables as well as for the Personal Taxes, Security and Money Gifts section has been calculated using only persons aged 15 or over, rather than all persons as in previous years. This modification resulted in a slightly higher imputation rate for those variables. As in previous years, the percentage of persons requiring imputation for clothing expenditure variables is based on all persons, since those expenditure questions are asked for each household member.

Table 5.2 Persons requiring income imputation, persons requiring clothing expenditure imputation and persons requiring imputation for variables in Personal Taxes, Security and Money Gifts section by province

5.1.3 Imputation of categorical variables by province

Table 5.3 shows the percentage of usable households requiring imputation of at least one categorical variable. The table is broken down by the number of imputed variables (out of 41) for a household. Categorical variables that required imputation can be found in the following sections of the questionnaire: Dwelling Characteristics (with the exception of the dwelling type variable); Facilities Associated with the Dwelling; Tenure (with the exception of variables related to a tenure change during the reference year); Tobacco and Miscellaneous for variables pertaining to purchases through direct sales (yes/no questions). Note that other categorical variables from the questionnaire, such as the household composition variables or questionnaire skips, are edited and validated by subject-matter experts from Income Statistics Division. Therefore, the latter variables are not imputed using the nearest neighbour technique.

Table 5.3 Households requiring imputation of categorical variables, Canada and the province

Date modified: