Section 5
Processing errors

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Proportion of households or individuals requiring imputation, at the national and provincial levels

Errors can arise in all types of data handling. The main stages of data processing are response coding, data entry, editing, imputation of partial nonresponse and weighting. In the Survey of Household Spending (SHS), different procedures are applied at each stage in order to minimize processing errors, and the survey estimates are compared with other data sources prior to release. Errors related to the adjustments made at the weighting stage have been described in sections 2 and 3. The other types of processing errors are covered in this section.

Because of the shift to a computer-assisted collection method in 2006, data processing and quality control procedures were altered. Automated edits incorporated into the questionnaire replaced the previously conducted balance edit checks and edits in regional offices. For the 2007 SHS, interviewers entered responses on a portable computer and conducted an initial edit simultaneously. Thus, interval controls, which showed minimums and maximums for certain purchases, were applied if the interviewer entered an unusual amount. Other edits targeted inconsistent responses, such as where the household was renting its dwelling but no rent was paid.

The processing of SHS data also involves imputation for partial nonresponse. Partial nonresponse occurs when the respondent refuses to answer or does not know the answer to certain questions. The imputation approach differs depending on whether the data are categorical or continuous. Categorical data take on only specific values (as in yes/no questions or type of dwelling questions), while continuous data can take any numerical value (as for income and expenditure data).

Income and expenditure data are imputed using the nearest neighbour technique. The imputation is done on one group of variables at a time, with the groups chosen by taking the relationships among the variables into account. A group generally corresponds to a section of the questionnaire. For each group, the missing values for a recipient (a household that has some missing data for at least one of these variables) are imputed from data on the most similar record among all donors (households that have no missing values for these variables). For each recipient, the closest donor is chosen as the one that minimizes a particular distance function. This function is based on matching variables that are chosen because they are correlated with the variables to be imputed. For example, the total income of a household is chosen as a matching variable for all sections pertaining to expenditures. It must also be ensured that, after receiving the donor values, the recipient household satisfies certain consistency rules. In general, the imputation is done at the household level, but in some groups (e.g., income and clothing expenditures), the imputation is done at the person level since the original data are collected at that level for these variables.

Note that since 2001, the imputation of all expenditure and income data has been done using the Canadian Census Edit and Imputation System (CANCEIS) of Statistics Canada. This new system is based on methodology that is slightly different from that in the system used previously. The new system allows a better use of categorical variables as matching fields when selecting a donor. Moreover, this system lends itself to the imputation of both continuous and categorical data. The new system was tested prior to its implementation and the results it gave were similar to those with the old system. Starting with 2003, categorical data, which are found mainly in the dwelling characteristics and facilities sections of the questionnaire, are imputed with the CANCEIS system. The categorical data were previously imputed with the help of a "hot deck" imputation technique that randomly chooses a donor from a group of respondent households with similar characteristics.

The bias caused by imputation of partial nonresponse is difficult to evaluate. It depends on the differences between respondents and nonrespondents as well as the ability of the imputation method to produce unbiased estimates. However, the imputation rates give an indication of the importance of partial nonresponse. They are presented in the following section.

5.1 Proportion of households or individuals requiring imputation, at the national, provincial and territorial levels

A first indication of the magnitude of partial nonresponse is the proportion of households requiring imputation and the number of variables imputed per household. The questionnaire can be divided into two major groups of variables: those collected at the household level and those collected at the individual level (such as income and clothing expenditure). For the latter, it is important to note that the respondent may provide only the total income or total clothing expenditures if he/she is unable to provide the breakdowns by source of income or type of expenditure. The level of imputation for the components of income and clothing expenditure is then larger, but this does not affect total income, total clothing expenditure or total expenditure.

The percentage of households requiring imputation for household expenditure (excluding clothing expenditures and expenditures in the section on Personal Taxes, Security and Money Gifts) is presented in the next sub-section. The subsequent sub-section presents the percentage of persons requiring imputation for a clothing expenditure variable, the percentage of persons requiring imputation for an income variable and the percentage of persons requiring imputation for a variable in the section on Personal Taxes, Security and Money Gifts. Finally, the last sub-section presents the results for the percentage of households requiring imputation for at least one of the categorical variables. After data imputation by the system, some corrections might have been needed on both imputed and non-imputed variables, in order to ensure data consistency. In reality, these changes constitute only a very small percentage. The results are provided at the national and provincial levels. This gives an indication of which provinces are most affected by imputation.

5.1.1 Household expenditure imputation by province or territory

Table 5.1-1 shows the percentage of usable households requiring imputation of at least one expenditure variable. Usable households are all households living in eligible dwellings, excluding households who could not be contacted, who refused to participate in the survey, or who provided incomplete data or who were out of balance (see definitions in Section 2.1). The table is broken down by the number of imputed variables (out of 242) for a household.

Note that regular mortgage payments and mortgage insurance premiums are included under shelter costs and thus under total expenditure. Starting with 2002, these two variables were added to the calculation of imputation rates shown in Table 5.1-1. The impact of this change is a higher overall imputation rate.

Starting in 2004, a change was made to the questionnaire regarding expenditures on communication services in the home (telephone, cell phone and Internet access), cable television services, satellite distribution services and security systems. Because of the growing use of packages (bundled services), a household may be billed for combined services, with the result that it is impossible for it to provide expenditures for individual services. In such a case, the respondent household may provide only the total expenditure for these services while indicating which services are included in the package. Expenditures for individual services are then imputed in two stages. First, we impute households for which only a few services are missing, followed by households for which only the total expenditure for the package is available. For the latter households, the imputed expenditures for services (those included in the package) are adjusted proportionally so that their sum corresponds to the total expenditure on the package as provided by the respondent household. Since this change has had a major impact on the overall imputation rate for expenditures, the imputation rates in Table 5.1-1 are shown separately with and without the costs of communications services in the home, rental of cable television services, rental of satellite distribution services and rental of security services. Also, since this change has had an impact on the level of imputation of expenditures for these six services, Table 5.1-2 is provided, showing the imputation rate and a measure of the impact of imputation for each of these services.

Table 5.1-1 Households requiring expenditure imputation by province or territory

Table 5.1-1 shows that it was necessary to impute expenditures for 49.5% of households nationally. Since 2004, this rate has been higher because of the change made to the questionnaire regarding expenditures related to communications services in the home (telephone, cell phone and Internet access), cable television services, satellite distribution services and security systems. Approximately 39% (data not shown) of usable households required imputation of at least one of these six services. In almost all of these cases, the household had reported paying for a package (bundled services) and the expenditures associated with the services included in the package were imputed. The higher imputation rates when these six variables are taken into account, such as shown in the column "2 variables imputed" and the column "3 or more variables imputed," are due to the fact that a package usually includes two or more services. Excluding expenditures related to communications services in the home, cable television services, satellite distribution services and security systems, the overall imputation rate is 19.6% at the national level. Just for the variable representing mortgage insurance premiums, imputation is required for 5.4% of usable households (or 14.6% of households when selecting only households that reported mortgages on dwellings that they owned and occupied) (data not shown).

When expenditures related to telecommunications services in the home (telephone, cell phone and Internet access), cable television services, satellite distribution services and security systems are excluded, it may be seen that nearly 62% of usable households (requiring imputation) required imputation of a single variable. Also, very few households had more than one variable imputed (7.4%). The provinces or territories with the lowest proportions of households requiring imputation of at least one expenditure variable are Nunavut (10.4%) and Yukon (15.2%). The highest rates are in Quebec (22.2%), Alberta (23.1%) and Nova Scotia (23.5%). Nova Scotia and British Columbia have the highest percentages of households that required imputation for more than one expenditure variable. In those two provinces, more than 40% of the households that required imputation had two or more expenditure variables imputed.

If we exclude regular mortgage payments, mortgage insurance premiums, expenditures related to communications services in the home, cable television services, satellite distribution services and security systems, then the low percentage of households for which variables had to be imputed, combined with a generally low number of variables to be imputed when imputation is required, suggests that the impact of imputed values on the estimates should not be too high.

Since there is a higher level of imputation for expenditures related to communications services in the home, cable television services, satellite distribution services and security systems, it is important to measure the effect of imputation on the estimates of totals for these six variables. This measure, along with the imputation rate, can be used to see how the amount of imputation done for these variables changes over time. Owing to the growing popularity of packages (bundled services) within the population, the imputation level should increase over time. To measure the impact of imputation, the weighted total of the imputed data is divided by the total estimate (sum of weighted values). This measure represents the proportion of the total value of the estimate that is obtained from imputed data.

Table 5.1-2 Impact of imputation on communications services, cable television services, satellite distribution services and security systems at the national level

According to Table 5.1-2, the imputation rate and the impact of imputation are greater for expenditures related to Internet access services and the rental of cable television services. This is mainly due to the fact that among households that reported paying for a package, a large proportion of packages included these two services. The high level of imputation performed on the components in Table 5.1-2 suggests that the estimates of these components might be greatly affected by imputation, while the effect on the estimate of the total of these six services combined will be negligible, since households must provide the total expenditure associated with the package. While the imputation rate and the impact are high for expenditures on Internet access services, the increase that occurred in 2007 for average Internet access expenditures was consistent with the trends observed from other independent sources of information. Internet access services accounted for 19.8% of all household expenditures on communications. Total expenditures on the six services in Table 5.1-2 combined represent only 2.8% of total household expenditure.

5.1.2 Person expenditure and income imputation by province or territory

Since some respondents provide only totals for clothing expenditure and income variables, a two-step procedure is used to impute these variables (at the individual level). Individuals who require imputation of only certain components are imputed first, followed by those for whom only totals are available but imputation on all components is required. (See reference [1] for a more detailed description of this process.)

The percentage of usable individuals (persons who are members of usable households) requiring imputation for an income variable is presented by province in Table 5.2. The table shows the percentage of persons who had exactly one variable imputed, the percentage of those who had two or more variables (but not all) imputed and the percentage of persons for whom only total income was available (and hence required having all their components imputed).The total percentage of persons requiring some form of income imputation is also provided. The second to last column of Table 5.2 indicates the total percentage of persons requiring some form of imputation for clothing expenditure variables. The last column of Table 5.2 indicates the total percentage of persons requiring some form of imputation for the Personal Taxes, Security and Money Gifts section of the questionnaire.

Note that questions related to personal income, personal taxes, security and money gifts are asked for each household member aged 15 or over on December 31 of the reference year. Thus, since the 2003 reference year, the percentage of persons requiring some form of imputation for income variables as well as for the Personal Taxes, Security and Money Gifts section was calculated using only persons aged 15 or over and was not based on all persons as done in previous years. This modification resulted in an imputation rate slightly higher for those variables. As was done in previous years, the percentage of persons requiring imputation for clothing expenditure variables is based on all persons, since those expenditure questions are asked for each household member.

Table 5.2 Persons requiring income imputation, persons requiring clothing expenditure imputation and persons requiring imputation for variables in personal taxes, security and money gifts section by province or territory

These results show that 5.4% of persons from usable households had imputation performed on at least one income variable. For nearly 60% of them, exactly one variable was imputed. Provincially, the percentages of persons requiring imputation on at least one income variable range from a low of 3.9% for Newfoundland and Labrador and a high of 7.5% for British Columbia.

From the second to last column of the table, it can be seen that 9.6% of persons required imputation for at least one of the clothing expenditure variables. The provincial rates range from 3.7% for Newfoundland and Labrador to 15.0% for Nova Scotia. Almost all these people provided their total expenditure on clothing but required imputation of the components. The higher level of imputation required on clothing expenditure components suggests that the estimates for these components could be greatly affected by imputation, while the effect on the estimates for total clothing expenditure will be negligible.

From the last column of the table, results show that 9.9% of persons had some imputation performed on at least one variable in the Personal Taxes, Security and Money Gifts section. Provincially, this percentage ranges from a low of 7.9% in Saskatchewan to a high of 12.9% in New Brunswick.

5.1.3 Imputation of categorical variables by province or territory

Table 5.3 shows the percentage of usable households requiring imputation of at least one categorical variable. The table is broken down by the number of imputed variables (out of 41) for a household. Categorical variables that required imputation can be found in the following sections of the questionnaire: Dwelling Characteristics (with the exception of the dwelling type variable); Facilities Associated with the Dwelling; Tenure (with the exception of variables related to a tenure change during the reference year); Tobacco and Miscellaneous for variables pertaining to purchases through direct sales (yes/no questions). Note that other categorical variables from the questionnaire, such as the household composition variables or questionnaire skips, are edited and validated by subject matter experts from the Income Statistics Division. Therefore, the latter variables are not imputed using the nearest neighbour technique.

Table 5.3 Households requiring imputation of categorical variables by province or territory

Table 5.3 indicates that at the national level, 8.9% of households required imputation of at least one categorical variable relating to dwelling characteristics, facilities associated with the dwelling, tenure and purchases through direct sales. However, approximately 74% of those households had only one variable imputed. Provincially, the total imputation rate ranges from a low of 4.8 % for Newfoundland and Labrador to a high of 10.4 % for Manitoba and Alberta.

Date modified: