User Guide for the Survey of Household Spending, 2014
3. Data quality

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Like all surveys, the SHS is subject to error, despite all the precautions taken in each step of the survey to prevent error or reduce its impact. There are two types of error: sampling and non-sampling.

3.1 Sampling errors

Sampling errors occur because inferences about the entire population are based on information obtained from only a sample of the population. The sample design, estimation method, sample size and data variability determine the size of the sampling error. The data variability for an expenditure item refers to the differences between members of the population in spending on that item. In general, the greater the differences between households, the larger the sampling error will be.

A common measure of sampling error is the standard error (SE). The SE is the degree of variation in the estimates which results from selecting one particular sample over another. The SE expressed as a percentage of the estimate is called the coefficient of variation (CV). The CV is used to indicate the degree of uncertainty associated with an estimate. For example, if the estimated number of households with a given dwelling characteristic is 10,000 with a CV of 5%, then the actual number is between 9,500 and 10,500 households 68% of the time, and between 9,000 and 11,000 households 95% of the time.

The standard errors for the SHS are estimated using the bootstrap method (see reference [1] in Section 7). CVs are available for the national and provincial estimates as well as for the estimates by household type, age of reference person, household income quintile, household tenure and size of area of residence.

3.2 Data suppression

To ensure accuracy, estimates with a CV greater than or equal to 35% have been suppressed.

Data for suppressed items do contribute to summary-level estimates. For example, if the expenditure estimate for a particular item of clothing were suppressed, this amount would still be included in the total estimate for clothing expenditure.

3.3 Non-sampling errors

Non-sampling errors occur because certain factors make it difficult to obtain accurate responses and to ensure that these responses retain their accuracy throughout processing. Unlike sampling errors, non-sampling errors are not easily quantified. Four sources of non-sampling error can be identified: coverage error, response error, non-response error and processing error.

3.3.1 Coverage error

Coverage error arises when sampling frame units do not adequately represent the target population. This error may occur during sample design or selection, or during data collection or processing.

3.3.2 Response error

Response error occurs when respondents provide inaccurate information. This error may be due to many factors, including flawed design of the questionnaire, misinterpretation of questions by interviewers or respondents, or faulty reporting by respondents.

Response error is the most difficult aspect of data quality to measure. In general, the accuracy of SHS data depends largely on respondents’ ability to remember (recall) household expenditures and their willingness to consult records.

3.3.3 Non-response error

Errors due to non-response occur when potential respondents do not provide the required information or when the information they provide is unusable. The main impact of non-response on data quality is that it can cause a bias in the estimates if the characteristics of non-respondents differ from those of respondents in a way that impacts the expenditures studied. While non-response rates can be calculated, they provide only an indication of data quality, since they do not measure the degree of bias present in the estimates. The magnitude of non-response can be considered a simple indicator of the risks of bias in the estimates.

For the 2014 SHS, the interview response rate is 66.7%, and provincial response rates are shown in Table 1. The table also shows the number of non-responding households by reason for non-response. Reasons include the inability to contact the household, the household’s refusal to participate in the survey and the inability to conduct an interview because of special circumstances (e.g., the respondent speaks neither official language or has a physical condition that precludes an interview). Respondents in the latter category are referred to as residual non-respondents.

Table 1
Interview response rates, Canada and provinces, 2014
Table summary
This table displays the results of Interview response rates Eligible sampled households, No contacts, Refusals, Residual non-respondents, Respondents and Response rate , calculated using number and percentage units of measure (appearing as column headers).
  Eligible sampled households No contacts Refusals Residual non-respondents Respondents Response rateTable 1 Note 1
number percentage
Canada 17,109 1,232 3,918 546 11,413 66.7
Atlantic provinces 5,364 286 1,221 196 3,661 68.3
Newfoundland and Labrador 1,529 96 313 37 1,083 70.8
Prince Edward Island 769 31 171 37 530 68.9
Nova Scotia 1,567 48 396 61 1,062 67.8
New Brunswick 1,499 111 341 61 986 65.8
Quebec 2,226 112 521 57 1,536 69.0
Ontario 2,407 205 594 118 1,490 61.9
Prairie provinces 4,981 474 1,072 122 3,313 66.5
Manitoba 1,682 171 335 57 1,119 66.5
Saskatchewan 1,495 154 341 38 962 64.3
Alberta 1,804 149 396 27 1,232 68.3
British Columbia 2,131 155 510 53 1,413 66.3

Some of the households selected to fill out a diary did not complete it or provided a diary that was considered unusable under the criteria outlined in section 2.5. For the 2014 SHS, the diary response rate among the interview-respondent households who were selected to fill out a diary is 66.1%. Provincial rates are provided in Appendix A. The final diary response rate (defined as the percentage of usable diaries relative to the number of households selected to fill out the diary) is 43.6% at the national level, and provincial rates are shown in Table 2.

Table 2
Diary response rates, Canada and provinces, 2014
Table summary
This table displays the results of Diary response rates Eligible sampled households, Interview non-respondents, Diaries, Response rate, Refusal, Unusable and Usable, calculated using number and percentage units of measure (appearing as column headers).
  Eligible sampled householdsTable 2 Note 1 Interview non-respondentsTable 2 Note 2 DiariesTable 2 Note 3 Response rateTable 2 Note 5
RefusalTable 2 Note 4 Unusable Usable
number percentage
Canada 8,625 2,943 1,789 135 3,758 43.6
Atlantic provinces 2,711 870 543 50 1,248 46.0
Newfoundland and Labrador 760 221 147 17 375 49.3
Prince Edward Island 390 105 97 6 182 46.7
Nova Scotia 805 267 182 15 341 42.4
New Brunswick 756 277 117 12 350 46.3
Quebec 1,122 357 237 13 515 45.9
Ontario 1,214 479 267 14 454 37.4
Prairie provinces 2,508 863 504 43 1,098 43.8
Manitoba 843 294 169 10 370 43.9
Saskatchewan 758 270 135 10 343 45.3
Alberta 907 299 200 23 385 42.4
British Columbia 1,070 374 238 15 443 41.4

The response rates vary from month to month. Monthly response rates for the interview and diary can be found in Appendix B. Interview and diary response rates by size of area of residence and dwelling type are shown in Appendix C.

The diary response rate of interview respondents can be found in Appendix D, broken down by various household characteristics, including household type, household tenure, age of the reference person, and before-tax income quintile.

Cases for which the respondent fails to answer some of the questions are referred to as partial non-response. Imputing missing values compensates for this partial non-response. Imputation rates are described in Section 3.3.5.

There are also cases in which a household fails to complete the diary for all 14 days as required, leaving days with no data. Adjustment factors were thus calculated to take into consideration these days with no data.

3.3.4 Processing error

Processing errors may occur in any of the data processing stages, including data entry, coding, editing, imputation of partial non-response, weighting and tabulation. Steps taken to reduce processing error are described in Section 2.5.

3.3.5 Imputation of partial non-response

The residual bias remaining after the imputation of partial non-response is difficult to measure. Its magnitude depends on the imputation method’s ability to produce unbiased estimates. The imputation rates provide an indication of the magnitude of partial non-response.

Partial interview non-response may result from a lack of information or from an invalid response to a question. The national and provincial percentages of households for which certain expenditure categories had to be imputed due to partial interview non-response shown in Table 3. These percentages are presented by number of imputed expenditure variables per household (out of all consumer expenditure data collected during the interview). The table contains two series of results, one including and the other excluding expenditures on communication services (telephone, cell phone and Internet), television services (via cable, a satellite dish or a phone line), satellite radio services, and home security services. This distinction has been made because these services are increasingly being purchased as a package. Households are often billed for bundled services, making it difficult or impossible for them to provide separate expenditure amounts for each service. Therefore, the total amount paid for the package is allocated to individual services through imputation, which significantly increases the number of households for which expenditures must be imputed.

Table 3
Percentage of households requiring imputation for consumer expenses collected during the interview, Canada and provinces, 2014
Table summary
This table displays the results of Percentage of households requiring imputation for consumer expenses collected during the interview Number of variables imputed
(out of 188), Number of variables imputed
(out of 193), 1, 2 to 9, 10 or more and Total, calculated using percentage units of measure (appearing as column headers).
  Number of variables imputed Table 3 Note 1
(out of 188)
Number of variables imputed Table 3 Note 2
(out of 193)
1 2 to 9 10 or more Total 1 2 to 9 10 or more Total
percentage
Canada 18.9 34.3 2.9 56.1 8.8 66.0 4.9 79.7
Newfoundland and Labrador 17.3 33.1 1.8 52.3 4.3 75.1 4.2 83.7
Prince Edward Island 21.5 33.4 2.1 57.0 7.5 73.0 4.3 84.9
Nova Scotia 18.4 34.2 1.2 53.8 6.4 75.0 2.4 83.8
New Brunswick 19.5 29.8 2.0 51.3 8.1 69.4 3.8 81.2
Quebec 17.5 32.3 2.8 52.6 7.5 67.4 4.8 79.6
Ontario 19.3 32.1 2.9 54.3 11.5 58.3 4.2 74.0
Manitoba 17.6 44.7 6.0 68.3 11.9 58.7 9.4 80.0
Saskatchewan 19.6 34.4 2.8 56.9 11.9 60.3 5.1 77.2
Alberta 17.2 35.4 3.2 55.8 9.3 60.1 5.2 74.6
British Columbia Table 3 Note 3 21.9 34.3 3.3 59.6 8.8 68.4 5.2 82.3

Users of expenditure estimates relating to communication, television, satellite radio or home security services should therefore take into account the high level of imputation of the expenditure data when examining these individual services. A measure of the impact of imputation on each individual service has been produced and is discussed in Appendix E. This measure represents the proportion of the total value of the estimate obtained from imputed data.

The percentages of households that responded to the interview and for which dwelling characteristics or household equipment had to be imputed can be found in Appendix F.

The imputation rates for all expenditures reported in the diary are shown in Tables 4 and 5. Table 4 deals with expenditures on goods and services including food from stores, which are reported in the first section of the diary. Table 5 shows the imputation rates for restaurant expenditures, which are reported in the second section of the diary.

For expenditure data from the diaries, imputation is used primarily to assign a value when the amount of a reported expenditure is missing, to assign a list of expenditure items (with individual costs) when only the total cost is provided (e.g., to assign grocery items and their individual costs when the respondent has provided only the total amount of the grocery bill), or to assign an expenditure code that is more detailed than the one that could be assigned using the information from the respondent (e.g., the type of bakery product). The imputation rate for each of these three types of imputation is shown in Table 4. Each rate represents the proportion of imputed items relative to all expenditure items from the diaries.

Table 4
Imputation rates by type of imputation for the section of the diary on goods and services including food from stores, Canada, 2014
Table summary
This table displays the results of Imputation rates by type of imputation for the section of the diary on goods and services including food from stores. The information is grouped by Type of imputation (appearing as row headers), Imputation rate, calculated using percentage units of measure (appearing as column headers).
Type of imputation Imputation rate
percentage
Imputation of a missing cost for a reported expense  
Food from stores 1.1
Other goods and services 2.3
All expenditures 1.5
Imputation of expenditure items (and their individual cost) from a total expense  
Food from stores 20.1
Other goods and services 12.4
All expenditures 17.6
Imputation of detailed expenditure code  
Food from stores 5.8
Other goods and services 5.5
All expenditures 5.7

The risks of bias associated with the imputed data depend largely on the level of detail at which the SHS data are used. For example, food expenditure data in the SHS are produced at a high level of detail to meet the needs of the Food Expenditure Survey users (last conducted in 2001). Food expenditures are categorized using a hierarchical system of more than 200 expenditure codes. For some reported expenditure items, the food product may have been known (e.g., dairy products or even milk), but the level of detail required (e.g., skim milk, 1% milk or 2% milk) had to be imputed. This type of imputation creates a risk of bias only in expenditure estimates at a very detailed level. In other cases, however, almost no information on the type of expenditure was available before imputation (e.g., it was known only that the expenditure was for a good). When so little information is available, the risks of bias in the estimates of the expenditure categories are more significant.

Restaurant expenditures are reported using a slightly different format in the second section of the diary. Imputation is used primarily to assign a value when the total amount of the restaurant expenditure or the cost of alcoholic beverages is missing, or when the type of meal (breakfast, lunch, dinner or snack and beverage) has not been specified. The imputation rate for each of these three types of imputation is shown in Table 5.

Table 5
Imputation rates by type of imputation for the section of the diary on snacks, beverages and meals purchased from restaurants or fast-food outlets, Canada, 2014
Table summary
This table displays the results of Imputation rates by type of imputation for the section of the diary on snacks, beverages and meals purchased from restaurants or fast-food outlets. The information is grouped by Type of imputation (appearing as row headers), Imputation rate, calculated using percentage units of measure (appearing as column headers).
Type of imputation Imputation rate
percentage
Imputation of total cost 1.01
Imputation of costs for alcoholic beverages 4.28
Imputation of meal type (breakfast, lunch, dinner, or snacks and beverages) 8.07

Lastly, households have the option of either providing receipts or recording their expenditure information in the diary. Table 6 shows the percentage of expenditures reported using each method for food expenditures, restaurant expenditures, and expenditures for other goods and services.

Table 6
Methods for recording expenses in the diary, Canada, 2014
Table summary
This table displays the results of Methods for recording expenses in the diary. The information is grouped by Expenditure category (appearing as row headers), Transcriptions and Receipts, calculated using percentage units of measure (appearing as column headers).
Expenditure category Transcriptions Receipts
percentage
Food 21.3 78.7
Restaurant 83.5 16.5
Other goods and services 45.2 54.8

Imputation rates vary depending on the expenditure reporting method. The rates in Tables 4 and 5 are presented by the expenditure reporting method in Appendix G.

3.4 The effect of large values

For any sample, estimates of totals, averages and standard errors can be affected by the presence or absence of large values in the sample. Large values are more likely to arise from positively skewed populations. Such values are found in the SHS and are taken into account when the final estimates are generated.

Date modified: