User Guide for the Survey of Household Spending, 2014
3. Data quality
Archived Content
Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.
Like all surveys, the SHS is subject to error, despite all the precautions taken in each step of the survey to prevent error or reduce its impact. There are two types of error: sampling and non-sampling.
3.1 Sampling errors
Sampling errors occur because inferences about the entire population are based on information obtained from only a sample of the population. The sample design, estimation method, sample size and data variability determine the size of the sampling error. The data variability for an expenditure item refers to the differences between members of the population in spending on that item. In general, the greater the differences between households, the larger the sampling error will be.
A common measure of sampling error is the standard error (SE). The SE is the degree of variation in the estimates which results from selecting one particular sample over another. The SE expressed as a percentage of the estimate is called the coefficient of variation (CV). The CV is used to indicate the degree of uncertainty associated with an estimate. For example, if the estimated number of households with a given dwelling characteristic is 10,000 with a CV of 5%, then the actual number is between 9,500 and 10,500 households 68% of the time, and between 9,000 and 11,000 households 95% of the time.
The standard errors for the SHS are estimated using the bootstrap method (see reference [1] in Section 7). CVs are available for the national and provincial estimates as well as for the estimates by household type, age of reference person, household income quintile, household tenure and size of area of residence.
3.2 Data suppression
To ensure accuracy, estimates with a CV greater than or equal to 35% have been suppressed.
Data for suppressed items do contribute to summary-level estimates. For example, if the expenditure estimate for a particular item of clothing were suppressed, this amount would still be included in the total estimate for clothing expenditure.
3.3 Non-sampling errors
Non-sampling errors occur because certain factors make it difficult to obtain accurate responses and to ensure that these responses retain their accuracy throughout processing. Unlike sampling errors, non-sampling errors are not easily quantified. Four sources of non-sampling error can be identified: coverage error, response error, non-response error and processing error.
3.3.1 Coverage error
Coverage error arises when sampling frame units do not adequately represent the target population. This error may occur during sample design or selection, or during data collection or processing.
3.3.2 Response error
Response error occurs when respondents provide inaccurate information. This error may be due to many factors, including flawed design of the questionnaire, misinterpretation of questions by interviewers or respondents, or faulty reporting by respondents.
Response error is the most difficult aspect of data quality to measure. In general, the accuracy of SHS data depends largely on respondents’ ability to remember (recall) household expenditures and their willingness to consult records.
3.3.3 Non-response error
Errors due to non-response occur when potential respondents do not provide the required information or when the information they provide is unusable. The main impact of non-response on data quality is that it can cause a bias in the estimates if the characteristics of non-respondents differ from those of respondents in a way that impacts the expenditures studied. While non-response rates can be calculated, they provide only an indication of data quality, since they do not measure the degree of bias present in the estimates. The magnitude of non-response can be considered a simple indicator of the risks of bias in the estimates.
For the 2014 SHS, the interview response rate is 66.7%, and provincial response rates are shown in Table 1. The table also shows the number of non-responding households by reason for non-response. Reasons include the inability to contact the household, the household’s refusal to participate in the survey and the inability to conduct an interview because of special circumstances (e.g., the respondent speaks neither official language or has a physical condition that precludes an interview). Respondents in the latter category are referred to as residual non-respondents.
| Eligible sampled households | No contacts | Refusals | Residual non-respondents | Respondents | Response rateTable 1 Note 1 | |
|---|---|---|---|---|---|---|
| number | percentage | |||||
| Canada | 17,109 | 1,232 | 3,918 | 546 | 11,413 | 66.7 |
| Atlantic provinces | 5,364 | 286 | 1,221 | 196 | 3,661 | 68.3 |
| Newfoundland and Labrador | 1,529 | 96 | 313 | 37 | 1,083 | 70.8 |
| Prince Edward Island | 769 | 31 | 171 | 37 | 530 | 68.9 |
| Nova Scotia | 1,567 | 48 | 396 | 61 | 1,062 | 67.8 |
| New Brunswick | 1,499 | 111 | 341 | 61 | 986 | 65.8 |
| Quebec | 2,226 | 112 | 521 | 57 | 1,536 | 69.0 |
| Ontario | 2,407 | 205 | 594 | 118 | 1,490 | 61.9 |
| Prairie provinces | 4,981 | 474 | 1,072 | 122 | 3,313 | 66.5 |
| Manitoba | 1,682 | 171 | 335 | 57 | 1,119 | 66.5 |
| Saskatchewan | 1,495 | 154 | 341 | 38 | 962 | 64.3 |
| Alberta | 1,804 | 149 | 396 | 27 | 1,232 | 68.3 |
| British Columbia | 2,131 | 155 | 510 | 53 | 1,413 | 66.3 |
|
||||||
Some of the households selected to fill out a diary did not complete it or provided a diary that was considered unusable under the criteria outlined in section 2.5. For the 2014 SHS, the diary response rate among the interview-respondent households who were selected to fill out a diary is 66.1%. Provincial rates are provided in Appendix A. The final diary response rate (defined as the percentage of usable diaries relative to the number of households selected to fill out the diary) is 43.6% at the national level, and provincial rates are shown in Table 2.
| Eligible sampled householdsTable 2 Note 1 | Interview non-respondentsTable 2 Note 2 | DiariesTable 2 Note 3 | Response rateTable 2 Note 5 | |||
|---|---|---|---|---|---|---|
| RefusalTable 2 Note 4 | Unusable | Usable | ||||
| number | percentage | |||||
| Canada | 8,625 | 2,943 | 1,789 | 135 | 3,758 | 43.6 |
| Atlantic provinces | 2,711 | 870 | 543 | 50 | 1,248 | 46.0 |
| Newfoundland and Labrador | 760 | 221 | 147 | 17 | 375 | 49.3 |
| Prince Edward Island | 390 | 105 | 97 | 6 | 182 | 46.7 |
| Nova Scotia | 805 | 267 | 182 | 15 | 341 | 42.4 |
| New Brunswick | 756 | 277 | 117 | 12 | 350 | 46.3 |
| Quebec | 1,122 | 357 | 237 | 13 | 515 | 45.9 |
| Ontario | 1,214 | 479 | 267 | 14 | 454 | 37.4 |
| Prairie provinces | 2,508 | 863 | 504 | 43 | 1,098 | 43.8 |
| Manitoba | 843 | 294 | 169 | 10 | 370 | 43.9 |
| Saskatchewan | 758 | 270 | 135 | 10 | 343 | 45.3 |
| Alberta | 907 | 299 | 200 | 23 | 385 | 42.4 |
| British Columbia | 1,070 | 374 | 238 | 15 | 443 | 41.4 |
|
||||||
The response rates vary from month to month. Monthly response rates for the interview and diary can be found in Appendix B. Interview and diary response rates by size of area of residence and dwelling type are shown in Appendix C.
The diary response rate of interview respondents can be found in Appendix D, broken down by various household characteristics, including household type, household tenure, age of the reference person, and before-tax income quintile.
Cases for which the respondent fails to answer some of the questions are referred to as partial non-response. Imputing missing values compensates for this partial non-response. Imputation rates are described in Section 3.3.5.
There are also cases in which a household fails to complete the diary for all 14 days as required, leaving days with no data. Adjustment factors were thus calculated to take into consideration these days with no data.
3.3.4 Processing error
Processing errors may occur in any of the data processing stages, including data entry, coding, editing, imputation of partial non-response, weighting and tabulation. Steps taken to reduce processing error are described in Section 2.5.
3.3.5 Imputation of partial non-response
The residual bias remaining after the imputation of partial non-response is difficult to measure. Its magnitude depends on the imputation method’s ability to produce unbiased estimates. The imputation rates provide an indication of the magnitude of partial non-response.
Partial interview non-response may result from a lack of information or from an invalid response to a question. The national and provincial percentages of households for which certain expenditure categories had to be imputed due to partial interview non-response shown in Table 3. These percentages are presented by number of imputed expenditure variables per household (out of all consumer expenditure data collected during the interview). The table contains two series of results, one including and the other excluding expenditures on communication services (telephone, cell phone and Internet), television services (via cable, a satellite dish or a phone line), satellite radio services, and home security services. This distinction has been made because these services are increasingly being purchased as a package. Households are often billed for bundled services, making it difficult or impossible for them to provide separate expenditure amounts for each service. Therefore, the total amount paid for the package is allocated to individual services through imputation, which significantly increases the number of households for which expenditures must be imputed.
| Number of variables imputed Table 3 Note 1 (out of 188) |
Number of variables imputed Table 3 Note 2 (out of 193) |
|||||||
|---|---|---|---|---|---|---|---|---|
| 1 | 2 to 9 | 10 or more | Total | 1 | 2 to 9 | 10 or more | Total | |
| percentage | ||||||||
| Canada | 18.9 | 34.3 | 2.9 | 56.1 | 8.8 | 66.0 | 4.9 | 79.7 |
| Newfoundland and Labrador | 17.3 | 33.1 | 1.8 | 52.3 | 4.3 | 75.1 | 4.2 | 83.7 |
| Prince Edward Island | 21.5 | 33.4 | 2.1 | 57.0 | 7.5 | 73.0 | 4.3 | 84.9 |
| Nova Scotia | 18.4 | 34.2 | 1.2 | 53.8 | 6.4 | 75.0 | 2.4 | 83.8 |
| New Brunswick | 19.5 | 29.8 | 2.0 | 51.3 | 8.1 | 69.4 | 3.8 | 81.2 |
| Quebec | 17.5 | 32.3 | 2.8 | 52.6 | 7.5 | 67.4 | 4.8 | 79.6 |
| Ontario | 19.3 | 32.1 | 2.9 | 54.3 | 11.5 | 58.3 | 4.2 | 74.0 |
| Manitoba | 17.6 | 44.7 | 6.0 | 68.3 | 11.9 | 58.7 | 9.4 | 80.0 |
| Saskatchewan | 19.6 | 34.4 | 2.8 | 56.9 | 11.9 | 60.3 | 5.1 | 77.2 |
| Alberta | 17.2 | 35.4 | 3.2 | 55.8 | 9.3 | 60.1 | 5.2 | 74.6 |
| British Columbia Table 3 Note 3 | 21.9 | 34.3 | 3.3 | 59.6 | 8.8 | 68.4 | 5.2 | 82.3 |
|
||||||||
Users of expenditure estimates relating to communication, television, satellite radio or home security services should therefore take into account the high level of imputation of the expenditure data when examining these individual services. A measure of the impact of imputation on each individual service has been produced and is discussed in Appendix E. This measure represents the proportion of the total value of the estimate obtained from imputed data.
The percentages of households that responded to the interview and for which dwelling characteristics or household equipment had to be imputed can be found in Appendix F.
The imputation rates for all expenditures reported in the diary are shown in Tables 4 and 5. Table 4 deals with expenditures on goods and services including food from stores, which are reported in the first section of the diary. Table 5 shows the imputation rates for restaurant expenditures, which are reported in the second section of the diary.
For expenditure data from the diaries, imputation is used primarily to assign a value when the amount of a reported expenditure is missing, to assign a list of expenditure items (with individual costs) when only the total cost is provided (e.g., to assign grocery items and their individual costs when the respondent has provided only the total amount of the grocery bill), or to assign an expenditure code that is more detailed than the one that could be assigned using the information from the respondent (e.g., the type of bakery product). The imputation rate for each of these three types of imputation is shown in Table 4. Each rate represents the proportion of imputed items relative to all expenditure items from the diaries.
| Type of imputation | Imputation rate |
|---|---|
| percentage | |
| Imputation of a missing cost for a reported expense | |
| Food from stores | 1.1 |
| Other goods and services | 2.3 |
| All expenditures | 1.5 |
| Imputation of expenditure items (and their individual cost) from a total expense | |
| Food from stores | 20.1 |
| Other goods and services | 12.4 |
| All expenditures | 17.6 |
| Imputation of detailed expenditure code | |
| Food from stores | 5.8 |
| Other goods and services | 5.5 |
| All expenditures | 5.7 |
The risks of bias associated with the imputed data depend largely on the level of detail at which the SHS data are used. For example, food expenditure data in the SHS are produced at a high level of detail to meet the needs of the Food Expenditure Survey users (last conducted in 2001). Food expenditures are categorized using a hierarchical system of more than 200 expenditure codes. For some reported expenditure items, the food product may have been known (e.g., dairy products or even milk), but the level of detail required (e.g., skim milk, 1% milk or 2% milk) had to be imputed. This type of imputation creates a risk of bias only in expenditure estimates at a very detailed level. In other cases, however, almost no information on the type of expenditure was available before imputation (e.g., it was known only that the expenditure was for a good). When so little information is available, the risks of bias in the estimates of the expenditure categories are more significant.
Restaurant expenditures are reported using a slightly different format in the second section of the diary. Imputation is used primarily to assign a value when the total amount of the restaurant expenditure or the cost of alcoholic beverages is missing, or when the type of meal (breakfast, lunch, dinner or snack and beverage) has not been specified. The imputation rate for each of these three types of imputation is shown in Table 5.
| Type of imputation | Imputation rate |
|---|---|
| percentage | |
| Imputation of total cost | 1.01 |
| Imputation of costs for alcoholic beverages | 4.28 |
| Imputation of meal type (breakfast, lunch, dinner, or snacks and beverages) | 8.07 |
Lastly, households have the option of either providing receipts or recording their expenditure information in the diary. Table 6 shows the percentage of expenditures reported using each method for food expenditures, restaurant expenditures, and expenditures for other goods and services.
| Expenditure category | Transcriptions | Receipts |
|---|---|---|
| percentage | ||
| Food | 21.3 | 78.7 |
| Restaurant | 83.5 | 16.5 |
| Other goods and services | 45.2 | 54.8 |
Imputation rates vary depending on the expenditure reporting method. The rates in Tables 4 and 5 are presented by the expenditure reporting method in Appendix G.
3.4 The effect of large values
For any sample, estimates of totals, averages and standard errors can be affected by the presence or absence of large values in the sample. Large values are more likely to arise from positively skewed populations. Such values are found in the SHS and are taken into account when the final estimates are generated.
- Date modified: