Section 1
Sampling errors

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Measures of sampling error
Coefficients of variation
Model for deriving an approximation of the CV
Suppression of unreliable data in estimation tables

Sampling errors exist when inferences about the population are drawn from the survey using information collected from a sample, rather than from the entire population. In addition to the sample design and the estimation method used in the Survey of Household Spending (SHS), the sample size and the variability of each characteristic are factors that determine sampling error. Characteristics that are rare or are distributed very unevenly in the population will have greater sampling error than characteristics that are observed more frequently or are more homogeneous in the population.

1.1 Measures of sampling error

The standard error is a commonly used measure of sampling error. The standard error (SE) is the degree of variation of the estimate due to the fact that one sample was selected rather than another, among all possible samples of the same size under the same sample design. Since the SHS uses a complex sample design and estimation method, the standard error is estimated using a resampling method known as the bootstrap technique. Prior to the 2003 reference year, the jackknife resampling method was used to produce standard error estimates for the SHS. The decision was made to use the bootstrap method for the 2003 and subsequent reference years, mainly because Income Statistics Division (ISD) was planning to publish median expenditure estimates and needed the coefficients of variation of those estimates. The bootstrap method is suitable for variance estimation of non-smooth statistics such as quantiles. For more details on this method, see reference [2].

The coefficient of variation (CV) is also a frequently used measure of the reliability of an estimate. It simply expresses the standard error as a percentage of the estimate. Thus, if an estimate Y is obtained for a certain characteristic and SE is the estimated standard error, then the CV will be (SE/Y) x 100.

Finally, either the standard error or the coefficient of variation may be used to derive another measure of the precision of estimates, namely the confidence interval. This measure indicates the level of confidence one can have that the true value of an observed characteristic for the population lies within the interval. An interval with a confidence level of 95% corresponds to the estimate obtained from the sample ± 2 standard errors: (Y ± 2 SE).1 This means that if the sampling were repeated a large number of times, each sample would provide a different interval and 95% of the intervals would contain the true value of the characteristic. Similarly, if the sampling were repeated, the interval Y ± SE would contain the true value in 68% of cases.

1.2 Coefficients of variation

Estimates of coefficients of variation are calculated for estimates of many characteristics collected in the SHS. The CVs of detailed average household expenditure, as well as the CVs of dwelling characteristics and household facilities and equipment, are available at the national and provincial levels (Income Statistics Division, 1-888-297-7355: income@statcan.gc.ca).

It should be noted that the estimated CVs do not take account of the fact that some of the data were imputed and thus may underestimate the true CVs. For most variables, the imputation rates are low (see Section 5) and the provided CVs represent good estimates of the true CVs. However, to assess the reliability of detailed expenditures with a high imputation rate, the CV and the imputation rate should both be considered.

Table 1.1 presents an overview of the CVs of estimates of household averages for a few of the summary-level expenditure categories and for income at the provincial and national levels.

Table 1.1 Coefficients of variation (%) by province and at the national level for the estimation of average household expenditures for several summary level expenditure categories and for the estimation of average income

Table 1.2 presents an overview of the CVs for some dwelling characteristics and household equipment estimates at the provincial and national levels.

Table 1.2 Coefficients of variation (%) by province and at the national level for some dwelling characteristics and household equipment

1.3 Model for deriving an approximation of the CV

Estimates for different domains of interest (for example, by income quintile) for the summary level expenditure categories are available in the publication Spending Patterns in Canada (see reference [4]). Estimates for different domains of interest for detailed expenditures are available upon request from the Income Statistics Division. (For more details on tables available upon request from the Income Statistics Division, see reference [3] or [4].) For operational reasons, it is not possible to produce CVs for all the characteristics collected by the survey at all the different levels of aggregation that may interest users.

1.3.1 Approximation of the CV for domain estimates

It is, however, possible to calculate an approximation of the CV by using a relationship between the number of households in the sample who reported expenditures for a given category and the CV at an aggregated level. This relationship, based on the CV's tendency to increase in proportion to a decrease in the square root of the number of households reporting an expenditure, is illustrated below.

Formula for approximating the CV for a domain (subgroup of the population)

If CV (Y) represents the CV for the estimate of the average per household of a certain characteristic for the entire population, then an approximation of the CV of the estimate of that characteristic can be calculated for a domain (which may be considered as a subgroup of the population, such as a household type, an income quintile, an urbanization level) according to the following equation:

Description

Formula 1

where
n: number of households in the sample
P: estimate of the proportion of households reporting a value > 0 for this characteristic in the population
nd: number of households in the sample in domain d
Pd: estimate of the proportion of households reporting a value > 0 for this characteristic in domain d

Generally, approximations for the different domains are calculated using the CV, size n and proportion P at the national level. If an approximation of a CV is desired for a domain that is entirely contained within a single province (for example, a metropolitan area), then it is preferable to use these values at the provincial level, since provincial CV's are published for the 2008 SHS (reference [3]). It should be noted that a CV obtained using this approach is only an approximation of the actual value.

1.3.2 Approximation of the CV from the microdata file

Microdata file users can obtain an approximation of the CV of the estimates using another method that will generally provide better results than the method described in the previous section for the CVs of detailed expenditure categories. This approach is described in detail in the documentation provided with the 2008 microdata file. This method of approximation can be used only with the microdata file, since it requires having data and weights for each household.

The 1997 SHS data quality report contains the results of the performance evaluation of these two CV approximation methods.

1.4 Suppression of unreliable data in estimation tables

Since the coefficient of variation is an indicator of data reliability, we would like to use it to determine whether or not the estimates should be published. Estimates whose estimated CV is greater than 33% are not considered sufficiently reliable to be published. However, estimated CV are not calculated for many of the published estimates. The suppression rule for expenditure estimates is therefore based on the number of households reporting a value greater than zero.2

It can be shown that CVs are usually less than 33% when the number of households reporting an expenditure is greater than 30. Since this is an approximate rule, some estimates may be published even though the CV is greater than 33%, and some estimates will not be published even though the CV is less than 33%. The 1997 SHS data quality report contains the results of the evaluation of the risk of error due to the suppression rule.


Notes

  1. The confidence interval is calculated directly from the CV in similar fashion, namely Y ± 2 (CV x Y)/100.
  2. In practice, we use the estimate of the proportion of households reporting an expenditure, which is multiplied by the sample size.
Date modified: