Section 1
Sampling errors

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Measures of sampling error
Coefficients of variation
Model for deriving an approximation of the CV
Suppression of unreliable data in estimation tables

Sampling errors exist when inferences about the population are drawn from the survey using information collected from a sample, rather than from the entire population. In addition to the sample design and the estimation method used in the Survey of Household Spending (SHS), the sample size and the variability of each characteristic are factors that determine sampling error. Characteristics that are rare or are distributed very unevenly in the population will have greater sampling error than characteristics that are observed more frequently or are more homogeneous in the population.

1.1 Measures of sampling error

Standard error is a commonly used measure of sampling error. Standard error is the degree of variation of the estimate considering that a particular sample was selected, rather than another, among all possible samples of the same size under the same sample design. Since the SHS uses a complex sample design and estimation method, the standard error is estimated using a resampling method known as the bootstrap technique. Prior to the 2003 reference year, the jackknife resampling method was used to produce standard error estimation for the SHS. Starting with the 2003 SHS, a decision was made to use the bootstrap resampling method, mainly because the Income Statistics Division was going to publish median expenditure estimates and needed the coefficient of variation of those estimates. The bootstrap resampling method is suitable for variance estimation of non-smooth statistics such as quantiles. For more details on this method, see reference [2].

The coefficient of variation (CV) is also a frequently used measure of the reliability of an estimate. It simply expresses the standard error as a percentage of the estimate. Thus, if an estimate Y is obtained for a certain characteristic and SE is the estimated standard error, then the CV will be (SE/Y) x 100. 

Finally, either the standard error or the coefficient of variation may be used to derive another measure of the accuracy of estimates, namely the confidence interval. This measure indicates the level of confidence that, for a characteristic observed, the true value for the population lies within the interval. An interval with a confidence level of 95% corresponds to the estimate obtained from the sample ± 2 standard errors: (Y ± 2 SE)1. This means that if the sampling were repeated a large number of times, each sample would provide a different interval and 95% of the intervals would contain the true value of the characteristic. Similarly, if the sampling were repeated, the interval Y ± SE would contain the true value in 68% of cases.

1.2 Coefficients of variation

Estimates of coefficients of variation are calculated for estimates of many characteristics collected in the SHS. The CVs of detailed average household expenditure, as well as the CVs of dwelling characteristics and household facilities and equipment, are available at the national and provincial levels in the publication User Guide—Survey of Household Spending (see reference [3]).

It should be noted that the estimated CVs do not take account of the fact that some of the data were imputed and thus may underestimate the true CVs. For most variables, the imputation rates are low (see Section 5) and the provided CVs represent good estimates of the true CVs. However, to assess the reliability of detailed expenditures with a high imputation rate, the CV and the imputation rate should both be considered.

Table 1.1 gives an overview of the CVs of estimates of household averages for a few of the summary-level expenditures categories and for income at the provincial and national levels.

Table 1.1 Coefficients of variation (%) by province and at the national level for the estimation of average household expenditures for several summary level expenditure categories and for the estimation of average income

The coefficients of variation (CVs) of the average estimates of total expenditure per household vary between 1.3% and 2.7% for the provinces, and the national figure is 0.7%.

For summary-level expenditure categories, the CVs at the national level are less than or equal to 1.9%, except for the following categories: furnishings, education, games of chance, miscellaneous expenditures and gifts of money and contributions. These expenditure categories represent respectively 2.9%, 1.7%, 0.4%, 1.6% and 2.6% of total expenditure. Also, with the exception of these categories, the CVs are generally less than or equal to 5% at the provincial level. Since the sample size was smaller in Prince Edward Island, the CVs tend to be higher than those of other provinces.

Table 1.2 gives an overview of the CVs for some dwelling characteristics and household equipment estimates at the provincial level as well as the national level.

Table 1.2 Coefficients of variation (%) by province and at the national level for some dwelling characteristics and household equipment

The coefficients of variation for dwelling characteristics and household equipment are generally below 4% at the provincial level, with some exceptions in the following categories: renter, satellite dish, regular telephone connection to a computer (modem), high-speed telephone connection to a computer and cable connection to a computer. Prince Edward Island is an exception. Since the sample size there is smaller, the CVs tend to be higher than those of the other provinces. Only Quebec has a CV below 4% for the renter category. It is also the province with the largest proportion of renters (43.4%). The CVs of expenditure in connection to a computer vary according to the type of connection. In Prince Edward Island, the CV (6.8%) is smaller for the regular telephone connection to a computer, with 30.0% of PEI households reporting that they have this type of connection. On the other hand, in British Columbia the CV (4.2%) is smaller for cable connection to a computer, with 33.6% of households in that province reporting this type of connection.

The CVs for dwelling characteristics and household equipment at the national level are less than or equal to 2.0% with the exception of the following categories: satellite dish, regular telephone connection, high-speed telephone connection to a computer and cable connection to a computer. There is a smaller proportion of households with such equipment for these four categories. At the national level, the proportions are respectively 22.0%, 15.7%, 21.3% and 22.0%.

1.3 Model for deriving an approximation of the CV

Estimates for different domains of interest (for example, by income quintile) for the summary level expenditure categories are available in the publication Spending Patterns in Canada(see reference [4]). Estimates for different domains of interest for detailed expenditures are available upon request from the Income Statistics Division. (For more details on tables available upon request from the Income Statistics Division, see reference [3] or [4].) For operational reasons, it is not possible to produce CVs for all the different levels of aggregation that may interest users.

1.3.1 Approximation of the CV for domain estimates

It is, however, possible to calculate an approximation of the CV by using a relationship between the number of households in the sample who reported expenditures for a given category and the CV at an aggregated level. This relationship, based on the CV's tendency to increase in proportion to a decrease in the square root of the number of households reporting an expenditure, is illustrated below.

Formula for approximating the CV for a domain (subgroup of the population)

If CV(Y) represents the CV for the estimate of the average per household of a certain characteristic for the entire population, then an approximation of the CV of the estimate of that characteristic can be calculated for a domain (which may be considered as a subgroup of the population, such as a household type, an income quintile, an urbanization level) according to the following equation:

Description

Formula 1

where
n: number of households in the sample
P: estimate of the proportion of households reporting a value > 0 for this characteristic in the population
nd: number of households in the sample in domain d
Pd: estimate of the proportion of households reporting a value > 0 for this characteristic in domain d

Generally, approximations for the different domains are calculated using the CV, size n and proportion P at the national level. If an approximation of a CV is desired for a domain that is entirely contained within a single province (for example, a metropolitan area), then it is preferable to use these values at the provincial level, since provincial CVs are published for the 2004 SHS (reference [3]). It should be noted that a CV obtained using this approach is only an approximation of the real value.

1.3.2  Approximation of the CV from the microdata file

Microdata file users can obtain an approximation of the CV of the estimates using another method that will generally provide better results than the method described in the previous section for the CVs of detailed expenditure categories. This approach is described in detail in the documentation provided with the 2004 microdata file. This method of approximation can be used only with the microdata file, since it requires having data and weights for each household.

The document on data quality for the 1997 SHS contains the results from the performance evaluation of these two CV approximation methods.

1.4 Suppression of unreliable data in estimation tables

Since the coefficient of variation is an indicator of the reliability of data, we would like to use it to determine whether or not the estimates should be published.  Estimates for which the CV is more than 33% are not considered sufficiently reliable to be published.  However, CV estimates are not calculated for many of the published estimates. The suppression rule for expenditure estimates is therefore based on the number of households reporting a value greater than zero2.

It can be shown that CVs are usually below 33% when the number of households reporting an expenditure is greater than 30. Since this is an approximate rule, some estimates may be published even though the CV is greater than 33%, and some estimates will not be published even though the CV is less than 33%. The document on data quality for the 1997 SHS gives the results from the evaluation of the risk of error in the use of the suppression rule.


Notes

  1. The confidence interval is calculated directly from the CV in similar fashion, namely Y ± 2 (CV x Y)/100.
  2. In practice, we use the estimate of the proportion of households reporting an expenditure, which is multiplied by the sample size.
Date modified: