Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.
Sampling errors arise from estimating a population characteristic by looking at only one portion of the population rather than the entire population. It refers to the difference between the estimate derived from a sample survey and the 'true' value that would result if a census of the whole population were taken under the same conditions. There are no sampling errors in a census because the calculations are based on the entire population.
A common measure of sampling error is the standard error (SE). The standard error measures the degree of variation introduced in estimates by selecting one particular sample rather than another of the same size and design. The standard error may also be used to calculate confidence intervals associated with an estimate (Y).
Confidence intervals (CI) are used to express the precision of the estimate. It has been demonstrated mathematically that, if the sampling were repeated many times, the true population value would lie within the Y +/- 2SE confidence interval 95 times out of 100 and within the narrower confidence interval defined by Y +/- SE, 68 times out of 100.
Another important measure of sampling error is given by the coefficient of variation (CV). The coefficient of variation is the standard error of an estimate, expressed as a ratio or percentage of the estimate (i.e. 100 x SE / Y).
To illustrate the relationship between the standard error, the confidence intervals and the coefficient of variation, let us take the following example. Suppose that the estimated median net worth from a given source is $10,000, and that its corresponding standard error is $200. The coefficient of variation is therefore equal to 2%. The 95% confidence interval estimated from this sample ranges from $9,600 to $10,400, i.e. $10,000 +/- $400. This means that with a 95% degree of confidence, it can be asserted that the median net worth of the target population is between $9,600 and $10,400.
Estimates with a coefficient of variation less than 16.6% are considered reliable for general use. Estimates with coefficients of variation between 16.6% and 33.3% should be accompanied by a warning to caution users about the high levels of error. Estimates with coefficients of variation higher than 33.3% are deemed to be unreliable. For estimates of net worth in this survey, CVs greater then 33.3% generally occur when the sample size contributing to an estimate is 25 or less. This affects the level of detail in published tables and, in particular, limits the availability of provincial statistics.
Table 5-1 provides quality level guidelines used at Statistics Canada.
Table 5-2 shows the precision of the SFS estimates. At the Canada level, the estimates are generally reliable. However, users should exercise caution when producing detailed estimates at the regional level. Additional variance estimates can be calculated by Statistics Canada on a cost-recovery basis.
The bootstrap approach, a pseudo-replication technique, is used for the calculation of the coefficients of variation of the estimates presented in table 5-2. Many Statistics Canada surveys use complex sampling designs when selecting their samples. As variance estimation for these sampling schemes cannot be accomplished using simple formulae, we must use approximate methods to estimate variances. Resampling methods, and in particular the bootstrap method, figure among these. The bootstrap approach possesses many interesting properties and is the method employed by many Statistics Canada surveys.
For more information on the bootstrap approach, refer to the Statistics Canada publication (Catalogue 12-002-XIE), The Research Data Centres Information and Technical Bulletin, Fall 2004, vol. 1 no. 2.
Non-sampling errors can be defined as errors arising during the course of all survey activities other than sampling. Unlike sampling errors, they can be present in both sample surveys and censuses.
Non-sampling errors can be classified into two groups: random errors and systematic errors.
Non-sampling errors are extremely difficult, if not impossible, to measure. Since random errors have the tendency to be cancelled out, systematic errors are the principal cause for concern. Unlike sampling variance, bias caused by systematic errors cannot be reduced by increasing the sample size.
Non-sampling errors can occur because of problems in coverage, response, non-response, data processing, estimation and analysis.
An error in coverage occurs when there is an omission, duplication or wrongful inclusion of the units in the population or sample. Omissions are referred to as undercoverage, while duplication and wrongful inclusions are called overcoverage. These errors are caused by defects in the survey frame: inaccuracy, incompleteness, duplication, inadequacy and obsolescence. Coverage errors may also occur in field procedures (e.g., a survey is conducted, but the interviewer misses several households or persons).
Response errors result from data that have been requested, provided, received or recorded incorrectly. The response errors may occur because of inefficiencies with the questionnaire, the interviewer, the respondent or the survey process.
Non-response errors are the result of not having obtained sufficient answers to survey questions. There are two types of non-response errors: complete and partial. The overall response rate for the 2005 Survey of Financial Security was 67.7%.
Processing errors sometimes emerge during the preparation of the final data files. For example, errors can occur while data are being coded, captured, edited or imputed. Coder bias is usually a result of poor training or incomplete instructions, variance in coder performance (i.e., tiredness, illness), data entry errors, or machine malfunction (some processing errors are caused by errors in the computer programs). The same thing can be said about captured errors. Sometimes, errors are incorrectly identified during the editing phase. Even when errors are discovered, they can be corrected improperly because of poor imputation procedures. To minimize errors, diagnostic tests are carried out periodically to ensure that expected results have been obtained.
Statistics Canada and other data-collecting agencies devote much effort to designing and monitoring surveys in order to make them as error-free as possible. If an inappropriate estimation method is used, then bias can still be introduced, regardless of how errorless the survey had been before estimation.
Analysis errors include any errors that occur when using the wrong analytical tools or when the preliminary results are used instead of the final ones. Errors that occur during the publication of these data results are also considered analysis errors.
For any sample, estimates can be affected disproportionately by the presence or absence of extreme values from the population. In an asset and debt survey, a few extreme values are expected in the sample, as valid extreme values do exist in the population. Values outside defined bounds were identified and reviewed in relation to other information reported for that respondent. If the value was judged to be the result of a reporting or processing error, it was adjusted. Otherwise, it was retained.
Due to the combined effect of these errors, the quality of net worth data is judged to be lower than the quality of income data. This is largely because records of the current value of assets and the outstanding amount of debt are not as readily available as records of income. For example, respondents with numerous bank accounts and investments may receive several different statements, with different reference periods. Compiling this information can be difficult; most income information, on the other hand, would be available in one document, if the respondent had completed an income tax return for the year in question.
It is important to realize that there are no other sources for much of the data collected by SFS. Of the variables that do have sources, comparison is often difficult because of differences in defining concepts, grouping of items, and how these items are valued.
Direct comparisons with outside sources, such as the National Balance Sheet Accounts (NBSA) of the System of National Accounts (SNA), do yield certain differences. Comparing both of these sources is difficult due to definitional, coverage and treatment differences.
Based on rough comparisons between the NBSA and the SFS, the following general conclusions can be drawn:
In theory – given similar valuation procedures and groupings – SNA data should be the same as that collected by an asset and debt survey. The SNA collects individual wealth data from institutional sources such as banks and insurance companies, net of corporations and governments. One major problem has been the SNA categorization of individuals and unincorporated business. Because the individual data and the unincorporated business can not be separated out, these estimates will always be higher than the survey estimates alone.
The Census and other surveys are important sources for ensuring that the SFS sample is representative of the Canadian population. Despite conceptual differences with the SNA estimates, ensuring a representative sample is extremely important to the validity of the data. It was determined that with respect to characteristics such as sex, age, marital status, education that the 2005 SFS data was very comparable to data from the 2001 Census. SFS estimates for pension variables such as membership and contributions were found to be very close to data produced by Statistics Canada’s Pension Plans in Canada Survey.
The overall response rate for the 2005 Survey of Financial Security was 67.7%. Table 5-3 gives a breakdown of response rates by province for the area sample and the high income sample.