Methodology

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Methodology

The statistics contained in this publication were derived from the Survey of Consumer Finances (SCF) and the Survey of Labour and Income Dynamics (SLID). For many years, SCF constituted the primary source of data on family income in Canada. In 1993, Statistics Canada introduced a new survey, SLID, with much the same objectives but of longitudinal rather than cross-sectional nature. Statistics Canada closely monitored the comparability of these two surveys and determined that they did indeed produce comparable results. Starting with the 1998 reference year, the SCF was no longer conducted. Additional information on the comparability of the SLID and SCF can be obtained in Bridging Two Surveys: An Integrated Series of Income Data from SCF and SLID, 1989-1997 or in A Comparison of the Results of the Survey of Labour and Income Dynamics (SLID) and the Survey of Consumer Finances (SCF) 1993-1997: Update (see also “related products and services”).

Survey content
Survey universe
The sample
Data collection
Data quality
Sampling errors
Standard error and coefficient of variation
Suppression
Non sampling errors
Weighting
Cross-sectional representativeness of SLID
Response rates
Imputation for non response
Comparability with income data sources

Survey content

The SCF was an annual survey, conducted each April (but discontinued after April 1998) as a supplement to the Labour Force Survey (LFS), and designed to produce cross-sectional statistics on income by detailed sources. Information on labour force experience and demographic characteristics such as education, family relationships and household composition was also collected, primarily by using data collected for the LFS.

SLID was designed to capture changes in the economic well-being of individuals and families over time and the determinants of labour market and income changes. The survey supports analysis on transitions into and out of the labour force associated with the life cycle or with the business cycle; on the impact of family events on labour market activity and remuneration; on the determinants of income instability; on what triggers shifts into and out of low income and on changes in the composition of income through time. Since SLID additionally carries a broad selection of human capital variables, it is also used for studies of such topics as gender wage and earnings gaps.

The major content themes of SLID are illustrated in the following chart.

Chart A - Organisation of content

Chart: Organization of content

* Not yet included in survey content.

Top of Page

Survey universe

SCF and SLID are household surveys that target essentially the same population. Both surveys cover all individuals in Canada, excluding residents of the Yukon, the Northwest Territories and Nunavut, residents of institutions and persons living on Indian reserves. Overall, these exclusions amount to less than 3 percent of the population.

The sample

The samples for SLID and SCF are selected from the monthly Labour Force Survey (LFS) and thus share the latter’s sample design. The LFS sample is drawn from an area frame and is based on a stratified, multi-stage design that uses probability sampling. The sample is composed of six independent samples. These samples are called rotation groups because each month one sixth of the sample (or one rotation group) is replaced.

The SCF was conducted each year as a supplement to the April LFS using two-thirds of the regular sample (four rotation groups). In total, approximately 35,000 households were surveyed. The SLID sample is composed of two panels. Each panel consists of two LFS rotation groups and includes roughly 15,000 households. A panel is surveyed for a period of six consecutive years. A new panel is introduced every three years. Thus two panels are always overlapping, resulting in a combined cross-sectional sample comparable in size to that of the SCF. The following diagram illustrates how and when panels overlap.

Chart B - Overlapping design of SLID sample

1993	1994	1995	1996	1997	1998	1999	2000	2001	2002	2003	2004
Panel 1
			Panel 2
						Panel 3
									Panel 4

Top of Page

Data collection

The reference period for the SCF was the previous calendar year. Income questionnaires were mailed out to selected households prior to the April LFS. Information collected through this supplementary income survey, along with demographic and labour market data amassed by the LFS that month, constituted the SCF database.

For each sampled household in SLID, up to 12 interviews are conducted over a six-year period. Every year in January, interviewers collect information regarding respondents’ labour market experiences during the previous calendar year. Information on educational activity and family relationships is also collected at that time. The demographic characteristics of family and household members represent a snapshot of the population as of the end of each calendar year.

Every May information on income is collected from the same sampled households. The income interview is deferred until May to take advantage of income tax time when respondents are more familiar with their income situation. As in the SCF, the reference period for income is the previous calendar year.

To reduce response burden, respondents can give Statistics Canada permission to use their T1 tax information for the purposes of SLID. Those who do so are only contacted for the labour interviews. Over 80% of SLID’s respondents give their consent to use their administrative records.

Both SCF and SLID interviews are conducted over the telephone using computer-assisted telephone interviewing (CATI). The interviewer reads the questions as they appear on the computer screen and keys in the reported information. Skip-patterns and edits are built into the collection software, allowing interviewers to immediately detect and resolve response inconsistencies. Collection of date-related information (e.g., employment spells, jobless spells, interruption of work), is greatly improved by the use of such an interactive data capture technique. Another advantage of the CATI technology is the feeding back of details from the previous interview, helping respondents to recall past events.

Proxy response is accepted in the SCF and SLID. This procedure allows one household member to answer questions on behalf of any or all other members of the household, provided he or she is willing to do so and is knowledgeable.

Top of Page

Data quality

There are two types of errors inherent in sample survey data, namely, sampling errors and non-sampling errors. The reliability of survey estimates depends on the combined impact of sampling and non-sampling errors.

Sampling errors

Sampling errors occur because inferences about the entire population are based on information obtained from only a sample of the population. The results are usually different from those that would be obtained if information were collected from the whole population. Errors due to the extension of conclusions based on the sample to the entire population are known as sampling errors. The sample design, the variability of the population characteristics measured by the survey, and the sample size determine the magnitude of the sampling error. In addition, for a given sample design, different methods of estimation will result in sampling errors of different sizes.

Standard error and coefficient of variation

A common measure of sampling error is the standard error (SE). The standard error measures the degree of variation introduced in estimates by selecting one particular sample rather than another of the same size and design. The standard error may also be used to calculate confidence intervals associated with an estimate (Y). Confidence intervals are used to express the precision of the estimate. It has been demonstrated mathematically that, if the sampling were repeated many times, the true population value would lie within the confidence interval Y ± 2SE 95 times out of 100 and within the narrower confidence interval defined by Y ± SE, 68 times out of 100. Another important measure of sampling error is given by the coefficient of variation, which is computed as the estimated standard error as a percentage of the estimate Y (i.e. 100 × SE / Y).

To illustrate the relationship between the standard error, the confidence intervals and the coefficient of variation, let us take the following example. Suppose that the estimated average income from a given source is $10,000, and that its corresponding standard error is $200. The coefficient of variation is therefore equal to 2%. The 95% confidence interval estimated from this sample ranges from $9,600 to $10,400, i.e. $10,000 ± $400. Thus it is assumed with a 95% degree of confidence that the average income of the target population is between $9,600 and $10,400.

The bootstrap approach is used for the calculation of the standard errors of the estimates. For more information on standard errors and coefficients of variation, refer to the Statistics Canada publication, Methodology of the Canadian Labour Force Survey (Catalogue number 71-526-XPB).

Standard errors and coefficients of variation of the estimates presented in Income Trends in Canada are available on request.

Top of Page

Suppression

Data reliability cutoffs were established based on variances of a number of different variables. In general, data values that have a coefficient of variation of less than 33% are not suppressed and can be used. Suppressed estimates have a coefficient of variation greater than 33% and are not reliable.

The suppression cutoffs are listed below. Weighted person, family and household estimates that fall below these suppression cut-offs are withheld.

Table D - Suppression cutoffs

Geography	Weighted Counts
Canada	13, 000
Newfoundland and Labrador	2, 500
Prince Edward Island	1, 500
Nova Scotia	4, 000
New Brunswick	2, 500
Quebec	14, 000
Ontario	14, 500
Manitoba	6, 500
Saskatchewan	2, 500
Alberta	6, 000
British Columbia	11, 000

Top of Page

Non-sampling errors

Non-sampling errors generally result from human errors such as inattention, misunderstanding or misinterpretation. The impact of randomly occurring errors over a large number of observations will be minimal. Errors occurring systematically can, on the other hand, have a major impact on the reliability of estimates. Considerable time and effort is invested into reducing non-sampling errors in SLID and SCF.

Non-sampling errors may arise from a variety of sources such as coverage, response, non-response and processing errors.

Coverage error arises when sampling frame units do not exactly represent the target population. Units may have been omitted from the sampling frame (under-coverage), or units not in the target population may have been included (overcoverage), or units may have been included more than once (duplicates). Undercoverage represents the most common coverage problem.

Slippage is a measure of survey coverage error. It is defined as the percentage difference between control totals (Census population projections) and weighted sample counts. Slippage rates for household surveys are generally positive because some people that should be enumerated are missed. Slippage rates have been revised back to 1996 using the 1996 Census population projections. According to the numbers in the table below, in 2001, SLID covered 86.6% of its target population. SLID estimation procedures use Census population projections to compensate for determined slippage.

Rates are also available upon request for sex, province and age groupings.

Top of Page

Table E - Slippage rates in SLID

Year	1996	1997	1998	1999	2000	2001
Canada (%)	10.28	11.12	11.85	12.02	12.64	13.40

Response errors may be due to many factors, such as faulty questionnaire design, interviewers’ or respondents’ misinterpretation of questions, or respondents’ faulty reporting. Great effort is invested in SCF and SLID to reduce the occurrence of response error. Measures undertaken to minimize response errors include the use of highly-skilled and well-trained interviewers, and supervision of interviewers to detect misinterpretation of instructions or problems with the questionnaire design. Response error can also be brought about by respondents who, willingly or not, provide inaccurate responses.

Income data are especially prone to misreporting, as income is a sensitive issue and includes many items with which respondents are not always familiar. To obtain more accurate information, income data for the SCF and SLID are collected after the income tax “season” when respondents are more familiar with their tax records. Respondents receive information about the income interview prior to the interviewer’s telephone call. This gives them time to consult documents and have information available at the time of the interview. Nevertheless, a comparison of data produced from the SCF with other sources suggest that certain income components such as EI benefits and self-employment earnings are under-reported in an income interview. For respondents who grant Statistics Canada permission to access their tax files (the majority of respondents), SLID collects income data directly from administrative files. This procedure reduces misreporting of income in the SLID.

Non-response errors occur to some extent in any survey for reasons such as household members being on vacation during the interview period or refusing to supply requested information, despite attempts to obtain complete response from sampled units. For these individuals, the missing data are imputed either explicitly by assigning data to each non-respondent on the basis of a similar respondent record, or implicitly by redistributing the weight of the non-respondent individual to other responding individuals. The bias introduced by non-response increases with the differences between respondent and non-respondent characteristics. Methods employed to compensate for non-response make use of information available for both respondents and non-respondents in an attempt to minimize this bias.

Processing errors can occur at various stages in the survey: data capture, editing, coding, weighting or tabulation. The computer-assisted collection method used for SLID and SCF reduces the chance of introducing capture errors because checks for consistency and completeness of the data are built into the computer application. To minimize coding, weighting or tabulation errors, diagnostic tests are carried out periodically. These tests include comparisons of results with other data sources.

Top of Page

Weighting

The estimation of population characteristics from a survey is based on the premise that each sampled unit represents, in addition to itself, a certain number of unsampled units in the population. A basic survey weight is attached to each record to indicate the number of units in the population that are represented by that unit in the sample. Two types of adjustment are then applied to the basic survey weights in order to improve the reliability of the estimates. The basic weights are first inflated to compensate for non-response. The non-response adjusted weights are then further adjusted to ensure that estimates on relevant population characteristics would respect population totals from sources other than the survey. The population totals used for SCF and SLID are based on Statistics Canada’s Demography Division population counts for different province-age-sex groups as well as counts by household and family size. In SLID, different weights apply for cross-sectional and longitudinal estimates.

Cross-sectional representativeness of SLID

Each longitudinal sample, or “panel” in SLID initially constitutes a representative cross-sectional sample of the population. However, because the real population changes each year, whereas by design the longitudinal sample does not, the sample must be modified to properly reflect these changes to the composition of the population. This is done by adding to the sample all new people in the population who are found to be living with the initial respondents (and likewise dropping them from the sample if they leave at later time-points). Conversely, any original respondents who leave the target population (by moving abroad, into institutions, etc.) are given a zero weight for cross-sectional purposes. In this way, the cross-sectional sample, composed of the original respondents minus those who left the target population plus those who have entered it, is virtually fully representative of the population at each subsequent time-point. The missing group is composed of persons who have newly entered the target population and are not living with anyone who was in the target population when the most recent panel was selected. Since SLID introduces a new panel every three years, however, this group is quite small.

Top of Page

Response rates

High response rates are essential for the data quality of any survey and thus considerable effort is invested to encourage effective participation from SCF and SLID respondents.

For the SCF, response is calculated at the family level whereas in SLID it is calculated at the household level. In SLID, a household is considered to be “respondent” if at least one of its members responds to either the January or the May interview. There is the additional stipulation that the information on the household’s composition cannot be missing for more than one year.

Within a respondent household, all members are assigned identical, positive final weights, and those members (if any) who did not respond to one or both of the collection phases will have final data that is either shown as “missing” on the final database or imputed, depending on the variable.

In the Survey of Consumer Finances (SCF) response ranged from 78.1% (1989) to 82.1% (1995), while the cross-sectional response rates in SLID range between 79.1% (2001) and 85.5% (1996).

The updated definition of respondent was introduced starting with the release of data for 2000, and applied retroactively to 1996. It had relatively little impact on response rates – the SLID response rates for 1996 to 2000 are now one to two percentage points lower than they were based on the old definition.

Response rates given in Table F have been revised back to 1996, using the new definition of a respondent household.

Table F - Response rate in SCF (1990-1995) and SLID (1996-2001)

Year	1990	1991	1992	1993	1994	1995	1996	1997	1998	1999	2000	2001
Response Rate (%)	79.0	80.0	80.7	80.0	79.5	82.1	85.5	83.6	82.3	82.8	80.8	79.1

Top of Page

Imputation for non-response

Income data are imputed in SCF – and in some cases in SLID – using a “nearest neighbour” approach. This method involves identifying another individual with certain similar characteristics, who becomes the “donor” for the imputed value. SLID also uses other imputation techniques. In fact, the primary method employed for imputing income data in this survey is to use the previous year’s data, updated for any changes in circumstances. Only in the absence of such data are income figures imputed using the “nearest neighbour” technique in SLID.

Amounts received through certain government programs, such as child tax benefits, the Goods and Services Harmonized Sales Tax Credit, and the Guaranteed Income Supplement, are also derived from other information. Data obtained from the tax route are complete and do not need imputation.

Comparability with other income data sources

Comparisons of figures produced from the SCF with other sources of data (Census of Population, Longitudinal Administrative Data, National Economic and Financial Accounts) reveal that certain income components, such as investment, self-employment earnings, social assistance payments and EI benefits, are under-reported in the SCF.

SLID’s estimates of the number of income recipients, aggregate individual income and average family income are higher than the corresponding estimates from SCF data.

Differences between SCF and SLID income figures can be attributed to the different procedures for editing, imputation, and data collection (entirely by questionnaire for the former versus partially by linkage with T1 income tax files for the latter).

Home \| Search \| Contact Us \| Français
Date Modified: 2008-10-17	Important Notices