Methodology of the Canadian Labour Force Survey
Chapter 8 Data quality

8.0 Introduction

The data quality evaluation refers to the process of evaluating the final product of the survey against the original objectives of the survey. In particular, the evaluations are in terms of data accuracy and reliability. Such information allows users to make more informed interpretation and use of the survey results. Users must be provided with information allowing them to assess the degree to which data limitations restrict the use of the data. Data quality evaluations are also of benefit to the statistical agency. When data limitations can be traced to specific steps in the survey process, such evaluations can be used to improve the quality of subsequent occasions of the survey and of other similar surveys.

The accuracy of statistical information is the degree to which the information correctly describes the phenomena it was designed to measure. It is usually characterized in terms of statistical error and is traditionally decomposed into bias (systematic error) and variance (random error) components. It may also be described in terms of the major sources of error that potentially cause inaccuracy (e.g., sampling errors and non-sampling errors). This is the approach that will be used here.

In a sample survey, inferences are made about the target population based on the data collected from only a portion of this population. The results will probably differ from those obtainable from a complete census of this population under the same conditions. The error caused by applying conclusions to the entire population based on only a sample is called a sampling error. Some factors that contribute to sampling errors include sample size, variability of the characteristics examined, the sampling plan, and the estimation method.

Non-sampling error, as its name indicates, is not caused by the sampling process and can take place in a census or a sample survey. This type of error can occur at any step of the survey (planning, design, data collection, coding, data capture, editing, estimation, analysis, and dissemination of data) and is mainly caused by human error. Interviewers may misunderstand instructions, respondents may make errors in answering questions, the answers may be incorrectly entered and errors may be introduced in the processing and tabulation of the data. These are all examples of non-sampling errors. Non-sampling error is also associated with other types of errors, such as errors in the information sources, the methods used to obtain population projections, seasonal adjustment errors, etc.

To monitor and ensure the quality of its data, the LFS adopted a program to measure data quality. A range of quality indicators are regularly produced, and carefully analyzed. If there are unusual values, the LFS managers are immediately notified so they can make the necessary corrections as quickly as possible. Some indicators are merely monitored, since their role is to detect trends or long-term effects. For example, some measure the consequences of certain operational changes, while others measure the impact of minor changes to the sample design. This long-term information on data reliability can be used to make changes that are likely to improve the overall quality of the results and to help analysts and data users at Statistics Canada and elsewhere with their work.

The quality indicators produced for the LFS are described below. Section 8.1 presents indicators related to sampling errors. Indicators related to non-sampling errors are discussed in Section 8.2. Section 8.3 describes the committees monitoring various aspects of the LFS to ensure the quality of the data released. Section 8.4 informs users of other resources available regarding LFS data quality.

8.1 Quality indicators related to sampling errors

Sampling error was defined earlier as the error that results from estimating a population parameter by measuring a portion of the population, the sample, rather than the entire population. The effect sampling errors have on survey estimates depends on several factors including the sample size, the sample design, the estimation method and the variability of the characteristic of interest.

If all other factors are constant, the sampling error is expected to decrease as the sample size increases. This is consistent with the fact that the sampling error should become zero once the entire population is sampled. For a given sample size, the sampling error is linked to the relative efficiency of various design characteristics. The stratification, the allocation and the selection method at each stage all have some impact on the magnitude of the sampling error. The estimation method used also plays an important role for a given sample design. For example, the composite estimation method used by the LFS significantly reduces the sampling errors (See Chapter 6).

Finally, sampling errors differ from one variable to another since the degree of variability differs from one variable to another. These errors are generally greater for relatively rare characteristics and when the charac­teristic of interest is not distributed evenly in the popu­lation. Therefore, although they are based on the same sample, unemployment estimates generally have a higher sampling error than employment estimates.

For probability sample surveys, like the LFS, methods exist to calculate sampling errors. The most commonly used measure to quantify sampling error is sampling variance. The methods used for variance estimation in the case of the LFS have been presented in Chapter 7.

Three key measurements are derived from the sampling variance: the standard error (SE), the coefficient of variation (CV) and the design effect.

8.1.1 Standard error

The standard error, defined as the square root of the sampling variance, can be used to calculate a confidence interval associated with an estimate. The confidence interval is built around the resulting estimate and its width depends on the standard error and on a confidence level parameter.

To illustrate, the following example will be considered. In May 2015, the LFS estimate for the unemployment rate of the Canadian population 15 years of age and up was 6.8%, and the standard error associated with this estimate was 0.001395. An approximate 68% confidence interval for the true unemployment rate is then given by 0.068±1x(0.001395), or between 6.66% and 6.94%. The confidence level means that if the same selection and estimation process was repeated several times (leading to different samples and different estimates), 68% of the confi­dence intervals built this way would contain the true population value.

The estimates of change from one month to the next have become more important to users over time. In response, the monthly LFS release now provides the standard errors (SEs) for the provincial and national month-to-month changes for employed and unemployed.

Given their stability, the SEs included in the monthly LFS publication are not updated every month. Instead, an estimate of the SE that corresponds to the average of the SEs from the twelve previous months is provided. These esti­mates are updated twice a year (usually in January and July). The table below provides the SEs observed for the month-to-month change in employment and unemployment estimates for Canadians 15 years of age and up.

Table 8.1
Standard error (SE) of the variation from one month to the next, Employed and Unemployed
Table summary
This table displays the results of Standard error (SE) of the variation from one month to the next. The information is grouped by Province (appearing as row headers), Employed and Unemployed, calculated using thousands units of measure (appearing as column headers).
Province Employed Unemployed
thousands
Newfoundland and Labrador 2.1 2.1
Prince Edward Island 0.6 0.6
Nova Scotia 2.7 2.5
New Brunswick 2.3 2.1
Quebec 15.9 13.7
Ontario 19.3 17.0
Manitoba 2.6 2.1
Saskatchewan 2.7 2.1
Alberta 9.8 8.1
British Columbia 10.6 8.5
Canada 29.5 25.3

8.1.2 Coefficient of variation

The Coefficient of Variation (CV), which is defined as the standard error divided by the estimate, is a relative measure of variation and is usually expressed as a percentage. In the example used earlier, the CV for the May 2015 unemployment rate is 2.05% ((0.001395/0.068)x100%). It gives an indication of the uncertainty associated with the estimates. Small CVs are desirable because they indicate that the sampling variability is small relative to the estimate.

In order to obtain CVs, the users are provided with approximate CV tables. These tables give approximate CVs according to the observed values of the estimates, for various domains. The values are conservative in the sense that, if many survey estimates were to be produced for the same domain, around 75% of the approximated CVs obtained from the tables will be larger than the actual CVs that would be calculated if the precise methods were used. There will, however, be 25% of approximated CVs that will be somewhat lower than the precise calculation. This has the net effect of producing quality indicators that show lower quality of the survey estimates than is actually the case – confidence intervals are wider and statistical tests show fewer significant differences.  These approximate CV tables are updated annually and provided in the Guide to the Labour Force Survey (71-543-G).

8.1.3 Design effect

A third measure derived from the sampling variance is the design effect, a relative measure calculated by dividing the sampling variance of an estimate obtained under the survey design by the sampling variance of a Simple Random Sample (SRS) of the same sample size.  It can also be used to compare the effectiveness of one sample design to another.  In the case of the LFS, it is particularly useful as an indicator of the deterioration of the sample design over time, or as a comparison establishing the gain/loss in efficiency obtained by redesigning the survey or associated with modifying some components of the design.

Different types of design effects can be computed, and each one depends on the data used to establish it.  Below, the term unadjusted design effect will be used to refer to design effects based on non-calibrated weights, meaning without the adjustment that takes the population counts and estimated totals into consideration. The term adjusted design effect will be used to refer to design effects that are based on the final weights, after composite calibration. As a result, the unadjusted design effects are indicative of the effectiveness of the sample design, while the adjusted design effects provide a more general evaluation of the overall strategy adopted by combining all the characteristics of the survey plan (stratification, multi-stage sampling, post-stratification and estimation). The smaller the design effect is, the more effective the design with regard to sampling variance. It should be noted that the unadjusted design effects (sample design) are generally greater than the adjusted design effects (survey plan) based on the final weights, since they do not benefit from the gain in precision from calibration.

The table below presents some values representing the averaged adjusted and unadjusted design effects for the characteristics employment and unemployment at the national and provincial levels, based on survey data from January to August 2015.

Table 8.2
Design effects, Employed and Unemployed, 2015
Table summary
This table displays the results of Design effects. The information is grouped by Province (appearing as row headers), Employed and Unemployed (appearing as column headers).
Province Employed Unemployed
Adjusted Unadjusted Adjusted Unadjusted
Newfoundland and Labrador 0.40 1.78 1.08 1.00
Prince Edward Island 0.31 1.28 1.00 1.03
Nova Scotia 0.35 1.85 1.08 1.17
New Brunswick 0.36 2.20 1.17 1.17
Quebec 0.50 2.70 1.66 1.96
Ontario 0.42 2.92 1.39 1.64
Manitoba 0.32 3.03 1.01 1.19
Saskatchewan 0.34 4.87 1.10 1.11
Alberta 0.48 4.25 1.44 1.66
British Columbia 0.44 3.52 1.44 1.59
Canada 0.54 3.73 1.77 2.08

In the LFS, unadjusted design effects, together with other information, are used to identify regions where the sample design has lost a significant portion of its effectiveness over time. In some cases, a mini-redesign is performed in these regions to remedy this problem.

8.2 Quality indicators related to non-sampling errors

Non-sampling errors are errors that arise during the course of virtually all survey activities, apart from sampling. The impact on the estimates can be seen in the bias and/or variability of the estimates. If these errors are random errors, then their effects will approximately cancel out over a large enough domain, leading solely to increased variability. However, the effect can still be large for small domains or when the characteristics being studied are rare. If the errors are systematic, in the sense that they tend to go in the same direction, this will lead to a bias in the final results. And unlike random errors, the bias linked to systematic errors cannot be reduced by increasing the size of the sample.

The most common sources of non-sampling errors are coverage error, nonresponse, measurement or response errors and processing errors.  Each one is discussed separately in the following sections.

8.2.1 Coverage errors

Coverage errors consist of omissions, erroneous inclusions, duplications and misclassifications of units on the survey frame.  In the case of the LFS, those errors may happen when the list of dwellings associated with a PSU is established or uploaded, when listing maintenance is performed to identify growth, when the dwellings and/or the persons to include in the survey are contacted, or when data are collected and processed. In the LFS, three main indicators are used to measure and monitor the coverage errors: the slippage rate, the vacancy rate and the PSU yield evaluation.

The slippage rate is the relative difference between the population size estimates produced from the pre-calibration weights and the most recent population projection estimates used as calibration totals.

The population projection estimates used to determine the slippage rate can also contain errors, and these errors are one of the factors that contribute to slippage. In the LFS, undercoverage is typically observed, as indicated by a positive slippage rate. To reduce the resulting bias as much as possible, the weight of each respondent is modified by the composite calibration adjustment factor (see Chapter 6).

Undercoverage is the result of omitting dwellings or persons from the target population. An occupied dwelling may not be on the PSU list for various reasons: it was omitted when the list was being established, the building was under construction when it was last verified, there were errors in the cluster delineations, or it was wrongly classified as vacant. It is also possible that persons in the household were overlooked, either because the respondent did not make their existence known or they were classified as being a member of a usual place of residence other than the dwelling sampled. Students are often overlooked since they live elsewhere during their studies, even though their usual residence is in the sample. Therefore, errors can slip into the survey estimates if the characteristics of the individuals not included in the survey differ from those of the individuals included. For example, if the survey does not include a part of the population that is young and highly mobile with higher unemployment rates than the population of the same age in the survey, the slippage biases the unemployment estimates downward.

Slippage is also affected by population growth and nonresponse adjustments.  The population grows between redesigns, and usually in specific places and unevenly.  The selected sample can over- or underestimate this growth, or accurately account for it. For instance, the selected PSUs in an area may experience no growth, but other PSUs on the frame in the same area could be facing significant growth.  In such a case, growth would be underestimated by the selected sample, and if the projected population estimates are in line with the growth that is actually occurring, the slippage rates would become larger for that area.

The adjustments to account for nonresponse (see Chapters 5 and 6) can also influence slippage. For instance, if non-respondent households have fewer members but are represented in the sample, via imputation or nonresponse adjustment factors, by large households, this can affect the slippage rate.

Lastly, as mentioned earlier, the population estimates also play a role in slippage.  The more accurate they are, the more informative the slippage rates are.

Every month, the slippage rates are thoroughly analyzed. They are produced monthly at the national (excluding the territories) and provincial levels and for 12 age-sex groups (15-19, 20-24, 25-29, 30-39, 40-54, 55+). The table below provides the average slippage rates for the 2015 calendar year.

Table 8.3
Average slippage rates - Canada by age group and province, 2015
Table summary
This table displays the results of Average slippage rates - Canada by age group and province % (appearing as column headers).
%
Canada
All ages 11.7
15 to 19 8.2
20 to 24 21.3
25 to 29 21.3
30 to 39 16.3
40 to 54 9.8
55+ 7.0
Newfoundland and Labrador 11.6
Prince Edward Island 16.2
Nova Scotia 12.3
New Brunswick 11.7
Quebec 8.6
Ontario 12.0
Manitoba 9.5
Saskatchewan 13.7
Alberta 15.1
British Columbia 12.8

Dwellings correctly identified as being vacant or invalid do not introduce a bias into the LFS estimates. However, the estimation variance is higher because the sample contains fewer valid households. The LFS inter­viewers return to selected vacant dwellings every month to interview any persons targeted by the survey who may have moved in since the previous month. Non-existent dwellings are simply removed from the survey frame. Special attention must be given when deter­mining vacant dwellings since they have a direct influence on two other indicators. If a dwelling is coded as vacant but its occupants are just temporarily absent, the nonresponse rate produced for the LFS will be underestimated. Furthermore, the slippage rate will be overestimated since this wrongly coded dwelling should have been considered when deter­mining the rate. It is therefore important for interviewers to do a thorough job when determining whether a dwelling is vacant, and therefore out of the scope of the survey, or quite simply occupied by a temporarily absent household, and therefore within the scope of the survey. Vacancy rates are also produced and monitored on a monthly basis.

The table below presents the average vacancy rates and the minimum and maximum values for 2015 at the provincial and national levels.

Table 8.4
Vacancy rate (unweighted), Canada and the Provinces, 2015
Table summary
This table displays the results of Vacancy rate (unweighted). The information is grouped by Province (appearing as row headers), Average, Maximum and Minimum, calculated using % units of measure (appearing as column headers).
Province Average Maximum Minimum
%
Newfoundland and Labrador 15.4 16.4 14.5
Prince Edward Island 21.1 23.4 19.8
Nova Scotia 18.1 18.9 17.4
New Brunswick 17.1 18.1 16.2
Quebec 12.8 13.3 12.2
Ontario 11.5 11.8 11.2
Manitoba 14.1 15.5 12.0
Saskatchewan 14.3 15.8 12.6
Alberta 14.6 15.2 13.7
British Columbia 12.3 13.2 11.9
Canada 13.7 14.1 13.0

For this quality indicator, there is some variability observed between provinces.  This is linked to the proportion of seasonal dwellings owned varying from province to province. Seasonal dwellings are always considered as vacant, because they are not the usual place of residence for any of the occupants.

The yields of the PSUs are monitored monthly to detect any large differences between the number of dwellings surveyed in the field and the number of dwellings anticipated by the sample design. As a result, any significant discre­pancy, such as 50% (positive or negative), between a DUF extract and the survey field results is reviewed. First, all clusters with an unexpected count are brought to the attention of the unit in Ottawa responsible for controlling the sample, which verifies the cluster boundaries and the expected number of dwellings. If the discrepancy cannot be explained at the central office, the cluster is sent to the regional office in question for an in-depth analysis. All the causes that explain the discrepancies are filed for future reference.

This control plays an important role, because if the sample size requires changes, it is vital to know which regions are undersampled or oversampled. In addition, the discrepancies recorded can turn out to be problems for the survey and taint the quality of the LFS data.

All of these indicators (slippage rate, vacancy rate, and PSU yield) serve to detect potential problems with the sample coverage and to assist in taking any necessary action. Examples of possible actions are to put together training tools for inter­viewers to increase their knowledge of the household composition rules, to distribute a newsletter explaining slippage or the concept of multiple dwellings, or establish a program to relist a certain number of PSUs considered to be growing.

8.2.2 Nonresponse

Every month during the survey week, the inter­viewers are busy determining which selected dwellings contain persons eligible for the survey. Dwellings are identified as ineligible for the survey month for the following reasons:

When a dwelling is identified as eligible for the survey, it is not always possible to do an interview. This is called household nonresponse and can occur due to any number of reasons such as: no one at home, temporary absence, interview impossible (inclement weather, unusual circumstances in the household, etc.), technical problems, or refusal.

The magnitude of the bias due to nonresponse is usually not known, but it is directly linked to the differences in characteristics between the groups of responding units and the groups of non-responding units. Since the effect of this bias grows as the nonresponse rate increases, efforts are made to maintain the response rate as high as possible during collection.

The table below presents the average nonresponse rates as well as the minimum and maximum rates for 2015.

Table 8.5
Nonresponse rates (unweighted), Canada and the Provinces, 2015
Table summary
This table displays the results of Nonresponse rates (unweighted). The information is grouped by Province (appearing as row headers), Average, Maximum and Minimum, calculated using % units of measure (appearing as column headers).
Province Average Maximum Minimum
%
Newfoundland and Labrador 11.2 13.0 9.9
Prince Edward Island 10.9 12.2 9.0
Nova Scotia 10.3 11.3 9.7
New Brunswick 11.4 12.5 10.5
Quebec 10.2 11.9 8.2
Ontario 13.9 15.4 12.6
Manitoba 11.7 12.8 10.3
Saskatchewan 11.9 12.8 11.1
Alberta 12.8 14.0 11.5
British Columbia 11.7 12.6 10.6
Canada 12.0 13.1 11.2

Every month, the LFS produces nonresponse rates by cause (simple refusal, no contact, temporary absence, technical problem or other reason) and also by collection mode. These rates are carefully analyzed to identify the major causes of the nonresponse and to make any necessary corrections.

Refusal rates for the LFS are usually very low, with monthly Canadian rates varying between 1% and 2%. The refusal rates are usually similar across provinces, but can dip as low as 0.5% or climb as high as 3%.  To a certain extent, the collection system makes it possible to get more information on the reason for refusal, and thus allows tracking of the changes in respondents’ attitudes toward the survey over time.

8.2.3 Measurement or Response errors

Measurement or response errors can be the result of the questionnaire design, how the questions are formulated, the respondent’s comprehension, the way the interview is conducted, or the general survey conditions. They can occur when the information is provided, received, or entered into the computer. However, with the computerized collection method, it is possible to reduce some of these errors, since some verification rules are integrated into the collection instrument and conflicts must be resolved during the interview. Nevertheless, the respondent may incorrectly interpret the question, not know the answer, or have forgotten or altered the facts for personal reasons. In addition, interviewers can unintentionally re-interpret responses. As in the other error categories, response errors may lead to an increase in the variance and/or the presence of bias.

The proxy responses provided by one household member when information is collected about another household member can also lead to response errors. However, those errors are considered preferable to the nonresponse errors that would have to be dealt with if responses were only accepted by the respondent for him or herself. Currently, about 60% of the LFS information is provided by proxy and this rate remains fairly stable through time.

In repeated surveys, in which the sample consists of a certain number of panels or rotation groups, the expected value of estimates varies slightly from one rotation group to another. This is called rotation group bias. With regard to the LFS, this bias is at its highest level for the sixth of the sample in its first interview. It is possible to calculate the rotation effect by taking the ratio between an estimate calculated for the part of the sample participating in the survey a certain number of times (first month, second month, etc.) and the estimate calculated for the entire sample.

Brisebois and Mantel (1996) calculated a modified rotation effect that takes into account the differences in the effects of sampling errors for the six rotation groups. Their study revealed several statistically significant differences between the rotation groups, but the overall effect was determined to be minor.

8.2.4 Processing errors

Processing errors can occur at various stages of the survey, such as input, validation, verification, coding, imputation, weighting and data tabulation.

The computerized collection method helps to prevent skip errors during data input, since the application determines the flow of questions. Similarly, certain verification rules are integrated into the collec­tion system to detect and correct discrepancies at the time of the interview.

The variables “occupation” and “industry” are coded to classification standards at the central office. In the first month of interviews, the interviewer collects information that accurately describes the type of company, industry or service in which the person works and that clearly and accurately indicates the type of work or nature of his/her duties. The first type of information is used to determine the industry, while the second type serves to identify the occupation. One of the first processing steps at the central office consists of coding the descriptive information collected for the variables “occupation” and “industry” based on the standard classification for these variables, NOC and NAICS. Monthly quality control processes are in place to evaluate the precision of this coding process.

The imputation rate is also a quality indicator with regard to data processing. Every month, diagnostics evaluating the results of the imputation process are produced and carefully examined. The diagnostics give information on the number of records treated by each imputation method and at each level of collapsing (see Chapter 5). The respective profiles of the non-imputed records and of the imputed ones are compared, as well as their respective contribution to the survey key estimates. This makes it possible to control the imputation quality and take the necessary actions.

To avoid errors likely to occur at the estimation and tabulation steps, a pre-release evaluation tool has been built.  With the help of this tool, it is possible to highlight variables, subgroups and/or domains for which the estimates and/or the standard errors are unusually distant from their respective historical averages. These estimates can then be more specifically investigated to see if any error is responsible for the sudden change. In addition to this, comparisons with other data sources are performed regularly, to verify if the LFS data is in line with other economic developments.

8.2.5 Monitoring of collection procedures  

The collection application produces paradata files that contain a host of information on the activities of interviewers in the field and in call centres. Using these files, it is possible to produce quality indi­cators on the interviewers’ activities. The LFS regularly analyzes the calls and visits made by the interviewers. Reports produced include, among others, information on the duration of the interviews (in person and over the telephone), the number of attempts to reach a respondent, and the number of cases transferred from one collection mode to another. Using this source of information, it is relatively easy to check whether the interviewers strictly follow the collection procedures and to take action in questionable cases These indicators can also be used to improve the training program for interviewers and strengthen certain components, such as task planning or the work schedule.

8.3 LFS committees

The LFS needs several coordination groups to see that the survey runs smoothly. Two permanent committees are described below. Their mandates include looking after permanent operations and evaluating the survey on a regular basis.

8.3.1 Operations Committee

The mandate of this committee is to monitor the activities that occur during each survey month and the circumstances surrounding the conduct of the survey, to ensure that the operations run smoothly, and to examine proposed changes and recommend whether they should be adopted. The Operations Committee is chaired by a senior member of Labour Statistics Division and meets every week.

8.3.2 Data Quality Committee

The committee, which was officially created in the spring of 1972, has the mandate of examining, evaluating and documenting monthly survey quality, and advising on any aspect of quality that needs attention.  It also initiates and reviews ad hoc studies and investigations related to methods and procedures affecting data quality, and makes recommendations based on its findings. This committee is chaired by a member of the Household Survey Methods Division.

To ensure the best data quality possible, the Data Quality Committee periodically examines the different quality indicators described earlier. It meets every month to examine and assess the quality of the monthly data and to make suggestions and recommendations on any survey aspect likely to improve quality. By closely following the evolu­tion of the quality indicators, the committee can intervene immediately with those in charge of the LFS activities in question to control the quality of the monthly data. The committee also discusses new developments that are likely to influence the quality of data that has just been collected or will be collected in the future, especially changes to the collection methods or the questionnaire, unusual problems in the field, ongoing testing of processes and methods, etc.

8.4 Resources available regarding LFS data quality

There are multiple other resources with information on various aspects of LFS data quality.  This section will describe a few of them.

8.4.1 The Daily

The Labour Force Survey measures the current state of the Canadian labour market. Thanks to the data collected by the LFS, it is possible to produce various types of estimates (monthly estimate, estimate of change from one month to the next, three month moving average, etc.) for many different characteristics (labour force status, hours worked, multiple job holders, etc.), over thousands of domains (national, provincial, subprovincial, age-sex groups, etc.). Statistics Canada publishes the LFS estimates on a monthly basis, only ten days after the end of collection.  The publication of new LFS estimates, which usually takes place on the first Friday of the month, is announced through The Daily, Statistics Canada’s official release bulletin, and is accompanied by a short analysis of the current labour market.  The press release includes information on specific aspects of the survey, such as upcoming revisions, newly available reports and products, date of next release…    

8.4.2 The Labour Force Survey web page

The Labour Force Survey web page, on Statistics Canada web site, has detailed information on many aspects of the survey, including quality.  In particular, it contains information on the quality evaluation process and the various sources of data to which the LFS estimates are compared to see if labour market trends are in line with general economic performance. It also features a summary of the changes that occurred to the data or to the estimates through the years.       

8.4.3 The Guide to the Labour Force Survey

The Guide to the Labour Force Survey (71-543-G) is a valuable source of information on survey concepts, classifications and definitions. It also provides guidelines and assistance on the comparison of LFS estimates across surveys (such as with the Survey of Employment, Payrolls and Hours) or across countries (such as with the USA). Appendix C also contains the Labour Force questionnaire. 

8.4.4 Access to LFS data

For users interested in the most common LFS estimates, the CANSIM tables are likely to have the information that is sought. Various types of estimates are provided for various domains and disclosure rules are applied to protect confidentiality.

For more specific situations, users may want to use the monthly released Public Use Micro-data File (71M0001X). This product is for users who prefer to do their own analysis and allows them to focus on specific subgroups in the population or cross-classify variables that are not in the catalogued products. Users can then submit requests on a cost-recovery basis to obtain variance estimates associated with their particular needs.

A Research Data Centre (RDC) provides access to Statistics Canada’s confidential microdata files. They are accessible only to researchers with approved projects who have been sworn in as "deemed employees" of Statistics Canada. The RDC confidential microdata files contain most of the original information collected during the survey interview with the subject as well as derived variables added to the dataset afterwards. They also contain the bootstrap weights used to calculate the variance estimates, which are available only in the Master file. RDCs are located throughout the country. The following web site has more information: www.statcan.gc.ca/eng/rdc/index.

The Real Time Remote Access (RTRA) complements existing methods of access to confidential micro-data. Using a secure username and password, the RTRA provides around the clock access to survey results from any computer with internet access. Confidentiality of the micro data is automated in the RTRA system, eliminating the need for manual intervention and allowing for rapid access to results. In order to utilize the RTRA Program, applicants must complete an application form. More information is provided on this web page: www.statcan.gc.ca/eng/rtra/rtra.

 
Date modified: