12-539 Data Quality Guidelines

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

Survey steps >

Data collection and capture operations

Scope and purpose

Data collection is any process whose purpose is to acquire or assist in the acquisition of data. Collection is achieved by requesting and obtaining pertinent data from individuals or organizations via an appropriate vehicle (see section on Questionnaire design). If no information is obtained initially, or if the data are deemed unsuitable as identified by preliminary editing, follow-up contacts may be initiated as part of data collection (see section on Editing).

Data Capture refers to any process that converts the information provided by a respondent into electronic format suitable for use by subsequent processes. Sometimes data are captured as part of the collection process in surveys using instruments such as CAPI, CATI and EDR. At other times, a separate operation needs to be set up for capturing data by manual key entry or automated means (e.g., ICR). Often this conversion of data involves either manual or automated coding, and sometimes it includes transmitting the data to another location.

The impact of data collection and capture operations on data quality is both direct and critical, as these data are the primary inputs of a survey-taking agency. The quality of these operations thus has a very high impact on the quality of the final product.

Principles

Respondents, or data suppliers, especially individuals and organizations who complete questionnaires, invariably without payment, are a survey-taking organization’s most valuable resource. To ensure continuing cooperation, it is essential to minimize the burden on respondents. Gaps or inconsistencies in the data are best corrected by consulting respondents themselves during data collection or very soon afterwards. Given data collection and capture operations' high impact on data quality, use of appropriate quality and performance measurement tools to manage these processes and provide objective measures to supervisors and clients is highly recommended. Throughout the process, appropriate steps must be taken to preserve the confidentiality of the information collected (see section on Disclosure control).

Guidelines

Interviewers and data capture operators are critical to the success of most data collection and capture operations. Ensure that they have appropriate training and tools (e.g., training manuals, see Burgess and Brierley, 1995).

Exploit available technology to improve the efficiency and quality of data collection and capture processes. Advances in communications and computing technology offer opportunities to greatly reduce the costs and risks associated with these processes. For example, computer-assisted survey interviewing (e.g., CAPI and CATI) and electronic data reporting (EDR) via the Internet, automated data entry (using ICR) and automated coding by text recognition (ACTR) are approaches that take advantages of available technologies.

Carefully control paper questionnaire delivery operations in mail surveys to ensure that each unit that has been selected to be in the survey receives the appropriate questionnaire. Once the questionnaire is returned, verify the accuracy of the coverage information and the quality of the data provided. Follow-up interviews may be needed in some cases. When no questionnaire is received, follow-up activities are necessary to establish the status of the unit (e.g., occupied or unoccupied; in business or out of business) and to obtain the missing information. Through all these steps, put in place a system to report on the completion status of each unit.

Establish appropriate sample control procedures for all data collection operations. Such procedures track the status of sampled units from the beginning through the completion of data collection so that data collection managers and interviewers can assess progress at any point in time. Sample control procedures and feedback from them are also used to ensure that every sampled unit is processed through all data collection and capture steps, with a final status being recorded.

Institute effective control systems to ensure the security of data capture, transmission and handling. Prevent loss of information and the resulting loss in quality due to system failures or human errors.

When collecting data, ensure that the respondent or the appropriate person within the responding household or organization is contacted at the appropriate time so that the information is readily available. Allow the respondent to provide the data in a method and format that is convenient to them or their organization. This will help increase response rates and improve the quality of the information obtained from the respondents.

In designing data collection processes, especially editing and coding, make sure that the procedures are applied to all units of study as consistently and in as error-free a manner as possible. Automation is desirable. Enable the staff or systems to refer difficult cases to a small number of knowledgeable experts. Centralize the processing in order to reduce costs and make it simpler to take advantage of available expert knowledge. Given that there can be unexpected results in the collected information, use processes that can be adapted to make appropriate changes if found necessary from the point of view of efficiency.

Monitor the frequency of edit rejects, the number and type of corrections applied by stratum, collection mode, processing type, data item and language of the collection. This will help in evaluating the quality of the data and the efficiency of the editing function.

Expenditure, performance and quality measures gathered during the data collection operation enable the survey manager to make decisions regarding the need for modification or redesign of the process. Track actual costs of postage, telephone calls, collection vehicle production, computing, and person-day consumption. Important quality measures include response rates, processing error rates, follow-up rates and counts of nonresponse by reason. When these measures are available at all levels at which estimates are produced and at various stages of the process, they can serve both as performance measures and measures of data quality (see section on Response and nonresponse).

Manual data capture from paper questionnaires or scanned images is subject to keying errors. Incorporate on-line edits for error conditions that the data capture operator can correct (i.e., edits that will identify keying errors). Record these cases for later review and analysis.

Implement verification procedures to assess how well operators are meeting the pre-established levels of keying error rates.

Use statistical quality control methods to assess and improve the quality of collection and capture operations. Collect and analyze quality control measures and results in a manner that would help identify the major root causes of error. Provide feedback reports to managers, staff, subject-matter specialists and methodologists. These reports should contain information on frequencies and sources of error (see Mudryk et al, 1994, 1996 and 2002; Mudryk and Xiao, 1996). Various software tools are available to help in this regard. These include the Quality Control Data Analysis System (QCDAS) and NWA Quality Analyst (see Mudryk, Bougie and Xie, 2002).

Use measures of quality and productivity to provide feedback at the interviewer or operator level, as well as to identify error-causing elements in the design of the collection vehicle or its processing procedures.

Use subsequent survey processes to gather useful information regarding quality that can serve as signals that collection and capture procedures and tools may require changes for future survey cycles. For example, the editing or data analysis stages may suggest the possibility of response bias or other collection-related problems.

Conduct a post mortem evaluation of all data collection and capture operations, and document the results for future use.

Top of Page

References

Burgess, M.J. and Brierly, R. (1995). A self-directed training course for monitors of CATI operations. Operations Research and Development Division, Statistics Canada.

Couper, M.P., Baker, R.P., Bethlehem, J., Clark, C.Z.F., Martin, J., Nicholls II, W.L. and O’Reilly, J. (eds.) (1998). Computer Assisted Survey Information Collection. Wiley, New York.

Dielman, L. and Couper, M.P. (1995). Data quality in a CAPI survey: keying errors. Journal of Official Statistics, 11, 141-146.

Dufour, J. (1996). Labour Force Survey data quality. Statistics Canada, Methodology Branch Working Paper No. HSMD-96-002E/F.

Dufour, J., Kaushal, R., Clark, C. and Bench, J. (1995). Converting the Labour Force Survey to computer-assisted interviewing. Statistics Canada, Methodology Branch Working Paper No. HSMD-95-009E.

Groves, R.M. (1989). Survey Errors and Survey Costs. Wiley, New York.

Groves, R.M., Biemer, P., Lyberg, L., Massey, J., Nicholls, W. and Waksberg, J. (eds.) (1988). Telephone Survey Methodology. Wiley, New York.

Lyberg, L., Biemer, P., Collins, M., de Leeuw, E., Dippo, C., Schwarz, N. and Trewin, D. (eds.) (1997). Survey Measurement and Process Quality, Wiley, New York.

Mudryk, W. and Xie, H. (2002). Quality control application in ICR data capture for the 2001 Census of Agriculture. Proceedings of the Section on Quality and Productivity, American Statistical Association, 2424-2429.

Mudryk, W. and Xiao, P. (1996). Quality control methodology for LFS industry and occupation coding operations. Statistics Canada technical report.

Mudryk, W., Bougie, B. and Xie, H. (2002). Some guidelines for data analysis in quality control. Statistics Canada technical report.

Mudryk, W., Burgess, M.J. and Xiao, P. (1996). Quality control of CATI operations in Statistics Canada. Proceedings of the Section on Survey Research Methods, American Statistical Association, 150-159.

Mudryk, W., Croal, J. and Bougie, B. (1994). Generalized Data Collection and Capture (DC2): Release 2.5.1, Sample Verification (SV). Statistics Canada technical report.

Williams, K., Denyes, C., March, M. and Mudryk, W. (1996). Quality measurement in survey processing. Proceedings of Symposium 96: Nonsampling Errors, Statistics Canada, 119-128.

Home \| Search \| Contact Us \| Français
Date Modified: 2014-04-10	Important Notices