|
|
Survey steps >
Scope and purpose
Data collection is any process whose purpose is to acquire or
assist in the acquisition of data. Collection is achieved by requesting
and obtaining pertinent data from individuals or organizations via an
appropriate vehicle (see section on Questionnaire
design). If no information is obtained initially, or if the data are
deemed unsuitable as identified by preliminary editing, follow-up contacts
may be initiated as part of data collection (see section on Editing).
Data Capture refers to any process that converts the information
provided by a respondent into electronic format suitable for use by subsequent
processes. Sometimes data are captured as part of the collection process
in surveys using instruments such as CAPI, CATI and EDR. At other times,
a separate operation needs to be set up for capturing data by manual key
entry or automated means (e.g., ICR). Often this conversion of data involves
either manual or automated coding, and sometimes it includes
transmitting the data to another location.
The impact of data collection and capture operations on data quality
is both direct and critical, as these data are the primary inputs of a
survey-taking agency. The quality of these operations thus has a very
high impact on the quality of the final product.
Principles
Respondents, or data suppliers, especially individuals and organizations
who complete questionnaires, invariably without payment, are a survey-taking
organization’s most valuable resource. To ensure continuing cooperation,
it is essential to minimize the burden on respondents. Gaps or inconsistencies
in the data are best corrected by consulting respondents themselves during
data collection or very soon afterwards. Given data collection and capture
operations' high impact on data quality, use of appropriate quality and
performance measurement tools to manage these processes and provide objective
measures to supervisors and clients is highly recommended. Throughout
the process, appropriate steps must be taken to preserve the confidentiality
of the information collected (see section on Disclosure
control).
Guidelines
- Interviewers and data capture operators are critical to the success
of most data collection and capture operations. Ensure that they have
appropriate training and tools (e.g., training manuals, see Burgess
and Brierley, 1995).
- Exploit available technology to improve the efficiency and quality
of data collection and capture processes. Advances in communications
and computing technology offer opportunities to greatly reduce the costs
and risks associated with these processes. For example, computer-assisted
survey interviewing (e.g., CAPI and CATI) and electronic data reporting
(EDR) via the Internet, automated data entry (using ICR) and automated
coding by text recognition (ACTR) are approaches that take advantages
of available technologies.
- Carefully control paper questionnaire delivery operations in mail
surveys to ensure that each unit that has been selected to be in the
survey receives the appropriate questionnaire. Once the questionnaire
is returned, verify the accuracy of the coverage information and the
quality of the data provided. Follow-up interviews may be needed in
some cases. When no questionnaire is received, follow-up activities
are necessary to establish the status of the unit (e.g., occupied or
unoccupied; in business or out of business) and to obtain the missing
information. Through all these steps, put in place a system to report
on the completion status of each unit.
- Establish appropriate sample control procedures for all data collection
operations. Such procedures track the status of sampled units from the
beginning through the completion of data collection so that data collection
managers and interviewers can assess progress at any point in time.
Sample control procedures and feedback from them are also used to ensure
that every sampled unit is processed through all data collection and
capture steps, with a final status being recorded.
- Institute effective control systems to ensure the security of data
capture, transmission and handling. Prevent loss of information and
the resulting loss in quality due to system failures or human errors.
- When collecting data, ensure that the respondent or the appropriate
person within the responding household or organization is contacted
at the appropriate time so that the information is readily available.
Allow the respondent to provide the data in a method and format that
is convenient to them or their organization. This will help increase
response rates and improve the quality of the information obtained from
the respondents.
- In designing data collection processes, especially editing and coding,
make sure that the procedures are applied to all units of study as consistently
and in as error-free a manner as possible. Automation is desirable.
Enable the staff or systems to refer difficult cases to a small number
of knowledgeable experts. Centralize the processing in order to reduce
costs and make it simpler to take advantage of available expert knowledge.
Given that there can be unexpected results in the collected information,
use processes that can be adapted to make appropriate changes if found
necessary from the point of view of efficiency.
- Monitor the frequency of edit rejects, the number and type of corrections
applied by stratum, collection mode, processing type, data item and
language of the collection. This will help in evaluating the quality
of the data and the efficiency of the editing function.
- Expenditure, performance and quality measures gathered during the
data collection operation enable the survey manager to make decisions
regarding the need for modification or redesign of the process. Track
actual costs of postage, telephone calls, collection vehicle production,
computing, and person-day consumption. Important quality measures include
response rates, processing error rates, follow-up rates and counts of
nonresponse by reason. When these measures are available at all levels
at which estimates are produced and at various stages of the process,
they can serve both as performance measures and measures of data quality
(see section on Response and nonresponse).
- Manual data capture from paper questionnaires or scanned images is
subject to keying errors. Incorporate on-line edits for error conditions
that the data capture operator can correct (i.e., edits that will identify
keying errors). Record these cases for later review and analysis.
- Implement verification procedures to assess how well operators are
meeting the pre-established levels of keying error rates.
- Use statistical quality control methods to assess and improve the
quality of collection and capture operations. Collect and analyze quality
control measures and results in a manner that would help identify the
major root causes of error. Provide feedback reports to managers, staff,
subject-matter specialists and methodologists. These reports should
contain information on frequencies and sources of error (see Mudryk
et al, 1994, 1996 and 2002; Mudryk and Xiao, 1996). Various software
tools are available to help in this regard. These include the Quality
Control Data Analysis System (QCDAS) and NWA Quality Analyst (see Mudryk,
Bougie and Xie, 2002).
- Use measures of quality and productivity to provide feedback at the
interviewer or operator level, as well as to identify error-causing
elements in the design of the collection vehicle or its processing procedures.
- Use subsequent survey processes to gather useful information regarding
quality that can serve as signals that collection and capture procedures
and tools may require changes for future survey cycles. For example,
the editing or data analysis stages may suggest the possibility of response
bias or other collection-related problems.
- Conduct a post mortem evaluation of all data collection and capture
operations, and document the results for future use.
References
Burgess, M.J. and Brierly, R. (1995). A self-directed training course for
monitors of CATI operations. Operations Research and Development Division,
Statistics Canada.
Couper, M.P., Baker, R.P., Bethlehem, J., Clark, C.Z.F., Martin, J.,
Nicholls II, W.L. and O’Reilly, J. (eds.) (1998). Computer
Assisted Survey Information Collection. Wiley, New York.
Dielman, L. and Couper, M.P. (1995). Data quality in a CAPI survey: keying
errors. Journal of Official Statistics, 11, 141-146.
Dufour, J. (1996). Labour Force Survey data quality. Statistics Canada,
Methodology Branch Working Paper No. HSMD-96-002E/F.
Dufour, J., Kaushal, R., Clark, C. and Bench, J. (1995). Converting the
Labour Force Survey to computer-assisted interviewing. Statistics Canada,
Methodology Branch Working Paper No. HSMD-95-009E.
Groves, R.M. (1989). Survey Errors and Survey Costs.
Wiley, New York.
Groves, R.M., Biemer, P., Lyberg, L., Massey, J., Nicholls, W. and Waksberg,
J. (eds.) (1988). Telephone Survey Methodology. Wiley,
New York.
Lyberg, L., Biemer, P., Collins, M., de Leeuw, E., Dippo, C., Schwarz,
N. and Trewin, D. (eds.) (1997). Survey Measurement and Process
Quality, Wiley, New York.
Mudryk, W. and Xie, H. (2002). Quality control application in ICR data
capture for the 2001 Census of Agriculture. Proceedings of the
Section on Quality and Productivity, American Statistical Association,
2424-2429.
Mudryk, W. and Xiao, P. (1996). Quality control methodology for LFS industry
and occupation coding operations. Statistics Canada technical report.
Mudryk, W., Bougie, B. and Xie, H. (2002). Some guidelines for data analysis
in quality control. Statistics Canada technical report.
Mudryk, W., Burgess, M.J. and Xiao, P. (1996). Quality control of CATI
operations in Statistics Canada. Proceedings of the Section on
Survey Research Methods, American Statistical Association, 150-159.
Mudryk, W., Croal, J. and Bougie, B. (1994). Generalized Data Collection
and Capture (DC2): Release 2.5.1, Sample Verification (SV). Statistics
Canada technical report.
Williams, K., Denyes, C., March, M. and Mudryk, W. (1996). Quality measurement
in survey processing. Proceedings of Symposium 96: Nonsampling
Errors, Statistics Canada, 119-128.
|