Chapter 3.4: Data collection planning and management
Data collection is typically the greatest expense of any statistical program. To optimize the use of resources, collection operations should be organized, planned and conducted as efficiently as possible. Burden should also be minimized for both respondents and the organization. Interviewing practices should be consistent across statistical programs and should reflect the highest quality standards; and, timeliness should always be seen as an underlying objective of data collection enhancement initiatives.
In the context of modernization, increased demand for statistics, and declining response rates, national statistical offices (NSOs) must also find innovative ways to improve how they manage their data collection. They must take full advantage of new technologies to gain in efficiency while improving the quality of the data collected. These realities do not come without challenges. Investments and changes in data collection management practices, tools and infrastructures usually have a significant impact on the data production process as a whole. For example, the introduction of computer-assisted technology will modify traditional questionnaire design and data processing practices.
This chapter focuses on strategies, mechanisms and tools that could be implemented to improve efficiency in collection activities and to modernize data collection operations. Please note that questions related to respondents' relations are presented in Chapter 4.5: Relations with survey respondents.
Strategies, mechanisms and tools
This section describes the four major components of data collection that, when efficiently managed, allow statistical offices to modernize their organizations and improve efficiencies:
- Collection governance
- Collection planning
- Collection practices and tools
- Collection monitoring
1. Collection governance
There are many governance approaches to organizing collection operations. The ultimate indicator for the adequacy of the chosen governance structure and mechanisms, however, is how well they satisfy the following strategic goals:
- Effective planning and coordination of capacity and capabilities across the collection infrastructure;
- Continuous improvement of efficiency, quality and timeliness;
- Effective reduction of response burden and costs; and
- Modernization of collection tools and processes with minimal or acceptable levels of risk.
The most appropriate governance structure is probably one that provides an optimal balance of centralization and decentralization in the statistical system context. Decentralized collection activities at the regional level or by statistical domain can sometimes lead to better quality and, to some extent, reduced costs because local knowledge can be used to improve respondent relations and increase efficiency by reducing the cost of follow-up of incomplete questionnaires, refusals, other non-response, etc. On the other hand, a centralized structure tends to facilitate planning and coordination while reducing duplication of effort. It also facilitates the implementation of consistent and standard concepts and procedures across regions and survey programs, which has an ultimate impact on data quality.
At Statistics Canada, all collection activities have been centralized under the Collection and Regional Services Branch (CRSB). Creating this centre of expertise for collection planning, development, and research, and for collection management was one of the first priorities of the Corporate Business Architecture (see Chapter 3.1: Corporate Business Architecture). Under this modernization initiative, all collection activities that were being conducted in various subject-matter areas were integrated under the CRSB, allowing Statistics Canada to find improvements and savings in its collection process, strengthen the links between all collection partners, enhance respondent relations and facilitate the modernization of collection processes and systems regardless of mode.
The CRSB includes the following units:
- The Collection Planning and Research Division (CPRD—at head office), is the centre of expertise for survey planning, survey development, and survey-collection research. CPRD's responsibilities include (1) the planning of capacity and capabilities across the collection infrastructure; (2) the delivery of coordinated collection services for household and business surveys (initial consultations, feasibility assessments, and cost estimates); (3) the development of collection strategies for all collection methods, including electronic questionnaires (EQ), computer-assisted telephone interviewing (CATI); computer-assisted personal interviewing (CAPI); paper and pencil interviewing (PAPI); and multi-mode collection; (4) the testing and support of survey-collection applications and IT-related infrastructures, and; (5) survey-collection research to improve the quality, timeliness and cost of survey collection as well as identifying ways to reduce respondent burden.
- Regional offices coordinate all modes of survey collection activities, in accordance with negotiated budgets and survey requirements. Regional offices are responsible for day-to-day management of collection operations, respondent relations, and the identification and resolution of problems related to collection as they arise.
- Statistical Survey Operations (SSO) is a separate employer managed by the staff of the Regional Offices. This organization collects information from survey respondents on their business, household or institution on behalf of Statistics Canada. SSO interviewers play a crucial role in explaining the importance of Statistics Canada surveys and engaging respondents to participate in them. Approximately 2,000 employees work for SSO and their workload depends on the survey collection volume and schedule. SSO has one employee classification group, which consists of two levels: interviewers and senior interviewers. There are also two types of interviewers: (1) field interviewers who carry out survey activities, in person, outside Statistics Canada offices, known as CAPI, and (2) interviewers who conduct interviews by telephone, known as CATI, and who work primarily from Statistics Canada regional offices..
In terms of coordination within the overall organizational governance, the CRSB seeks advice and guidance from the Agency's Collection Planning Committee. The mandate of this committee is to review and recommend strategies for generating efficiencies in collection, respondent relations, and operations in terms of cost, quality and timeliness. The primary objectives are to
- review and recommend approaches to optimize Statistics Canada's collection services in terms of methods, operations, cost-effectiveness, and minimizing of response burden;
- review and recommend actions or best practices to optimize response rates, data quality, and respondent outreach in support of data collection activities;
- review demand and priorities for collection services and to recommend necessary capacity adjustments, and
- review and recommend internal rates for collection services to Executive Management Board.
The committee also develops an annual work program and reports progress directly to the Executive Management Board, with a view to ensuring coherence between all management strategies and practices within the organization.
2. Collection planning
A good collection planning system should allow NSOs to effectively prioritize collection activities, coordinate capacity and capabilities across the collection infrastructure, and establish roles and responsibilities regarding all aspects linked to collection, including execution, assessment, monitoring, contingency planning and security, as well as the related communications strategy. The requirements of the subject-matter divisions—and the means by which, and the extent to which, these requirements are expected to be met—are examined and agreed on. It is during this phase that the resources, the funding requirements, and the schedule of activities are developed. The quality of the planning phase is crucial to the quality of the entire project. Good planning requires good management skills and knowledgeable and experienced people.
The following are key considerations in planning survey collection activities:
- Survey objectives (information needs, primary uses and users of the data, concepts, survey content, quality requirements by domains of interest and analysis plan);
- Survey frame (target population, area vs. list frames, use of existing frame vs. cost to build new one and quality of frame, including contact information);
- Sample design (statistical units, sample size, sample distribution);
- Questionnaire design (i.e., questionnaire content, types of questions (closed vs. open answers), question sensitivity, order of questions, average length of questionnaire);
- Collection method (i.e., self-enumerated vs. interview-assisted, face-to-face vs. telephone interviews, paper-and-pencil, computer-assisted or electronic questionnaires, use of electronic digital assistance devices.). Choosing the appropriate collection method is crucial and should take into account the participation rate, respondent burden, and budgetary and operational constraints. Refer to box 3.6.1 (at the end of this chapter), which identifies the collection methods used by statistical organizations and their advantages and disadvantages with regard to the entire data collection and production process;
- General consideration related to collection management (staff availability; labour market; remoteness of the target population; respondent cooperation;; transportation expense costs; average costs related to the printing or design of collection tools; language of respondents and translation costs, etc.).
Once the key planning decisions are taken, the project schedule must be created. The creation of the project schedule consists in preparing the list of activities and tasks along with timelines, key milestones, and the name(s) of the person(s) responsible for completing them. Steps include integrating lessons from the last collection exercise through a review of the post-mortem document and a peer review of the new survey procedures by collection committee. Steps also include an analysis of the interviewer capacity to determine the following: where the work will be allocated; which Management Information System reports should be used to monitor collection, and determine the parameters, conditions, and key performance indicators for identifying critical collection milestones. Finally a Collection and Operations Service Agreement is drafted and signed off by all service areas involved in all stages of the process, from survey development to collection.
At Statistics Canada, planning has been greatly improved by the creation of a single organizational point of entry for all subject-matter divisions requiring collection services—the Collection Front Door. This service is responsible for the initial collection feasibility assessment stage, during which survey specifications are reviewed, data-collection process flows are identified, required capacity is assessed, and preliminary cost estimates are prepared. Having collection experts assisting survey managers in the design of their surveys and the establishment of collection-related costs improve quality and efficiency with regard to the general planning and the conduct of surveys.
3. Collection management practices and tools
Within the collection structure, interview-based collection activities are usually performed through a direct hierarchy involving collection project managers, supervisors and interviewers. Project managers play a liaison role with the survey programs and ensure that collection requirements, such as quality standards and response rates, are met. Collection schedules, timetables for data collection, details of survey milestones, targeted response rates, and reporting periods can facilitate their work.
Supervisors are usually responsible for hiring, training and monitoring interviewers; distributing the workload among the available resources; and addressing day-to-day management problems. Supervisors' manuals assist them in performing their duties. These useful tools typically include guidelines on the following: hiring and training interviewers; creating interviewer assignments; occupational health and safety; quality and performance controls; logistics, security and privacy protection; conversion of refusals to respondents; and methods for dealing with disabled persons, language problems, or special considerations applicable to respondents.
Interviewers can be responsible for several of the following functions: listing, interviewing and tracing respondents; following-up with non-respondents; editing and coding. Although this type of work does not require advanced education, interviewers' strategic role in data production requires minimum abilities and personal qualities. Criteria to consider when hiring interviewers include interpersonal skills, fluency in local languages, organizational skills, computer literacy, trustworthiness, subject-matter knowledge, and knowledge of the area where the survey is being conducted. Interviewers' training should focus on (1) the development of soft skills, such as empathy, active listening, refusal conversion; (2) the understanding of survey concepts, content and particularities; and (3) the collection process, including the use of listing, collection and coding tools. Cognitive studies show that classroom training combined with home study, mock interviews, and early feedback on performance tend to produce better results. Regular refreshers and interviewers' manuals can also facilitate the retention of the information provided during training.
Self-administrated surveys do not require the same hierarchy but may nevertheless require listing, coding, editing, and non-response follow-up, which can be centrally managed or not. Proper training, supporting documentation, and monitoring are therefore necessary.
Recent improvements in conducting survey collection were implemented further to the analysis of collection process data. Paradata research has been focused on improving the understanding of data collection processes and response patterns leading to the identification of strategic improvement opportunities. For example, at Statistics Canada, research findings on paradata produced from CATI surveys have indicated that the same data collection approach does not work effectively throughout an entire data collection cycle. There is therefore a need to develop a more flexible, responsive and efficient data collection strategy.
One strategy is to offer a multi-modal collection option to all respondents. Under this approach, a survey collection could start in one mode and be pursued using any other combination of modes, with a view to achieving the best possible quality in the most efficient way. For example, the approach used in the Canadian census is one that starts collection with the least expensive mode (electronic questionnaire) and ends with the more expensive mode that typically yields higher response rates (personal interview). To enable this, the statistical organization needs to be able to securely and easily move cases between its electronic questionnaire platforms, its multiple call centres, and its workforce of personal interviewers. Calls must be made at times when they are most likely to generate a response. As well, information about individual cases must be available in real time, so that the organization can detect issues related to interviewer or questionnaire performance. All of this work must be accomplished with the smallest possible number of systems and processes to achieve economies of scale.
Statistics Canada has started to implement this vision in 2011, through the Integrated Collection and Operation Systems (ICOS) project. This initiative's objective is to develop an integrated collection systems environment to achieve the targeted level of flexibility between modes and sites, fully exploiting use of the Internet for electronic questionnaires. In 2014, through Phase 1 of the ICOS Business Collection Portal (ICOS-BCP), data collection started to make use of a single electronic questionnaire for both respondents and interviewers to collect data for the annual surveys of the Integrated Business Survey Program. Over the next few years, the migration of business surveys to the ICOS-BCP will continue as they align with the adoption of common data processing tools and the implementation of the use of a single electronic questionnaire to support both self-response and interviewer-administered response.
The integrated collection system environment also relies on the use of the ICOS Collection Management Portal (ICOS-CMP) to meet the needs of the census programs, social surveys, and some business surveys requiring a CAPI component. The 2016 Census of Population and Census of Agriculture will be conducted through this portal, which will also support the recruitment of Statistics Act census employees. In parallel, work will continue on addressing the business requirements of other survey programs relying on the ICOS-CMP for their data collection. In particular during this period, a prototype of an off-line solution will be implemented to allow data collection for the consumer price index program and personal interviewing for social surveys. This work will position the organization for the introduction of live data collection for surveys by means of the ICOS-CMP, beginning in 2017/2018.
4. Collection monitoring
Collection monitoring should be considered as a strategic tool to improve data quality and efficiency in collection operations. The adoption of computer-based collection technologies helps ensure (1) that the questionnaire is being used properly, (2) that the interviewing techniques are effective and consistent across interviewers, and (3) that respondents' answers adequately reflect what is intended to be measured. For example, CATI technologies often allow supervisors to listen in on live interviews. Some CAPI systems can now record parts of interviews for quality assurance purposes. Electronic questionnaires can also be programmed to prompt additional verification questions through built-in edits. Finally, personal digital assistants (PDAs) or tablets equipped with a global positioning system have also been used to verify that interviewers actually collected their information from the expected location. This technology is not yet available at Statistics Canada; however, the agency is actually exploring its possible use and feasibility for the Consumer Price Index.
More recently, the relationship between the quality, cost, productivity, and responding potential of outstanding cases over the course of collection has been investigated. Additional tools have been developed to better assess and monitor progress, quality and performance during collection and to allow the development and implementation of a Responsive Collection Design (RCD) strategy for CATI surveys. RCD strategy is an approach to survey data collection that uses both information available before and paradata accumulated during collection (e.g., the sequence of calls) to identify when changes to collection approaches are needed in response to the progress of collection. The main idea is to constantly assess the data collection process on the basis of the most recent information available with a view to making the most efficient use of available resources remaining (adaptive collection).
In practice, the RCD approach monitors and analyzes collection progress against a pre-determined set of indicators for two purposes: to identify critical data collection milestones that require significant changes to the collection approach and to adjust collection strategies to make the most efficient use of remaining available resources. This type of monitoring is central to the Integrated Business Survey Program's approach adopted by Statistics Canada (see Chapter 3.3: Enhancing how surveys are conducted).
Key success factors
The planning and management of collection operations are crucial to a survey's success. Without clear and effective collection tools, an established and consistent governance structure, and an engaging respondent relations program, there are no consistent and efficient ways to understand what is to be achieved and how it is to be achieved.
In improving the planning and management of collection, Statistics Canada aims to enhance and consolidate the following key elements.
First, the collection planning and management governance structure supported by the close relationship and collaboration between CPRD (as planning and coordinating authority) and regional offices (responsible for survey collection operations). Because they were part of the same branch in Statistics Canada, the collection exercise was more defined, better structured, and more efficient.
Secondly, focusing efforts and synergy on building and enhancing an integrated common collection tool is an effective way to ensure consistency and generate savings while improving client relations.
Finally, continually investing in research and development and communications strategies that will improve efficiencies and respondent relations is key to improving timeliness, response rates and, consequently, the quality of surveys. For details about respondent relations, refer to Chapter 4.5.
While the modernization of collection tools represents an opportunity to improve collection practices, quality and efficiency, it still represents a cost for any NSO interested in adopting them. The acquisition of the technology, the development of skills to use this technology, and the impact and dependency on the rest of the statistical production process must be factored in when one is building an investment business case. Adopting a one-type-fits-all collection tool, such as ICOS, is one solution to optimize the cost-efficiency of a new tool. Another innovative solution to reduce associated costs and risks is to adopt a coordinated approach to the acquisition and implementation of a new technology. In this regard, the experience of Cape Verde in adopting PDAs in its 2010 General Population and Housing Census and the south-south co-operation between Brazil, Cape Verde, Côte d'Ivoire and Senegal described in box 3.4.2 is certainly worth mentioning.
The general way forward of a statistical organization is to continue seeking and seizing the opportunities offered by new technologies to improve the efficiency of collection operations as well as the timeliness and quality of the data collected.
For Statistics Canada, this will be achieved by the integration of all surveys into one multi-mode collection platform and ongoing researches on emerging collection modes, such as electronic questionnaires and mobile applications. Where possible administrative data will also be used in order to reduce respondent burden and the costs associated with collection activities.
The method of data collection should be chosen for the purpose of achieving a high participation rate and collecting data that are as complete and accurate as possible while minimizing the burden to the respondent, taking account of privacy considerations, and satisfying the client's budget and operational constraints.
There are two types of collection methods for NSOs to use:
- Self-response, which means that respondents complete only the questionnaire by themselves and can return it electronically, by mail, fax, or pick-up;
- Interview-assisted, where interviewers assist respondents in completing the questionnaire either in person or by telephone.
In terms of collection modes and the way data are collected, several options exist:
- Internet collection through an electronic questionnaire, either self-completed or interviewer-assisted;
- CAPI, CATI, or computer-assisted self-interview (CASI); and
- Paper collection, which is known PAPI.
Here are the most common collection methods used at Statistics Canada.
|Characteristics of the collection methods and mode||Advantages||Disadvantages|
|1. Self-response requires well-structured, easy-to-follow questionnaire with clear instructions for respondents||
|2. Interview-assisted are particularly useful for survey concepts or questionnaires that are complex, or in any instance where self-enumeration could be difficult||
|2.1 Personal interviews are conducted face-to-face with respondents, usually at their residence or place of work||
|2.2 Telephone interviews are conducted over the phone from centralized call centres||
Other forms of data collection include (1) direct observations (Canadian Health Measures Survey or consumer price index), (2) Electronic Data Reporting, used for business surveys, (3) administrative data, (4) supplementary surveys, where questions are added to existing surveys, and (5) omnibus surveys, where several surveys are combined together into one questionnaire.
The graphic below shows the percentage of the use of each collection mode for all surveys (household and business) in 2015 at Statistics Canada.
As shown in the graphic, CATI is by far the most used collection mode (54% of the total), followed by multi-mode (15%), CAPI (13%), handheld (7%), and electronic questionnaires (4%). Please note that, although the percentage for electronic questionnaires may seem low, 46% of business surveys have converted to electronic questionnaires with some form of follow-up, by mail, fax or CATI.
The impact of the introduction of mobile devices in the data collection process
The National Institute of Statistics (INE) of Cape Verde used mobile devices in the collection of data for the General Population and Housing Census in 2010. This fact was a turning point for the institution, and also for the African continent as a whole, since it was the first digital census for both the INE and the continent. The investment in innovation resulted from the need to show that it is possible to conduct a census in a different way on our continent, innovating in all its phases, bringing higher quality to data collection and greater possibilities for analysis, thanks to the combination of alphanumeric and geographic components. Moreover, it was imperative to reduce existing costs and to shorten the time for finalizing the collection and publication of results.
The introduction of mobile devices, in general, requires the restructuring of all collection operation phases, specifically: cartography, survey design, awareness, pilot census, collection, processing, and dissemination. Therefore, members of INE must be prepared and open to change.
During the cartography phase, numbering all locations and delimiting census districts according to the latitude–longitude coordinate system were necessary. In order to increase efficiency, the majority of houses were georeferenced at headquarters, with the validation, description and collection of new attributes taking place in the field.
During the survey design phase, consistency edits and controls were considered since they would be implemented in the installed application in the mobile device, so that this part of the data verification could be done at the time of collection. However, the possibility to enter more information in the mobile devices does not mean, necessarily, an improvement in quality. For example, it would be possible to enter occupation and economic activity directly into the system as the enumerator would have only to choose the appropriate code or description. However, the set of questions for future coding remained a more precise methodology since those classifications were not in the application. In the following operations, questions and classifications were used, and both worked as a better consistency control.
During the awareness or public-relations phase, introducing the collection devices was important considering that smartphones were not common in 2010. An explanation of their purpose and importance to future respondents was necessary.
The pilot census, on the other hand, required constant presence of the IT unit because this operational phase constituted an important testing period for the application, regarding both its specific functionalities and the methodology functionalities (consistency controls and auto-skip). The collection also required IT presence to guarantee the normal operation of the application, such as to ensure collected information was saved, considering that the continuous transmission of data allowed also a stricter control of the operation progress through constant follow-up of the collection both by headquarters and the population of Cape Verde. This was possible via a website built for this purpose.
The data processing was easier since part of its verification was done in the field. This reduced the time spent on this phase. It is worth mentioning that the data entry phase was eliminated completely (except for 102 interviews with homeless people done on paper).
Lastly, during the dissemination phase, a considerable amount of time was gained in publishing the data, considering that the collection of georeferenced data allowed geostatistical analysis (for example, a study on a location with a cluster of children and existing kindergarten facilities, which allowed local officers to identify the needs for building / establishing education facilities according to the number of children). It is appropriate to highlight that, in a digital census, it is important to invest extensively in training field representatives and to test the application and methodology.
In order to successfully complete all phases of Census 2010, INE received support from the Brazilian Institute of Geography and Statistics (IBGE) in the form of technical expertise and supply of equipment used for data collection, as provided in the collaboration agreement signed by INE and IBGE.
The continuous support from IBGE was essential for the success of the operation. However, the greatest benefit came afterwards: the collaboration model allowed knowledge transfer for the purpose of enabling the technical independence of the institution in the programming domain and use of mobile devices. At the end of 2011, the first application was in the field for a scheduled survey and involved only in-house know-how. As a result, there has been constant improvement. In addition, INE has worked with partners to share acquired knowledge and explore new areas.
In this context, INE has invested in experience sharing, especially in terms of the south-south co-operation. In 2013, INE supported the National Statistics and Demographic Agency of Senegal to perform its General Census of Population, Agriculture and Cattle. The technical collaboration was provided essentially by INE, and the supply of mobile devices was made by IBGE. These devices were further used by INE in Côte d'Ivoire. This shows once again the need for the African continent to have this type of equipment, common in other countries, in the 2020 Census round, given its proven benefits.
There is a need for a great number of mobile devices during census rounds; however, after these rounds, the use of these devices is uncertain. Although a portion of them are retained by the institution for survey purposes, because surveys are carried out on a smaller scale in Cape Verde, the result is a surplus of mobile devices which could be managed by the excellence centre to benefit the continent. Furthermore, the centre would develop technical expertise to be available to countries that needed it. This is a statement that INE of Cape Verde has been supporting since the Fifth Africa Symposium on Statistical Development, held in Senegal in 2009.
In the last few years, INE received study visits from the institutes of countries such as Tunisia, Ethiopia, Togo, Burkina Faso, Madagascar, Mauritania, and Niger. The establishment of a formal structure focused on supporting counterparts would allow for a more effective and closer follow-up.
Note that the introduction of new technologies is a challenge in any country, and African countries are no exception. Cost reduction as well as advocacy and constant training are necessary—this applies to all countries. Investment in the statistical needs of NSOs is one that yields high returns.
Laflamme, François and Milana Karaganis (2010). Development and Implementation of responsive collection design for CATI surveys at Statistics Canada, European Quality Conference, Helsinki, Finland.
Statistics Canada (2014). Responsive Collection Design – Users Guide (version 1.4). Document prepared by the Collection Planning and Research Division. Internal document, accessible on demand.
Statistics Canada (2015). Survey planning and management governance – User Guide (version 4.0). Document prepared by the Collection and Regional Services Branch. Internal document, accessible on demand.