3.4 Processing
3.4.3 Editing
Text begins
In an ideal world, data would be collected without any errors. Unfortunately, responses, either from surveys or from administrative files, may be missing, incomplete or incorrect. Data editing is the application of checks to detect missing, invalid or inconsistent entries or to point to data records that are potentially in error. No matter what type of data you are working with, certain edits are performed at different stages or phases of data collection and processing. Data editing is described and illustrated here by focusing on surveys, but it is also widely applied to other data sources, such as administrative data, to ensure the data quality.
Data editing begins by asking the question, “What could be the causes for errors in our files?” There are several situations where errors can be introduced into the data, and the following list gives some of them:
- A respondent could have misunderstood a question.
- A respondent or an interviewer could have checked the wrong response.
- A coder could have miscoded or misunderstood a written response.
- An interviewer could have forgotten to ask a question or to record the answer.
- A respondent could have provided inaccurate responses.
- Some questions have been left blank.
Always keep in mind the objectives of data editing:
- to ensure the accuracy of data;
- to establish the consistency of data;
- to determine whether the data are complete;
- to ensure the coherence of aggregated data;
- to obtain the best possible data available.
Applying editing rules
So, how do we edit? The first step is to apply rules, or factors to be taken into consideration, to the data. These rules are determined by the expert knowledge of a subject-matter specialist, the structure of the questionnaire, the history of the data, and any other related surveys or data set.
Expert knowledge can come from a variety of sources. The specialist could be an analyst who has extensive experience with the type of data being edited. An expert could also be one of the survey sponsors who are familiar with the relationships between the data.
The layout and structure of the questionnaire will also impact the rules for editing data. For example, sometimes respondents are instructed to skip certain questions if the questions do not apply to them or their situation. This specification must be respected and incorporated into the editing rules.
Lastly, other data sources relating to the same sort of variables or characteristics are used in order to establish some of the rules for editing data. For example, business surveys usually collect financial data of businesses. The same information could be available from the tax returns of the company. Thus, the tax data can be used to develop editing rules for validating survey data.
Data editing types
There are several types of commonly used data editing, which include:
- Validity edits look at one question field or cell at a time. They check to ensure the record identifiers, invalid characters, and values have been accounted for; essential fields have been completed (e.g. no quantity field is left blank where a number is required); specified units of measure have been properly used; and the reported data lie within an allowed range of value (e.g. the reporting time is within the specified limits). In computer-assisted data collection, such as web surveys, real-time data editing is typically built into the data collection system so that the validity of the data is evaluated as the data are collected.
- Duplication edits examine one full record at a time. These types of edits check for duplicated records, making certain that a respondent or a survey unit has only been recorded once. A duplication edit also checks to ensure that the respondent does not appear in the survey universe more than once, especially if there has been a name change. Finally, it ensures that the data have been entered in the system only once.
- Consistency edits compare different answers from the same record to ensure that they are coherent with one another. For example, if a person is declared to be in the 0 to 14 age group, but also claims that he or she is retired, there is a consistency problem between the two answers. Inter-field edits are another form of a consistency edit. These edits verify that if a figure is reported in one section, a corresponding figure is reported in another.
- Historical edits are used to compare survey answers in current and previous surveys. For example, any dramatic changes since the last survey will be flagged. The ratios and calculations are also compared, and any percentage variance that falls outside the established limits will be noted and questioned.
- Statistical edits look at the entire set of data. This type of edit is performed only after all other edits have been applied and the data have been corrected. The data are compiled and all extreme values, suspicious data and outliers are rejected.
- Miscellaneous edits fall in the range of special-reporting arrangements; dynamic edits particular to the survey; correct classification checks; changes to physical addresses, locations or contacts; and legibility edits (i.e. making sure the figures or symbols are recognizable and easy to read).
Data editing is influenced by the complexity of the questionnaire. Complexity refers to the length, as well as the number of questions asked. It also includes the detail of questions and the range of subject matter that the questionnaire may cover. In some cases, the terminology of a question can be very technical. For these types of surveys, special reporting arrangements and industry-specific edits may occur.
Data editing levels
Data editing can be performed manually, with the assistance of computer programming, or a combination of both techniques. Depending on the medium (electronic, paper) by which the data are submitted, there are two levels of data editing—micro- and macro-editing.
- Micro-editing corrects the data at the record level. This process detects errors in data through checks of the individual data records. The intent at this point is to determine the consistency of the data and correct the individual data records.
- Macro-editing also detects errors in data, but does this through the analysis of aggregate data (totals). The data are compared with data from other surveys, administrative files, or earlier versions of the same data. This process determines the comparability of data.
- Date modified: