3.5 Estimation
3.5.3 Non-sampling error

Text begins

Non-sampling error refers to all sources of error that are unrelated to sampling. Non-sampling errors are present in all types of survey, including censuses and administrative data. They arise for a number of reasons: the frame may be incomplete, some respondents may not accurately report data, data may be missing for some respondents, etc.

Non-sampling errors can be classified into two groups: random errors and systematic errors.

  • Random errors are errors whose effects approximately cancel out if a large enough sample is used, leading to increased variability.
  • Systematic errors are errors that tend to go in the same direction, and thus accumulate over the entire sample leading to a bias in the final results. Unlike random errors, this bias is not reduced by increasing the sample size. Systematic errors are the principal cause of concern in terms of a survey’s data quality. Unfortunately, non-sampling errors are often extremely difficult, if not impossible, to measure.

Types of non-sampling error

Non-sampling error can occur in all aspects of the survey process, and can be classified into the following categories: coverage error, measurement error, nonresponse error and processing error.

Coverage error

Coverage error consists of omissions (undercoverage), erroneous inclusions, duplications and misclassifications (overcoverage) of units in the survey frame. Since it affects every estimate produced by the survey, they are one of the most important types of error. In the case of a census, it may be the main source of error. Coverage error can have both spatial and temporal dimensions, and may cause bias in the estimates. The effect can vary for different subgroups of the population. This error tends to be systematic and is usually due to under coverage, which is why it’s important to reduce it as much as possible.

Measurement error

Measurement error, also called response error, is the difference between measured values and true values. It consists of bias and variance, and it results when data are incorrectly requested, provided, received or recorded. These errors may occur because of inefficiencies with the questionnaire, the interviewer, the respondent or the survey process.

  • Poor questionnaire design
    It is essential that sample survey or census questions are worded carefully in order to avoid introducing bias. If questions are misleading or confusing, then the responses may end up being distorted.
  • Interviewer bias
    An interviewer can influence how a respondent answers the survey questions. This may occur when the interviewer is too friendly or aloof or prompts the respondent. To prevent this, interviewers must be trained to remain neutral throughout the interview. They must also pay close attention to the way they ask each question. If an interviewer changes the way a question is worded, it may impact the respondent’s answer.
  • Respondent error
    Respondents can also provide incorrect answers. Faulty recollections, tendencies to exaggerate or underplay events, and inclinations to give answers that appear more socially acceptable are several reasons why a respondent may provide a false answer.
  • Problems with the survey process
    Errors can also occur because of a problem with the actual survey process. Using proxy responses, meaning taking answers from someone other than the respondent, or lacking control over the survey procedures are just a few ways of increasing the risk of response errors.

Non-response error

Estimates obtained after nonresponse has been observed and imputation has been used to deal with this nonresponse are usually not equivalent to the estimates that would have been obtained had all the desired values been observed without error. The difference between these two types of estimates is called the nonresponse error. There are two types of non-response errors: total and partial.

  • Total nonresponse error occurs when all or almost all data for a sampling unit are missing. This can happen if the respondent is unavailable or temporarily absent, the respondent is unable to participate or refuses to participate in the survey, or if the dwelling is vacant. If a significant number of sampled units do not respond to a survey, then the results may be biased since the characteristics of the non-respondents may differ from those who have participated.
  • Partial nonresponse error occurs when respondents provide incomplete information. For certain people, some questions may be difficult to understand, they may refuse or forget to answer a question. Poorly designed questionnaire or poor interviewing techniques can also be reasons which result partial nonresponse error. To reduce this form of error, care should be taken in designing and testing questionnaires. Adequate interviewer training and appropriate edit and imputation strategies will also help minimize this error.

Processing error

Processing error occurs during data processing. It includes all data processing activities after collection and prior to estimation, such as errors in data capture, coding, editing and tabulation of the data as well as in the assignment of survey weights.

  • Coding errors occur when different coders code the same answer differently, which can be caused by poor training, incomplete instructions, variance in coder performance (i.e. tiredness, illness), data entry errors, or machine malfunction (some processing errors are caused by errors in the computer programs).
  • Data capture errors result when data are not entered into the computer exactly as they appear on the questionnaire. This can be caused by the complexity of alphanumeric data and by the lack of clarity in the answer provided. The physical layout of the questionnaire itself or the coding documents can cause data capture errors. The method of data capture, manual or automated (for example, using an optical scanner), can also result in errors.
  • Editing and imputation errors can be caused by the poor quality of the original data or by its complex structure. When the editing and imputation processes are automated, errors can also be the result of faulty programs that were insufficiently tested. The choice of an inappropriate imputation method can introduce bias. Errors can also result from incorrectly changing data that were found to be in error, or by erroneously changing correct data.

Date modified: