3.5 Estimation
3.5.2 Sampling error
Text begins
An important part of estimation is estimating the magnitude of the sampling error in the estimate. This provides a measure of the precision of the survey’s estimates for the specific sample design. Sampling error can only be estimated if probability sampling is used.
The sampling error is the error caused by observing a sample instead of the whole population. It arises from estimating a population characteristic by looking at only one portion of the population rather than the entire population, and refers to the difference between the estimate derived from a sample survey and the true value that would result if a census of the whole population were taken under the same conditions. There is no sampling error in a census because the calculations are based on the entire population.
Estimating the sampling error
As mentioned before, any estimates derived from samples are subject to sampling error because only a part of the population was observed. A different sample could have come up with different estimates. Sampling error causes variability among estimates derived from different samples when keeping the same sample size and design, and the same estimation method used. It’s measured commonly by the sampling variance, which depends on many things, including the sampling method, the estimation method, the sample size and the variability of the estimated characteristic.
Sampling variance
In simple sample designs, such as Simple Random Sampling, the sampling variance can be calculated directly using a formula. However, formula usually doesn’t exist for more complex designs. In this case, an estimate of the sampling variance can be calculated using methods such as Taylor linearization or resampling methods such as the jackknife and the bootstrap.
Regardless of which method is used for variance estimation, it has to incorporate sample design properties such as stratification, clustering and multistage or multi-phase selection, if applicable.
Other factors affecting the magnitude of the sampling variance include the following:
- In general, sampling variance decreases as sample size increases but the change is not proportional.
- Population size has an impact on the sample variance for small to moderate sized population. For large populations, its impact is minor.
- Variability of the characteristic of interest in the population also affects the size of the sampling error. The greater the difference between the population units, the larger the sample size required to achieve a specific level of precision.
- A sampling plan, which includes a sample design and an estimation procedure, also affect the magnitude of sampling error. The method of sampling, called “sample design,” can greatly affect the size of the sampling error. Surveys involving a complex sample design could lead to larger sampling error than a simpler one. The estimation procedure also has a major impact on the sampling error. These concepts are examined in greater detail in the section on sampling.
Other measures of sampling error
Except using sampling variance to measure sampling error, other frequently used methods also exist, including standard error, coefficient of variation, margin of error and confidence interval.
The standard error is the square root of the sampling variance. This measure is easier to interpret since it provides an indication of sampling error using the same scale as the estimate whereas the variance is based on squared differences.
The coefficient of variation (CV) assesses the size of the standard error relative to the estimate of the characteristic being measured. It is the ratio of the standard error of the estimate to the average value of the estimate itself. CV is very useful in comparing the precision of sample estimates, where their sizes or scale differ from one another. Even though CV is widely used in Statistics Canada’s official releases, it’s not recommended for measuring the precision for proportions, especially when the estimated proportions are close to 0 or 1. In this case, confidence interval is more appropriate to use.
The confidence interval (CI) gives a range of values around the estimate that is likely to include the unknown population value with a given probability. This probability is the confidence level of the IC. For a given estimate in a given sample, using a higher confidence level generates a wider CI, meaning a less precise CI. The most commonly used confidence level is 95%, but confidence levels of 99% or 90% are also used in certain circumstances.
The margin of error is half the width of the CI. The larger the margin of error, the less confidence one should have that a result would reflect the result of a survey of the entire population. It is often used to report sampling error by pollsters or journalists.
Example 1
It is common to see the results of a survey reported in a newspaper as follows:
According to a recent survey, 15% of Ottawa residents attend religious services every week. The results, based on a sample of 1,345 residents, are considered accurate within plus or minus 3 percentage points 19 times out of 20.
In this example, the expression “19 times out of 20” means that if the survey was repeated many times, then the confidence interval would cover the true population value 19 times out of 20. This is equivalent to a 95% confidence level. The expression “plus or minus 3 percentage points” means that the margin of error is 3%. Therefore, the value of the estimation is 15% and the corresponding 95% CI is 12% to 18%.
- Date modified: