Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.
Self-weighting designs
Adjusting the weights
Other estimation methods
Estimating the sampling error
Examples of estimation using an simple random sampling design
Estimation of the population mean
Estimation of the population total
As we now know, the goal of conducting surveys is to obtain information about a particular population. When the sample has been selected and the information collected (see the Data collection chapter) and processed (see the Data processing chapter), there still remains the task of linking the information gathered from the sample back to the overall population.
Estimation is the process of determining a likely value for a variable in the survey population, based on information collected from the sample. Researchers are usually interested in looking at estimates of many statistics—totals, averages and proportions being the most frequent—for different variables. For example, a sample survey could be used to produce any of the following statistics: estimates for the proportion of smokers among all people aged 15 to 24 in the population; the average earnings of men and women with a university degree; or the total number of cars possessed by the whole survey population.
Underpinning the estimation process is the sampling weight of a unit, which indicates the number of units in the population (including the sampling weight) that are represented by this sampled unit. The sampling weight is the inverse of the unit's probability of selection.
Household number | Number of persons | Number of cars | Probability of selection | Sampling weight |
---|---|---|---|---|
1 | 1 | 0 | 1/4 | 4 |
2 | 4 | 2 | 1/4 | 4 |
3 | 2 | 1 | 1/4 | 4 |
4 | 2 | 1 | 1/4 | 4 |
5 | 3 | 2 | 1/4 | 4 |
The selection probability of 1 in 4 comes from the fact that systematic sampling gives an equal chance of being selected to each household on your street. The sampling weight of 4 is just the inverse of that probability. When estimating, you have to look at the characteristics of each sampled household. In this case, it is decided that 4 households from the population of 20 on your street have the same characteristics.
In order to estimate the total number of persons living on your street, you have to multiply the number of persons in a household by the number of households in that sampling weight, then add up all the final numbers. For example, there are 4 one-person households (represented by Household number 1), 4 four-person households, 8 two-person households (four households represented by Household number 3 and four households represented by Household number 4) and 4 three-person households. The estimation of the total number of persons would then be:
Estimated number of persons living on your street
= (4 x 1) + (4 x 4) + (8 x 2) + (4 x 3)
= 48 people
To estimate the average number of cars per household, you proceed in the same manner. Get an estimate of the total number of cars owned by households on your street and then, divide the estimate by the actual number of households on the street. For example, there are 4 households without a car (represented by Household number 1), 8 households with two cars (represented by Household number 2 and Household number 5), 8 households with one car each (represented by Household number 3 and Household number 4).
Estimated number of cars
= (4 x 0) + (8 x 2) + (8 x 1)
= 24 cars
Estimated average
= 24 ÷ 20
= 1.2 cars per household
It is not always the case that all sampled units had the same sampling weight. Some designs give unequal probability of selection to units, resulting in units within the same sample having different sampling weights. Answers from one household or business could represent the answers for 200 units of the population, while the answers from another could represent only 50 units in the population.
When every unit in the sample has the same sampling weight, the sampling design is said to be self-weighted. This kind of design is time-saving and operationally convenient, particularly for large samples. Because every unit has the same weight, those weights can be ignored when estimating averages and proportions. The average for the sample gives an appropriate estimate of the average for the whole population.
Simple random sampling and systematic sampling are examples of self-weighted designs. In that sense, calculations could have been made easier in Example 2. For instance, to estimate the average number of cars per household in the population, we could have used the same average as the one used in the sample. The 5 sampled households own a total of 6 cars, an average of 1.2 cars per household. This is the same result as that obtained using the sampling weight procedure.
Sometimes, the sampling weights are adjusted prior to estimation. There are basically two reasons for weight adjustment:
Adjusting for external information: Sometimes, we know the actual total for one or more variables measured in the sample. In Example 3 of the Probability sampling section, a population of the 1,000 best horror movies was equally divided into 500 classic movies and 500 modern movies. Even though you knew this prior to sampling, you decided to select a simple random sample of 100 movies and ended up with 77 classic movies and 23 modern movies. Each of these movies has a weight of 10 (because you selected 1 movie out of every 10 titles). Using the answers from the survey and the sampling weight, your sample would represent a population of 770 classic movies and 230 modern movies. This could lead to inaccurate estimates. One solution would be to decrease the weight of every sampled classic and increase the weight of every sampled modern movie so that your sample gives an estimate of 500 classics and 500 modern films in the population. This should reduce the distortion caused by a 'bad' sample.
Of course, stratifying by release date prior to sampling would have solved this problem. However, in a lot of cases, we have totals at the population level, but we don't know the attribute of each unit on the sampling frame. For example, from the Census of Population, we know how many men and women there are in a specific city, but all we have for sampling is a list of households. Thus, stratifying our population by sex would not be possible. Demographic projections by age and sex for each province are often used in social surveys to adjust sampling weights.
The weights adjusted for non-response and/or external counts are used for estimation, in the same way as the sampling weight was used in Example 1.
Using the weights to inflate the sample results is not the only estimation method that exists, but it is the simplest one and the only one that we will cover. Nevertheless, it is important to know that there exist some other methods that could lead to more precise estimates (e.g., using auxiliary information). The estimation process has to take into account the sampling design that was used. Otherwise, the resulting estimates could be severely biased.
As mentioned before, any estimates derived from samples are subject to what is called the sampling error. This comes from the fact that only a part of the population was observed, instead of the whole. A different sample could have come up with different results. The amount of variation that exists among the estimates from the different possible samples is what makes the sampling error. (There are roughly 14 million different combinations of 6 numbers from 1 to 49, so imagine how many ways there are to select a sample of 25,000 Canadian households!) Of course, this sampling error is unknown, since we would need to know the answer for each unit of the population in order to calculate it. Nevertheless, it can be estimated by using the survey data. The extent of the sampling error depends on many things, including the sampling method, the estimation method, the sample size and the variability of the estimated characteristic. This is why each sample estimate has its own sampling error. This error should thus be approximated for each estimated total, average, proportion, etc. produced by the survey.
Simple random sampling is the simplest of all sampling methods. Estimation using the simple random sampling method has been studied extensively. There are simple formulas to estimate the sampling error for many statistics when simple random sampling is used, especially since it is a self-weighting design. We present here the most common estimator for a population average (mean) and total, under simple random sampling.
In a simple random sample, the estimate of the population mean is identical to the mean of the sample:
where
x = an observed value
= estimate of the population mean
x = sum of all observed x values in the sample
n = number of observations in the sample.
Note: Lowercase x and n should be used if you are referring to a sample survey and upper case X and N should be used when referring to a population.
If the sample results have been summarized in a frequency table, then the estimate for the population mean is the same as the sample. Thus,
where
x = an observed value
f = the frequency of the value (the number of times that this value have been observed in the sample)
= estimate of the population mean
xf = sum of all observed xf values (the product of the observed values times its frequency) in the sample
f = sum of the frequencies in the sample.
Example 2: A farmer randomly selects 10 eggs from a gross of 12 dozen eggs (144 eggs) he finds in his hen house. He carefully weighs each egg.
The following weights were recorded in grams:
0.75, 0.70, 0.55, 0.50, 0.60, 0.65, 0.75, 0.65, 0.75, 0.50
What is the mean weight of the gross of eggs?
Using the above formula, we can determine the mean weight of all of the eggs:
For a simple random sample, the estimation formula of a total for the population is
where
x = an observed value
= estimated population total
x = sum of all observed x values in the sample
n = number of observations in the sample
N = total number of observations in the population.
It is just the estimate for the mean value multiplied by the number of units in the population. In the previous example, the mean weight of an egg is 0.64 grams, so it is logical to think that the total weight of the 144 eggs would be 92.16 grams (144 x 0.64 = 92.16 grams).
If sample results have been summarized in a frequency table, then the estimate formula for total population is
where
x = an observed value
= estimated population total
xf = sum of all observed xf values in the sample
f = sum of frequencies in the sample
N = total number of observations in the population.