3.5 Estimation 3.5.1 Weighting

Text begins

The principle behind estimation in a probability survey is that each sample unit represents not only itself, but also several units of the survey population. The design weight of a unit usually refers to the average number of units in the population that each sampled unit represents. This weight is determined by the sampling method and is an important part of the estimation process.

While the design weights can be used for estimation, most surveys produce a set of estimation weights by adjusting the design weights to improve the precision of the final estimates. The two most common reasons for making adjustments are to account for nonresponse and to make use of pertinent data available from other sources. Once the final estimation weights have been calculated, they are applied to the sample data in order to compute estimates

Design weight

The first step in estimation is assigning a weight to each sampled unit. The design weight ( ${w}_{d}$ ), which is the average number of units in the population that each sampled unit represents, is the inverse of its inclusion probability (π) in the sample.

$w d = 1 / π MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfKttLearuatH9givLearmWu51MyVXgatC vAUfeBSjuyZL2yd9gzLbvyNv2CaeHbd9wDYLwzYbItLDharyavP1wz ZbItLDhis9wBH5garqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbb L8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpe pae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaam aaeaqbaaGcbaaeaaaaaaaaa8qacqWG3bWDpaWaaSbaaSqaa8qacqWG Kbaza8aabeaak8qacqGH9aqpcqaIXaqmcqGGVaWlcqaHapaCaaa@424E@$

If the inclusion probability is 1/50, then each selected unit represents on average 50 units in the population and the design weight is ${w}_{d}=50$ .

Some sample designs assign the same design weights for all units in the sample, while others give different design weights to sampled units for various reasons, such as improving precision or reducing cost.

Example 1: Simple Random Sample

Suppose there are N =100 Grade 12 (or secondary 5) students in a high school. A simple random sample of size n =25 students is selected, and the selected students are invited to complete a questionnaire about their career plan.

• The inclusion probability is:
.
• The design weight is:
.

Each student selected in the simple represents four students of the school.

Production of simple estimates

Estimates can be produced after weights are calculated while only simple estimates, such as totals, averages and proportions, are covered here.

Estimating a population total

The estimate of the total number ( $\stackrel{^}{Y}$ ) of units in the population is calculated by multiplying the weight and the value of interest for each selected unit then summed over all in sample units. For categorical variables, the estimate is actually calculated by adding together the weights of the responding units.

Example 2: Simple Random Sample (Continued)

Suppose that within the 25 students selected in the sample, there are about 10 applied to science programs. Then, the total number of students applied to science programs is:

Estimating a population average

The estimate of the average ( $\stackrel{^}{\overline{Y}}$ ) in the population is the estimate of the total value for the variable in interest ( $\stackrel{^}{Y}$ ) divided by the estimate of the total number of units ( $\stackrel{^}{N}$ ) in the population.

Example 3: Simple Random Sample (Continued)

Usually, students apply to more than one program when applying for university study. Suppose that within the 25 students selected in the sample, 5 of them apply to only 1 program, 10 of them apply to 2 programs and 10 of them apply to 3 programs. Then, the average number of applications per student is calculated as following:

• Total number of applications is given by:
$\stackrel{^}{Y}=\left(4×5×1\right)+\left(4×10×2\right)+\left(4×10×3\right)=220$
• Total number of students is given by:
• Average number of applications per student is given by:
$\stackrel{^}{\overline{Y}}=\frac{\stackrel{^}{Y}}{\stackrel{^}{N}}=\frac{220}{100}=2.2$

Estimating a population proportion

The estimate of the proportion in the survey population having a given characteristic is quite similar as estimating a population average in terms of the mathematical formula. It is also calculated as a quotient between two estimated totals. The main difference is the numerator, which indicates the estimate of the total number of units possessing the given characteristic ( $c$ ) when estimating a proportion ( $\stackrel{^}{P}$ ). However, the numerator is the estimate of the total value for quantitative data when estimating an average.

$P ^ = N C ^ N ^ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfKttLearuatH9givLearmWu51MyVXgatC vAUfeBSjuyZL2yd9gzLbvyNv2CaeHbd9wDYLwzYbItLDharyavP1wz ZbItLDhis9wBH5garqqtubsr4rNCHbGeaGqiVu0Je9sqqrpepC0xbb L8F4rqqrFfpeea0xe9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpe pae9pg0FirpepeKkFr0xfr=xfr=xb9adbaqaaeGaciGaaiaabeqaam aaeaqbaaGcbaaeaaaaaaaaa8qacuWGqbaupaGbaKaapeGaeyypa0Za aSaaa8aabaWaaecaaeaapeGaemOta40damaaBaaaleaapeGaem4qam eapaqabaaakiaawkWaaaqaa8qacuWGobGtpaGbaKaaaaaaaa@41B4@$

Example 4: Simple Random Sample (Continued)

Suppose within the 25 students selected in the sample, there are 10 females and 15 males. Overall, 10 students apply for science programs with 5 females and 5 males. The proportion of students apply for science programs by gender is calculated as following:

1. Total number of students applied science programs by gender is given by:

2. Total number of students by gender is given by:
${\stackrel{^}{N}}_{male}=15×4=60$
${\stackrel{^}{N}}_{female}=10×4=40$
3. Proportion of students applied science programs by gender is given by:

Other estimation methods

The estimation method described above for Simple Random Sampling is the simplest estimation method, and there are other more advanced ones available, which are widely applied in many surveys. The most appropriate estimation method to use is determined by a few factors, such as the characteristics to be estimated, the different types of data, reliability, cost and timeliness, etc. At Statistics Canada, specialized estimation systems are used to produce estimates involving complicated procedures in a timely manner.

Quite often design weights have to be adjusted prior to estimation, and there are two main types of adjustment: nonresponse adjustment and adjustment for external information.

Almost all surveys suffer from nonresponse, which occurs when all or some key information requested from sampled units is unavailable for some reason, such as the sample unit refuses to participate, no contact is made, the unit cannot be located or the information obtained is unusable. The easiest way to deal with such nonresponse is to ignore it, but this leads to inaccurate estimates.

Two common ways of dealing with this kind of nonresponse is to impute missing answers or to adjust the design weights based on the assumption that the responding units represent both responding and nonresponding units. The design weights of the non-respondents are then redistributed among the respondents.