# 3.5 Estimation

3.5.1 Weighting

Text begins

**Topic navigation**

The principle behind estimation in a probability survey is that each sample unit represents not only itself, but also several units of the survey population. The design weight of a unit usually refers to the average number of units in the population that each sampled unit represents. This weight is determined by the sampling method and is an important part of the estimation process.

While the design weights can be used for estimation, most surveys produce a set of estimation weights by adjusting the design weights to improve the precision of the final estimates. The two most common reasons for making adjustments are to account for nonresponse and to make use of pertinent data available from other sources. Once the final estimation weights have been calculated, they are applied to the sample data in order to compute estimates

## Design weight

The first step in estimation is assigning a weight to each sampled unit. The **design weight** (
${w}_{d}$
), which is the average number of units in the population that each sampled unit represents, is the inverse of its inclusion probability (π) in the sample.

If the inclusion probability is 1/50, then each selected unit represents on average 50 units in the population and the design weight is ${w}_{d}=50$ .

Some sample designs assign the same design weights for all units in the sample, while others give different design weights to sampled units for various reasons, such as improving precision or reducing cost.

### Example 1: Simple Random Sample

Suppose there are *N *=100 Grade 12 (or secondary 5) students in a high school. A simple random sample of size *n* =25 students is selected, and the selected students are invited to complete a questionnaire about their career plan.

- The inclusion probability is:

$\pi \text{}=n/N\text{}=25/100\text{}=\text{}1/4$ . - The design weight is:

${w}_{d}=\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$\pi $}\right.=1/\frac{1}{4}=4$ .

Each student selected in the simple represents four students of the school.

## Production of simple estimates

Estimates can be produced after weights are calculated while only simple estimates, such as totals, averages and proportions, are covered here.

### Estimating a population total

The estimate of the total number ( $\widehat{Y}$ ) of units in the population is calculated by multiplying the weight and the value of interest for each selected unit then summed over all in sample units. For categorical variables, the estimate is actually calculated by adding together the weights of the responding units.

### Example 2: Simple Random Sample (Continued)

Suppose that within the 25 students selected in the sample, there are about 10 applied to science programs. Then, the total number of students applied to science programs is:

$\widehat{Y}=\text{}4\text{}\times \text{}10\text{}=\text{}40$

### Estimating a population average

The estimate of the average ( $\widehat{\overline{Y}}$ ) in the population is the estimate of the total value for the variable in interest ( $\widehat{Y}$ ) divided by the estimate of the total number of units ( $\widehat{N}$ ) in the population.

$$\widehat{\overline{Y}}=\frac{\widehat{Y}}{\widehat{N}}$$

### Example 3: Simple Random Sample (Continued)

Usually, students apply to more than one program when applying for university study. Suppose that within the 25 students selected in the sample, 5 of them apply to only 1 program, 10 of them apply to 2 programs and 10 of them apply to 3 programs. Then, the average number of applications per student is calculated as following:

- Total number of applications is given by:

$\widehat{Y}=\left(4\times 5\times 1\right)+\left(4\times 10\times 2\right)+\left(4\times 10\times 3\right)=220$ - Total number of students is given by:

$\widehat{N}=4\times 25=100$ - Average number of applications per student is given by:

$\widehat{\overline{Y}}=\frac{\widehat{Y}}{\widehat{N}}=\frac{220}{100}=2.2$

### Estimating a population proportion

The estimate of the proportion in the survey population having a given characteristic is quite similar as estimating a population average in terms of the mathematical formula. It is also calculated as a quotient between two estimated totals. The main difference is the numerator, which indicates the estimate of the total number of units possessing the given characteristic ( $c$ ) when estimating a proportion ( $\widehat{P}$ ). However, the numerator is the estimate of the total value for quantitative data when estimating an average.

$$\widehat{P}=\frac{\widehat{{N}_{C}}}{\widehat{N}}$$

### Example 4: Simple Random Sample (Continued)

Suppose within the 25 students selected in the sample, there are 10 females and 15 males. Overall, 10 students apply for science programs with 5 females and 5 males. The proportion of students apply for science programs by gender is calculated as following:

- Total number of students applied science programs by gender is given by:

${\widehat{N}}_{male,science}=5\times 4=20$

${\widehat{N}}_{female,science}=5\times 4=20$ - Total number of students by gender is given by:

${\widehat{N}}_{male}=15\times 4=60$

${\widehat{N}}_{female}=10\times 4=40$ - Proportion of students applied science programs by gender is given by:

${\widehat{P}}_{male,science}=\frac{{\widehat{N}}_{male,science}}{{\widehat{N}}_{male}}=\frac{20}{60}=1/3$

${\widehat{P}}_{female,science}=\frac{{\widehat{N}}_{female,science}}{{\widehat{N}}_{female}}=\frac{20}{40}=1/2$

## Other estimation methods

The estimation method described above for Simple Random Sampling is the simplest estimation method, and there are other more advanced ones available, which are widely applied in many surveys. The most appropriate estimation method to use is determined by a few factors, such as the characteristics to be estimated, the different types of data, reliability, cost and timeliness, etc. At Statistics Canada, specialized estimation systems are used to produce estimates involving complicated procedures in a timely manner.

## Adjusting the weights

Quite often design weights have to be adjusted prior to estimation, and there are two main types of adjustment: nonresponse adjustment and adjustment for external information.

### Adjusting for nonresponse

Almost all surveys suffer from nonresponse, which occurs when all or some key information requested from sampled units is unavailable for some reason, such as the sample unit refuses to participate, no contact is made, the unit cannot be located or the information obtained is unusable. The easiest way to deal with such nonresponse is to ignore it, but this leads to inaccurate estimates.

Two common ways of dealing with this kind of nonresponse is to impute missing answers or to adjust the design weights based on the assumption that the responding units represent both responding and nonresponding units. The design weights of the non-respondents are then redistributed among the respondents.

### Adjusting for external information

Sometimes information about the survey population is available from other sources, for example information from a census or an administration file. This information can also be incorporated in the weighting process.

There are two main reasons for using external (auxiliary) data at estimation. The first reason is that it is often important for the survey estimates to match known population totals or estimates from another, more reliable, survey. For example, many social surveys adjust their survey estimates in order to be consistent with estimates (age, sex distributions, etc.) of the most recent census of the population. External information may also be obtained from administrative data or from another survey that is considered to be more reliable because of its larger sample size or because the published estimates must be respected.

The second reason is to improve the precision of the estimates, as long as the values of the auxiliary variables are collected for the surveyed units and that population totals or estimates are available for these variables from another reliable source.

- Date modified: