# 3.2 Sampling

3.2.1 Selection of a sample

Text begins

**Topic navigation**

Sampling allows the estimation of the characteristics of a population by directly observing a portion of the entire population. Researchers are not interested in the sample itself, but in what can be learned from the sample about the entire population. It is essential that a sample survey be correctly defined and organized. If the wrong questions are asked, the collected data will not help in fulfilling the survey objectives. If the questions are asked to the wrong people, the data will not give a good representation of the population. Results will be biased.

Here are the steps to follow to select a sample and to ensure that this sample will allow you to meet the survey’s objective.

## Establish the survey's objectives

Specifying the objectives of a survey with as much detail as possible is critical to its ultimate success. The initial users and uses of the data should be identified at this stage. It is also at this stage that a decision should be made on the type of data to be used among census, sample survey, administrative data or an alternative source of data.

**Define the target population**

No matter which type of data is used, the target population must be well defined. It is the total population for which the information is required. In order to achieve this, the units that make up the population must be described in terms of characteristics that clearly identify them. The following characteristics define the target population:

- Nature of units: persons, hospitals, schools, etc.
- Geographic location: the geographic boundaries of the population have to be determined, as well as the level of geographic detail required for the survey estimate (by province, by city, etc.).
- Reference period: the period covered by the survey.
- Other characteristics, such as socio-demographic characteristics (a particular age group, for example) or type of industry.

**Decide on the data to be collected**

The data requirements of the survey must be established. It is also necessary to define the terms relative to the data and ensure that these definitions meet data requirements operationally.

**Set the level of precision**

There is a level of uncertainty associated with estimates from a sample. It is the sampling error. When designing a survey, the acceptable level of uncertainty in the survey estimates has to be established. This level depends on the end use of the results and on the overall budget and time available. The bigger the budget, the more resources available to control quality. The level of uncertainty will also be determined by the sample size. Increasing the sample size will decrease the sampling error. For example, if you sample 24 out of 25 students in your class, there will not be as much sample-to-sample variation as there would be if you only sampled 5 students from among the 25 students in the class.

## The sample design

The following steps lead to the determination of the sample design:

- Determine what the survey population will be (e.g. students, men aged 20 to 35, newborn babies, etc.).
- Choose the most appropriate survey time frame.
- Define the survey units.
- Establish the sample size (e.g. a sample of 100 from a population of 1,000).
- Select a sampling method.

Estimation techniques to be used, that is, how results will be generalized to the entire population and how sampling error will be calculated, will result directly from the sampling design and will be discussed in the upcoming section on estimation.

## The survey population

Some members of the target population have to be excluded because of operational constraints such as the high cost of collecting data in some remote areas, the difficulty of identifying and contacting certain components of the target population, etc. The population that is effectively included is called the survey population or the **observed population**. The target population is the population we **want to observe** while the survey population is the population we **can observe**.

The goal is to have the survey population as close as possible to the target population. It is also very important to inform the users of the data of the differences between the two populations, as the results of the survey will apply only to the survey population.

For example, a target population for a survey could be all Canadians aged 15 years and over (on a particular reference date), while the survey population could exclude residents of the Yukon, Nunavut and the Northwest Territories, persons living on Aboriginal reserves, full-time members of the Canadian Armed Forces and residents of institutions. These Canadians might be excluded for various reasons: to survey people in the territories might prove to be difficult and expensive, military personnel may not be available for surveying if they are out on a mission, etc. Using this example, about 2% of the target population would be excluded from the survey population.

## The survey frame

The survey frame, also called the sampling frame, is the tool used to gain access to the population. There are two types of frames: list frames and area frames. A list frame is just a list of the units in a population. Each unit can be identified and the frame includes the information needed to access these units. A good frame should be complete and up-to-date. No member of the survey population should be excluded from the frame or appear more than once and no unit that is not part of the population (e.g. deceased persons) should be on the frame. The chosen frame will impact the selected survey population. For instance, if a list of telephone numbers is used to select a sample of households, then all households without telephones are excluded from the survey population.

An area frame is a list of geographic areas. Instead of selecting units directly from the frame as one would with a list frame, geographic areas are selected and a means to access units located in these areas is identified, like visiting the units in person, for instance. Suppose that you were surveying a rural town in Quebec to see what percentage of residents are farmers. If you were provided with an area frame, then you would be able to locate which roads to visit, but you would still have to find out the names and addresses of the residents on each road.## The survey units

There are three types of units that have to be accurately identified in order to avoid problems during the selection, data collection and data analysis stages. They are as follows:

- The sampling unit is part of the frame and therefore subject to being selected.
- The respondent unit, or reporting unit, who provides the information needed by the survey.
- The unit of reference, or unit of analysis, the unit about which information is provided and who is used to analyze the survey results.

For example, in a survey about newborns in Edmonton, the sampling unit might be a household, the reporting unit one of the parents or a legal guardian, and the unit of reference the baby.

The sampling units may differ depending on the survey frame used. This is why the survey population, survey frame and survey units are defined in conjunction with one another.

## The sample size

The level of precision needed for the survey estimates will impact the sample size. However, it is not as easy to determine the sample size as one may think. Generally, the actual sample size of a survey is a compromise between the level of precision to be achieved, the survey budget and any other operational constraints. In order to achieve a certain level of precision, the sample size will depend, among other things, on the following factors:

- The variability of the characteristics being observed: If every person in a population had the same salary, then a sample of one person would be all you need to estimate the average salary of the population. If the salaries are very different, then you would need a bigger sample in order to produce a reliable estimate.
- The population size: To a certain extent, the bigger the population, the bigger the sample needed. But once you reach a certain size, an increase in population no longer affects the sample size. For instance, the necessary sample size to achieve a certain level of precision will be about the same for a population of one million as for a population twice that size.
- The sampling and estimation methods: Not all sampling and estimation methods have the same level of efficiency. The more efficient the method, the smaller the sample size needed to obtain a given precision of the estimates. However, because of operational constraints and limitations of the survey frames, you cannot always use the most efficient method.

## The sampling method

There are two types of sampling methods: probability sampling and non-probability sampling. The difference between them is that in probability sampling, every unit has a probability of being selected that can be quantified. This is not true for non-probability sampling. The next section will describe features of both types of sampling and detail some of the methods related to each type.

- Date modified: