# Selection of a sample

## Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

Sampling allows statisticians to draw conclusions about a whole by examining a part. It enables us to estimate characteristics of a population by directly observing a portion of the entire population. Researchers are not interested in the sample itself, but in what can be learned from the survey—and how this information can be applied to the entire population.

It is essential that a sample survey be correctly defined and organized. If the wrong questions are posed to the wrong people, statisticians will not receive information that will be useful when applied to the entire population.

In the context of a national statistical agency like Statistics Canada, the following steps are needed to select a sample and ensure that this sample will fulfill its goals.

## Establish the survey's objectives

The first step in planning a useful and efficient survey is to specify the objectives with as much detail as possible. Without objectives, the survey is unlikely to generate usable results. Clarifying the aims of the survey is critical to its ultimate success. The initial users and uses of the data should be identified at this stage.

The pros and cons of a census versus a sample survey or the use of administrative records should be evaluated and a decision made as to the most appropriate method. (At this point, we will assume that a sample survey is the best way to proceed in order to obtain the information we need. This assumption will hold true for the remainder of the sample selection steps, even though many of the steps mentioned will also apply to the other methods.)

## Define the target population

The target population is the total population for which the information is required. For example, if you were to conduct a survey about the most popular types of cars in Saskatchewan, then the target population would be every car in Saskatchewan. The units that make up the population must be described in terms of characteristics that clearly identify them. Specifically, the target population is defined by the following characteristics:

• Nature of data required: about persons, hospitals, schools, etc.
• Geographic location: the geographic boundaries of the population have to be determined, as well as the level of geographic detail required for the survey estimate (by province, by city, etc.).
• Reference period: the time period covered by the survey.
• Other characteristics, such as socio-demographic characteristics (interest in a particular age group, for example) or type of industry.

## Decide on the data to be collected

he data requirements of the survey must be established. To ensure that the requirements are operationally sound, the necessary data terms and definitions also need to be determined.

## Set the level of precision

As mentioned in the section on Sampling error, there is a level of uncertainty associated with estimates coming from a sample. For example, if you are trying to estimate the average distance between home and school for students in your class of 25 from a sample of 5 persons, your estimate will depend on who the 5 sampled students are. If the 5 sampled students also live close to the school, the results will not be able to represent the class accurately. This sample-to-sample variation is what causes the sampling error. Statisticians can estimate the sampling error associated with a particular sampling plan, and try to minimize it.

When designing a survey, the acceptable level of uncertainty in the survey estimates has to be established. This level depends on what the end use of the results will be and on the size of the overall budget. The bigger the budget, the more resources available, and thus, less chance for error. And if the end result is to serve a specific purpose, then the acceptable level of uncertainty would be smaller than an end result that is simply looking for general trends.

The level of uncertainty will also be determined by the sample size. Increasing the sample size will decrease the sampling error. (If you sample 24 out of 25 students in your class, there will not be as much sample-to-sample variation as there would be if you only sampled 5 students from among the 25 possible samples.)

## The sample design

Once the objectives, guidelines and definitions have been worked out, the statistician can work on the survey plan. The survey plan is divided into three parts:

• Sample design: how the sample will be collected.
• Estimation techniques: how the results from the sample will be extended to the whole population.
• Measures of precision: how the sampling error will be measured.

The estimation techniques and measures of precision are discussed in a later section. For the moment, we will look at the sample design. The following steps lead to the complete determination of the sample design:

1. Determine what the survey population will be (e.g., students, men aged 20 to 35, newborn babies, etc.).
2. Choose the most appropriate survey time frame.
3. Define the survey units.
4. Establish the sample size (e.g., a sample of 100 from a population of 1,000).
5. Select a sampling method.

## The survey population

The target population must be defined early in the survey-designing process. This is the population for which information is required. However, some members of the population have to be excluded because of operational constraints: the high cost of collecting data in some remote areas, the difficulty of identifying and contacting certain components of the target population, etc. For example, because it would be too difficult to locate and survey each car owned by every resident in Saskatchewan, a survey population of just the major cities and towns might be conducted instead. When some of the members of the target population are excluded, we call the included population the survey population or, what is sometimes called, an observed population. The target population is the population we want to observe while the survey population is the population we can observe.

The goal of this process is to have the survey population as close as possible to the target population. It is also very important that the users of the data be informed of the differences between the two populations, as the results of the survey will apply only to the survey population.

For example, a target population for a survey could be all Canadians aged 15 years and over (on a particular reference date), while the survey population could exclude residents of the Yukon, Nunavut and Northwest Territories, persons living on Aboriginal reserves, full-time members of the Canadian Armed Forces and residents of institutions. These Canadians might be excluded for various reasons: to survey people in the territories might prove to be difficult and expensive, military personnel may not be available for surveying if they are out on a mission, etc. Using this example, about 2% of the target population would be excluded from the survey population.

## The survey frame

The survey frame, also called the sampling frame, is the tool used to gain access to the population. There are two types of frames: list frames and area frames. A list frame is just a list of names and addresses that provide direct access to 'individuals' (e.g., a list of hospitals, a list of restaurants, a list of students at a university). Area frames are a list of geographic areas that provide indirect access to individuals (e.g., the neighbourhoods in a city). This type of access is called indirect because first, a list of geographic areas must be selected and then, access to individuals within each selected area must be worked out.

For instance, suppose that you were surveying a rural town in Quebec to see what percentage of residents are farmers. If you were provided with an area frame, then you would be able to locate which roads to visit, but you would still have to find out the names and addresses of the residents on each road.

When there is no single frame that is appropriate, multiple frames can be used. Some sampling techniques using both types of frames will be discussed later.

A good frame should be complete and up-to-date; no member of the survey population should be excluded from the frame or duplicated on the frame (represented more than once); and no unit that is not part of the population (e.g., deceased persons) should be on the frame. The frame chosen will impact the selected survey population. For instance, if a list of telephone numbers is used to select a sample of households, then all households without telephones are excluded from the survey population.

## The survey units

There are three types of units that have to be accurately identified in order to avoid problems during the selection, data collection and data analysis stages. They are as follows:

• The sampling unit is part of the frame and therefore subject to being selected.
• The respondent unit or reporting unit provides the information needed by the survey.
• The unit of reference or unit of analysis—the unit about which information is provided—is used to analyse the survey results.

For example, in a survey about newborns in Edmonton, the sampling unit might be a household, the reporting unit one of the parents or a legal guardian, and the unit of reference the baby.

The sampling units may differ depending on the frame used. This is why the survey population, survey frame and survey units are defined in conjunction with one another.

## The sample size

The level of precision needed for the survey estimates will impact the sample size. However, it is not as easy to determine the sample size as one may think. Generally, the actual sample size of a survey is a compromise between the level of precision to be achieved, the survey budget and any other operational constraints, such as budget and time. In order to achieve a certain level of precision, the sample size will depend, among other things, on the following factors:

• The variability of the characteristics being observed: If every person in a population had the same salary, then a sample of one person would be all you would need to estimate the average salary of the population. If the salaries are very different, then you would need a bigger sample in order to produce a reliable estimate.
• The population size: To a certain extent, the bigger the population, the bigger the sample needed. But once you reach a certain level, an increase in population no longer affects the sample size. For instance, the necessary sample size to achieve a certain level of precision will be about the same for a population of one million as for a population twice that size.
• The sampling and estimation methods: Not all sampling and estimation methods have the same level of efficiency. You will need a bigger sample if your method is not the most efficient. But because of operational constraints and the unavailability of an adequate frame, you cannot always use the most efficient technique.

## The sampling method

There are two types of sampling methods: probability sampling and non-probability sampling. The difference between them is that in probability sampling, every unit has a 'chance' of being selected, and that chance can be quantified. This is not true for non-probability sampling; every item in a population does not have an equal chance of being selected. The next section will describe features of both types of sampling and detail some of the methods related to each type.