Appendix A - Methodology

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

The Survey on the Vitality of Official-language Minorities methodology
Warnings and limits pertaining to the interpretation and the use of the data

The Survey on the Vitality of Official-language Minorities methodology

Survey population
Survey instrument
Sampling plan
Data sources
Error detection and verification
Estimation
Evaluation of quality
Disclosure control
Measure of data quality
Significant difference between two estimates

Survey population

The survey's target population includes children under 18 years of age who have at least one parent (aged 15 years or over) who belongs to the official-language minority. It also includes adults 18 years of age or over who belong to the official-language minority. The survey covers the 10 provinces and 3 territories and excludes all persons living in collective dwellings, institutions, on Indian reserves and in northern Inuit communities in Quebec. The survey also excludes non permanent residents (persons with work or study permits and those with refugee status).

Official-language minorities are essentially defined as Francophones outside of Quebec and Anglophones in Quebec. People with a non-official language as their mother tongue are also part of the survey population based on their knowledge and use of French or English. The variables used to determine whether a person was included in the survey's target population were mother tongue, knowledge of official language, and the language spoken most often at home. A more detailed description of the criteria used to define the survey population can be found in Section 1: Context and survey information .

Survey instrument

Two questionnaires were developed in consultation with external clients: an adult questionnaire and a child questionnaire. There were several waves of testing during the development of each questionnaire. Qualitative testing was done during several stages of development, and a pilot test took place one year before the actual survey.

Sampling plan

The SVOLM is a post-censal survey. This means that the sample for the survey was selected from individuals who completed the long questionnaire in the 2006 Census which is systematically distributed to approximately one in five households. Answers to the questions on mother tongue, knowledge of official languages, and the language spoken most often at home allowed for the identification of the target population.

Next, a stratified sample of people in the target population was taken. The strata are defined by the cross-classification of the ten provinces and some sub-provincial regions (in New Brunswick, Quebec and Ontario) and seven age groups (0 to 4 years, 5 to 11 years, 12 to 17 years, 18 to 24 years, 25 to 44 years, 45 to 64 years, and 65 years and over). The territories were regrouped and only two age groups were considered, age 0 to 17 years and 18 years and over. Other stratification variables were used to allocate the sample; these were the concentration of the target population in the individual's region, being Allophone or not, as well as geographical sub-regions for certain areas.

Thus, a sample of 30,794 adults and 22,362 children was selected for a total of 53,156 people in the survey.

Data sources

Data collection started on October 10, 2006 and ended on January 15, 2007. Participation in the SVOLM was voluntary. Data collection was done by computer-assisted telephone interviewing (CATI). Data was collected directly from selected respondents. Proxy interviews were not permitted for the adult sample. For the child sample, a respondent for the child was chosen a priori from the sampling frame. This was usually one of the child's parents or, on rare occasions, one of the child's grandparents if the child did not live with their parents but their grandparents. Since the child's belonging to the target population depends on the parents' (or grandparents') belonging to the official-language minority, it was important to contact the selected parent for the interview and not just any adult in the household. If the selected parent was absent for the duration of the survey, it was possible to conduct the interview with the other parent but only if, and only if, the other parent was also part of the official-language minority. The questionnaire allowed for these situations to be identified.

Error detection and verification

The computer system used by the interviewers to collect respondent data allowed for the prevention of a number of errors. When an impossible, improbable or incoherent answer was entered into the system by the interviewer, the system displayed a message which allowed the interviewer to correct typing errors or verify information with the respondent, without permitting them to continue until the error was fixed. Checking of some interviews was done by the interviewer supervisors and feedback was provided in order to avoid the repetition of errors.

Once collection was finished, a data processing system was implemented. This included data validation, checking the consistency between sections, coding of write-in responses, checking the consistency of the relationships among members of the household, derivation of a response status, and validating the flow of the questionnaire.

Estimation

After data processing, the next step consists of attributing a weight to each record in the sample. The weight calculation consists of three main steps: (1) calculation of the initial weight, (2) adjustment for non-response, and (3) post-stratification.

  1. In the first step, the inverse of the probability of selection was attributed as the initial weight to each record in the sample. Therefore, this weight reflects the sampling design that was used.
  2. The correction of the weights for total non-response was done using a method that predicts the probability of response. The probability of response of the respondents and the non-respondents was estimated using a logistic regression model. Response classes based on the response probabilities predicted by the model were then formed with the help of a classification algorithm. Once the classes were formed, the mass of the weights of the non-respondents was transferred to the respondents within each class. The correction for non-response was done in three parts for each of the two samples: adjustment for "non-contact", adjustment for refusals and adjustment for out of scope records. Since the variables explaining each type of non-response are different, it was preferable to construct different models.
  3. Post-stratification consists of correcting the weights of respondent records in such a way that totals for certain variables such as province, region and age group are consistent with the corresponding Census totals.

The sampling error was estimated using the "bootstrap" method. Bootstrap weights were calculated and adjusted using the same steps as the survey weights. Hence, it is possible to estimate the sampling variance of each estimate and present it as a coefficient of variation (CV).

Evaluation of quality

Qualitative testing took place at different stages during the development of the survey. In this way, it was assured that questions were understood by respondents and that questions allowed a correct measurement of the concepts. A pilot survey also took place about one year before the survey to evaluate all procedures, from the content of the questionnaire to the analysis of the data.

In order to limit non-response and to minimize measurement error, interviewers were trained by members of the SVOLM team. Interviewers were given manuals clearly describing all the procedures and were supervised by experienced people who could correct them and help them at any time. Follow-up for people who refused to answer the survey was also done by senior interviewers to try and reduce non-response. Furthermore, during interviews, interviewers used a function built into the system to record any comments which helped in resolving certain invalid responses or incorrect interpretations. These notes were very useful during data processing.

Disclosure control

Statistics Canada is prohibited by law from releasing any data that would divulge information obtained under the Statistics Act that relates to any identifiable person, business or organization without the prior knowledge or the consent in writing of that person, business or organization. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.

In order to ensure confidentiality, all estimates based on a group of less than 10 people in the sample cannot be published. Rounding is an additional measure used to ensure confidentiality. Hence, all population counts and totals are rounded to the nearest multiple of ten, whereas all ratios and percentages are rounded to the nearest unit.

Measure of data quality

The errors that occur in surveys can be separated into two categories, depending on whether they were caused by sampling or not.

Sampling errors are mainly due to the fact that only a sample, not the entire population, is being used for analysis. Therefore, they cannot be avoided completely, but it is possible to measure their magnitude. A measure is thus provided for each cell in all disseminated data tables. For a given estimate, this measure is presented as the coefficient of variation (CV) which is the ratio between the square root of the variance (standard error) of the estimate and the estimate itself. The CV indicates the proportion that the standard error represents compared to the estimate. So, the smaller the CV, the more the corresponding estimate is reliable. The CVs that accompany the estimates in the SVOLM tables were calculated using the "bootstrap" method.

According to the guidelines that govern Statistics Canada's publications, all disseminated data must be accompanied by a quality measure. Using the CV, an estimate can be classified into one of the following dissemination categories:

  • If the CV is less than or equal to 16.5%, the estimate can be disseminated.
  • If the CV is greater then 16.5% but less than or equal to 33.3%, the estimate should be used with caution since it has a higher error associated with it. The estimates in this category are accompanied by the letter "E".
  • If the CV is greater than 33.3%, it is preferable not to disseminate the estimate since the associated error is too high.

Non-sampling errors cannot be easily estimated, but they can be avoided. These errors can occur during any stage of a survey. They include coverage, non-response, measurement and processing errors.

Concerning potential coverage errors, it was possible to avoid most such cases by using the Census as the sampling frame, which provides a very good coverage of the Canadian population. However, because the Census is a self-completed survey and it allows proxy responses, it is possible that errors occurred in the responses to the language questions. Hence, it was possible for an adult in a given household to answer the three language questions on behalf of the other adults in the households without having sufficient knowledge to do so. Such a situation could cause the inclusion of an individual in the target population when they shouldn't be there (over coverage) or exclude individuals who should be in the target population (under coverage). Keeping in mind that the three language "filter" questions where asked of respondents during the survey, correction was done for part of the over coverage. Once it was determined that an individual was not part of the target population, the individual was excluded from the sample and as a consequence, the weighting was adjusted. On the other hand, it was impossible to correct for under coverage and this error is difficult to quantify.

The delay between the Census and the collection of a post-censal is also an important factor in coverage error. Answers to certain questions are, in fact, more likely to change over time. For example, the answer to the question on the knowledge of official languages may change if the individual learned the other official language during the period. Equally, the language spoken at home may also have changed. Even though for most cases the information is relatively stable over a short period of time, it is still important to minimize the delay between the creation of the sample frame and the collection of data. Thanks to changes in the methodology and operations of the Census, it was possible to select samples quickly and thus reduce the risk of changes in the answers to the filter questions.

The response rate for the survey is approximately 73% (for the adult and child samples combined). The units which are excluded from the target population, the out of scopes, are not included in the calculation of this rate since the sample size was already inflated to take these losses into account. If, however, one is interested in the proportion of individuals who completed the entire questionnaire from everyone selected for the survey, a rate of 67% is observed; in this case the out of scopes are considered to be non-respondents. From another point of view, the out of scopes could also be considered as respondents since they were contacted and interviewed. However, this option was not retained here.

In general, response rates were relatively similar between the different regions in Canada. Nevertheless, some regions or provinces had a response rate well below the others: Newfoundland and Labrador, Toronto, British Columbia and the territories. These regions had both higher out of scope and non-response rates than the other regions. In addition to limiting the potential for analysis, because there were fewer respondents in certain tables, the precision of the estimates for these regions is not as good as for the other regions. Also, for certain age groups in these regions, the rates of out of scopes and non-response are so high that it is difficult to guarantee that the estimates are without bias. Hence, it is suggested to use caution when analyzing the results of the following groups: 18 to 24 year olds in Newfoundland and Labrador, Toronto and British Columbia, as well as the group of people aged 65 years of age and over in Newfoundland and Labrador, Toronto and the region called the rest of New Brunswick.

Measurement and processing errors are difficult to quantify, but steps were taken so that they were minimized during the development of the CATI application. This application was tested and corrected during the various stages of the development of the survey.

Significant difference between two estimates

When comparing two estimates to each other, one must determine if the difference between them is statistically significant before drawing conclusions. Since there is an error associated to each estimate, it is possible that although two estimates seem to be different, their associated errors are so high that one cannot affirm that they are in fact different. The recommended method to use when one has access to estimates and their CV, is the method referred to as the confidence interval overlap method. For each estimate, the 95% confidence interval is calculated. If the two intervals overlap, then one cannot conclude that the two estimates are different (or, in more technical terms, with a 95% confidence level, the null hypothesis that there is no difference between the two estimates cannot be rejected). If the two intervals do not overlap, then it is possible to conclude that the two estimates are different (in more technical terms, with a 95% confidence level, the null hypothesis that there is no difference between the two estimates can be rejected).

When an estimate and its standard error are available, a 95% confidence interval (CI95) is constructed as follows:

CI95 = estimate ± 1.96 X standard error

Since it is the CV that is provided with the estimates and the CV is obtained using the standard error, the formula can be rewritten as follows:

CI95 = estimate ± 1.96 X [ CV X estimate ]

Warnings and limits pertaining to the interpretation and the use of the data

Two distinct samples: Adults and children
Sampling frame for Newfoundland and Labrador and Prince Edward Island
Allophones using French in Montreal
Large weights
Allophones and out of scopes (adult sample)
Out of scopes
Incomplete coverage of the 0 to 4 year old age group

Two distinct samples: Adults and children

The SVOLM data comes from two distinct and non-complementary samples, a sample of adults (individuals aged 18 years and over) who are part of the official-language minority and a sample of children who are under 18 years of age but who have at least one parent (aged 15 years and over) who is part of the official-language minority. Thus, it is not necessary for a child to be part of the minority in order for them to be included in the sample. For this reason, the results from the two samples cannot be combined. Therefore, no grand total of individuals who are part of the minority will be published.

Sampling frame for Newfoundland and Labrador and Prince Edward Island

Since the target population is relatively small in Newfoundland and Labrador and Prince Edward Island, the Census long form questionnaire distributed to one in five households did not lead to the identification of enough people to guarantee good quality estimates from the SVOLM sample. Hence, it was necessary to also make use of the short form questionnaire in order to have access to the entire target population. The inconvenience of using the short form is that it is less precise in identifying the target population then the long form since it only contains one question about language (the mother tongue) whereas three questions about language from the long form are used.

Allophones using French in Montreal

In the Montreal region, where the survey was primarily interested in the situation of allophones, a sample of allophones who use French (and not English, the language of the minority) was also selected. This part of the population is only used to obtain a complete portrait of allophones in the Montreal region and can only be used for this. These individuals will always form a separate group in the tables and cannot be aggregated with the rest of the sample at any time.

Large weights

It is not very efficient to select a large number of people from a homogeneous environment where the answers obtained would be very similar. It is actually much more preferable to have diversity in the environments from which people were selected. The adopted sampling strategy allowed for the increase in the sampling fraction in the less homogeneous environments compared to a purely proportional sampling strategy. This measure was used in order to increase the efficiency of the sampling. However, the further the sampling strategy is from being proportional, the more the variability of the weights increases. Hence, it is possible, in certain tables, that one or several observations with relatively larger weights have a big influence on the estimate. The weighting strategy was done in a way to limit this sort of situation, but it is always preferable to study the distribution of the weights when surprising results are observed.

Allophones and out of scopes (adult sample)

Allophones who use the language of the minority were covered proportionally within every geographic domain. So, approximately the same proportion of allophones is found in the sample as in the population. In Toronto and in the province of British Columbia, allophones represent a large proportion of the target population (more than 50%) and, as a consequence, of the sample. A relatively high rate of out of scopes was observed among the allophones in the survey (about 26%). The main reason that such a high rate of out of scopes is observed for these two regions (20% and 14% respectively) is because of the number of allophones in the sample. In fact, allophones who ended up as out scope represent 72% of all out of scopes in Toronto and 51% of out of scopes in British Columbia.

Out of scopes

Some domains of estimation have a high enough rate of out of scopes that the results associated with those domains should be used with caution. This is the case for the 5 to 11 year old (21%) and the 18 to 24 year old (28%) age groups in Newfoundland and Labrador. Part of the problem can be explained by the fact that a portion of the sample in that province was selected from the Census short form for which a higher rate of out of scopes was observed than for people who completed the long form. In Toronto and British Columbia, the out of scopes represent a non-negligible proportion of the sample for all age groups (see the preceding section), but it is in the 18 to 24 year old age group where the situation is the most problematic, with a rate of 32% in Toronto and 23% in British Columbia. Finally, in the territories, the children's group is most affected with 20% of cases being out of scopes, compared to 13% for the adult group.

Incomplete coverage of the 0 to 4 year old age group

Given that approximately five months went by between Census day and the beginning of collection for the SVOLM , and that the date of reference for the calculation of age was the beginning of collection, it is impossible to state that children under 5 months of age were covered by the survey. Hence, even though there are a number of babies aged 5 months or less in the sample, they should not be the object of a specific analysis since the coverage of this group is incomplete.