# 6. Weighting

## Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

To calculate initial weights, “false Aboriginal individuals” were treated as though they never existed, that is, they were excluded from the survey frame and the sample (please refer to section 3.2.3 “Sampling design and allocation of the sample” for more information).

In a sample survey, each selected person represents not only himself or herself, but also other persons who were not sampled. Consequently, a weight is associated with each selected person to indicate the number of persons that he or she represents. This weight must be used for all estimations. For example, in a simple random sample of 2% of the population, each person represents 50 persons in the population. The initial weight is then adjusted for such things as non-response and discrepancies between the characteristics of the sample and known totals for the target population (post-stratification). In fact, seven steps were used in the weighting process.

## 6.1 Initial weights

The initial weight of a unit in a given APS stratum corresponds to the product of two components: the inverse of the stratum sampling fraction and the NHS weight corrected for non-response to the NHS for the unit in question. The stratum sampling fraction is calculated as the number of people selected for the APS in each stratum divided by the total number of available NHS respondents for that stratum. The NHS weight used is the NHS sampling weight corrected for non-response, then capped to the 99th percentile, as calculated by the methodology team working on NHS estimation.

## 6.2 Adjustment for units not sent to collection

A relatively small number of sampled units were not sent to the field for different reasons.  These included:

• cases where three members of the same household had already been selected;
• units without a name or date of birth;
• “wave 2 ineligible units”, that is, individuals selected in wave 2 in households where at least one individual had indicated his refusal to participate in the survey in wave 1.

In the first two instances, a ratio adjustment was made by NHS region and Aboriginal group. In the third instance, a ratio adjustment was made by NHS region, Aboriginal group and education group. Within a region and Aboriginal group (or a region, Aboriginal group and education group in the third case), the weights of units removed were set to zero and the weights of the remaining units were increased proportionally (ratio adjustment).

## 6.3 Adjustment for non-response

Two adjustments were made for two types of non-response: the selected persons for whom no contact was made or the parent or guardian of the child (“non-contact”: 2,981 adults and 770 children) and persons contacted who did not (or could not) provide the information for themselves or their child (“non-response with contact”: 6,263 adults and 1,763 children). The second type of non-response is mainly associated with refusals or “disguised refusals”. An example of a “disguised refusal” might be a person contacted several times who continually postpones the interview. Two adjustments were made since the characteristics of the people that could not be contacted are often different from those of the people who refused when contacted.

The distinction between children and adults is made here based on age according to the NHS (and not age as of February 1, 2012 as measured on the APS), that is, under 15 years of age for children and 15 and over for adults. This is an important distinction because fewer characteristics explaining non-response are available for children than for adults on the NHS. Among children, it is not the characteristics of the child that influence response or non-response but rather the characteristics of the person responding for the child (parent or guardian). Consequently, it was necessary to determine for each child under 15 on the NHS who the most likely person was to respond for the child based on the child’s situation in the census family, and regardless of whether a response was or was not obtained to the APS for this child. In situations where the child’s parents or guardians lived as opposite sex couples, preference was given to the female person (mother, grandmother, aunt, for example).

It should be mentioned that the definition of “non-contact” changed from the definition used at the time of the 2006 APS. Because the 2012 interviews were computer-assisted interviews (CAI) rather than the paper format questionnaire used in 2006, a series of collection variables, referred to as “paradata” were available for all units of the sample. In particular, information was collected for each contact attempt. A unit was deemed “non-contact” if none of the attempts resulted in contacting the person selected or the parent or guardian of the child selected. In 2006, non-contact was established based on the last contact attempt only. Consequently, in 2012 compared to 2006, there were proportionally fewer “non-contacts” and more “non-responses with contact”.

Weights were first adjusted for non-contact cases and then for non-response with contact, for adults and children separately. In what follows, the term “non-response” will be used for both types of non-response. The term “respondent” refers to the person completing the information for the selected person (usually themselves for the adults or a parent or guardian for the children).

Each non-response adjustment was done in three steps. First, a logistic regression model was used to predict the response probability (probability of obtaining a response) for each selected unit (for both responding and non-responding units) from a series of explanatory variables. These explanatory variables are divided into two groups. The first group consists of the “person” or “household” characteristics from the NHS for the person selected or of the parent or guardian of the child selected (for example, Aboriginal group of the person selected or of the parent or guardian of the child selected, number of people in the household of the person selected, etc.). The second group of explanatory variables consists of collection variables called “paradata”. The number of attempts to contact a subject and whether tracing was required are examples of paradata variables used by logistic regression models. The paradata were found to be particularly good predictors of the response or non-response as many of these variables measure the effort to contact a person or to obtain a response from a contacted person. For instance, individuals requiring a large number of attempts to be contacted were found to be very similar to individuals for whom no contact was made (all attempts failed).

In the second step, individuals (respondents and non-respondents) with similar response probabilities were grouped in adjustment classes using cluster analysis. A simulation was carried out to determine approximately the optimal number of classes and the minimum number of respondents per class. The response rate was derived for each class based on the number of respondents and non-respondents in the class. The derived response rate was weighted using the weights from the previous adjustment step.

In the third step, the inverse of the weighted response rate in a class was used as the adjustment factor for that class and the weights of the responding units within the class were adjusted accordingly. The weights of the non-responding units were set at zero.

It is important to note that at this stage, all units considered to be out of scope were classified as respondents. Indeed, all the required information was collected from these individuals to determine that they were out of scope. The weights of these out-of-scope units were set to 0 in the last step of the weight adjustment and these units were eliminated from the analytical file. Retaining them until the last step makes it possible to produce internally weighted estimates of different groups of units outside the target population. This will be very useful, for example, in estimating certain parameters at the time of the next survey.

## 6.4 Adjustment for partial respondents

Partial respondents are individuals with Aboriginal identity on the APS but who did not complete enough information to meet the definition of respondent as defined in section 5. There were 157 partial respondents, which means their impact on the estimates should be minimal.

The adjustment was made by region, Aboriginal group and education group as measured on the NHS. A number of groupings were made by cross-tabulating these variables in order to obtain enough observations to calculate the adjustment factor. Knowing that these partial respondents had reported Aboriginal identity, only the weights for respondents of Aboriginal identity were increased to reflect partial respondents (the out-of-scope weights, including non-Aboriginal individuals on the APS, were not adjusted). The weights of partial respondents were then set at zero.

## 6.5 Post-stratification

Post-stratification ensures that the sum of the adjusted weights for the responding units corresponds to the NHS estimates according to different groups called post-strata.

In the case of the APS, two separate post-stratifications were carried out. The first post-stratification adjusts the weights of the Aboriginal identity or ancestry population from the NHS by post-stratum using the identity and ancestry variables from the RDB survey frame (see section 3.1.3) at the time of sample selection (and not the APS-measured variables, which are the subject of the second post-stratification). The post-strata are defined from certain combinations of region, Aboriginal type (identity or ancestry-only), Aboriginal group (Status First Nations, Non-Status First Nations, Métis, Inuit, other) and age group (6-14, 15-44, and 45 and over). The distinction between Status and Non-Status First Nations was used only for the provinces between Ontario and British Columbia. It is important to point out that the NHS estimates on which the weights were adjusted correspond exactly to the APS coverage, specifically, the identity or ancestry-only population aged 6 and over as of February 1, 2012, excluding people living on reserves and certain First Nations communities in the territories.

The weights were adjusted according to the ratio of the NHS weighted estimate to the sample weighted estimate for each post-stratum. As a result, the sample did not under- or over-represent certain combinations of Aboriginal groups, regions and age groups of the NHS.

Given that the responses to the questions defining the Aboriginal identity population (presented in section 3.1.1) may differ between the APS and NHS, a second post-stratification was carried out. Note that the APS questions defining the identity population are slightly different from those asked in the NHS (see Table 1 in section 2 and section 3.1.1). The second post-stratification ensured that the Aboriginal identity population estimated from the APS questions corresponded to the Aboriginal identity population defined according to the NHS within each post-stratum. Unlike the first post-stratification, the second one was not a “classical” post-stratification where weights were readjusted to address under- or over-representation of certain groups in the sample. Indeed, the answers to the questions on Aboriginal identity in the APS may have differed from those obtained by the NHS for a variety of reasons (section 8.1). This second post-stratification was more of a “practical” one that ensured that the Aboriginal identity population counts according to the APS were the same as those obtained on the NHS. After this step, only respondents with Aboriginal identity according to the APS had positive weights.

It is important to note that the 2012 APS processing and imputation system eliminates one category of Aboriginal identity, namely, the “Status Indian or member of a First Nation / Indian band only” group (see section 3.2.1). Respondents in this group were imputed as First Nations people in the 2012 survey. During the second post-stratification, individuals in this NHS group were also combined with First Nations people. Because it was impossible to preserve the multiple identity counts between the APS and NHS (counts too small or discrepancies too large), individuals reporting an identity of First Nations and Métis, First Nations and Inuit, or First Nations, Métis and Inuit were combined with individuals reporting a First Nations identity during the second post-stratification. Individuals reporting a Métis and Inuit identity were combined with Métis. The second post-strata were formed from specific combinations of region, Aboriginal identity group (Status First Nations, Non-Status First Nations, Métis, Inuit) and age group (6-14, 15-44 and 45 and over).

## 6.6 Adjustment for extreme weights– Sigma gap method

Once the above weight adjustments were completed, some weights had very large values compared to others, which could have created problems during estimation if the observations with large weights also had very distinct characteristics from the observations with smaller weights. A method referred to as the “sigma gap” method was used to detect these extreme weights within each post-stratum, the post-strata being closely linked to the survey’s domains of estimation (see section 3.2.1). Bernier and Nobrega (1998)Note1 describe one application of the sigma gap method. The sigma gap method used here was intended to detect “outlier values” (excessively large weights) by calculating the difference between two successive weights after being sorted in descending order. This difference was compared to n*standard deviation of the weights within each post-stratum. If the difference exceeded n*standard deviation of the weights, the largest weight was identified as an outlier. Once a weight was identified as an outlier, then all others that were larger than it, in its post-stratum, were automatically identified as outliers. These weights are then reduced to the value of the first non-outlier weight. The mass of the reduced weights were then redistributed within the post-strata by a ratio adjustment. After examining a number of scenarios, a value of 2 was finally selected for n. This particular value for n made it possible to identify the weights that would intuitively have been considered as outliers.

## Note

1. Bernier, J. et Nobrega, K. (1998). Outlier detection in asymmetric samples: A comparison of an inter-quartile range method and a variation of a sigma-gap method. Statistical Society of Canada Annual Meeting, June 1998.

Is something not working? Is there information outdated? Can't find what you're looking for?