Canadian Survey on Disability, 2012: Concepts and Methods Guide
Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.
- 1. Introduction
- 2. Survey content: concepts and questions
- 3. CSD Sample design
- 4. Data collection
- 5. Data processing
- 6. Weighting and creation of final data files
- 7. Data quality
- 8. Differences between the 2012 Canadian Survey on Disability (CSD) and the 2006 Participation and Activity Limitation Survey (PALS)
- 9. Data dissemination
1.1 Survey overview
The 2012 Canadian Survey on Disability (CSD) is a national survey of Canadian adults whose everyday activities are limited because of a long-term condition or health-related problem. The CSD was developed by Statistics Canada in collaboration with Employment and Social Development Canada (ESDC) (formerly Human Resources and Skills Development Canada). Input for the survey was obtained from the ESDC Persons with Disabilities Technical Advisory Group which consists of experts in the field of disability, including academics and representatives from various community associations across Canada, as well as members from ESDC and Statistics Canada. The survey was conducted from September 24, 2012 to January 13, 2013.
The CSD is based on a social model of disabilities rather than a medical model. The social model is based on the premise that disability is the result of the interaction between a person’s functional limitations and barriers in the environment, including social and physical barriers that make it harder to function day-to-day. Thus, disability is a social disadvantage that an unsupportive environment imposes on top of an individual’s impairment (Mackenzie et al, 2009).
The 2012 CSD provides a range of data on different disability types and severities. It measures how often the daily activities of Canadian adults are limited by long-term conditions, health problems and task-based difficulties. The CSD also collects data on use of aids and assistive devices as well as help received or required by respondents. The survey includes questions on the education and employment experiences of persons with disabilities, accommodations made in these areas and ability to get around the community.
Data from the CSD will serve disability and social policy analysts at all levels of government, as well as associations for persons with disabilities and researchers working in the field of disability policy and programs. The CSD will be used to plan and evaluate services, programs and policies for Canadian adults with disabilities to help enable their full participation in society. In particular, information on adults with disabilities is essential for the effective development and operation of the Employment Equity Program. Data on disability are also used to fulfill Canada’s international agreement relating to the United Nations Convention on the Rights of Persons with Disabilities.
The 2012 Canadian Survey on Disability (CSD) was based on a sample of persons who reported an activity limitation on the 2011 National Household Survey (NHS) and who were 15 years of age or older as of the date of the NHS, May 10, 2011. Since the NHS excludes the institutionalized population and other collective dwellings, the CSD only covers persons living in private dwellings in Canada. Also, for operational reasons, the population living on First Nation reserves is also excluded. Total sample size for the CSD was approximately 45,500 individuals. The overall response rate was 75%. The CSD provides reliable data on persons with disabilities for each province and territory of Canada. Detailed age breakdowns are also available by province.
Data on disability in Canada have been collected for over 30 years, reflecting an evolving recognition of the importance of data to support the goal of full participation of persons with disabilities. Concepts and methods used in the production of data on disability have also evolved over time. The first survey on disability in Canada was conducted in 1981, The International Year of the Disabled, shortly after the Canadian Parliamentary Committee on the Disabled and the Handicapped published its report entitled "Obstacles". Among the report’s recommendations was that Statistics Canada produce data on persons with disabilities. This launched the Canadian Health and Disability Survey, which was conducted as a supplement to the October 1983 and June 1984 Labour Force Survey. In addition, the 1986 Census included a question about activity limitations that would help to identify persons who were likely to have a disability. Later that year, Statistics Canada used that Census base to select a sample for the Health and Activity Limitation Survey (HALS). The 1986 HALS was conducted to identify Canadians with disabilities and to determine what limitations they experienced and the barriers they faced. A second HALS took place in 1991.
In 1996, no post-censal surveys were conducted. However, in 1998, the federal, provincial and territorial governments released their common disability framework, In Unison, calling for the promotion of greater inclusion of persons with disabilities in all aspects of Canadian society. Their 1998 report noted the importance of developing a reliable statistical database on disability and underlined the key role survey data would play in supporting policy development and research in this area.
In 2001, the International Classification of Functioning, Disability and Health (ICF), was approved by all World Health Organization (WHO) member states, including Canada. The ICF defined disability as the relationship between body structures and functions, daily activities and social participation, while recognizing the role of environmental factors. Reflecting this new definition of disability driven by a social model approach, HALS was renamed as the Participation and Activity Limitations Survey (PALS) and was conducted in 2001 and 2006. The new name reflected the fact that the new survey would increase the focus on the participation of persons with activity limitations. As with HALS, PALS was a joint effort by Employment and Social Development Canada and Statistics Canada.
More recently, in 2010, Canada ratified the United Nations Convention on the Rights of Persons with Disabilities. In keeping with Article 31 on Statistics and Data Collection, ESDC launched the New Disability Data Strategy and began developing a new set of questions to identify persons with disabilities, called the Disability Screening Questions (DSQ). The DSQ sought to move more fully towards the social model of disability, to achieve greater consistency in disability identification by type, and to improve coverage of the full range of disability types, especially mental/psychological, learning and memory disabilities. The DSQ were extensively tested qualitatively and quantitatively and were then used for the first time to identify adults with a disability on the 2012 Canadian Survey on Disability (CSD). The CSD also provided a portrait of adult Canadians with disabilities in relation to their participation in society.
1.3 A Caution to users
The concepts and methods used to measure disability in the 2012 Canadian Survey on Disability (CSD) represent a significant departure from those used in the 2006 Participation and Activity Limitation Survey (PALS). The most important change is that the two surveys used a different definition of disability. In the CSD, the new definition was applied by using the new set of disability screening questions (DSQ), reflecting a fuller implementation of the social model of disability. Other significant changes introduced for the CSD (see Chapter 8 below) also affect time series comparability. Because of the major differences in concepts and methods between the 2006 PALS and the 2012 CSD, it is neither possible nor recommended to compare the prevalence of disability over time between these two sources.
1.4 Purpose of the Concepts and Methods Guide
This Concepts and Methods Guide is intended to provide an understanding of the 2012 Canadian Survey on Disability (CSD) with respect to its subject-matter content and its methodological approaches. It is designed to assist CSD data users by serving as a guide to the concepts and questions used in the survey as well as the technical details of survey design, field work and data processing. The guide provides helpful information on how to use and interpret survey results. Its discussion of data quality also allows users to review the strengths and limitations of the data for their particular needs.
Chapter 1 of this guide provides an overview of the 2012 CSD by introducing the survey’s background and objectives. Chapter 2 discusses the development and testing of the survey’s content, explaining the key concepts and definitions used for the survey. This chapter introduces the CSD questionnaire modules as well as data linkages with the National Household Survey (NHS). Chapters 3 to 6 cover important aspects of the survey methodology, from sampling design, through data collection and processing and ending in the creation of final data files. Chapters 7 and 8 cover issues of data quality and caution users against making comparisons with data from previous Participation and Activity Limitations Surveys (PALS). Chapter 9 outlines the survey products that are more widely available to the public, including data tables, a fact sheet and reference material. Appendices provide more detail on questionnaire indicators, the disability severity score, questions used for assessing employment equity, special coding categories for the survey and standard classifications used. A glossary of survey terms is also provided.
2.1 Identifying persons with a disability
In the context of the New Disability Data Strategy of 2010, efforts to better identify persons with disabilities began with the creation of an instrument called the “Disability Screening Questions” (DSQ). The DSQ form the first component of the more comprehensive Canadian Survey on Disability.
The goal in developing the new screening questions was to create a new, comprehensive identification module for persons with disabilities which would:
- move more fully toward the social model of disability,
- identify both type and severity of disability,
- achieve greater consistency in disability identification by type,
- improve coverage of the full range of disability types, in particular, mental/psychological, learning and memory disabilities, and
- be short enough for adoption on a range of general population surveys.
It should be noted that the DSQ were specifically developed for the measurement of disability among adult Canadians but not among children.
Disability screening questions (DSQ)
The social model of disability implies that the presence of a difficulty alone is not sufficient for the identification of a disability - a limitation in daily activities must also be declared. Therefore, the DSQ measure the type and severity of disabilities of Canadian adults by asking questions about how often respondents’ daily activities are limited by long-term conditions, health problems and task-based difficulties.
Screener questions on the DSQ evaluate the presence and severity of 10 distinct types of disabilities related to a health problem or condition that has lasted or is expected to last for six months or more. Screening questions emphasize consistency of measurement across disability types. The questions address the following disability types:
The DSQ also contain a question concerning any other health problem or condition that has lasted or is expected to last for six months or more. This question is associated with the “other” type. However, this type will be counted only if no other limitation has been reported under the 10 types of disability listed above. If there is both a limitation under one of the 10 types and an “other” limitation, the latter will be ignored.Note 1 An “unknown” type is therefore created for persons who reported only an “other” type of limitation and no other limitation.
Qualitative and quantitative testing of disability screening questions
The new set of disability screening questions was drafted following an extensive review of existing disability indicators used in Canada and internationally. Next, the DSQ were tested both qualitatively and quantitatively. Different iterations of qualitative testing were conducted by Statistics Canada's Questionnaire Design Resource Centre (QDRC) in September 2010, March 2011 and May 2011. Qualitative testing was carried out in English and French, in six different locations across Canada. Overall, a total of approximately 125 individual cognitive interviews took place as well as four focus group discussions. Adjustments were made to question wording based on these results.
During the Fall of 2011, the DSQ were also tested quantitatively on two Statistics Canada surveys: the Labour Force Survey and the Canadian Community Health Survey. These tests established the reliability and the validity of the DSQ as an instrument for estimating the prevalence of disability in the adult population within the context of the social model of disability. Following final adjustments to question wording and interviewer instructions, the DSQ were approved for use in the Canadian Survey on Disability (CSD).
It should be noted that an additional shorter version of the DSQ has also been developed as a minimum-content option for identifying persons with a disability on general population surveys at Statistics Canada. This will allow for the reliable identification of persons with disabilities on national surveys of the general population on a more frequent basis, in addition to the more specialized Canadian Survey on Disability (CSD). As a result, more varied data on persons with disabilities in Canada can be produced.
Operational definition of a disability using the DSQ
For each of the 10 disability types, the DSQ always have at least one question on the associated level of difficulty (no difficulty, some difficulty, a lot of difficulty, cannot do) and a question on the frequency of the limitation of daily activities (never, rarely, sometimes, often, always). To meet the definition of a disability for a particular type, the frequency for the corresponding limitation in daily activities must be “sometimes”, “often” or “always”), or it must be “rarely” combined with a difficulty level of “a lot” or “cannot do”.
Table 2.1 below summarizes the combination of answers to the DSQ that are generally used to identify a disability. This classification applies to the majority of disability types measured on the DSQ.
|How much difficulty do you have...?||How often are your daily activities limited by...?|
|No difficulty||No disability||No disability||Disability||Disability||Disability|
|Some difficulty||No disability||No disability||Disability||Disability||Disability|
|A lot of difficulty||No disability||Disability||Disability||Disability||Disability|
|Cannot do||No disability||Disability||Disability||Disability||Disability|
|Source: Statistics Canada, Canadian Survey on Disability, 2012.|
It should be noted that in some situations, these criteria were modified. In particular, a person who reports having a developmental disorder is identified as disabled if the respondent has been diagnosed with this condition, regardless of the level of difficulty or the frequency of the activity limitation reported.
Another noteworthy exception is the “unknown” type, where we do not ask about level of difficulty. A person will have an “unknown” disability only if he or she reports being limited in terms of daily activities “sometimes”, “often” or “always” because of this health problem or condition and if he or she reports no other limitation under the 10 disability types.
Lastly, for disabilities involving seeing, hearing, mobility, flexibility and dexterity, which are measured with task-based questions, a response of “no difficulty” does not result in the follow-up question on daily activity limitations. Thus, all “no difficulty” responses for these disability types are classified as “no disability”.
2.2 Calculating rates of disability in Canada
One of the primary objectives of the 2012 CSD is to produce rates of disability in Canada. These can be calculated by province and territory, for example, or by age group. Rates of disability are calculated with the following formula:
(Persons with a disability / (Persons with a disability + Persons without a disability)) x 100
In order to provide such statistics, the methodology of the CSD required not only identifying persons with a disability but also producing estimates of the number of persons without a disability in Canada. Thus, the CSD drew two distinct samples of persons from the 2011 National Household Survey (NHS):
- Those who were filtered-in by the NHS activity limitation questions (called the YES sample) and who would proceed though the Disability Screening Questions (DSQ) of the Canadian Disability Survey (CSD) for the more precise identification of persons with a disability, and
- Those who were filtered-out by the NHS activity limitation questions (called the NO sample) and who were automatically considered persons without a disability.
Details about these methods are provided in Chapter 3 of this guide.
2.3 Measuring the severity of disabilities
Usefulness of a severity score
It is clear from previous research using the 2006 Participation and Activity Limitation Survey that disability severity is a strong predictor of the reduced participation of people with disabilities in several domains of everyday life (Federal Disability Report, 2010). People with severe disabilities are less likely than their counterparts with milder disabilities to participate in the labour force, to attend and complete post-secondary education programs, and to participate in community activities. Those with severe disabilities are also more likely to be in need of supports and services, such as aids, devices, caregiving, medical specialists and income supports.
The inclusion of disability severity is an important consideration in analyses of the participation of people with disabilities. The ready-to-use, consistent disability severity score included in the CSD file enables analysts to develop more accurate inferences about the situation currently faced by persons with disabilities.
The Severity Score
A severity score was developed using the Disability Screening Questions (DSQ). For each of the 10 disability types, a score is assigned using a scoring grid that takes into account both the frequency of the activity limitations (never, rarely, sometimes, often, or always) and the intensity of the difficulties (no difficulty, some difficulty, a lot of difficulty, or cannot do). The score increases with the frequency of the limitation and the level of difficulty.
A global severity score is derived based on all disability types. A person’s global severity score is calculated by taking the average of the scores for the 10 disability types. Consequently, the more types of disability a person has, the higher his or her score will be.
Overall, the global score meets the following three criteria:
- it increases with the number of disability types;
- it increases with the level of difficulty associated with the disability;
- it increases with the frequency of the activity limitation.
To make the severity score easier to use, severity classes were established. It is important to understand that the name assigned to each class is simply intended to facilitate its use. It is not a label or judgement concerning the person’s level of disability. In other words, the classes should be interpreted as follows: people in class 1 have a less severe disability than people in class 2; the latter have a less severe disability than people in class 3; and so on:
- 1 = mild disability
- 2 = moderate disability
- 3 = severe disability
- 4 = very severe disability
The breakdown of persons with a disability across the four severity classes is shown in the table below.
|Severity class||Persons with a disability
|Class 1 = mild||1,195,590||31.7|
|Class 2 = moderate||747,980||19.8|
|Class 3 = severe||849,540||22.5|
|Class 4 = very severe||982,810||26.0|
|Source: Statistics Canada, Canadian Survey on Disability, 2012.|
For additional information on the methods used to derive severity scores, see Appendix B.
2.4 Creating a portrait of Canadian adults with disabilities
In addition to the Disability Screening Questions (DSQ), detailed content was identified for the Canadian Survey on Disability (CSD) which provided a picture of the issues affecting Canadian adults who have difficulty carrying out their daily activities. The CSD covered the use of aids and assistive devices as well as help received or required by persons with disabilities. Data priorities also included the education and employment experiences of adults with disabilities, accommodations made in these areas, the ability to get around the community using public transportation, and sources of income.
Survey indicators in these content areas were drawn from a number of sources, including the 2006 Participation and Activity Limitation Survey (PALS). Relevant standardized and well-established measures used on other Statistics Canada surveys were also gathered and reviewed as potential indicators. Those indicators considered most important were then modified for increased relevance to the population of persons with disabilities.
A draft questionnaire was tested in March 2012. Qualitative testing involved 50 cognitive interviews in both English and French, in three locations across Canada. Results of testing led to recommendations towards a final set of robust questions for the full Canadian Survey on Disability. Improvements to the wording of questions, interviewer instructions and the flow of the questionnaire were implemented.
Questions in the 2012 CSD were designed for use in a Computer Assisted Interviewing (CAI) environment. CAI enables more complex question flows to be built into the questionnaire and includes many features to maximize data quality, such as consistency verifications across questions. Specifically for the CSD, the questionnaire consisted of a Computer Assisted Telephone Interview (CATI). The new CATI questionnaire application underwent extensive modular and end-to-end testing.
One final step was taken with respect to content development for the 2012 CSD: linkage to data from the 2011 National Household Survey (NHS). Since the 2012 CSD drew its sample from the 2011 NHS (see Chapter 3 for details), relevant information from the NHS can be combined with information provided during the 2012 CSD interview. This approach reduces the number of questions that need to be asked on the CSD and provides for a richer portrait of persons with disabilities for CSD data users. In addition, since a sample of persons without a disability was drawn from the NHS for the purpose of calculating rates of disability in Canada (see Chapter 3 for more detail), NHS data linkage also provide a rich source of data on people without disabilities, allowing important comparisons of characteristics to be made between the two groups.
More details of this process of NHS data linkage are provided in Section 2.6 below.
A special note about age data
Age is a core demographic factor of interest in the analysis of disability in Canada. When using age as a component of research with 2012 CSD data, or in combination with linked data from the 2011 NHS, it is important for users to keep in mind the different reference periods involved. Section 6.2 of this guide provides an explanation of these survey reference periods. With respect to age, it is important to note that data collected from respondents in the context of the NHS were collected on May 10, 2011 while data from the CSD were collected 16 to 20 months later, between September 2012 and January 2013. So, for example, CSD respondents who were 15 years of age at the time of the NHS were 16 to 20 months older at the time of the CSD. With respect to particular research studies that may be sensitive to this time lag, data users will have the option of selecting an age indicator based on the 2011 NHS reference period or an age variable based on the date of the 2012 CSD interview. Section 6.2 provides an understanding of the use of survey reference periods in relation to different types of data analyses that may be of interest to users.
2.5 Questionnaire modules
Listed below are the questionnaire modules found on the 2012 Canadian Survey on Disability. Each are described in detail in Appendix A.
The Canadian Survey on Disability – 2012
- Purpose of the survey
- Voluntary nature of survey
- Explanation of data linkage with NHS
Disability screening questions
- Filter questions
- Main underlying condition
- Aids and assistive devices
- Recurring aids and assistive devices questions
- Medication use
- Received and needed help with everyday activities
- Recent school attendance
- Past school attendance
- Educational experiences
- Educational background
- Labour force status
- Employment details
- The unemployed
- Not in labour force
- Workplace training
- Employment modifications
- Labour force discrimination
Getting around the community
Source of income
In addition to information on the questionnaire modules in Appendix A, Appendix D lists the extra question categories created during survey coding as well as the standard classifications used to create indicators for open-ended survey questions. The Canadian Survey on Disability questionnaire can be found on the Statistics Canada website.
A comprehensive description of all the variables available from the survey data is provided in the 2012 CSD Data Dictionaries (codebooks). For details on how to obtain the data dictionaries, contact Statistics Canada Client Services.
2.6 Linked content from the 2011 National Household Survey
The Canadian Survey on Disability (CSD) draws its sample from among 2011 National Household Survey (NHS) respondents (see Chapter 3 for details). At the outset of the 2012 CSD interview, all respondents were told about the plans to link the CSD survey data with the information that they provided on the NHS. All linked information is kept confidential and used for statistical purposes only.
The specific benefits of a CSD-NHS record linkage are reduced response burden for the target population of the CSD, the establishment of survey weights which are crucial to providing valid estimates, and the creation of a comprehensive microdata file on persons with disabilities in Canada. Together, data from these two sources provide a detailed statistical portrait of persons with disabilities in Canada - data which are not available from any other source.
As explained in more detail in Chapter 3, the CSD also drew a sample of persons without a disability from the NHS. NHS linkage thus allows data users to compare characteristics of persons with a disability and persons without a disability.
Approximately 200 NHS variables were linked to the final CSD file for 2012, both for persons with a disability and for persons without a disability. The list below shows the type of NHS variables that have been appended to the CSD analytical files.
- Household level variables
- Geography, including census metropolitan areas
- Housing, including tenure, number of rooms in dwelling and need for repairs
- Family, including presence of spouse/partner in household, number of children in census family
- Person-level variables
- Activity limitations, including activity difficulties/reductions at home, work and school
- Aboriginal identity and ancestry
- Demographics, including age, marital status and common-law status, place of birth
- Immigration, citizenship and ethnicity
- Education, including school attendance, location of study, postsecondary certificates, diplomas and degrees, types obtained, major field of study
- Employment, including labour force status, weeks worked in 2010, class of worker, full-time or part-time work
- Place of work, including place of work status, type of commute and distance
- Mobility, including mobility status 1 and 5 years ago
- Income, including family income, employment income, low-income status (before and after tax)
- Language, including knowledge of official languages, mother tongue, language spoken at home and language of work
It is important to note that these NHS variables refer to each respondent’s situation on the day of the 2011 NHS, that is, as of May 10, 2011. Thus, for 2012 CSD respondents, users should be aware that in some cases, the respondent may have moved, had a change in the composition of their household, or had a change in employment between the date of the 2011 NHS and the date of the 2012 CSD interview. In other words, some of the information provided by the NHS may not be reflective of the respondent’s situation when the CSD interview took place.
A complete list of the NHS variables and their specifications are provided in the 2012 CSD Data Dictionaries.
The 2012 Canadian Survey on Disability (CSD) was designed to produce reliable data for each of the provinces and territories. Other geographic variables are also available in the 2012 CSD database, based on geographies from the National Household Survey, such as Census Metropolitan Areas (CMAs). In addition, geographies will include health regions across Canada which represent administrative areas or regions as used by health authorities. However, users should note that not all CSD survey data can be cross-tabulated or analyzed at these more detailed levels of geography. Some data tables will be possible but the reliability of data estimates at these levels of geography will need to be examined on a case by case basis.
The NHS Dictionary defines geographies relevant to the CSD.
More details on health regions can be found on the Statistics Canada website.
3.1 Target population and coverage
The population covered by the Canadian Survey on Disability (CSD) consists of all persons aged 15 and over (on Census/NHS Day, May 10, 2011) who have an activity limitation or a participation restriction associated with a physical or mental condition or health problem and were living in Canada at the time of the Census/NHS. This includes persons living in private dwellings in the 10 provinces and three territories. However, for operational reasons, the population of Indian reserves is excluded.
The CSD’s sample is based on responses to questions in the 2011 National Household Survey (NHS). All persons who answered yes to at least one of the NHS filter questions on activity limitations (see Figure 1) were included in the CSD frame. These persons are members of what is referred to as the YES population. Within this group, people whose daily activities are limited due to a long-term condition or health problem are members of the target population of persons with a disability.
Although the CSD does not cover the population that answered no to the NHS filter questions (referred to as the NO population), a sample of these persons is included in the CSD analytical file (see Section 3.8). These persons are all considered persons without a disability. There are also a number of persons without a disability in the YES population; they did not report any activity limitations in the CSD’s DSQ module. As we will see later, the sample of persons without a disability is used in two ways: to calculate disability rates and to compare the NHS characteristics of persons with a disability and persons without a disability.
Activities of daily living
National Household Survey – N1 – question 7
Does this person have any difficulty hearing, seeing, communicating, walking, climbing stairs, bending, learning or doing any similar activities?
- Yes, sometimes
- Yes, often
National Household Survey – N1 – question 8
Does a physical condition or mental condition or health problem reduce the amount or kind of activity this person can do:
- (a) at home?
- Yes, sometimes
- Yes, often
- (b) at work or at school?
- Yes, sometimes
- Yes, often
- Not applicable
- (c) in other activities, for example, transportation or leisure?
- Yes, sometimes
- Yes, often
3.2 Reference period
The Canadian Survey on Disability (CSD) represents the population aged 15 and over on Census/NHS Day, May 10, 2011. However, because of the time lag between the Census/NHS and CSD collection, all the information collected in the CSD represents the population’s characteristics as measured in the fall of 2012. To understand how these two reference periods affect the use and interpretation of CSD data, refer to section 6.2.
3.3 Frame drawn from the NHS
The sampling frame was constructed from the Census and NHS response database (RDB). The database contained all responses received via the various reporting modes (Internet, paper questionnaires, personal interviews, etc.).
The frame had to go through a number of processing steps to ensure that all the information would be as complete as possible:
- when the age was missing for a person on the NHS questionnaire, it had to be imputed on the basis of responses to NHS questions (for example, data on education, employment and income);
- various data sources were used to detect errors in telephone numbers or find missing telephone numbers or addresses;
- preliminary weights for the NHS had to be derived so that estimates of the population size in each stratum could be computed for sample allocation.
3.4 Sample design
The CSD’s sample design can be viewed as a three-phase design, where the first two phases are for the selection of the NHS sample itself and the third phase is for the selection of the CSD sub-sample.
The initial phase was the sample selection of the NHS itself. The NHS questionnaire comes in two main versions, the N1 form and the N2 form.
The N1 form is completed by self-enumeration and is administered to approximately one in three households in most parts of Canada (N1 regions). Other than the basic census demographic questions (name, sex, date of birth, legal marital status, common-law status, relationship to person 1, various language questions and the consent question to release the data in 92 years), the NHS N1 form includes questions on labour market activity, income, education, activity limitations, citizenship, housing, ethnic origin, and so on.
The N2 form, identical in content to the N1 form, except for some adapted examples and questions excludedNote 2, is administered by personal interview to all households in remote areas and Indian reserves (N2 regions).
In sampling terminology, the NHS sample design is a stratified systematic sample of occupied private dwellings with a constant sampling rate of 1/3 in N1 regions, and with complete enumeration in N2 regions.
Non-response to the NHS was not inconsequential and a new strategy was put in place to mitigate the potential effect of non-response bias. Hence a subsample of non-respondent households was selected for non-response follow-up (NRFU). Consequently, this subsample carries a greater sampling weight to reflect both probabilities of selection at each phase. The subsample was selected in such a way as to ensure good representation of collection units (CUs) with higher concentrations of groups at risk of not respondingNote 3. It is important to note that non-response follow-up was not performed in canvasser areas (remote regions and Indian reserves and Indian settlements, also known as N2 regions). This NRFU subsample represents about 1 out of 3 non-responding households. The design in this second phase is a model assisted random sample.
In the third phase, the CSD sample was selected from the group of individuals who responded to the NHS (including the NRFU subsample) and reported activity limitations to the NHS. Excluded from selection were persons living on Indian reserves, as well as persons less than 15 years old on May 10, 2011.
The CSD sample was selected so that there would be a sufficiently large sample in each estimation domain, as explained in section 3.5.
3.5 Estimation domains and stratification
Domains of estimation are groups of units for which estimates are targeted with an “acceptable” level of precision. These domains of estimation for the CSD consist of the provinces cross-classified with the following age groups:
- 15 to 24
- 25 to 44
- 45 to 64
- 65 to 74
- 75 and over
For each of the three territories, the estimation domain includes a single age group (15+). For Prince Edward Island, the first two age groups had to be combined because of their very small population sizes.
Each estimation domain was further sub-divided into strata to take the severityNote 4 of the activity limitation and the NHS sample design into account. To control for severity, those people who answered “yes, often” at least once on the NHS filter questions were placed in the “Often” stratum, while those who answered “yes, sometimes” at least once but never “yes, often” were placed in the “Sometimes” stratum. To control for the NHS sample design, two characteristics were considered: being among the NHS’s initial respondents (the IR strata) as opposed to the non-response follow-up respondents (the NRFU strata), and living in a remote region (the N2 strata) or not (the N1 strata). Note that no non-response follow-up was carried out in remote regions. Controlling for these characteristics effectively grouped people with similar preliminary weights.
Hence, each estimation domain was divided into six possible strata defined as follows:
- N1 region - initial respondent - “often”
- N1 region - initial respondent - “sometimes”
- N1 region – NRFU respondent - “often”
- N1 region - NRFU respondent - “sometimes”
- N2 region - initial respondent - “often”
- N2 region - initial respondent - “sometimes”.
Note that not all 6 strata necessarily occurred in each estimation domain. All persons meeting the conditions for the CSD frame were then classified into these estimation domains and strata prior to sample selection.
3.6 Sample allocation
The sample sizes were determined in such a way that, for each estimation domain, one could estimate a minimum proportion with a maximum coefficient of variation (CV) of 16.5%. At Statistics Canada, 16.5% is often used as the upper limit for the CV of an acceptable estimate. The minimum proportion to estimate in each estimation domain is shown in the table below. A design effect of 1.2 was assumed for these calculations. In other words, it was assumed that in the estimation domains, the variance that would be obtained with the CSD’s sample design would be 20% higher than the variance that would be obtained if a simple random sample of the same size in each domain was selected.
|15 to 24||25 to 44||45 to 64||65 to 74||75 and over|
|Newfoundland and Labrador||9.0||8.0||8.0||11.0||11.0|
|Prince Edward Island||9.0||8.0||11.0||11.0|
|Source: Statistics Canada, Canadian Survey on Disability, 2012.|
A method of optimal allocation among the strata in a particular domain was used, taking into account the expected non-response and the expected false positive rate (persons who reported activity limitations in the NHS but have no disability according to the DSQ) in each stratum. This allocation depended in part on the NHS weights adjusted for non-response. It should be noted that at the time of allocation, those weights had not yet been calculated. Consequently, preliminary weights were calculated solely for the purposes of the allocation. However, the final NHS weights were used in the CSD weighting process. For background information on NHS weighting, see Morel and Nambeu (2013).Note 5
Following sample allocation, it sometimes happened that more units were needed in one stratum than the number available in the sampling frame. In that case, all the units in the stratum had to be selected (making it a take-all stratum) and then the number of units needed to make up the shortfall was selected from the other strata in the same estimation domain in proportion to their allocation to those strata.
3.7 Sample sizes
The CSD’s final sample size (the YES sample) for each province and territory is shown in Table 3.2 below.
|Province/Territory||Sample sent to collection|
|Newfoundland and Labrador||3,792|
|Prince Edward Island||2,809|
|Source: Statistics Canada, Canadian Survey on Disability, 2012.|
3.8 The NO sample
As previously mentioned, the CSD analytical file is used in part to estimate disability rates for various geographies and to compare the characteristics of persons with a disability and persons without a disability. For that purpose, the analytical file must also include a representative sample of persons without a disability, which is not the case when only the YES sample is considered. A sample from the NO population therefore had to be selected, i.e. persons who reported no activity limitations in the NHS filter questions. The underlying assumption here is that the members of the NHS’s NO population are less likely to have a disability, or that if they have a disability, it is very mild. The decision not to cover the NO population in the CSD also has to do with the cost involved, since a very large sample of that population is needed to find persons with a disability, which is very expensive during collection.
The NO sample was not sent to the field, since each individual is considered not to have a disability. The analytical file contains a large number of characteristics from the NHS for this sample and for the YES sample, which makes it possible to compare persons with a disability and persons without a disability. With this NO sample, analysts are also able to produce the denominators required to calculate the disability rates for a number of subgroups of the Canadian population. To keep the analytical file from getting too large, a sample of the NO population was selected instead of taking the entire population. The size of this sample was set so that the same precision as is required for the YES sample would be achieved (see Table 3.1), with a maximum CV of 7%.Note 6 In selecting the sample, the NO population was stratified by province, sex and five-year age group, with the oldest group being 75 and over (thus, it was more detailed than the YES sample). The NHS design was also taken into account for the NO sample, where strata were further divided into N1 vs N2 regions and initial vs NRFU respondents. The sample sizes for the provinces are shown in the table below.
(not sent to collection)
|Newfoundland and Labrador||11,813|
|Prince Edward Island||6,643|
|Source: Statistics Canada, Canadian Survey on Disability, 2012.|
Since all units in the NO population are considered not to have a disability, the severity score for each unit in the file was set to 0, and the severity class was set to 0.
4.1 Time Frame
The Canadian Survey on Disability (CSD) was conducted from September 24, 2012 to January 13, 2013. As a post-censal / post-NHS survey, it followed the 2011 National Household Survey which was conducted on May 10, 2011. Thus, a time lag of 16 to 20 months existed between the two surveys.
In the months leading up to data collection for the 2012 Canadian Survey on Disability (CSD), promotional activities took place to raise awareness of the survey and to encourage participation.
A webpage appeared on the Statistics Canada website which included “Questions and Answers” for respondents, a survey description, background information on the survey and its methodology, and a link to the questionnaire. A special website icon was designed by Statistics Canada and distributed to the ESDC Persons with Disabilities Technical Advisory Group (TAG), and they helped to promote the survey by displaying the survey icon on their organizations’ websites, thus providing a link to Statistics Canada’s webpage for the CSD.
Prior to data collection, an introductory letter and survey brochure were sent to respondents which informed them about the upcoming survey and impressed upon them the importance of their participation. The letter was provided in both official languages. For survey collection in Nunavut, the letter was also provided in Inuktitut (Baffin dialect). In addition, a Braille insert was included with the introductory letters which provided a contact telephone number so that those with visual impairments could contact Statistics Canada for more information.
4.3 Mode of collection
The questions in the 2012 Canadian Survey on Disability (CSD) were administered using Computer Assisted Telephone Interviews (CATI). Accommodations were made to maximize direct participation of persons with disabilities by providing all respondents with an email address, a TTY telephone number (Telecommunications Device for the Hearing Impaired) and a contact telephone number written on a Braille insert that accompanied the introductory letter. Where alternate arrangements were requested, personal interviews were conducted using a paper and pencil interview (PAPI) version of the questionnaire.
A PAPI questionnaire was also used to conduct in-person interviews with some respondents in the Northwest Territories who did not have a telephone and could not otherwise have been contacted. It was known prior to the survey that a large proportion of dwellings in the western territories did not have telephones and this meant that coverage would be generally limited to NHS respondents who had a telephone. Prior to the start of collection, Statistics Canada was able to establish a partnership with the NWT Bureau of Statistics to hire a team of interviewers to help conduct personal interviews. As a result, a sample of dwellings which did not have telephones was selected and a PAPI version of the questionnaire was used for house-to-house data collection.
Across Canada, respondents were interviewed in the official language of their choice – English or French. The questionnaire was also translated into Inuktitut and interviews in that language were made available for respondents in Nunavut.
The time required to complete the survey varied from person to person but, on average, the survey took approximately 40 minutes to complete.
4.4 Supervision and quality control
All Statistics Canada interviewers were under the supervision of senior interviewers who were responsible for ensuring that interviewers were familiar with the concepts and procedures of the surveys to which they were assigned. Senior interviewers were also responsible for periodically monitoring the interviewers to ensure that standard procedures were being followed.
Interviewers were trained on the survey content and the computer-assisted interviewing application. In addition to classroom presentations and a period of self-study, the interviewers completed a series of mock interviews to become familiar with the survey and its concepts and definitions.
4.5 Proxy interviews
Since disability is difficult to measure and very subjective, interviewers were asked to make every effort to conduct the interview with the selected person. However, in the following circumstances, a proxy interview was acceptable:
- The selected person was away for the duration of the survey.
- The selected person spoke neither English nor French.
- The selected person was unable to participate because of mental or physical health problems.
- A parent insisted on responding for his or her child aged 15 to 17.
In order to be accepted as a proxy respondent, the person responding must be:
- an adult who speaks either English or French;
- reachable during the survey’s data collection period;
- the person most knowledgeable, or among the most knowledgeable, about the selected person’s difficulties and challenges related to activity limitations and participation restrictions.
A total of 5,164 proxy interviews were conducted, and 93% of them were completed.
The table below shows the distribution of proxy interviews by respondent’s age and reason for the proxy interview (excluding 320 persons for whom the reason is unknown).
|Age group||Reason for doing a proxy interview|
|Health||Absent||Language||ParentNote 1 insists on answering||Total|
|15 to 24||835||431||30||328||1,624|
|25 to 44||360||150||100||8||618|
|45 to 64||235||132||167||7||541|
|65 to 74||335||62||178||5||580|
|75 and over||863||47||216||7||1,133|
4.6 Special issues
Interviewers were instructed to make all reasonable attempts to obtain a completed interview with the selected member of the household. Those who refused to participate were re-contacted up to two more times to explain the importance of the survey and to encourage their participation. For cases in which the timing of the interviewer’s call was inconvenient, an appointment was arranged to call back at a more convenient time. For cases in which there was no one home, numerous call backs were made.
Special issues arose in relation to data collection for the Canadian Survey on Disability (CSD) which were addressed with extra coordination in the field and corrective adjustments to survey methods. For instance, due to the extended time lag between the 2011 NHS and the 2012 CSD, the contact information for respondents was sometimes out of date. This affected the successful delivery of introductory letters for respondents who were no longer at the address provided at the time of the NHS. In terms of out-of-date telephone numbers, interviewers were provided with special tracing instructions on ways to locate these respondents in order to complete the interview.
As occasionally happens, the collection of the 2012 CSD occurred during the same time period as several other surveys conducted by Statistics Canada, creating a potential risk for respondent-burden. Careful planning and adjustments to survey design were implemented to address and minimize the burden to households for survey respondents.
A comprehensive description of adjustments applied to survey weights as a result of special issues arising during data collection are provided in Section 6.1.
4.7 Response rate
Collection for the Canadian Survey on Disability (CSD) ended with a response rate of 74.6%. This response rate is the number of complete respondents (with or without a disability) divided by the number of cases sent to collection minus the out-of-scope cases. Out of scope cases include people who have died, who were institutionalized, have emigrated, moved to an Indian reserve, are members of the Canadian Forces, are visitors to Canada (misclassified during the NHS) or have an invalid age. Hence, the response rate reflects the percentage of cases that completed the interview relative to the number of cases that should have completed it (which is why the out-of-scope cases are excluded from the denominator).
Response rate = Completed cases / (Cases sent to collection – Out-of-scope cases)
The tables below show the response rates by province and territory, and by age group.
|Province/Territory||Sent to collection||Out of scope||Completed||Response rate|
|Newfoundland and Labrador||3,792||156||2,733||75.2|
|Prince Edward Island||2,809||118||2,126||79.0|
|Source: Statistics Canada, Canadian Survey on Disability, 2012.|
|Age groups||Sent to collection||Out of scope||Completed||Response rate|
|15 to 24||9,944||121||6,633||67.5|
|25 to 44||11,672||119||7,831||67.8|
|45 to 64||10,487||180||8,182||79.4|
|65 to 74||6,481||244||5,282||84.7|
|75 and over||6,859||756||4,894||80.2|
|Source: Statistics Canada, Canadian Survey on Disability, 2012.|
5.1 Data capture
Responses to survey questions were captured directly by the interviewer using a computerized questionnaire. This involved an application developed using Blaise software in a Computer Assisted Telephone Interview (CATI) system. The computerized questionnaire reduces processing time and costs associated with data entry, transcription errors and data transmission. Data from the CSD paper questionnaires were also entered into the CATI system once forms were returned from the field and were thus electronically captured for further processing.
Some editing of data is done directly at the time of the interview. Specifically, where a particular response appears to be inconsistent with previous answers or outside of expected values, the interviewer is prompted, through message screens on the computer, to confirm answers with the respondent, and, if needed, to modify the information. The response data are subjected to further edit and imputation processes once they arrive in head office.
5.2 Processing steps
Data processing involves a series of steps to convert the questionnaire responses from their initial raw format to a high-quality, user-friendly database involving a comprehensive set of variables for analysis. A series of data operations are executed to clean files of inadvertent errors, edit the data for consistency, code open-ended questions, create useful variables for data analysis, and finally to systematize and document the variables for ease of analytical usage.
The 2012 CSD used a new set of social survey processing tools developed at Statistics Canada called the “Social Survey Processing Environment” (SSPE). The SSPE involves SAS software programs, custom applications and manual processes for performing the following systematic processing steps:
- Receipt of raw data
- Clean up
- Edits and imputations
- Derived variables
- Creation of final processing file
- Creation of dissemination files
5.3 Record clean up: in-scope and complete records
Following the receipt of raw data from the electronic questionnaire applications, a number of preliminary cleaning procedures were implemented for the 2012 CSD at the level of individual records. These included the removal of all personal identifier information from the files, such as names and addresses, as part of a rigorous set of ongoing mechanisms for protecting the confidentiality of respondents. Duplicate records were resolved at this stage. Also part of clean up procedures was the review of all respondent records to ensure each respondent was “in-scope” and had a sufficiently completed questionnaire. Specific criteria for respondents are outlined below.
- To be “in scope” for the 2012 CSD, respondents must be at least 15 years of age as of the day of the National Household Survey (NHS), May 10, 2011. In-scope respondents include two groups: 1) those who were screened-in upon completing the Disability Screening Questions (DSQ) and were therefore part of the disability population and 2) those who were screened-out by the DSQ and were thus considered non-disabled. Both groups remain in the survey database.
- To have a “complete” questionnaire, respondents who met the criteria of the disabled population must have completed the minimum number of critical questions on the survey: those required to produce data tables for persons with disabilities as required by the 1995 Employment Equity Act. These questions are listed in Appendix C.
- To have a “complete” questionnaire, respondents who were assigned to the non-disabled population must only have completed the DSQ. The rest of the survey questions did not apply to them.
Those that did not meet these criteria were removed from the database, and classified as non-respondents.
5.4 Variable recodes and multiple response questions
This stage of processing involved changes at the level of individual variables. Variables could be dropped, recoded, re-sized or left as is. Formatting changes were intended to facilitate processing as well as analysis of the data by end-users. One such change is the dropping of the letter Q which appeared on every variable name on the questionnaire at the beginning of each question number (shown below).
Another change at the variable level was the conversion of multiple-response questions (“Mark-all-that-apply” questions) to corresponding sets of single-response variables which are easier to use. For each response category associated with the original question, a variable was created with “yes/no” response values. An example is provided below. This process is called “destringing” the variables and allows easier statistical tabulation of the variables by end-users.
Original multiple-response question:
DRV_Q06 What are the reasons you have difficulty using public transit or specialized transit service?
INTERVIEWER: Examples of specialized transit services: Handi-Transit, Wheel-Trans, Para-Transpo, etc.
Mark all that apply. Read categories to respondent
- 01 Service is not available when you need it
- 02 Booking rules don’t allow for last minute arrangements
- 03 Difficulty getting to or locating bus stops
- 04 Difficulty getting on or off the vehicle
- 05 Difficulty seeing signs or notices, stops or hearing announcements
- 06 Overcrowding
- 07 Difficulty requesting service
- 08 Difficulty interpreting schedules
- 09 Difficulty transferring or completing complicated transfers
- 10 Your condition or health problem is aggravated when you go out
- 11 Too expensive
- 12 Other reason
- DK, RF
Final variables in single-response “yes/no” format:
DRV_06A What are the reasons you have difficulty using public transit or specialized transit service?
- Service is not available when you need it
- 1 Yes
- 2 No
- DK, RF
DRV_06B What are the reasons you have difficulty using public transit or specialized transit service?
- Booking rules don’t allow for last minute arrangements
- 1 Yes
- 2 No
- DK, RF
DRV_06C What are the reasons you have difficulty using public transit or specialized transit service?
- Difficulty getting to or locating bus stops
- 1 Yes
- 2 No
- DK, RF
...additional “yes/no” questions for each response category, as indicated, from – Difficulty getting on or off the vehicle to - Too expensive... and including the last category:
DRV_06L What are the reasons you have difficulty using public transit or specialized transit service?
- Other reason
- 1 Yes
- 2 No
- DK, RF
In some cases, multiple-response questions on the survey have corresponding follow-up questions which are also multiple-response items. Thus, a single question on the questionnaire can become a large array of variables on the final database.
For example, AAD_Q10 asks: Which aids or assistive devices do you need but do not have? A total of 17 response categories of assistive devices are presented (and one additional category was created in coding – see discussion below – for a total of 18 final categories). Thus, AAD_Q10 will become 18 separate variables in a “yes/no” format: AAD_10A through AAD_10R. This question is then followed up with AAD_Q11 - Why do you not have (refers to AAD_Q10 response(s))? For this follow-up question, a total of 8 response categories of reasons are presented. Thus, for AAD_Q11, a total array of 144 (18 x 8) “yes/no” variables are created, where the 18 devices correspond to letters A though R, and the 8 reasons correspond to letters A through H: AAD_11AA, AAD_11AB, AAD_11AC... AAD_11AH through AAD_11RA, AAD_11RB, AAD_11RC... AAD_11RH.
5.5 Flows: response paths, valid skips and question non-response
Another set of data processing procedures for the 2012 CSD was the verification of questionnaire flows or skip patterns. All response paths and skip patterns that were built into the questionnaire were verified to ensure that the universe or target population for each question was accurately captured during processing. Special attention was paid to distinctions between valid skips and non-response, an important distinction for statistical analysis. These concepts are explained below in order to assist users to better understand question universes as well as statistical outputs for CSD survey variables.
Response – an answer directly relevant to the content of the question that can be categorized into pre-existing answer categories, including ‘other-specify’.
Valid skip – indicates that the question was skipped because it did not apply to the respondent’s situation, as determined by valid answers to a previous question. In such cases, the respondent is not considered to be part of the target population or universe for that question. As noted below, where a question was skipped due to an undetermined path (that is, a “don’t know” or “refusal” to a previous question caused the skip), the respondent is coded to “Not stated” for that question.
Don’t know – the respondent was unable to provide a response for a number of reasons - for example, due to difficulty remembering or because they were responding for someone else.
Refusal – the respondent preferred not to respond, perhaps due to sensitivity of the question.
Not stated – this indicates that the question response is missing and there is an undetermined path for the respondent, such as when a respondent did not answer the previous filter question or where an inconsistency was found in a series of responses.
Suppressed - For NHS variables only, an additional type of non-response is the suppressed value as a result of consistency edit procedures conducted during CSD processing (see section 5.7).
Special codes have been designated to each of these types of responses to facilitate user recognition and data analysis. For instance, “valid skip” codes are set to “6” as the last digit, with any preceding digits set to “9” (for example, code would be “996” for a 3-digit variable). All “don’t know” responses end in ”7”, with any preceding digits set to “9” (for example, “997”). Refusals end in “8”, with any preceding digits set to “9” (for example, “998”); and “not stated” values end in 9, with any preceding digits set to “9” also (for example, “999”). Suppressed values are coded to “5” as the last digit, with any preceding digits set to “9” (for example, “995”).
Data processing also includes the coding of “Other-specify” items, sometimes referred to as “write-in responses”. For most questions on the CSD questionnaire, pre-coded answer categories were supplied and the interviewers were trained to assign a respondent’s answer to the appropriate category. However, in the event that a respondent’s answer could not be easily assigned to an existing category, many questions also allowed the interviewer to enter a long-answer text response in the “Other-specify” category.
All questions with “Other-specify” categories were examined during processing. In the case of questions where ‘Other-specify’ responses constituted less than 10% of overall responses to the question, coding was not performed and responses were left in ‘Other’. A total of 12 questionnaire items were coded for ‘other-specify’ responses. Based on coding guidelines, many of the long answers provided by respondents for these questions were re-coded back into one of the existing answer categories. Responses that were unique and qualitatively different from existing categories were kept as “Other”. Where counts warranted, new categories were created to capture emerging themes in the data that were not reflected in existing categories. Appendix D presents the extra categories added for the 2012 CSD. These will be taken into account when refining the answer categories for future cycles of the survey.
Open-ended questions and standard classifications
A few questions on the 2012 CSD questionnaire were recorded by interviewers in a completely open-ended format. These included questions related to the following:
- The respondent’s main medical condition which causes them the most difficulty or limits their activities the most;
- Occupation and industry of work; and
- Major field of post-secondary study, where applicable.
These responses were coded using a combination of automated and interactive (manual) coding procedures. Standardized classification systems were used to code these responses. Appendix D provides details of these classifications.
Coding for all classifications involved a team of experienced coders and quality control supervisors. Subject matter experts in data processing applied additional verification procedures.
5.7 Edits and imputations
After the coding stage of processing, analyses were performed on the data to identify gaps, inconsistencies, extreme outliers and other potential problems in the data. To resolve the problematic data identified, customized edits and imputations were performed.
CSD data were evaluated both for internal consistencies across variables and for external consistencies in relation to the respondents’ linked NHS data. With respect to internal consistencies, the data had already benefitted from extensive edits built into the electronic questionnaire.
Examples of electronic edits include several that ensured that responses related to age throughout the questionnaire were not contradictory. For example, edits ensured that the reported age when respondent first experienced limitation in daily activities or when the respondent completed their highest level of schooling was not greater than the age of the respondent at the time of the interview. Another questionnaire edit ensured that the number of hours a respondent reported that they usually worked per week did not exceed 168 (24 hours x 7 days) and prompted interviewers to confirm the hours with respondents whenever reported hours per week exceeded 80.
Post-survey analyses were undertaken to review the frequency with which questionnaire edits were triggered during interviews for all respondents. As none of these counts were significant, further internal consistency editing was not required.
In addition to these analyses, two situations were identified where an error was found on the electronic questionnaire resulting in missing data for some respondents. In both cases, imputation was considered as a potential solution. Imputation involves replacing the missing data with a new value based on values from a donor respondent with similar characteristics or based on values determined through logical determinism.
In the first situation, a subgroup of respondents (n=581) in the module on employment modifications were missed at question EMO_Q02, which asked if certain modifications to their work environment were made available to the respondent. More specifically, due to an error in the programming of the skip pattern, the subgroup of respondents missed were persons with a disability out of the labour force, who have worked since 2007, and who reported that their condition does not completely prevent them from working and would need an accommodation to be able to work. We considered the option of imputing the missing responses for these individuals using donor imputation. However, these persons have a very unique profile, and after closely examining the potential donors, we were unable to find suitable donors with similar characteristics to the respondents. As a result, we kept the responses as missing, rather than imputing responses from donors with very different (employment) profiles.
In the second situation, the responses to questions SNC_Q01D and SNC_Q01E (“In the past year, did you receive income from the Canada Pension Plan Disability Benefit?” and “In the past year, did you receive income from the Quebec Pension Plan Disability Benefit?”) were imputed for some respondents (n= 466) because their year of birth was missing. These questions were normally asked when a condition on year of birth had been met (YOB>=1946). A missing year of birthNote 7 resulted in the situation where, for certain respondents, the persons should have answered the questions, but instead the questions were not asked. Since the two possible answers were “yes” and “no,” we imputed the responses using two logistic regression models to predict the probability of a “yes” answer to these two questions, respectively. To build and validate these models, we used data from other respondents with a disability who had given a year of birth and who had answered the proper question flow. We then ran the models for the cases with missing data to impute a value, based on the predicted probability of a “yes” answer. Indicators to identify the imputed cases were created for users.
In terms of external edits, analyses were performed comparing CSD and NHS data for those indicators where the concepts across surveys overlapped in meaningful ways and where inconsistencies would have the potential to present analytical problems of interpretability for users. Over 20 indicators were analyzed, primarily in the areas of labour force participation, educational attainment and family status. The basic assumption was that the CSD has more recent data than the NHS and that the NHS variables, in their release version, may have undergone a number of manipulations and imputations. Therefore, the general rule is as follows: when there is an inconsistency, the value of the particular NHS variable is deleted and replaced with the special value “95” to allow users to take it into account during their analyses.
5.8 Derived variables and external NHS-linked variables
In order to facilitate more in-depth analysis of the rich CSD dataset, approximately 100 derived variables (DVs) were created by re-grouping or combining items on the questionnaire. In addition, approximately 200 NHS variables were linked to the final CSD processing file for 2012.
DVs were created to provide users with indicators of disability status and disability type, based on definitions used for the CSD (see section 2.1 for survey definitions of disability). DVs were also created to capture disability severity ratings and classes across disability types. Others included DVs for the use of assistive devices and help needed with daily activities which required combining several items on the questionnaire. Numerous important labour force concepts were captured by DVs, as were sources of income. Finally, several DVs reflected the coding of variables to standard classification systems such as the International Classification of Diseases and the Classification of Instructional Programs.
In constructing derived variables for respondents, if any component question was not answered (that is, had a value of “Don’t know”, “Refused”, “Not stated”, or “Valid skip”), the code assigned to the derived variable was labelled “Not stated”.
All derived variable names have a ”D” in the first character position of the name. For all linked NHS variables, the NHS variable name was preserved as much as possible on the CSD database. Some exceptions applied since CSD variable names are restricted to eight characters whereas NHS variable names sometimes exceeded eight characters.
The 2012 CSD Data Dictionaries identify in detail which variables were derived and the specifications for their derivation (contact Statistics Canada Client Services for details). DVs are listed by theme in Appendix A along with other survey indicators. A complete list of DVs and NHS variables, along with their specifications are provided in the 2012 CSD Data Dictionaries.
In a sample survey, each respondent represents not only himself or herself but also other people who have not been sampled. For that reason, each respondent is assigned a weight which indicates the number of people that he or she represents. To maintain data coherence and ensure that the results accurately represent the target population and not just the individuals sampled, that weight must be used to compute all estimates.
There are several steps in calculating the weights for the CSD. The first step is to assign each unit selected for the CSD an initial weight based on the sample design. The initial weight is the inverse of the probability of inclusion. For the CSD, the initial weight is the product of two factors: the NHS weight and the CSD subsampling weight (the inverse of the sampling fraction). Then a number of adjustments are made to the weights to control for non-response and exclusions during collection and to avoid extreme weights in the estimation domains. The final step is to calibrate the survey weights on the NHS estimated totals and make certain adjustments to account for units that were in scope during the May 2011 selection process but out of scope at the time of the survey in 2012. The main steps in the weighting process are described in the subsections below.
Calculation of the initial weights
Initial weights need to be calculated for both the YES sample and NO sample. Since the CSD sample design is based on the NHS design, the initial weight is the product of the NHS finalNote 8 weight and the inverse of the CSD sampling fraction. The NHS final weight takes into account the NHS sample design, non-response and other corrections. For more details on the NHS weighting strategy, refer to the NHS User Guide, section 4.3.
Hence the CSD initial weight is the product of the adjusted and corrected NHS weight and the inverse of the CSD sampling fraction. The CSD sampling fraction is the size of the sample selected in a stratum divided by the number of units available in the sampling frame in that stratum.
The weight adjustments described in the next subsections only pertain to the YES sample, as there was no collection done for the NO sample, therefore no need to adjust for non contact or non response.
Adjustment for units not sent to collection
The sample selected for the CSD was expanded slightly in anticipation of the exclusion of some units from collection for the following reasons:
- selection of more than three members of the same household;
- selection of persons in households previously selected for other surveys in the North (Canadian Community Health Survey, Labour Force Survey or Survey of Household Spending);
- no telephone number available for a household (outside the NWT);
- no name or date of birth reported in the NHS for a person, so no way to identify the right respondent in the household.
These losses were taken into account in calculating the sample size, and oversampling in some strata was done to compensate for them. Units excluded from collection were thus treated as non-respondents and their weight was redistributed to the other units at the stratum level.
Non-contact and non-response adjustments
There were two major categories of non-response in the CSD: non-contact, and non-response after contact. These two types of non-response were treated separately, as they constituted two different phenomena. The factors that explain non-contact tended to be related to the characteristics of the household and the mobility of persons, while the factors that explained non-response in a contacted household tended to be related to the characteristics of individuals.
First, units sent to the field were separated into two large groups: units that were contacted, and units that were not contacted. Logistic regression was used to model the probability of being contacted. The explanatory variables for the model came either from the NHS frame or from paradata.Note 8 The variables selected for the non-contact model were province, owning or renting the dwelling, number of persons in the household, number of children in the household, number of bedrooms, total household income, 5-year mobility indicator, visible minority group, registered Indian status, class of worker, labour force status, type of occupation, reason for not being able to start work the week before the Census, age group, marital status, consenting to release one’s NHS data in 92 years, knowledge of official languages, status in the census family structure, and number of contact attempts.
With this model, for each unit (contacted or not), a probability of being contacted was obtained. Then, response homogeneity classes were formed by grouping units that had similar probabilities of being contacted. An automatic class formation methodNote 10 was used to generate classes that were homogeneous with respect to predicted probability of being contacted and contained a sufficient number of units contacted to avoid excessively large weight adjustment factors. A total of 37 classes were formed, and within each one, the weight of the non-contact units was redistributed to the contacted units.
Next, an adjustment was made for a subset of persons who were contacted but non-respondent, either those who had a disability or health condition that prevents them from responding, or those who completed the DSQ module (and, on the basis of their responses, had a disability) but not the rest of the CSD interview. Since there were very few of these cases (about 700), a relatively simple adjustment was made at the stratum level, redistributing the weight of non-respondents with a disability among respondents with a disability.
The next step was to adjust the weights for other non-respondents (generally refusals). In this case, too, logistic regression was used to model the probability of responding given the fact that contact had been made at the household level. The variables selected for the non-response model were province, age group, presence of children in the household, number of children in the census family, number of maintainers in the household, living in a large urban centre or not, visible minority group, class of worker, occupational group, total household income, consenting to release of one’s NHS data in 92 years, knowledge of official languages, number of contact attempts, number of contacts, number of refusals, number of appointments, average number of days between contacts, date of last contact (we divided the collection period into four groups), day of the week of last contact, and time of last contact.
With this non-response model, for each unit (respondentNote 11 or not), a probability of responding given contact was obtained. Response homogeneity classes were formed by grouping units that had similar probabilities of responding. The same procedure was used as for the contact model, which resulted in the formation of 14 classes. Within each class, the weight of the non-respondent units was redistributed among the respondent units.
Out-of-scope units (deaths, institutional admissions, persons who now live outside the country, etc.) were initially considered respondent units, in that we were able to speak with a household member who confirmed the unit’s out-of-scope status. Their weight was not set to 0; rather, it was retained since they represented units of the initial population (on May 10, 2011) that were out of scope in the fall of 2012. However, these units are excluded from the analytical file.
Adjustment for extreme weights
Following the non-contact and non-response adjustments, the distribution of respondents’ weights was examined to detect the presence of very large weights by province and then by estimation domain. Some adjustment factors may have generated very large weights for some individuals compared with others in some domains, which could have a detrimental effect on the estimates and their variance. The sigma-gap method was used to detect these extreme weights first within each province and then within each estimation domain. An example of how the sigma-gap method can be applied is given in Bernier and Nobrega (1998).Note 12 As used here, the sigma-gap method is intended to detect large gaps between successive weights sorted in ascending order (when they are greater than the median). When an excessively large gap is found between two successive weights, the larger of the two weights and all subsequent weights are classified as outliers. To assess the size of a gap between two weights, it was compared with a certain number of standard deviations of the distribution of all weights. For the CSD, gaps between weights that were three times the distribution’s standard deviation within each province were identified. The choice to use three standard deviations was made because that corresponded to the gap that would have been used to identify outlier weights with a manual process. All the weights identified as outliers were set to the province’s highest non-outlier value. The same procedure was then followed for the estimation domains. The resulting weight reduction was offset in the post-stratification step, as described below.
Estimation domain jumpers
Before proceeding to post-stratification, we had to account for cases in which the age derived from the date of birth reported in the CSD did not match the age group to which the case was assigned on the basis of the sampling frame. This situation is possible since the date of birth in the NHS frame may have been missing and been imputed, or it may have been incorrectly reported or captured. We found just under 300 cases where this had occurred. We compared their weights following the adjustment for extreme weights in the previous step with the distribution of weights in their new domain. When the individual’s weight fell within the range of weights in the new domain, it was retained with no change. On the other hand, if it fell outside the range of weights in the new domain, it was changed to the new domain’s minimum value (if it was below the range) or maximum value (if it was above the range). In this step, we adjusted the weight of 35 individuals in the CSD sample.
Post-stratification and out-of-scope adjustment
Two separate post-stratification steps were needed to obtain the final weights. These steps were done for both the YES sample and NO sample.
The first post-stratification involved adjusting the weights of CSD respondents (out-of-scope cases, respondents with a disability, and respondents without a disability) to obtain the same totals as in the NHS for the YES population by province, age group, sex and severity. Severity refers to the severity reported in the NHS: either “often” or “sometimes”. Post-stratification was performed on the following age groups: 15-19, 20-24, 25-29, …, 60-64, 65-79, 75+.
The same post-stratification was carried out on the NO sample starting from the initial weights computed earlier.
The first post-stratification was performed independently for the YES and NO populations. The control totals were calculated from the NHS dissemination database for the population aged 15 and over living in private dwellings, excluding reserves.
The main purpose of the second post-stratification was to adjust the weights of the NO sample in order to estimate the proportion of the population that was out of scope in the fall of 2012. Nonetheless, this step was performed on both the YES sample and NO sample. Since CSD collection took place about a year and a half after the NHS, a large number of out-of-scope units were found in the YES sample, representing about 235,000 individuals in the YES population. Because these out-of-scope cases must be excluded from the disability rates, it is important to try to remove the out-of-scope cases from the denominator (which consists of the YES and NO populations) as well. Otherwise, we may underestimate the disability rates. As the NO sample was not sent to collection, we had to find an indirect method of estimating and excluding the out-of-scope cases. The strategy used to estimate the number of out-of-scope cases in the NO population, as we will see, depends on the type of out-of-scope case and the available information.
First, we calibrated the weights of the YES and NO samples on the population estimatesNote 13 produced by the Demography Division, adjusted for net undercoverage on May 10, 2011, excluding the under-15 age group, Indian reserves, collective dwellings, members of the Canadian Forces, and visitors to Canada. This calibration was necessary because the death and emigration totals we will use later to estimate the number of out-of-scope cases in the NO population relate to the total population, not just the enumerated population. More than 65% of the out-of-scope cases observed in the CSD were deaths. The adjustment for this type of out-of-scope case had to be as precise as possible.
Second, we made the various adjustments required to exclude units that would have been out of scope in the fall of 2012 from the NO population. The aim here is not to identify the individuals in the NO sample who are out of scope but to reduce the sum of the NO sample’s weights so that they match the population totals excluding the out-of-scope cases.
These adjustments were carried out separately for different types of out-of-scope cases, and using different data sources, as described below.
Estimates for the various types of out-of-scope cases in the YES population are presented in the table below.
|Type of out-of-scope||Unweighted||Weighted||Weighted|
|Moved to an Indian reserve||4||120||0.05|
|Source: Statistics Canada, Canadian Survey on Disability, 2012.|
Deaths and emigrants (between May 10, 2011, and CSD collection)
The Demography Division produced cumulative totals of deaths and emigrants between May 10, 2011, and November 15, 2012 (the mid-point of CSD collection), by province/territory, age group and sex. These totals were adjusted to exclude people under age 15, Indian reserves, collective dwellings, members of the Canadian Forces and visitors to Canada. Since these totals cover the entire population (the YES and NO populations), we were able to subtract the number of deaths estimated for the YES population with the CSD sample (using the weights obtained following post-stratification on the Demography Division totals, as described above) and thus derive the number of deaths in the NO population. Then we were able to adjust the sampling weight of the NO population downward to produce an estimate of the population that was still alive and still in Canada in the fall of 2012.
|Demography Division estimate (total population)||Indirect estimate (NO population)|
|Sources: Statistics Canada, Canadian Survey on Disability, 2012 and Population Estimates Program.|
The downward adjustment of the weights of the NO population was carried out for deaths at the level of the provinces cross-classified by two age groups (15-64 and 65+). For emigration, the adjustment was made by age group only (15-24, 25-44 and 45+). It was important for the adjustment to be made as precisely as possible, but without creating inconsistencies in the adjustments to be carried out.
Persons living in an institution after May 10, 2011
This type of out-of-scope case is more difficult to estimate for the NO population. There is no accurate data source that we can use to estimate the number of people admitted to institutions since the last census. In addition, we know that the proportion of people admitted to institutions in the YES population should be larger than the proportion in the NO population, but we do not know how much larger.
We obtained estimates of institutional admissions from the Survey of Labour and Income Dynamics (SLID, Panel 7) for the total population aged 15 and over (with the same exclusions of reserves, collective dwellings, etc., as the CSD, but also excluding the territories). These estimates were provided to us by age group (15-64, 65-74,75+), and they cover a period of a year and a half, which is comparable to the gap between the Census and the CSD. Since the SLID has no estimates for the territories, it was impossible to estimate the number of institutional admissions in the territories’ NO population. An indirect estimate of the institutional admissions in the NO population was obtained by subtracting the number of institutional admissions estimated for the YES population in the CSD from the SLID’s estimated total for the entire population (as was done for deaths and emigration).
The number of institutional admissions in the NO population was negligible for the first two age groups (15-24 and 65-74). We therefore performed an adjustment only for the last age group in the NO population. We decreased the weight of the NO population aged 75 and over to reduce the population by just under 26,000 in the 10 provinces.
Persons who moved to an Indian reserve after May 10, 2011
The number of cases in the YES population that were out of scope because they moved to a reserve was so small that we saw no need to make an adjustment in the NO population.
Other out-of-scope cases
For the other types of out-of-scope cases—members of the Canadian Forces, visitors to Canada (incorrect classification in the NHS) and invalid age—we had to assume that the proportion for the NO population would be the same as the proportion estimated for the YES population. This assumption is less realistic for members of the Canadian Forces, but this adjustment for out-of-scope cases is very small.
The total number of members of the YES population who fall under these three types of out-of-scope cases makes up about 0.13% of the YES population.Note 14 We therefore assumed that 0.13% of the NO population also fell under these types of out-of-scope cases, and we reduced the weights of the NO population in proportion to the observed figures for the YES population in the following three age groups: 15-24, 25-44, 45+.
It is important to keep in mind that the out-of-scope cases in the NO population were excluded by adjusting the weight of the NO population downward to compensate for the losses. For the YES population, on the other hand, we simply excluded from the analytical file those individuals who were identified as out-of-scope cases and whose sampling weight enabled us to estimate the number of persons they initially represented in the YES population.
Weighted population counts
Table 6.3 shows counts for the YES population covered by the survey, that is, persons who reported an activity limitation in the NHS filter questions, after adjustment for net undercoverage and exclusion of units that were out of scope in the fall of 2012.
|Province/Territory||YES Population covered
||Persons with a disability
|Newfoundland and Labrador||85,130||59,300|
|Prince Edward Island||26,370||18,840|
|Source: Statistics Canada, Canadian Survey on Disability, 2012.|
The next table shows weighted counts of the population covered by the NO sample, following adjustments for net undercoverage and exclusion of units that were out of scope in the fall of 2012.
|Newfoundland and Labrador||335,850|
|Prince Edward Island||91,070|
|Source: Statistics Canada, Canadian Survey on Disability, 2012.|
6.2 File structure and content
We created two analytical files for the CSD: a file with persons who have a disability, and a file with persons who do not have a disability. Depending on the type of analysis required, users will have to use either the disabled persons file only or both files together.
The disabled persons file contains persons selected for the CSD who, according to the definition of disability used in the CSD, are considered to have a disability. This file is the more comprehensive of the two. It contains all CSD data and several variables from the NHS. Any analysis that deals exclusively with disabled persons can be done with this file alone.
The non-disabled persons file contains three groups of people: two from the CSD’s YES sample, and the third from the NO sample. The three groups are as follows:
- Persons interviewed for the CSD who reported being limited only “rarely” with “no difficulty” or “some difficulty”. For the purposes of the CSD interview, these persons were deemed to potentially have a disability and therefore had to answer all the questions in the CSD. However, they were excluded from the final definition of disability.
- Persons interviewed for the CSD who reported that they were “never” limited (false positives). For the purposes of the CSD interview, these persons were deemed not to have a disability and therefore did not have to answer the rest of the questions in the CSD.
- Persons from the NO sample. They reported no activity limitations in the NHS, were not sent to collection, and were automatically deemed not to have a disability.
Hence, the non-disabled persons file has different content depending on the group of people involved. For the persons in group (a), we have the same content as for the persons with a disability: all the CSD data and several variables from the NHS. For the persons in group (b), we have only the data from the CSD’s DSQ module, since the interview was terminated immediately after that module. However, we also have the NHS variables. For the persons in group (c), we have only the NHS variables, since no CSD collection was done for those units.
The non-disabled persons file should be used together with the disabled persons file for two types of analysis: calculation of disability rates, since the denominator must include both persons with a disability and persons without a disability, and comparison of the NHS characteristicsNote 15 of persons with a disability and persons without a disability.
To distinguish between the various groups in the files, we created a derived variable, CSDPOPFL, which takes a value of 1 for persons with a disability, 2 for group (a) persons without a disability, 3 for group (b) persons without a disability, and 4 for group (c) persons without a disability.
To make the two files easier to use, they both have the same variables. However, some variables will have missing values in the non-disabled persons file, because we do not have all the information for some groups. The table below summarizes the contents of the various files and the various groups, for the disabled persons file and the non-disabled persons file.
|CSDPOPFL||Demographic variables||DSQ||Other CSD content and derived variables||NHS variables||Final weight and bootstrapNote 1 weights|
|Persons WITH a disability||1|
|Persons WITHOUT a disability
|Persons WITHOUT a disability
|Persons WITHOUT a disability
Source: Statistics Canada, Canadian Survey on Disability, 2012.
A note on reference periods
When calculating disability rates or comparing the characteristics of persons with a disability and persons without a disability, the reference date is May 10, 2011. However, when one is only interested in persons with a disability, one will work with the CSD data collected and measured in the fall of 2012. In this case, the reference period will be the Fall of 2012. This is equivalent to viewing the CSD as a sort of longitudinal survey where the first wave of data was collected in 2011 through the NHS (the initial population), for both the YES population and the NO population, and where the second wave was collected in 2012 through the CSD, for the subset of persons with a disability.
In other words, the CSD’s persons with a disability are individuals who reported an activity limitation in the NHS on May 10, 2011, and a disability in the CSD in 2012. Hence, the CSD’s characteristics for persons with a disability are based on 2012 information about a population defined in 2011.
6.3 Final datasets and data dictionaries
Final data files included the following:
- Final processing file
- Analytical files for use in Research Data Centres
The final processing file is an in-house file that includes a number of temporary variables used exclusively for processing purposes. The Analytical files are dissemination files processed further for release purposes. The analytical files are distributed in Research Data Centres across Canada. They are also used at Statistics Canada to produce data tables in response to client requests. Dissemination files are scheduled for distribution following the CSD release day of December 3, 2013 (see Chapter 9 for dissemination details).
In order to transform the final, cleaned processing file to final Analytical data files for researchers, a series of actions were performed. First, steps were taken for the enhanced protection of respondent confidentiality. Next, person-weights were added to the files. Weighting is described in more detail in Section 6.1. Finally, all temporary variables or variables used exclusively for processing purposes were removed from the files.
Accompanying the 2012 CSD Analytical files are the following supporting documents:
- the record layout,
- SAS (Statistical Analysis System), SPSS (Statistical Package for the Social Sciences) and Stata syntax to load the files, and
- metadata in the form of a data dictionary for each Analytical file that describe every variable and provide weighted and unweighted frequency counts.
6.4 Guidelines for analysis
A User Guide has been created for the Analytical files. This guide provides detailed step-by-step instructions for using the CSD data files. The User Guide includes guidelines for tabulation and statistical analysis, how to apply the necessary weights to the data, information on software packages available and guidelines for the release of data, such as rounding rules. The process of estimating the reliability of estimates, both quantitative and qualitative, is covered in detail. In addition, detailed data dictionaries are available covering all variables available.
7.1 Overview of data quality evaluation
The objective of the Canadian Survey on Disability is to produce quality estimates on the type and severity of disabilities of Canadians aged 15 years and over (as of May 10, 2011) as well as on a variety of other important indicators of the experiences and challenges of persons with disabilities. This chapter reviews the quality of the data for this survey.
Sections 7.2 and 7.3 below explain the two types of errors that occur in surveys - sampling and non-sampling errors. Each type of error is evaluated in the context of the CSD. Sampling error is the difference between the data obtained from the survey sample and the data that would have resulted from a complete census of the entire population taken under similar conditions. Thus, sampling error can be described as differences arising from sample-to-sample variability. Non-sampling errors are all other errors that are unrelated to sampling. Non-sampling errors can occur at any stage of the survey process, and include non-response for the survey as well as errors introduced during data collection or computer processing.
This chapter describes the various measures adopted to prevent errors from occurring wherever possible and to adjust for any errors found throughout the different stages of the CSD. Areas of caution for interpreting CSD data are noted. Readers may also refer to the National Household Survey User Guide for related information on data quality.
7.2 Sampling errors and bootstrap method
The estimates that can be produced with this survey are based on a sample of individuals. Somewhat different estimates might have been obtained if we had conducted a complete census with the same questionnaires, interviewers, supervisors, processing methods and so on, as those actually used. The difference between an estimate derived from the sample and an estimate based on a comprehensive enumeration under similar conditions is known as the estimate’s “sampling error”.
To produce estimates of the sampling error for statistics produced from the CSD, we used a particular type of bootstrap method. Several bootstrap methods exist in the literature, but none was appropriate for the CSD’s complex sample design. The following characteristics of the sample design make it difficult to estimate the sampling errors:
- A three-phase design in which households (or dwellings) are selected in the first two phases, and individuals in the third phase. In the first phase, a random sample of 4.5 million households stratified by collection unit (CU) was selected for the NHS. In mid-July 2011, a second-phase subsample of 400,000 households from the 1.2 million that had not yet responded was selected as part of the non-response follow-up (NRFU) operation. In the third phase, a sample of some 45,500 individuals with activity limitations according to the NHS was selected for the CSD.
- The sampling fraction of the first-phase sample (NHS) is non-negligible (about 1/3 in the N1 regions), and the sampling fraction of the CSD is rather high in some strata.
- The CSD strata (combinations of estimation domains, N1 or N2 regions, initial respondent vs. NRFU respondent, often limited vs. sometimes limited) are non-nested within the NHS strata (CUs or groups of CUs).
- The method used has to be flexible enough to produce standard statistics such as proportions, totals, averages and ratios, as well as more sophisticated statistics, including percentiles and logistic regression coefficients.
For the purposes of calculating the sampling error, NRFU respondents were treated as a third-phase sample, in which a dwelling’s probability of inclusion was equal to its probability of responding, independently for each dwelling.
As previously mentioned, several bootstrap methods exist in the literature for one-phase sampling. The most common method, known as “with replacement” bootstrap, involves selecting M with-replacement subsamples from the main sample and producing estimates for each subsample. The bootstrap variance estimate is then calculated as a function of the squared differences between the estimates from each of the M bootstrap subsamples and the estimate from the survey sample.
The use of bootstrap weights makes calculating the variance much simpler. For each subsample (bootstrap replicate), the initial sampling weights are adjusted for bootstrap subsampling, which produces what is known as “initial bootstrap weights”. Since each bootstrap sample is the result of selecting units with replacement, a unit can be selected in a particular bootstrap sample more than once. It can be shown that the bootstrap weights are a function of the initial weight of the observation multiplied by what is referred to as the unit’s “multiplicity” in the bootstrap sample, which is the number of times the unit is selected in the bootstrap sample. A unit’s multiplicity in the bootstrap sample is a random variable that has what is known as a “multinomial distribution”. Hence, the bootstrap weights can be seen as the product of the initial sampling weights and a random adjustment factor (in this case, a function of the unit’s multiplicity). Once initial bootstrap weights have been calculated, all weight adjustments applied to the initial sampling weights were applied to the initial bootstrap weights to obtain the final bootstrap weights, which capture the variance associated not only with the particular sample design but also with all weight adjustments applied to the full sample to derive the final weights.
In 2006, the Aboriginal Peoples Survey (APS) developed a general bootstrap method for two-phase sampling (Langlet, Beaumont and Lavallée, 2008). That method could not be used for the Participation and Activity Limitation Survey (PALS) because it was still being developed when the data were released. For 2011, the two-phase method used in the 2006 APS was adapted for the new complexities of the sample design. A review of the 2006 method follows.
As mentioned previously, the bootstrap weights can be seen as the product of the initial sampling weights and a random adjustment factor. This is the underlying idea of the general bootstrap method. In the case of a two-phase sample, the variance can be decomposed into two components, each one associated with a sampling phase. The two-phase general bootstrap method generates a random adjustment factor for each phase of sampling. In this case, the initial bootstrap weight of a given unit is the product of its initial sampling weight and the two random adjustment factors.
In 2011, for variance estimation, the NHS was considered to have two additional phases: the NRFU subsample and the NRFU subsample non-response. To take these additional phases into account, the three phases of the NHS were combined into a single phase so that the two-phase general bootstrap method (an NHS phase and a CSD phase) could be used. In the general bootstrap method for two-phase designs, it can be shown that the random adjustment factors depend on the single and double inclusion probabilities associated with each phase. To combine the three phases into a single phase, the single and double inclusion probabilities for the three phases of the NHS need to be combined. The single and double inclusion probabilities of the three phases combined are given by the product of the single and double inclusion probabilities of each of the three phases. The details of the methodology used are provided in Haddou (2013).Note 16
Once the three phases of the NHS have been combined into a single phase, the general bootstrap method for two-phase sampling was used, which involved the derivation of two sets of random adjustment factors, one for each phase.
There is a major advantage in having two sets of random adjustment factors. The first set of adjustment factors can be used for estimates based on the first phase only, i.e., estimates based on the NHS sample. These estimates are used when the weights are adjusted to the NHS totals during post-stratification (Section 6.1). This produces variable NHS totals for each bootstrap sample, which reflects the fact that the NHS totals used are based on a sample and not on known fixed totals.
For the CSD, 1,000 sets of bootstrap weights were generated using the method described above. The method used is slightly biased in that it slightly overestimates the variance. The extent of the overestimation is considered negligible for the CSD. The method can also produce negative bootstrap weights. To overcome this problem, a transformation was performed on the bootstrap weights to reduce their variability. Consequently, the variance calculated with these transformed bootstrap weights has to be multiplied by a factor which is a function of a certain parameter, known as phi. The parameter’s value is chosen as the smallest integer that makes all bootstrap weights positive. For the CSD, this factor is 4. The variances calculated from the transformed bootstrap weights must therefore be multiplied by 42 = 16. Similarly, the coefficients of variation obtained (square root of the variance divided by the estimate itself) have to be multiplied by 4. However, most software applications that produce sampling error estimates from bootstrap weights have an option to specify this adjustment factor, so that the correct variance estimate is obtained without the extra step of multiplying by the constant.
It is extremely important to use the appropriate multiplicative factor for any estimate of sampling error such as variance, standard error or CV. Omission of this multiplicative factor will lead to erroneous results and conclusions. This factor is often specified as the “Fay adjustment factor” in software applications that produce sampling error estimates from bootstrap weights.
For examples of procedures using the Fay adjustment factor, see the 2012 Canadian Survey on Disability Data User Guide.
The measure of sampling error used for the CSD is the coefficient of variation (CV) of the estimate, which is the standard error of the estimate divided by the estimate itself. For this survey, when the CV of an estimate is greater than 16.5% but less than or equal to 33.3%, the estimate is accompanied by the letter "E� to indicate that the data should be used with caution. When the CV of an estimate is greater than 33.3%, or if an estimate is based on 10 units or less, the cell estimate is replaced by the letter "F� to indicate that the entry is suppressed for reliability reasons.
7.3 Non-sampling errors
Besides sampling, a number of factors at almost every phase of a survey can cause errors in survey results. Respondents may misunderstand the questions and answer them inaccurately, responses may be entered incorrectly during data capture and errors may be introduced in the processing and tabulation of data. These are all examples of non-sampling errors.
Over a large number of observations, randomly occurring errors will have little effect on estimates drawn from the survey. However, errors occurring systematically will contribute to biases in the survey estimates. Thus, much time and effort was devoted to reduce non-sampling errors in the survey. At the stage of content development, extensive activities were undertaken to develop questions that would be well-understood by respondents. The new questionnaire was tested thoroughly during several rounds of qualitative testing. In addition, many initiatives were taken in the field to encourage participation and reduce the number of non-response cases. Also important were the numerous quality assurance measures applied at the stages of data collection, coding and processing to verify and correct errors in the data. Weighting adjustments were used where appropriate to correct potential bias.
Non-sampling errors arise primarily from the following sources: coverage, non-response, measurement and processing. For each of these areas, the following sections discuss the various measures used to minimize and correct error.
Coverage errors occur when the sampled population excludes people intended to be in the target population. Because the CSD is an extension of the 2011 NHS, it inherits the coverage problems of that survey, which in turn inherits the coverage problems of the 2011 Census. For more information about coverage errors in the census please see the Final estimates of 2011 Census coverage on Statistics Canada’s website. For more information about the quality of NHS data, see the NHS User Guide on Statistics Canada’s website.
Non-response errors result from a failure to collect complete information on all units in the selected sample. Non-response produces errors in the survey estimates in two ways. First, non-respondents often have different characteristics from respondents, which can result in biased survey estimates if non-response is not corrected properly. In this case, the larger the non-response rate, the larger the bias may be. Secondly, if non response is higher than expected, it reduces the effective size of the sample. As a result, the precision of the estimates decreases (the sampling error on the estimates will increase). This second aspect can be overcome by selecting a larger sample size initially. However, this will not reduce the potential bias in the estimates.
The scope of non-response varies. One level of non-response is item non-response, where the respondent does not respond to one or more questions, but has completed a significant pre-defined portion of the overall questionnaire. Generally, the extent of partial non-response was small in the CSD as a result of extensive qualitative reviews and testing of questionnaire items. There is also total non-response when the person selected to participate in the survey could not be contacted or did not participate once contacted. Weights of respondents were inflated in order to compensate for those who did not respond as described in Section 6.1.
To reduce the number of non-response cases, many initiatives were undertaken prior to and during data collection.
The Statistics Canada website contained a CSD web page which included a series of questions and answers for respondents, as well as general information about the survey. In the months leading up to the survey, a special link to Statistics Canada’s website was made available on the websites of several organizations for persons with a disability, so that they could access information about the upcoming CSD. Prior to collection, a brochure was distributed to each selected respondent along with an introductory letter which provided an overview of the survey and explained the importance of participating. A small leaflet was also provided in Braille.
In addition, in-depth interviewer training was conducted. The interviewers were trained by experienced Statistics Canada training staff. In conjunction with the training, detailed interviewer manuals were provided as a reference. Furthermore, all of the interviewers were under the direction of interviewer supervisors, who oversaw activities in the field. Rigorous efforts to reach non-respondents through call-backs and follow-ups were also made by senior interviewers to encourage respondents to participate in the survey.
A table of final response rates obtained for the 2012 CSD is provided in Section 4.2 of this guide. The overall response rate for the survey was 74.6%. Response rates were highest in the older age groups.
Measurement errors occur when the response provided differs from the real value. Such errors may be attributable to the respondent, the interviewer, the questionnaire, the collection method or the respondent's record-keeping system. Extensive efforts were made for the 2012 CSD to develop questions which would be understood, relevant and sensitive for respondents.
Several rounds of qualitative testing were done for the CSD, and in particular for the new Disability Screening Questions (DSQ). Qualitative testing was carried out by Statistics Canada's Questionnaire Design Resource Centre (QDRC). To minimize measurement error, adjustments were made to question wording and flows based on those results.
Many other measures were also taken to specifically reduce measurement error, including the use of skilled interviewers, extensive training of interviewers with respect to the survey procedures and content, and observation and monitoring of interviewers to detect problems of questionnaire design or misunderstanding of instructions.
Processing errors may occur at various stages including programming of the CATI application, data capture by the interviewer, coding and editing. Quality control procedures were applied to every stage of the data processing to minimize this type of error. Interviews for the CSD were done by Computed Assisted Interviewing and several edits were implemented in the system so that unusual values could be confirmed by respondents during the interviewNote 17 and errors corrected on the spot.
At the data processing stage, a detailed set of procedures and edit rules were used to identify and correct any inconsistencies between the responses provided. For every step of data cleaning, a set of thorough, systematized procedures were developed to assess the quality of every variable on file and to make corrections to every error found. A snapshot of the output files was taken at each step and verification was made comparing files at the current and previous step. The programming of all edit rules were exhaustively tested before being applied to the data. Examples of data processing verification included the review of all question flows, including very complex sequences, to ensure skip values were accurately assigned and distinguished from different types of missing values; quality control double-coding of ‘other-specify’ responses; experienced supervision of coding to standardized classifications; and the review of all derived variables against their component variables to ensure accurate programming of derivation logic, including very complex derivations. See processing chapter of this guide for more details.
8. Differences between the 2012 Canadian Survey on Disability (CSD) and the 2006 Participation and Activity Limitation Survey (PALS)
The 2012 Canadian Survey on Disability (CSD) includes a set of disability screening questions that were used for the first time to identify persons with a disability in Canada. Although some data users may be seeking to compare the prevalence of disability between surveys, and particularly with the CSD’s predecessor – the 2006 Participation and Activity Limitation Survey (PALS) – there are many reasons why this is not possible.
The concepts and methods used to measure disability in the 2012 CSD represent a significant change from those used in the 2006 PALS. The most important change is that the two surveys used a different definition of disability. In the CSD, the definition was applied by using the new set of disability screening questions (DSQ). These screening questions reflect a fuller implementation of the social model of disability, greater consistency in disability identification by type, and improved coverage of the full range of disability types, especially mental/psychological and cognitive (learning and memory) disabilitiesNote 18. Differences are discussed in more detail below.
Because of the major differences in concepts and methods between the 2006 PALS and the 2012 CSD, it is neither possible nor recommended to compare the prevalence of disability over time between these two sources.
8.1 New method of screening for disability
In contrast to the PALS screening questions which used a hybrid approach—a social model for identifying some types of disabilities and a medical model for other types— the CSD screening questions (DSQ) were designed to provide greater consistency in disability identification by type.
Based on their responses to the DSQ, respondents are identified as having a disability only if their daily activities are limitedNote 19 as a result of an impairment or difficulty with particular tasks.
The CSD (by adopting the DSQ) allows respondents to determine whether they face activity limitations as a result of these difficulties or impairments. Some people who indicate that they have some difficulty with certain tasks or have an impairment of some type go on to indicate that this never interferes with their daily activities. In PALS, these individuals were considered to have a disability, but in the CSD, they are not.
This change will have the greatest impact on the identification of persons with sensory and physical disabilities because the PALS identified disabilities in these areas solely on the basis of an indication of some difficulty. At the same time, for certain non-physical disability types, PALS is closer to the CSD because it did have the added requirement of a limitation of activities.
Other changes to screening questions may also have an impact on results. For example, the questions regarding mental/psychological disabilities have been altered somewhat by including examples of the more prevalent conditions (such as depression, anxiety, and bipolar disorder) and excluding examples of less prevalent conditions which are also more highly stigmatized (such as schizophrenia). Changes were also made to the list of examples in questions pertaining to learning disabilities and memory disabilities.
An additional difference between the surveys involves the identification of communication disabilities which was done in PALS but not in the CSD. For the DSQ, no question could be found in successive rounds of qualitative testing to properly identify persons with communication disabilities. Most iterations of a question to identify this small group (including the PALS question itself) yielded difficulties due to people having neither English nor French as their first language or to cultural difficulties (for example, not understanding colloquial references). As well, the advent of social media as a form of communication appears to have added new complexities to the concept of communication for the Canadian population.
Finally, changes were made to the concept of “agility” used by PALS. In the DSQ, this type of disability was split into two types: flexibility and dexterity, since qualitative tests showed that people find that these two tasks are quite different from each other and relate to different underlying conditions. This split was considered an improvement in the identification of different physical disabilities and was also in response to requests made by disability data users and by the Employment and Social Development Canada Persons with Disabilities Technical Advisory Group (TAG).
8.2 Other changes to CSD content
Just as the prevalence of disability and the prevalence of certain types of disabilities cannot be compared across surveys, data regarding other content of the CSD cannot be compared with PALS either. The content of the CSD has been streamlined and updated to a large extent. Some of the content from the PALS was cut due to operational constraints; however, every effort was made to ensure that most of the cuts were restricted to content that had been less informative and less utilized.
Survey questions were also updated to better reflect current realities and to correct known weaknesses in the PALS. For example, the section on aids and assistive devices has undergone major changes. Many of the items contained within the PALS were considered out of date in terms of their current usage by people with disabilities. Similarly, new items were added to better reflect technological advancements that have happened since the PALS questions on aids and assistive devices were originally developed in the late 1990s.
Efforts were also made to streamline the method by which CSD respondents were asked about their requirements and unmet needs. For example, questions regarding the need for and use of household fixtures (such as grab bars, etc.) by persons with certain types of physical disabilities are now combined with other aids and devices for the same disability type. The PALS separated questions about aids and devices that were portable and attached to the persons themselves from those attached to homes. This separation was unnatural for many individuals who wanted to indicate a need for items such as grab bars in the earlier section on aids/devices (often reported under “other”) and then reported them again in a later section when specifically prompted, possibly leading to some double counting.
These changes to questionnaire wording and flows mean that comparisons should not be made between the PALS and the CSD data.
8.3 NHS filter questions
While the CSD and the DSQ are considered to be a big step forward in improving the measurement of disability using the social model, it should be noted that the CSD sample was pre-filtered using the same filter questions on the 2011 National Household Survey (NHS) as those used on the 2006 Census long form for the PALS. Follow up studies have shown that these filter questions do not adequately identify people with mental/psychological or cognitive disabilities. This means that the CSD continues to have some of the weaknesses that the PALS had with respect to undercoverage of some disability types. Nevertheless, of those screened in by the NHS, the new method of screening on the CSD will help improve the identification of persons with mental/psychological, cognitive and “other” types of disabilities because they can now be better identified.
8.4 Change in lag time from the filtering survey
As mentioned above, both PALS and CSD derived their sample frames from the answers to the long-form Census in 2006 and the NHS in 2011. The PALS questionnaire was administered between six and nine months after Census data were collected. The CSD, on the other hand, was in the field 16 to 20 months after the NHS data were collected. This difference in lag time not only made it more difficult to track selected respondents who had moved, but it also increased the possibility that a respondent who had reported an activity limitation at the time of the NHS may no longer have a disability, may have been institutionalized or may have died during that time (see Section 8.5). In addition, some information linked from the NHS to the CSD file (for example, information on income) may have changed during the period of time between the two surveys.
8.5 Other methodological changes
The CSD sampling frame was built based on answers to the 2011 NHS, while that of PALS was taken from the 2006 long-form Census. Although every effort was made by the NHS to minimize impacts due to a lower response rate, the CSD results may have been impacted by this change. For more details on the non-response follow-up and weighting adjustments for the NHS, please refer to Chapter 6.
A second methodological difference of the CSD compared to the PALS involved a change in the weighting strategy applied to the CSD to compensate for the longer time lag between the NHS collection and the CSD collection. As mentioned earlier, this time lag increased the likelihood of non-response in the CSD due to death or institutionalisation. As many of these cases may have been persons with a disability, it was important to ensure that disability prevalence not be underestimated. The weights of the population who said NO to the NHS filter questions were, therefore, adjusted to take into account deaths and institutionalizations that would have occurred between the NHS collection and CSD collection. This required a calibration of the weights to population estimates adjusted for net undercoverage, which was not done in the 2006 PALS. For more details on the weighting used in the CSD, please refer to Chapter 6.
8.6 Summary and recommendation
As discussed above, the main differences between the PALS and the CSD can be summarized as follows:
- The definition of disability used in the CSD is different from PALS. The CSD has adopted the newly developed DSQ which is being used for the first time to identify disability in Canada.
- Screening questions in the CSD more closely reflect a social model of disability than do the PALS screening questions. They are also consistent across all types of disabilities, unlike the PALS questions.
- Questionnaire content has been streamlined and updated to reflect current technology and to correct weaknesses in question wording.
- The longer lag time between the NHS and CSD follow-up increased the possibility that selected respondents no longer had a disability, were institutionalized or died during that time. This required a different method for calibration of weights which was not done in the 2006 PALS.
- Finally, the sampling frame for the CSD was derived from the 2011 NHS rather than the 2006 Census and so CSD results may have been impacted by this change.
All of these changes should be assumed to affect comparability of the surveys. Comparison of CSD data to PALS data is, therefore, neither possible nor recommended.
9.1 Data products and services
Data for the 2012 Canadian Survey on Disability (CSD) were released publically on December 3, 2013. The CSD release included a brief Fact Sheet about disability in Canada and a set of data tables on disability rates for adults in Canada, by age and sex, for each of the provinces and territories. Tables also included data on the types and severity of disabilities. These items are available to the public free of charge on Statistics Canada’s website.
Starting in 2014, researchers across the country will be able to conduct in-depth analyses using the CSD Analytical data files housed at Statistics Canada’s Research Data Centres (RDCs). In order to access the files, researchers must undergo a research and ethics committee review for approval. Their use of the data must be conducted according to Statistics Canada policies, guidelines and standards. For instance, only aggregate statistical estimates that conform to the confidentiality provisions of the Statistics Act will be released outside of Statistics Canada.
Later releases will include an analytical report addressing important subjects like the use of aids and assistive devices, help received or required, and the employment and educational experiences of persons with disabilities.
In addition to these data products and services, clients can request custom data tables from Statistics Canada. All such requests are screened for confidentiality and the aggregate data are rounded before being released to clients. Statistics Canada also delivers special CSD presentations to key stakeholders and at various conferences.
9.2 Reference products
Information about the 2012 Canadian Survey on Disability (CSD) is available on Statistics Canada’s website. Statistics Canada provides an Integrated Metadata Base (IMDB) on-line for all surveys that it conducts, including the 2012 CSD. The purpose of the IMDB is to provide information that will assist the public in interpreting Statistics Canada's published data. The information (also known as metadata) is provided to ensure an understanding of the basic concepts that define the data, including variables and classifications, the underlying statistical methods and surveys, and key aspects of the data quality. Direct access to the CSD questionnaire is also provided.
In addition to the IMDB, the present Concepts and Methods Guide is provided online for a detailed discussion of survey content, sampling design, data collection and processing, weighting of the data, data quality, differences between the 2012 CSD and the 2006 Participation and Activity Limitation Survey (PALS), and dissemination products for the CSD.
For researchers using the analytical files in Statistics Canada’s Research Data Centres (RDCs), an RDC User Guide is available with detailed step-by-step instructions for using the data file. The RDC User Guide describes the structure of the data files in detail, including all core variables, derived variables and linkages to the National Household Survey (NHS). Detailed data dictionaries provide information for all variables available. The RDC User Guide also provides detailed guidelines for tabulation and statistical analysis, how to apply the necessary weights to the data, information on software packages available and guidelines for the release of data, such as rounding rules. The process of estimating the reliability of estimates, both quantitative and qualitative, is covered in detail.
9.3 Disclosure control
Statistics Canada is prohibited by law from releasing any data that would divulge information obtained under the Statistics Act that relates to any identifiable person, business or organization without the prior knowledge or the consent in writing of that person, business or organization. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.
Bernier, J., and Nobrega, K. (1998). “Outlier detection in asymmetric samples: A comparison of an inter-quartile range method and a variation of a sigma-gap method”. Annual meeting of the Statistical Society of Canada, June 1998.
Federal Disability Report: The Government of Canada’s Annual Report on Disability Issues (2010). Human Resources and Skills Development Canada. Gatineau, Québec.
Haddou, M. (2013). “Bootstrap Variance Estimation Specifications - Aboriginal Peoples Survey”. Internal document, January 2013.
Langlet, É., Beaumont, J.-F., and Lavallée, P. (2008). “Bootstrap Methods for Two-Phase Sampling Applicable to Postcensal Surveys”. Paper presented at Statistics Canada’s Advisory Committee on Statistical Methods, May 2008, Ottawa.
Mackenzie, Andrew, Matt Hurst and Susan Crompton (2009). Living with disability series, Defining disability in the Participation and Activity Limitation Survey, Canadian Social Trends, 2009/12. Statistics Canada catalogue no. 11-008-X.
Morel, J., and Nambeu, C. (2013). “National Household Survey: Weighting and estimation update”. Presented at the SSC conference in May 2013.
[an error occurred while processing this directive]
- A decision was made to ignore the “other” type when there was already a limitation under one of the 10 disability types because it was observed that respondents with a disability that fell under one of the 10 types tended to report the disease that caused their disability under “other”. Double counting of disability types was thus avoided.
- The questions on citizenship (question 10), immigrant status received (question 11) and year of immigration (question 12) are not asked of people living on Indian reserves and settlements who were enumerated using the N2 form.
- Many simulations were carried out in preparation for NHS collection in an effort to identify the groups with the greatest risk of not responding to this survey. CUs with high concentrations of these target groups were oversampled for non-response follow-up.
- The concept of “severity” here is not the same used for the severity score, which was described in section 2.3. Here, we only take into account the frequency of limitation reported to the NHS filter questions.
- Morel, J., and Nambeu, C. (2013). “National Household Survey: Weighting and estimation update”. Presented at the SSC conference in May 2013.
- For Prince Edward Island, we set the CV at 10% for the 65-to-74 age group and 15% for the 75-and-over age group to avoid selecting all available units in the sampling frame.
- The application used the field corresponding to the date of birth reported directly by the respondent at the time of the interview, but some respondents only reported their age.
- The NHS weight that was used here is a version that was corrected to avoid excessively large weights.
- Paradata are collection variables that are available for all the selected units, such as the number of contact attempts, the date, time and duration of each contact, the result of each contact (refusal, appointment, completed interview, etc.).
- The SAS procedure PROC FASTCLUS was used here.
- Note that out-of-scope units were excluded from this adjustment (but included in the non-contact adjustment). Since the household had already been contacted in all these cases, we would have known if the selected person was out of scope. We therefore assumed here that all non-respondents with contact were in-scope, and their weight was redistributed among in-scope respondents only.
- Bernier, J., and Nobrega, K. (1998). "Outlier detection in asymmetric samples: A comparison of an inter-quartile range method and a variation of a sigma-gap method�. Annual meeting of the Statistical Society of Canada, June 1998.
- When weighting was done for the CSD, the Demography Division was still producing its population estimates with a model based on the 2006 Census. The data needed to adjust the 2011 Census data for net undercoverage were not yet available. We nevertheless decided that it was better to make this adjustment to avoid underestimating the CSD disability rates.
- The proportion 0.13% is obtained by dividing 7,739 by the total of the CSD weights after the second post-stratification.
- NHS characteristics, because for most people without a disability, we have only those variables (the NO sample).
- Haddou, M. (2013). “Bootstrap Variance Estimation Specifications - Aboriginal Peoples Survey”. Internal document, January 2013.
- The only exception to this is for the personal interviews done in the Northwest Territories using paper questionnaires that were later on captured in the CATI application at the regional office.
- This is true of the DSQ module on its own; however some constraints are associated with it being administered as a component of the 2012 CSD, a post-censal survey. See Section 8.3 for more detail.
- The only exception to this is for developmental disabilities where a person is considered to be disabled if the respondent has been diagnosed with this condition.