Canadian Survey on Disability, 2022: Concepts and Methods Guide
5. Processing

Warning View the most recent version.

Archived Content

Information identified as archived is provided for reference, research or recordkeeping purposes. It is not subject to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please "contact us" to request a format other than those available.

Skip to text

Text begins

5.1 Pre-processing: data capture

All responses to the 2022 Canadian Survey on Disability (CSD) questions were captured directly in the electronic questionnaire (EQ) application, both for the interviewer-led (iEQ) component and the respondent self-reporting (rEQ) component. Additional case management information for the iEQ was captured and transmitted to head office. Data from the rEQ were transmitted directly to head office. Paradata was also collected in the form of audit trail files which helped inform on things like survey length, time spent on difficult question, etc. These electronic systems create many efficiencies in both time and costs associated with data capture and transmission. All survey responses were kept highly secure through industry-standard encryption protocols, firewalls and encryption layers.

For some CSD questions, data underwent a preliminary verification process when respondents were completing the survey. This was accomplished by means of a series of soft edits programmed into the EQ. That is, where a particular response appeared to be inconsistent with previous answers or outside of expected values, the interviewer or the self-reporting respondent was notified with an on-screen warning message, providing them with an opportunity to modify the response provided. The response data were subjected to more in-depth processing once they were transmitted to head office, as described in the sections below.

5.2 Survey processing steps

Once survey responses were transmitted to head office, more extensive data processing for the CSD began. This involved a series of steps to convert the questionnaire responses from their initial raw format to a high-quality, user-friendly database involving a comprehensive set of variables for analysis. A series of data operations were executed to clean files of inadvertent errors, edit the data for consistency, code open-ended questions, create useful variables for data analysis, and finally to systematize and document the variables for ease of analytical usage. 

The CSD uses a set of social survey processing tools developed at Statistics Canada called the “Social Survey Processing Environment” (SSPE). The SSPE involves statistical software programs (SAS-based), custom applications and manual processes for performing the following systematic processing steps:

Each step of processing from the initial clean-up to the construction of derived variables are described in more detail in the sections of this chapter below. Chapter 6 provides the details related to final database creation.

5.3 Record clean up: in-scope and complete records

Following the receipt of raw data from the electronic questionnaire applications, a number of preliminary cleaning procedures were implemented for the 2022 CSD at the individual records level. These included the removal of all personal identifier information from the files, such as names and addresses, as part of a rigorous set of ongoing mechanisms for protecting the confidentiality of respondents. In addition, we made sure to save only one copy of any duplicates (i.e., two entries for a single respondent) found at this stage. Each pair was examined individually in order to ascertain the best record to keep. The only exceptions to this rule were when the first record obviously contained errors.

Also part of clean-up procedures was the review of all respondent records to ensure each respondent was “in-scope” and had a sufficiently completed questionnaire. Specific criteria for respondents are outlined below.

  1. To be “in scope” for the 2022 CSD, respondents must be at least 15 years of age on Census Day, May 11, 2021, and reside in a private household in Canada at the time of the survey. Specific questions in the entry module were used to confirm these criteria before beginning the interview. In-scope respondents include two groups: 1) those who were screened in upon completing the Disability Screening Questions (DSQ) and were therefore part of the disability population and 2) those who were screened out by the DSQ and were thus considered non-disabled. Both groups remain in the final survey database.
  2. To have a “complete” questionnaire, respondents who met the criteria of the population of persons with a disability must have provided an answer to the last question of the Labour Force Discrimination (LFD) module.Note This ensures that we get responses to a number of essential questions: those required to produce data tables for persons with disabilities as required by the 1995 Employment Equity Act. See Appendix D for more information.
  3. To have a “complete” questionnaire, respondents who were assigned to the population of persons without a disability must have provided an answer to the last question in the DSQ. Respondents without a disability were not required to complete the rest of the CSD questionnaire.

During data collection, information was exchanged several times between headquarters and the regional office interviewers. Since this information could not always be adequately saved in the collection system, specific tickets were opened where information about respondents was sent for RO’s to coordinate input. This took place in a system called CTOC.

Once the final status of each respondent was determined, cases considered out of scope or incomplete were removed from the database. The weights of respondents with complete questionnaires were adjusted upward to compensate for these losses (see section 6.1 for more information on weighting).

At this stage, a few edits are also applied to ensure validity of data. For example:

5.4 Recodes: variable changes and multiple-response questions

This stage of processing involved changes at the level of individual variables. Variables could be dropped, recoded, re-sized or left as is. Formatting changes were intended to facilitate processing as well as analysis of the data by end-users. One such change at the variable level was the conversion of multiple-response questions (“Select-all-that-apply” questions) to corresponding sets of single-response variables which are easier to use. For each response category associated with the original question, a new variableNote was created with “yes/no” response values. An example is provided below. This process is called “destringing” the variables. 

Start of text box

Original multiple-response question:

AADH_Q20 Why do you not have a hearing aid?
ON-SCREEN HELP: Select all that apply.

  1. Cost
  2. Do not want to or not willing to upgrade from current aid or assistive device
  3. Not available
  4. Available aids cannot be adapted
  5. Other reasons

End of text box

Start of text box

Final variables in single-response “yes/no” format:

AADH_20A Why do you not have a hearing aid?
 - Cost

  1. Yes
  2. No

AADH_20B Why do you not have a hearing aid?
 - Do not want to or not willing to upgrade from current aid or assistive device

  1. Yes
  2. No

AADH_20C Why do you not have a hearing aid?
 - Not available

  1. Yes
  2. No

AADH_20D Why do you not have a hearing aid?
 - Available aids cannot be adapted

  1. Yes
  2. No

AADH_20E Why do you not have a hearing aid?
 - Other reasons

  1. Yes
  2. No

End of text box

5.5 Flow edits: response paths, valid skips and question non-response

Another set of data processing procedures applied to the 2022 CSD was the verification of questionnaire flows or skip patterns. All response paths and skip patterns that were built into the questionnaire were verified to ensure that the universe (or coverage) for each question was accurately captured during processing.

Different category types for question response and non-response are explained below in order to assist users to better understand question universes as well as statistical outputs for CSD survey variables.

Question response and non-response categories

The electronic questionnaire items were identical for both interviewers (iEQ) and self-completing survey respondents (rEQ). Respondents or interviewers were generally invited to select a response from among a set of answer categories provided on the screen. In some instances, survey questions were open-ended, requiring a write-in response. An optional response category of “Don’t know” was provided in a limited number of questions. In some situations, a respondent may have skipped past the question by hitting the Next button without having provided a response. For certain critical survey questions, a missed question would elicit an automated reminder to the respondent to complete the missed question. However, respondents always had the option to skip over a question.

Special numeric codes have been designated for each type of non-responseNote in order to facilitate user recognition and data analysis.

Response

Valid skip

Don’t know

Not stated

Non-response for derived variables (DVs)

The construction of derived variables (DVs) for the CSD database often involved combining or regrouping answers of more than one survey question. Among the component variables of a DV, it is possible that some may have had valid answers, while others may have had non-response values. Where components for a given DV included any non-response code of Don’t Know or Not Stated, DVs were coded to reflect the best possible understanding of the combination of responses involved.

Non-response for external census linked variables

In the case of external census variables linked to the CSD, it should be noted that these variables do not generally contain any missing data such as Don’t Know, Refusal or Not Stated responses, since census processing operations for most variables involved imputation of all missing responses before they were linked to the CSD. The only exception to this involves variables related to the Activities of Daily Living question on the census, where data were not imputed as these variables were intended only to provide a sampling frame for the post-censal CSD. As noted below, any missing values for these variables are coded to “Not stated”.

However, there were other categories of non-response for census variables as described below:

Not applicable
Suppressed
Not stated

More information on derived variables and census variables is provided in sections 5.9 and 5.10 below.

5.6 Coding

The next step of data processing involved the review and classification of write-in responses to questionnaire items, wherever applicable—a process called coding. Two types of questions required the application of coding procedures: “Other-specify” items and questions that were completely open-ended. These are described in more detail below.

“Other-specify” items

For most questions on the CSD questionnaire, a list of answer category options was presented to respondents for their consideration. These often included on-screen help text with explanations and examples to assist with respondent selection of the most appropriate category for their situation. However, in the event that a respondent’s answer could not be easily assigned to an existing category, many questions also allowed respondents or interviewers to enter a long-answer text response in the “Other-specify” category.

All questions with “Other-specify” categories were examined and coded during processing. A total of 30 questions were coded for ‘other-specify’ responses. Twenty-five of these involved multiple response questions (“mark all that apply”) and five involved single response questions. Based on coding guidelines prepared by subject-matter specialists, many of the long answers provided by respondents for these questions were recoded back into one of the existing answer categories. Responses that were unique and qualitatively different from existing categories were kept as “Other”.

Open-ended questions and standard classifications

An additional 19 questions on the 2022 CSD questionnaire were recorded in a completely open-ended format. These included questions related to the following:

  1. The respondent’s main two medical conditions which caused them the most difficulty or limited their activities the most (up to two conditions may be reported) (ICD);
  2. Occupation and industry of work (NAICS and NOC);
  3. Major field of post-secondary study (CIP);

For most of these questions, responses were coded using a custom in-house tool called the Coding and Correction Environment (CCE). Standardized classification systems for all 4 fields were used and included the International Classification of Diseases (ICD), the North American Industry Classification System (NAICS), the National Occupation Classification (NOC), and the Classification of Instructional Programs (CIP).

Coding for standardized classifications involved a team of experienced coders and quality control supervisors. Subject matter experts in data processing applied additional verification procedures, which were particularly scrutinous for CSD 2022, given the comparability context

5.7 Consistency edits

A number of edits and imputations are required to ensure that survey data are consistent and complete. Consistency edits target inconsistencies between survey variables. At this point, data had already gone through various edits built into the electronic questionnaire. So, these edits are targeted. To give a few examples, we programmed edits to:

5.8 Variable conversion

At this stage, final variable names are established on the file. For example, the letter Q which appears in all question acronyms is removed from final variable names. All final variable names must respect an 8-character limit.

5.9 Derived variables

In order to facilitate more in-depth analysis of the rich CSD dataset, over 170 derived variables (DVs) were created by regrouping or combining answers from one or more questions on the questionnaire. This includes the creation of 39 new derived variables, specific to 2022. All DV names have a “D” in the first character position of the name for quick identification. The 2022 CSD Data Dictionaries identify all DVs.

5.10 External census-linked variables

A CSD census linkage was performed which, not only supports the legal framework and basis for the existence of the CSD, but provides tremendous benefit to the public by helping to inform disability and inclusion policies through sound analyses, increasing accountability of the Government of Canada and transparency of information for the Canadian public. The linkage between the CSD and the census allows for comparisons of the outcomes of persons with and without disabilities, specifically labour market status, income and education, often demonstrating socioeconomic gaps between persons with and without disabilities. Without the ability to compare between persons with and without disabilities, it would not be possible to fulfil the policy and programing commitments mentioned in the previous response. Furthermore, the use of the census data for CSD respondents increases the range of available information which can support policy and programing on topics not included within the CSD, such as housing or journey to work, while reducing response burden.

In addition to the CSD variables, approximately 350 census variables were added to the final CSD processing file for 2022 through record linkage. Respondents were informed of the plan to link the CSD data to administrative data sources through the addition of the generic record linkage statement, which was added to the ‘Getting started’ pages of the survey/interview. All linked information is kept confidential and used for statistical purposes only.

For all census variables, the census variable name was preserved as much as possible on the CSD database. Some exceptions applied since CSD variable names are restricted to eight characters whereas census variable names sometimes exceeded eight characters in length. Consistency with variable names used in CSD 2017 was also considered. The 2022 CSD Data Dictionaries provide a complete listing of census variables.

The final structure and content of the data files are described in Chapter 6.

5.11 Data validation and confrontation against CSD 2017

As noted, the 2022 CSD was designed to have as much comparability as possible with the previous cycle in 2017. Until the 2022 CSD, there had not been two comparable cycles of a disability survey since the Participation and Activity Limitation survey (PALS) in 1991 and 1996.

A time series will be especially important to understand the potential long-term impact of the COVID-19 pandemic on PWD including rates of disability by type, labour market activities, income, unmet needs, among others.

Additional data validation and confrontation was performed on CSD 2022 using CSD 2017 as a benchmark.

Z-scores could not be used since we do not have enough historical comparable data. Therefore, this involved performing a scan of all Statistics Canada literature and releases that talked about persons with disabilities since 2017.

It also involved comparing historical data for key variables over time to ensure that changes were well documented and were understood within their larger contexts and not tied to discrepancies in processing methods for example.


Date modified: