Canadian Survey on Disability, 2022: Concepts and Methods Guide
7. Data quality

Skip to text

Text begins

7.1 Overview of data quality evaluation

The objective of the Canadian Survey on Disability is to produce reliable estimates on the type and severity of disabilities of Canadians aged 15 years and over (as of May 11, 2021) as well as on a variety of other important indicators of the experiences and challenges of persons with disabilities. This chapter reviews the quality of the data for this survey.

Sections 7.2 and 7.3 below explain the two types of errors that occur in surveys—sampling and non-sampling errors. Each type of error is evaluated in the context of the CSD. Sampling error is the difference between the data obtained from the survey sample and the data that would have resulted from a complete census of the entire population taken under similar conditions. Thus, sampling error can be described as differences arising from sample-to-sample variability. Non-sampling errors refer to all other errors that are unrelated to sampling. Non-sampling errors can occur at any stage of the survey process, and include non-response for the survey as well as errors introduced before or during data collection or during data processing.

This chapter describes the various measures adopted to prevent errors from occurring wherever possible and to adjust for any errors found throughout the different stages of the CSD. Areas of caution for interpreting CSD data are noted. Readers may also refer to the Guide to the Census of Population, 2021 for related information on data quality.

7.2 Sampling errors and quality release rules

The estimates that can be produced with this survey are based on a sample of individuals. Somewhat different estimates might have been obtained if we had conducted a complete census with the same questionnaire, interviewers, supervisors, processing methods and so on, as those actually used. The difference between an estimate derived from the sample and an estimate based on a comprehensive enumeration under similar conditions is known as the estimate’s “sampling error”.

Sampling error estimation

To produce estimates of the sampling error for statistics produced from the CSD, we used a particular type of bootstrap method. Several bootstrap methods exist in the literature, but none was appropriate for the CSD’s complex sample design. The following characteristics of the sample design make it difficult to estimate the sampling errors:

A two-phase design in which households (or dwellings) are selected in the first phase and individuals in the second phase. In the first phase, a random sample of approximately one in four households, stratified by collection unit (CU), was selected to respond to the census long-form questionnaire. In the second phase, a sample of some 54,000 individuals having reported a difficulty in activities of daily living on the census was selected for the CSD.
The sampling fraction of the first-phase sample (census long-form) is non-negligible (about 1/4 in the non-remote regions), and the sampling fraction of the CSD is rather high in some strata.
The CSD strata (combinations of province/territory, age group, remote or non-remote region, mild, moderate or high severity level) are non-nested within the census strata (CUs or groups of CUs).
The method used has to be flexible enough to produce standard statistics such as proportions, totals, averages and ratios, as well as more sophisticated statistics, including percentiles and logistic regression coefficients.

In 2006, a general bootstrap method for two-phase sampling was developed and applied to the Indigenous Peoples Survey (IPS)^Note (Langlet, Beaumont and Lavallée, 2008) The underlying idea of the general bootstrap method is that the initial bootstrap weights can be seen as the product of the initial sampling weights and a random adjustment factor. In the case of a two-phase sample, the variance can be split into two components, each associated with one sampling phase. The two-phase general bootstrap method generates a random adjustment factor for each phase of sampling. In this case, the initial bootstrap weight of a given unit is the product of its initial sampling weight and the two random adjustment factors. Once initial bootstrap weights have been calculated, all weight adjustments applied to the initial sampling weights were applied to the initial bootstrap weights to obtain the final bootstrap weights. Therefore, the final bootstrap weights capture the variance associated not only with the particular sample design but also with all weight adjustments applied to the full sample to derive the final weights.

For the 2022 CSD, the method developed for the 2006 IPS was adapted to reflect the 2021 Census sample design which included the census long-form questionnaire. In terms of calculating variance, the 2021 Census sample design is considered a two-phase plan: the first phase involves the initial selection of approximately one in four households, while the second is the census respondent sample. Although the 2021 Census had a very high collection response rate (97.4% for the long-form), the second phase accounts for non-response in calculating variance for the census. That being said, in order to use the generalized two-phase method for the CSD, the two census phases were combined into a single phase, while the 2022 CSD sample made up the second phase.

There is a major advantage in having two sets of random adjustment factors. The first set of adjustment factors can be used for estimates based on the first phase only, i.e., estimates based on the census long-form sample. These estimates are used when the weights are adjusted to the census totals during calibration (Section 6.1). This produces variable census totals for each bootstrap sample, which reflects the fact that the census totals used are based on a sample and not on known fixed totals.

For the CSD, 1,000 sets of bootstrap weights were generated using the general bootstrap method. The method used is slightly biased in that it slightly overestimates the variance. The extent of the overestimation is considered negligible for the CSD. The method can also produce negative bootstrap weights. To overcome this problem, the bootstrap weights were transformed to reduce their variability. Consequently, the variance calculated with these transformed bootstrap weights has to be multiplied by a factor which is a function of a certain parameter. The parameter’s value is chosen as the smallest integer that makes all bootstrap weights positive. For the CSD, this factor is 4. The variances calculated from the transformed bootstrap weights must therefore be multiplied by 4² = 16. Similarly, the standard error (square root of the variance) must be multiplied by 4. However, most software applications that produce sampling error estimates from bootstrap weights have an option to specify this adjustment factor, so that the correct variance estimate is obtained without the extra step of multiplying by the constant.

Start of text box

It is extremely important to use the appropriate multiplicative factor for any estimate of sampling error such as variance, standard error, CI or CV. Omission of this multiplicative factor will lead to erroneous results and conclusions. This factor is often specified as the “Fay adjustment factor” in software applications that produce sampling error estimates from bootstrap weights.

For examples of procedures using the Fay adjustment factor, see the User Guide (Canadian Survey on Disability, 2022: A User Guide to the Analytical Data Files).

End of text box

Quality release rules

The quality indicator used to report the quality of estimates in terms of their sampling error for the 2022 CSD is the 95% confidence interval (CI). For more information on confidence intervals, please see section 8.6.

In disseminated tables, rules are applied for issuing quality warnings and for applying quality suppressions when estimates are deemed not reliable. The quality release rules for the CSD are mainly based on the sample size because the quality of all estimates from a sample survey is affected by the number of respondents that contribute to the estimate. The length of the CI is also considered in the quality release rules because estimates with a high sampling error are less reliable and should be used with caution. The rules are different whether the estimate is a proportion or another type of statistic. For proportions, the CI length is reported in percentage points (p.p.). For other statistics, the CI length is calculated relative to the estimate by dividing the CI length by the estimate. Finally, the rules are slightly different for estimates at the national level.

The tables below explain how estimates are categorized according to the quality release rules and which actions are taken for each category for the dissemination of the 2022 CSD.

Table 7.1
Quality rules for subnational estimates for the 2022 CSD
Table summary
This table displays the results of Quality rules for subnational estimates for the 2022 CSD. The information is grouped by Release category (appearing as row headers), Rule and Action (appearing as column headers).
Release category	Rule		Action
Release category	For proportions^{Table 7.1 Note 1}	For other statistics^{Table 7.1 Note 2}	Action
Releasable	n ≥ 90 and CI length ≤ 14 p.p.	n ≥ 90 and CI relative length ≤ 1.4	Release with no warning. Users should use CI as quality indicator.
Releasable with warning (E)	45 ≤ n < 90 or CI length > 14 p.p.	45 ≤ n < 90 or CI relative length > 1.4	Release with quality warning (letter E). Users should use CI as quality indicator.
Not releasable (F)	n < 45 (regardless of the CI length)	n < 45 (regardless of the CI relative length)	Suppress the estimate and its CI for quality reasons (letter F).
Note 1 Note that for estimated proportions, n is defined as the unweighted count of the number of respondents in the denominator (not the numerator) of the proportion. Return to note 1 referrer Note 2 Note that for estimated means, n is defined as the unweighted count of the number of respondents that contribute to the estimate including values of zero. For estimated totals and counts, n is defined as the unweighted count of the number of respondents with nonzreo values that contribute to the estimate. Return to note 2 referrer Source: Statistics Canada, Canadian Survey on Disability, 2022.

Table 7.2
Quality rules for national estimates for the 2022 CSD
Table summary
This table displays the results of Quality rules for national estimates for the 2022 CSD. The information is grouped by Release category (appearing as row headers), Rule and Action (appearing as column headers).
Release category	Rule		Action
Release category	For proportions^{Table 7.2 Note 1}	For other statistics^{Table 7.2 Note 2}	Action
Releasable	n ≥ 180 and CI length ≤ 14 p.p.	n ≥ 180 and CI relative length ≤ 1.4	Release with no warning. Users should use CI as quality indicator.
Releasable with warning (E)	90 ≤ n < 180 or CI length > 14 p.p.	90 ≤ n < 180 or CI relative length > 1.4	Release with quality warning (letter E). Users should use CI as quality indicator.
Not releasable (F)	n < 90 (regardless of the CI length)	n < 90 (regardless of the CI relative length)	Suppress the estimate and its CI for quality reasons (letter F).
Note 1 Note that for estimated proportions, n is defined as the unweighted count of the number of respondents in the denominator (not the numerator) of the proportion. Return to note 1 referrer Note 2 Note that for estimated means, n is defined as the unweighted count of the number of respondents that contribute to the estimate including values of zero. For estimated totals and counts, n is defined as the unweighted count of the number of respondents with nonzero values that contribute to the estimate. Return to note 2 referrer Source: Statistics Canada, Canadian Survey on Disability, 2022.

For an estimated difference between two estimates, the release category of the difference is the most restrictive of the release categories of the two estimates. In other words, the release rule for a difference can be summarized as follows:

If at least one of the estimates is category F, then the estimated difference should be suppressed;
Otherwise, if at least one estimate is category E, then the estimated difference is released with warning;
Else, the estimated difference is released with no warning.

7.3 Non-sampling errors

Besides sampling errors, non-sampling errors can occur at almost every step of a survey. Respondents may misunderstand the questions and answer them inaccurately, responses may be inadvertently entered incorrectly during data capture and errors may be introduced in the processing of data. These are all examples of non-sampling errors.

Over a large number of observations, randomly occurring errors will have little effect on estimates drawn from the survey. However, errors occurring systematically may contribute to biases in the survey estimates. Thus, much time and effort were devoted to reducing non-sampling errors in the survey. At the content development stage, extensive activities were undertaken to develop questions and response categories that would be well understood by respondents. The questionnaire was tested thoroughly during several rounds of qualitative testing. In addition, many initiatives were taken in the field to encourage participation and reduce the number of non-response cases. Also important were the numerous quality assurance measures applied at the data collection, coding and processing stages to verify and correct errors in the data. Weighting adjustments were made by taking into account the different characteristics of non-respondents compared to respondents and thus minimizing any potential bias that may have been introduced.

The following paragraphs discuss the different types of non-sampling errors and the various measures used to minimize and correct these errors in the CSD.

Coverage errors

Coverage errors occur when the sampled population excludes people intended to be in the target population. Because the CSD is an extension of the 2021 census long-form, it inherits the coverage problems of that survey, which in turn inherits the coverage problems of the 2021 Census. For more information about coverage errors on the census, please see the 2021 Census Coverage Technical Report, to be released on the Statistics Canada’s website in 2024. For more information about the quality of census data, please consult Chapter 9 of the Guide to the Census of Population, 2021.

Non-response errors

Non-response errors result from not being able to collect complete information on all units in the selected sample. Non-response produces errors in the survey estimates in two ways. First, non-respondents often have different characteristics from respondents, which can result in biased survey estimates if non-response is not corrected properly. In this case, the larger the non-response rate, the larger the bias may be. Secondly, if non-response is higher than expected, it reduces the effective size of the sample. As a result, the precision of the estimates decreases (the sampling error on the estimates will increase). This second aspect can be overcome by selecting a larger sample size initially. However, this will not reduce the potential bias in the estimates.

The scope of non-response varies. One level of non-response is item non-response, where the respondent does not respond to one or more questions, but has completed a significant pre-defined portion of the overall questionnaire. Generally, the extent of partial non-response was small in the CSD as a result of extensive qualitative reviews and testing of questionnaire items. There is also total non-response when the person selected to participate in the survey could not be contacted or did not participate once contacted. Weights of respondents were increased in order to compensate for those who did not respond, as described in Section 6.1.

To reduce the number of non-response cases, many initiatives were also undertaken prior to and during data collection (as mentioned in Chapter 4). The Statistics Canada website included a CSD web page which provided a series of questions and answers for respondents, as well as general information about the survey. At the outset of collection, each selected respondent received an introductory letter providing an overview of the survey and a coloured brochure explaining the importance of participating. Respondents could also request access to survey information in Braille. During data collection, tweets and messages containing graphics and information were regularly posted on Statistics Canada’s various social media profiles to promote the CSD (e.g., Twitter, Facebook, Instagram, etc.).

In addition, in-depth interviewer training was conducted by experienced Statistics Canada staff. In conjunction with the training, detailed interviewer manuals were provided as a reference. All interviewers were under the direction of senior interviewers, who oversaw activities. Rigorous efforts to reach non-respondents through call-backs and follow-ups were also made by interviewers. Whenever possible, more than one phone number was provided for each selected respondent to maximize the chance of reaching the person during the collection period.

During the collection period, several letters and email reminders were sent to respondents encouraging them to respond. Interviewers also gave respondents their personal secure access code if they preferred to respond online rather than on the phone when contacted. A table of final response rates obtained for the 2022 CSD is provided in Section 4.8 of this guide. The overall response rate for the survey was 61.1%. Response rates were highest in the older age groups, who were easier to reach by telephone. Approximately 64% of responses were obtained through self-reporting, compared with 36% through telephone interview.

Measurement errors

Measurement errors occur when the response provided differs from the real value. Such errors may be attributable to the respondent, the interviewer, the questionnaire, the collection method or the data processing system. Extensive efforts were made for the 2022 CSD to develop questions which would be understood, relevant and sensitive to respondents’ needs.

Several rounds of qualitative testing were done for the CSD, in particular to test the electronic questionnaire format, new modules, and certain questions that were modified from 2017. Qualitative testing was carried out by Statistics Canada's Questionnaire Design Resource Centre (QDRC). To minimize measurement error, adjustments were made to question wording, categories of response, help text and question flows.

Many other measures were also taken to specifically reduce measurement error, including the use of skilled interviewers, extensive training of interviewers with respect to the survey procedures and content, and reviewing the interviewers’ notes to detect problems due to questionnaire design or misunderstanding of instructions.

Processing errors

Processing errors may occur at various stages, including programming of the electronic questionnaire, data capture by the interviewer or the respondent, coding and data editing. Quality control procedures were applied to every stage of data processing to minimize this type of error. The CSD was conducted through an electronic questionnaire, either interviewer-led or via online self-reporting. A number of edits were built into the system to warn the respondent or the interviewer in the event of inconsistencies or unusual values, making it possible to correct them immediately (see Section 5.7).

At the data processing stage, a detailed set of procedures and edit rules were used to identify and correct any inconsistencies between the responses provided. For every step of data cleaning, a set of thorough, systematized procedures were developed to assess the quality of every variable on file and correct every error found. A snapshot of the output files was taken at each step and verification was made comparing files at the current and previous step. The programming of all edit rules were tested before being applied to the data. Examples of data processing verification included: 1) the review of all question flows, including very complex sequences, to ensure skip values were accurately assigned and distinguished from different types of missing values; 2) an in-depth qualitative review of open-ended and ‘other-specify’ responses for accurate and rigorous coding; 3) experienced supervision of coding to standardized classifications; and 4) review of all derived variables against their component variables to ensure accurate programming of derivation logic, including very complex derivations. For additional information on data processing, please consult Chapter 5 of this guide.

Date modified:: 2023-12-01

Language selection

Search and menus

Search

Canadian Survey on Disability, 2022: Concepts and Methods Guide
7. Data quality

7.1 Overview of data quality evaluation

7.2 Sampling errors and quality release rules

Sampling error estimation

Quality release rules

7.3 Non-sampling errors

Coverage errors

Non-response errors

Measurement errors

Processing errors

Canadian Survey on Disability, 2022: Concepts and Methods Guide 7. Data quality

7.1 Overview of data quality evaluation

7.2 Sampling errors and quality release rules

Sampling error estimation

Quality release rules

7.3 Non-sampling errors

Coverage errors

Non-response errors

Measurement errors

Processing errors

Note of appreciation

Standards of service to the public

Copyright

Canadian Survey on Disability, 2022: Concepts and Methods Guide
7. Data quality