Quality assurance

Sort Help
entries

Results

All (250)

All (250) (0 to 10 of 250 results)

  • Journals and periodicals: 75F0002M
    Description: This series provides detailed documentation on income developments, including survey design issues, data quality evaluation and exploratory research.
    Release date: 2024-04-26

  • Surveys and statistical programs – Documentation: 32-26-0007
    Description: Census of Agriculture data provide statistical information on farms and farm operators at fine geographic levels and for small subpopulations. Quality evaluation activities are essential to ensure that census data are reliable and that they meet user needs.

    This report provides data quality information pertaining to the Census of Agriculture, such as sources of error, error detection, disclosure control methods, data quality indicators, response rates and collection rates.
    Release date: 2024-02-06

  • Articles and reports: 13-604-M2024001
    Description: This documentation outlines the methodology used to develop the Distributions of household economic accounts published in January 2024 for the reference years 2010 to 2023. It describes the framework and the steps implemented to produce distributional information aligned with the National Balance Sheet Accounts and other national accounts concepts. It also includes a report on the quality of the estimated distributions.
    Release date: 2024-01-22

  • Articles and reports: 13-604-M2023001
    Description: This documentation outlines the methodology used to develop the Distributions of household economic accounts published in March 2023 for the reference years 2010 to 2022. It describes the framework and the steps implemented to produce distributional information aligned with the National Balance Sheet Accounts and other national accounts concepts. It also includes a report on the quality of the estimated distributions.
    Release date: 2023-03-31

  • Articles and reports: 13-604-M2022002
    Description:

    This documentation outlines the methodology used to develop the Distributions of household economic accounts published in August 2022 for the reference years 2010 to 2021. It describes the framework and the steps implemented to produce distributional information aligned with the National Balance Sheet Accounts and other national accounts concepts. It also includes a report on the quality of the estimated distributions.

    Release date: 2022-08-03

  • 19-22-0009
    Description:

    Join us as Statistics Canada’s Quality Secretariat will give a presentation on the importance of data quality. We are living in an exciting time for data: sources are more abundant, they are being generated in innovative ways, and they are available quicker than ever. However, a data source is not only worthless if it does not meet basic quality standards – it can be misleading, and worse than having no data at all! Statistics Canada’s Quality Secretariat has a mandate to promote good quality practices within the agency, across the Government of Canada, and internationally. For quality to truly be present, it must be incorporated into each process (from design to analysis) and into the product itself – whether that product is a microdata file or estimates derived from it. We will address why data quality is important and how one can evaluate it in practice. We will cover some basic concepts in data quality (quality assurance vs. control, metadata, etc.), and present data quality as a multidimensional concept. Finally, we will show data quality in action by evaluating a data source together. All data quality literacy levels are welcome. After all, everybody plays a part in quality!

    https://www.statcan.gc.ca/en/services/webinars/19220009

    Release date: 2022-01-26

  • Articles and reports: 11-522-X202100100015
    Description: National statistical agencies such as Statistics Canada have a responsibility to convey the quality of statistical information to users. The methods traditionally used to do this are based on measures of sampling error. As a result, they are not adapted to the estimates produced using administrative data, for which the main sources of error are not due to sampling. A more suitable approach to reporting the quality of estimates presented in a multidimensional table is described in this paper. Quality indicators were derived for various post-acquisition processing steps, such as linkage, geocoding and imputation, by estimation domain. A clustering algorithm was then used to combine domains with similar quality levels for a given estimate. Ratings to inform users of the relative quality of estimates across domains were assigned to the groups created. This indicator, called the composite quality indicator (CQI), was developed and experimented with in the Canadian Housing Statistics Program (CHSP), which aims to produce official statistics on the residential housing sector in Canada using multiple administrative data sources.

    Keywords: Unsupervised machine learning, quality assurance, administrative data, data integration, clustering.

    Release date: 2021-10-22

  • Articles and reports: 11-522-X202100100023
    Description:

    Our increasingly digital society provides multiple opportunities to maximise our use of data for the public good – using a range of sources, data types and technologies to enable us to better inform the public about social and economic matters and contribute to the effective development and evaluation of public policy. Ensuring use of data in ethically appropriate ways is an important enabler for realising the potential to use data for public good research and statistics. Earlier this year the UK Statistics Authority launched the Centre for Applied Data Ethics to provide applied data ethics services, advice, training and guidance to the analytical community across the United Kingdom. The Centre has developed a framework and portfolio of services to empower analysts to consider the ethics of their research quickly and easily, at the research design phase thus promoting a culture of ethics by design. This paper will provide an overview of this framework, the accompanying user support services and the impact of this work.

    Key words: Data ethics, data, research and statistics

    Release date: 2021-10-22

  • Articles and reports: 13-604-M2021001
    Description:

    This documentation outlines the methodology used to develop the Distributions of household economic accounts published in September 2021 for the reference years 2010 to 2020. It describes the framework and the steps implemented to produce distributional information aligned with the National Balance Sheet Accounts and other national accounts concepts. It also includes a report on the quality of the estimated distributions.

    Release date: 2021-09-07

  • Stats in brief: 89-20-00062020001
    Description:

    In this video, you will be introduced to the fundamentals of data quality, which can be summed up in six dimensions—or six different ways to think about quality. You will also learn how each dimension can be used to evaluate the quality of data.

    Release date: 2020-09-23
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (171)

Analysis (171) (0 to 10 of 171 results)

  • Journals and periodicals: 75F0002M
    Description: This series provides detailed documentation on income developments, including survey design issues, data quality evaluation and exploratory research.
    Release date: 2024-04-26

  • Articles and reports: 13-604-M2024001
    Description: This documentation outlines the methodology used to develop the Distributions of household economic accounts published in January 2024 for the reference years 2010 to 2023. It describes the framework and the steps implemented to produce distributional information aligned with the National Balance Sheet Accounts and other national accounts concepts. It also includes a report on the quality of the estimated distributions.
    Release date: 2024-01-22

  • Articles and reports: 13-604-M2023001
    Description: This documentation outlines the methodology used to develop the Distributions of household economic accounts published in March 2023 for the reference years 2010 to 2022. It describes the framework and the steps implemented to produce distributional information aligned with the National Balance Sheet Accounts and other national accounts concepts. It also includes a report on the quality of the estimated distributions.
    Release date: 2023-03-31

  • Articles and reports: 13-604-M2022002
    Description:

    This documentation outlines the methodology used to develop the Distributions of household economic accounts published in August 2022 for the reference years 2010 to 2021. It describes the framework and the steps implemented to produce distributional information aligned with the National Balance Sheet Accounts and other national accounts concepts. It also includes a report on the quality of the estimated distributions.

    Release date: 2022-08-03

  • Articles and reports: 11-522-X202100100015
    Description: National statistical agencies such as Statistics Canada have a responsibility to convey the quality of statistical information to users. The methods traditionally used to do this are based on measures of sampling error. As a result, they are not adapted to the estimates produced using administrative data, for which the main sources of error are not due to sampling. A more suitable approach to reporting the quality of estimates presented in a multidimensional table is described in this paper. Quality indicators were derived for various post-acquisition processing steps, such as linkage, geocoding and imputation, by estimation domain. A clustering algorithm was then used to combine domains with similar quality levels for a given estimate. Ratings to inform users of the relative quality of estimates across domains were assigned to the groups created. This indicator, called the composite quality indicator (CQI), was developed and experimented with in the Canadian Housing Statistics Program (CHSP), which aims to produce official statistics on the residential housing sector in Canada using multiple administrative data sources.

    Keywords: Unsupervised machine learning, quality assurance, administrative data, data integration, clustering.

    Release date: 2021-10-22

  • Articles and reports: 11-522-X202100100023
    Description:

    Our increasingly digital society provides multiple opportunities to maximise our use of data for the public good – using a range of sources, data types and technologies to enable us to better inform the public about social and economic matters and contribute to the effective development and evaluation of public policy. Ensuring use of data in ethically appropriate ways is an important enabler for realising the potential to use data for public good research and statistics. Earlier this year the UK Statistics Authority launched the Centre for Applied Data Ethics to provide applied data ethics services, advice, training and guidance to the analytical community across the United Kingdom. The Centre has developed a framework and portfolio of services to empower analysts to consider the ethics of their research quickly and easily, at the research design phase thus promoting a culture of ethics by design. This paper will provide an overview of this framework, the accompanying user support services and the impact of this work.

    Key words: Data ethics, data, research and statistics

    Release date: 2021-10-22

  • Articles and reports: 13-604-M2021001
    Description:

    This documentation outlines the methodology used to develop the Distributions of household economic accounts published in September 2021 for the reference years 2010 to 2020. It describes the framework and the steps implemented to produce distributional information aligned with the National Balance Sheet Accounts and other national accounts concepts. It also includes a report on the quality of the estimated distributions.

    Release date: 2021-09-07

  • Stats in brief: 89-20-00062020001
    Description:

    In this video, you will be introduced to the fundamentals of data quality, which can be summed up in six dimensions—or six different ways to think about quality. You will also learn how each dimension can be used to evaluate the quality of data.

    Release date: 2020-09-23

  • Stats in brief: 89-20-00062020008
    Description:

    Accuracy is one of the six dimensions of Data Quality used at Statistics Canada.   Accuracy refers to how well the data reflects the truth or what actually happened.   In this video we will present methods to describe accuracy in terms of validity and correctness. We will also discuss methods to validate and check the accuracy of data values.

    Release date: 2020-09-23

  • Articles and reports: 13-604-M2020002
    Description:

    This documentation outlines the methodology used to develop the Distributions of household economic accounts published in June 2020 for the reference years 2010 to 2019. It describes the framework and the steps implemented to produce distributional information aligned with the National balance sheet accounts and other national accounts concepts. It also includes a report on the quality of the estimated distributions.

    Release date: 2020-06-26
Reference (78)

Reference (78) (40 to 50 of 78 results)

  • Surveys and statistical programs – Documentation: 11-522-X19990015658
    Description:

    Radon, a naturally occurring gas found at some level in most homes, is an established risk factor for human lung cancer. The U.S. National Research Council (1999) has recently completed a comprehensive evaluation of the health risks of residential exposure to radon, and developed models for projecting radon lung cancer risks in the general population. This analysis suggests that radon may play a role in the etiology of 10-15% of all lung cancer cases in the United States, although these estimates are subject to considerable uncertainty. In this article, we present a partial analysis of uncertainty and variability in estimates of lung cancer risk due to residential exposure to radon in the United States using a general framework for the analysis of uncertainty and variability that we have developed previously. Specifically, we focus on estimates of the age-specific excess relative risk (ERR) and lifetime relative risk (LRR), both of which vary substantially among individuals.

    Release date: 2000-03-02

  • Surveys and statistical programs – Documentation: 11-522-X19990015660
    Description:

    There are many different situations in which one or more files need to be linked. With one file the purpose of the linkage would be to locate duplicates within the file. When there are two files, the linkage is done to identify the units that are the same on both files and thus create matched pairs. Often records that need to be linked do not have a unique identifier. Hierarchical record linkage, probabilistic record linkage and statistical matching are three methods that can be used when there is no unique identifier on the files that need to be linked. We describe the major differences between the methods. We consider how to choose variables to link, how to prepare files for linkage and how the links are identified. As well, we review tips and tricks used when linking files. Two examples, the probabilistic record linkage used in the reverse record check and the hierarchical record linkage of the Business Number (BN) master file to the Statistical Universe File (SUF) of unincorporated tax filers (T1) will be illustrated.

    Release date: 2000-03-02

  • Surveys and statistical programs – Documentation: 11-522-X19990015664
    Description:

    Much work on probabilistic methods of linkage can be found in the statistical literature. However, although many groups undoubtedly still use deterministic procedures, not much literature is available on these strategies. Furthermore there appears to exist no documentation on the comparison of results for the two strategies. Such a comparison is pertinent in the situation where we have only non-unique identifiers like names, sex, race etc. as common identifiers on which the databases are to be linked. In this work we compare a stepwise deterministic linkage strategy with the probabilistic strategy, as implemented in AUTOMATCH, for such a situation. The comparison was carried out on a linkage between medical records from the Regional Perinatal Intensive Care Centers database and education records from the Florida Department of Education. Social security numbers, available in both databases, were used to decide the true status of the record pair after matching. Match rates and error rates for the two strategies are compared and a discussion of their similarities and differences, strengths and weaknesses is presented.

    Release date: 2000-03-02

  • Surveys and statistical programs – Documentation: 11-522-X19990015666
    Description:

    The fusion sample obtained by a statistical matching process can be considered a sample out of an artificial population. The distribution of this artificial population is derived. If the correlation between specific variables is the only focus the strong demand for conditional independence can be weakened. In a simulation study the effects of violations of some assumptions leading to the distribution of the artificial population are examined. Finally some ideas concerning the establishing of the claimed conditional independence by latent class analysis are presented.

    Release date: 2000-03-02

  • Surveys and statistical programs – Documentation: 11-522-X19990015668
    Description:

    Following the problems with estimating underenumeration in the 1991 Census of England and Wales the aim for the 2001 Census is to create a database that is fully adjusted to net underenumeration. To achieve this, the paper investigates weighted donor imputation methodology that utilises information from both the census and census coverage survey (CCS). The US Census Bureau has considered a similar approach for their 2000 Census (see Isaki et al 1998). The proposed procedure distinguishes between individuals who are not counted by the census because their household is missed and those who are missed in counted households. Census data is linked to data from the CCS. Multinomial logistic regression is used to estimate the probabilities that households are missed by the census and the probabilities that individuals are missed in counted households. Household and individual coverage weights are constructed from the estimated probabilities and these feed into the donor imputation procedure.

    Release date: 2000-03-02

  • Surveys and statistical programs – Documentation: 11-522-X19990015670
    Description:

    To reach their target audience efficiently, advertisers and media planners need information on which media their customers use. For instance, they may need to know what percentage of Diet Coke drinkers watch Baywatch, or how many AT&T customers have seen an advertisement for Sprint during the last week. All the relevant data could theoretically be collected from each respondent. However, obtaining full detailed and accurate information would be very expensive. It would also impose a heavy respondent burden under current data collection technology. This information is currently collected through separate surveys in New Zealand and in many other countries. Exposure to the major media is measured continuously, and product usage studies are common. Statistical matching techniques provide a way of combining these separate information sources. The New Zealand television ratings database was combined with a syndicated survey of print readership and product usage, using statistical matching. The resulting Panorama service meets the targeting information needs of advertisers and media planners. It has since been duplicated in Australia. This paper discusses the development of the statistical matching framework for combining these databases, and the heuristics and techniques used. These included an experiment conducted using a screening design to identify important matching variables. Studies evaluating and validating the combined results are also summarized. The following three major evaluation criteria were used; accuracy of combined results, statibility of combined results and the preservation of currency results from the component databases. The paper then discusses how the prerequisites for combining the databases were met. The biggest hurdle at this stage was the differences between the analysis techniques used on the two component databases. Finally, suggestions for developing similar statistical matching systems elsewhere will be given.

    Release date: 2000-03-02

  • Surveys and statistical programs – Documentation: 11-522-X19990015672
    Description:

    Data fusion as discussed here means to create a set of data on not jointly observed variables from two different sources. Suppose for instance that observations are available for (X,Z) on a set of individuals and for (Y,Z) on a different set of individuals. Each of X, Y and Z may be a vector variable. The main purpose is to gain insight into the joint distribution of (X,Y) using Z as a so-called matching variable. At first however, it is attempted to recover as much information as possible on the joint distribution of (X,Y,Z) from the distinct sets of data. Such fusions can only be done at the cost of implementing some distributional properties for the fused data. These are conditional independencies given the matching variables. Fused data are typically discussed from the point of view of how appropriate this underlying assumption is. Here we give a different perspective. We formulate the problem as follows: how can distributions be estimated in situations when only observations from certain marginal distributions are available. It can be solved by applying the maximum entropy criterium. We show in particular that data created by fusing different sources can be interpreted as a special case of this situation. Thus, we derive the needed assumption of conditional independence as a consequence of the type of data available.

    Release date: 2000-03-02

  • Surveys and statistical programs – Documentation: 11-522-X19990015674
    Description:

    The effect of the environment on health is of increasing concern, in particular the effects of the release of industrial pollutants into the air, the ground and into water. An assessment of the risks to public health of any particular pollution source is often made using the routine health, demographic and environmental data collected by government agencies. These datasets have important differences in sampling geography and in sampling epochs which affect the epidemiological analyses which draw them together. In the UK, health events are recorded for individuals, giving cause codes, a data of diagnosis or death, and using the unit postcode as a geographical reference. In contrast, small area demographic data are recorded only at the decennial census, and released as area level data in areas distinct from postcode geography. Environmental exposure data may be available at yet another resolution, depending on the type of exposure and the source of the measurements.

    Release date: 2000-03-02

  • Surveys and statistical programs – Documentation: 11-522-X19990015678
    Description:

    A population needs-based health care resource allocation model was developed and applied using age, sex and health status of populations to measure population need for health care in Ontario. To develop the model, provincial data on self-assessed health and health service utilization by age and sex from 62,413 respondents to the 1990 Ontario Health Survey (OHS) were used in combination with provincial health care expenditure data for the fiscal year 1995/96 by age and sex. The model was limited to the services that were covered in the OHS (general practitioner, specialist physician, optometry, physiotherapy, chiropractic and acute hospital). The distribution of utilization and expenditures between age-sex-health status categories was used to establish appropriate health care resource shares for each age-sex-health status combination. These resource shares were then applied to geographic populations using age, sex and health status data from the OHS together with more recent population estimates to determine the needs-based health care resource allocation for each area. Total dollar allocations were restricted to sum to the 1995/96 provincial budget and were compared with 1995/96 allocations to determine the extent to which Ontario allocations are consistent with the relative needs of the area populations.

    Release date: 2000-03-02

  • Surveys and statistical programs – Documentation: 11-522-X19990015680
    Description:

    To augment the amount of available information, data from different sources are increasingly being combined. These databases are often combined using record linkage methods. When there is no unique identifier, a probabilistic linkage is used. In that case, a record on a first file is associated with a probability that is linked to a record on a second file, and then a decision is taken on whether a possible link is a true link or not. This usually requires a non-negligible amount of manual resolution. It might then be legitimate to evaluate if manual resolution can be reduced or even eliminated. This issue is addressed in this paper where one tries to produce an estimate of a total (or a mean) of one population, when using a sample selected from another population linked somehow to the first population. In other words, having two populations linked through probabilistic record linkage, we try to avoid any decision concerning the validity of links and still be able to produce an unbiased estimate for a total of the one of two populations. To achieve this goal, we suggest the use of the Generalised Weight Share Method (GWSM) described by Lavallée (1995).

    Release date: 2000-03-02
Date modified: