Analysis

Skip to main content
Skip to footer

Language selection

Français

Search and menus

Search and menus

Search

Skip to filters. View results.

What’s new on our website

Statistics Canada's Trust Centre

Results

All (74)

All (74) (10 to 20 of 74 results)

11. Examples of multiple imputation in large-scale surveys Archived
Articles and reports: 11-522-X20020016716
Description:
Missing data are a constant problem in large-scale surveys. Such incompleteness is usually dealt with either by restricting the analysis to the cases with complete records or by imputing, for each missing item, an efficiently estimated value. The deficiencies of these approaches will be discussed in this paper, especially in the context of estimating a large number of quantities. The main part of the paper will describe two examples of analyses using multiple imputation.
In the first, the International Labour Organization (ILO) employment status is imputed in the British Labour Force Survey by a Bayesian bootstrap method. It is an adaptation of the hot-deck method, which seeks to fully exploit the auxiliary information. Important auxiliary information is given by the previous ILO status, when available, and the standard demographic variables.
Missing data can be interpreted more generally, as in the framework of the expectation maximization (EM) algorithm. The second example is from the Scottish House Condition Survey, and its focus is on the inconsistency of the surveyors. The surveyors assess the sampled dwelling units on a large number of elements or features of the dwelling, such as internal walls, roof and plumbing, that are scored and converted to a summarizing 'comprehensive repair cost.' The level of inconsistency is estimated from the discrepancies between the pairs of assessments of doubly surveyed dwellings. The principal research questions concern the amount of information that is lost as a result of the inconsistency and whether the naive estimators that ignore the inconsistency are unbiased. The problem is solved by multiple imputation, generating plausible scores for all the dwellings in the survey.
Release date: 2004-09-13
12. Area-level models using data from multiple surveys Archived
Articles and reports: 11-522-X20020016717
Description:
In the United States, the National Health and Nutrition Examination Survey (NHANES) is linked to the National Health Interview Survey (NHIS) at the primary sampling unit level (the same counties, but not necessarily the same persons, are in both surveys). The NHANES examines about 5,000 persons per year, while the NHIS samples about 100,000 persons per year. In this paper, we present and develop properties of models that allow NHIS and administrative data to be used as auxiliary information for estimating quantities of interest in the NHANES. The methodology, related to Fay-Herriot (1979) small-area models and to calibration estimators in Deville and Sarndal (1992), accounts for the survey designs in the error structure.
Release date: 2004-09-13
13. Obtaining cancer risk factor prevalence estimates in small areas Archived
Articles and reports: 11-522-X20020016718
Description:
Cancer surveillance research requires accurate estimates of risk factors at the small area level. These risk factors are often obtained from surveys such as the National Health Interview Survey (NHIS) or the Behavioral Risk Factors Surveillance Survey (BRFSS). Unfortunately, no one population-based survey provides ideal prevalence estimates of such risk factors. One strategy is to combine information from multiple surveys, using the complementary strengths of one survey to compensate for the weakness of the other. The NHIS is a nationally representative, face-to-face survey with a high response rate; however, it cannot produce state or substate estimates of risk factor prevalence because sample sizes are too small. The BRFSS is a state-level telephone survey that excludes non-telephone households and has a lower response rate, but does provide reasonable sample sizes in all states and many counties. Several methods are available for constructing small-area estimators that combine information from both the NHIS and the BRFSS, including direct estimators, estimators under hierarchical Bayes models and model-assisted estimators. In this paper, we focus on the latter, constructing generalized regression (GREG) and 'minimum-distance' estimators and using existing and newly developed small-area smoothing techniques to smooth the resulting estimators.
Release date: 2004-09-13
14. A comparison of approaches to modelling health and environment Archived
Articles and reports: 11-522-X20020016719
Description:
This study takes a look at the modelling methods used for public health data. Public health has a renewed interest in the impact of the environment on health. Ecological or contextual studies ideally investigate these relationships using public health data augmented with environmental characteristics in multilevel or hierarchical models. In these models, individual respondents in health data are the first level and community data are the second level. Most public health data use complex sample survey designs, which require analyses accounting for the clustering, nonresponse, and poststratification to obtain representative estimates of prevalence of health risk behaviours.
This study uses the Behavioral Risk Factor Surveillance System (BRFSS), a state-specific US health risk factor surveillance system conducted by the Center for Disease Control and Prevention, which assesses health risk factors in over 200,000 adults annually. BRFSS data are now available at the metropolitan statistical area (MSA) level and provide quality health information for studies of environmental effects. MSA-level analyses combining health and environmental data are further complicated by joint requirements of the survey sample design and the multilevel analyses.
We compare three modelling methods in a study of physical activity and selected environmental factors using BRFSS 2000 data. Each of the methods described here is a valid way to analyse complex sample survey data augmented with environmental information, although each accounts for the survey design and multilevel data structure in a different manner and is thus appropriate for slightly different research questions.
Release date: 2004-09-13
15. Simulation study to assess the precision of the two-stage cluster survey for injection safety Archived
Articles and reports: 11-522-X20020016721
Description:
This paper examines the simulation study that was conducted to assess the sampling scheme designed for the World Health Organization (WHO) Injection Safety Assessment Survey. The objective of this assessment survey is to determine whether facilities in which injections are given meet the necessary safety requirements for injection administration, equipment, supplies and waste disposal. The main parameter of interest is the proportion of health care facilities in a country that have safe injection practices.
The objective of this simulation study was to assess the accuracy and precision of the proposed sampling design. To this end, two artificial populations were created based on the two African countries of Niger and Burkina Faso, in which the pilot survey was tested. To create a wide variety of hypothetical populations, the assignment of whether a health care facility was safe or not was based on the different combinations of the population proportion of safe health care facilities in the country, the homogeneity of the districts in the country with respect to injection safety, and whether the health care facility was located in an urban or rural district.
Using the results of the simulation, a multi-factor analysis of variance was used to determine which factors affect the outcome measures of absolute bias, standard error and mean-squared error.
Release date: 2004-09-13
16. Modelling the impacts of colorectal cancer screening in Canada using POHEM Archived
Articles and reports: 11-522-X20020016722
Geography: Canada
Description:
Colorectal cancer (CRC) is the second cause of cancer deaths in Canada. Randomized controlled trials (RCT) have shown the efficacy of screening using faecal occult blood tests (FOBT). A comprehensive evaluation of the costs and consequences of CRC screening for the Canadian population is required before implementing such a program. This paper evaluates whether or not the CRC screening is cost-effective. The results of these simulations will be provided to the Canadian National Committee on Colorectal Cancer Screening to help formulate national policy recommendations for CRC screening.
Statistics Canada's Population Health Microsimulation Model was updated to incorporate a comprehensive CRC screening module based on Canadian data and RCT efficacy results. The module incorporated sensitivity and specificity of FOBT and colonoscopy, participation rates, incidence, staging, diagnostic and therapeutic options, disease progression, mortality and direct health care costs for different screening scenarios. Reproducing the mortality reduction observed in the Funen screening trial validated the model.
Release date: 2004-09-13
17. Perfoming logistic regression on survey data with the new surveylogistic procedure Archived
Articles and reports: 11-522-X20020016723
Description:
Categorical outcomes, such as binary, ordinal and nominal responses, occur often in survey research. Logistic regression investigates the relationship between such categorical responses variables and a set of explanatory variables. The LOGISTIC procedure can be used to perform a logistic analysis on data from a random sample. However, this approach is not valid if the data come from other sample designs, such as complex survey designs with stratification, clustering and/or unequal weighting. In these cases, specialized techniques must be applied in order to produce the appropriate estimates and standard errors.
The SURVEYLOGISTIC procedure, experimental in Version 9, brings logistic regression for survey data to the SAS System and delivers much of the functionality of the LOGISTIC procedure. This paper describes the methodological approach and applications for this new software.
Release date: 2004-09-13
18. The analysis of survey data using Stata: Some recent developments Archived
Articles and reports: 11-522-X20020016724
Description:
Some of the most commonly used statistical models are fitted using maximum likelihood (ML) or some extension of ML. Stata's ML command provides researchers and data analysts with a tool to develop estimation commands to fit their models using their data. Such models may include multiple equations, clustered observations, sampling weights and other survey design characteristics. These elements are discussed in this paper.
Release date: 2004-09-13
19. Bridging multiple-race responses in the U.S. Census to single-race categories for the calculation of vital rates Archived
Articles and reports: 11-522-X20020016725
Description:
In 1997, the US Office of Management and Budget issued revised standards for the collection of race information within the federal statistical system. One revision allows individuals to choose more than one race group when responding to federal surveys and other federal data collections. This change presents challenges for analyses that involve data collected under both the old and new race-reporting systems, since the data on race are not comparable. The following paper discusses the problems encountered by these changes and methods developed to overcome them.
Since most people under both systems report only a single race, a common proposed solution is to try to bridge the transition by assigning a single-race category to each multiple-race reporter under the new system, and to conduct analyses using just the observed and assigned single-race categories. Thus, the problem can be viewed as a missing-data problem, in which single-race responses are missing for multiple-race reporters and needing to be imputed.
The US Office of Management and Budget suggested several simple bridging methods to handle this missing-data problem. Schenker and Parker (Statistics in Medicine, forthcoming) analysed data from the National Health Interview Survey of the US National Center for Health Statistics, which allows multiple-race reporting but also asks multiple-race reporters to specify a primary race, and found that improved bridging methods could result from incorporating individual-level and contextual covariates into the bridging models.
While Schenker and Parker discussed only three large multiple-race groups, the current application requires predicting single-race categories for several small multiple-race groups as well. Thus, problems of sparse data arise in fitting the bridging models. We address these problems by building combined models for several multiple-race groups, thus borrowing strength across them. These and other methodological issues are discussed.
Release date: 2004-09-13
20. An investigation into the development and testing of a methodology for updating census indicators Archived
Articles and reports: 11-522-X20020016727
Description:
The census data are widely used in the distribution and targeting of resources at national, regional and local levels. In the United Kingdom (UK), a population census is conducted every 10 years. As time elapses, the census data become outdated and less relevant, thus making the distribution of resources less equitable. This paper examines alternative methods in rectifying this.
A number of small area methods have been developed for producing postcensal estimates, including the Structural Preserving Estimation technique as a result of Purcell and Kish (1980). This paper develops an alternative approach that is based on a linear mixed modelling approach to producing postcensal estimates. The validity of the methodology is tested on simulated data from the Finnish population register and the technique is applied to producing updated estimates for a number of the 1991 UK census variables.
Release date: 2004-09-13

Stats in brief (0)

Stats in brief (0) (0 results)

No content available at this time.

Articles and reports (73)

Articles and reports (73) (10 to 20 of 73 results)

11. Area-level models using data from multiple surveys Archived
Articles and reports: 11-522-X20020016717
Description:
In the United States, the National Health and Nutrition Examination Survey (NHANES) is linked to the National Health Interview Survey (NHIS) at the primary sampling unit level (the same counties, but not necessarily the same persons, are in both surveys). The NHANES examines about 5,000 persons per year, while the NHIS samples about 100,000 persons per year. In this paper, we present and develop properties of models that allow NHIS and administrative data to be used as auxiliary information for estimating quantities of interest in the NHANES. The methodology, related to Fay-Herriot (1979) small-area models and to calibration estimators in Deville and Sarndal (1992), accounts for the survey designs in the error structure.
Release date: 2004-09-13
12. Obtaining cancer risk factor prevalence estimates in small areas Archived
Articles and reports: 11-522-X20020016718
Description:
Cancer surveillance research requires accurate estimates of risk factors at the small area level. These risk factors are often obtained from surveys such as the National Health Interview Survey (NHIS) or the Behavioral Risk Factors Surveillance Survey (BRFSS). Unfortunately, no one population-based survey provides ideal prevalence estimates of such risk factors. One strategy is to combine information from multiple surveys, using the complementary strengths of one survey to compensate for the weakness of the other. The NHIS is a nationally representative, face-to-face survey with a high response rate; however, it cannot produce state or substate estimates of risk factor prevalence because sample sizes are too small. The BRFSS is a state-level telephone survey that excludes non-telephone households and has a lower response rate, but does provide reasonable sample sizes in all states and many counties. Several methods are available for constructing small-area estimators that combine information from both the NHIS and the BRFSS, including direct estimators, estimators under hierarchical Bayes models and model-assisted estimators. In this paper, we focus on the latter, constructing generalized regression (GREG) and 'minimum-distance' estimators and using existing and newly developed small-area smoothing techniques to smooth the resulting estimators.
Release date: 2004-09-13
13. A comparison of approaches to modelling health and environment Archived
Articles and reports: 11-522-X20020016719
Description:
This study takes a look at the modelling methods used for public health data. Public health has a renewed interest in the impact of the environment on health. Ecological or contextual studies ideally investigate these relationships using public health data augmented with environmental characteristics in multilevel or hierarchical models. In these models, individual respondents in health data are the first level and community data are the second level. Most public health data use complex sample survey designs, which require analyses accounting for the clustering, nonresponse, and poststratification to obtain representative estimates of prevalence of health risk behaviours.
This study uses the Behavioral Risk Factor Surveillance System (BRFSS), a state-specific US health risk factor surveillance system conducted by the Center for Disease Control and Prevention, which assesses health risk factors in over 200,000 adults annually. BRFSS data are now available at the metropolitan statistical area (MSA) level and provide quality health information for studies of environmental effects. MSA-level analyses combining health and environmental data are further complicated by joint requirements of the survey sample design and the multilevel analyses.
We compare three modelling methods in a study of physical activity and selected environmental factors using BRFSS 2000 data. Each of the methods described here is a valid way to analyse complex sample survey data augmented with environmental information, although each accounts for the survey design and multilevel data structure in a different manner and is thus appropriate for slightly different research questions.
Release date: 2004-09-13
14. Simulation study to assess the precision of the two-stage cluster survey for injection safety Archived
Articles and reports: 11-522-X20020016721
Description:
This paper examines the simulation study that was conducted to assess the sampling scheme designed for the World Health Organization (WHO) Injection Safety Assessment Survey. The objective of this assessment survey is to determine whether facilities in which injections are given meet the necessary safety requirements for injection administration, equipment, supplies and waste disposal. The main parameter of interest is the proportion of health care facilities in a country that have safe injection practices.
The objective of this simulation study was to assess the accuracy and precision of the proposed sampling design. To this end, two artificial populations were created based on the two African countries of Niger and Burkina Faso, in which the pilot survey was tested. To create a wide variety of hypothetical populations, the assignment of whether a health care facility was safe or not was based on the different combinations of the population proportion of safe health care facilities in the country, the homogeneity of the districts in the country with respect to injection safety, and whether the health care facility was located in an urban or rural district.
Using the results of the simulation, a multi-factor analysis of variance was used to determine which factors affect the outcome measures of absolute bias, standard error and mean-squared error.
Release date: 2004-09-13
15. Modelling the impacts of colorectal cancer screening in Canada using POHEM Archived
Articles and reports: 11-522-X20020016722
Geography: Canada
Description:
Colorectal cancer (CRC) is the second cause of cancer deaths in Canada. Randomized controlled trials (RCT) have shown the efficacy of screening using faecal occult blood tests (FOBT). A comprehensive evaluation of the costs and consequences of CRC screening for the Canadian population is required before implementing such a program. This paper evaluates whether or not the CRC screening is cost-effective. The results of these simulations will be provided to the Canadian National Committee on Colorectal Cancer Screening to help formulate national policy recommendations for CRC screening.
Statistics Canada's Population Health Microsimulation Model was updated to incorporate a comprehensive CRC screening module based on Canadian data and RCT efficacy results. The module incorporated sensitivity and specificity of FOBT and colonoscopy, participation rates, incidence, staging, diagnostic and therapeutic options, disease progression, mortality and direct health care costs for different screening scenarios. Reproducing the mortality reduction observed in the Funen screening trial validated the model.
Release date: 2004-09-13
16. Perfoming logistic regression on survey data with the new surveylogistic procedure Archived
Articles and reports: 11-522-X20020016723
Description:
Categorical outcomes, such as binary, ordinal and nominal responses, occur often in survey research. Logistic regression investigates the relationship between such categorical responses variables and a set of explanatory variables. The LOGISTIC procedure can be used to perform a logistic analysis on data from a random sample. However, this approach is not valid if the data come from other sample designs, such as complex survey designs with stratification, clustering and/or unequal weighting. In these cases, specialized techniques must be applied in order to produce the appropriate estimates and standard errors.
The SURVEYLOGISTIC procedure, experimental in Version 9, brings logistic regression for survey data to the SAS System and delivers much of the functionality of the LOGISTIC procedure. This paper describes the methodological approach and applications for this new software.
Release date: 2004-09-13
17. The analysis of survey data using Stata: Some recent developments Archived
Articles and reports: 11-522-X20020016724
Description:
Some of the most commonly used statistical models are fitted using maximum likelihood (ML) or some extension of ML. Stata's ML command provides researchers and data analysts with a tool to develop estimation commands to fit their models using their data. Such models may include multiple equations, clustered observations, sampling weights and other survey design characteristics. These elements are discussed in this paper.
Release date: 2004-09-13
18. Bridging multiple-race responses in the U.S. Census to single-race categories for the calculation of vital rates Archived
Articles and reports: 11-522-X20020016725
Description:
In 1997, the US Office of Management and Budget issued revised standards for the collection of race information within the federal statistical system. One revision allows individuals to choose more than one race group when responding to federal surveys and other federal data collections. This change presents challenges for analyses that involve data collected under both the old and new race-reporting systems, since the data on race are not comparable. The following paper discusses the problems encountered by these changes and methods developed to overcome them.
Since most people under both systems report only a single race, a common proposed solution is to try to bridge the transition by assigning a single-race category to each multiple-race reporter under the new system, and to conduct analyses using just the observed and assigned single-race categories. Thus, the problem can be viewed as a missing-data problem, in which single-race responses are missing for multiple-race reporters and needing to be imputed.
The US Office of Management and Budget suggested several simple bridging methods to handle this missing-data problem. Schenker and Parker (Statistics in Medicine, forthcoming) analysed data from the National Health Interview Survey of the US National Center for Health Statistics, which allows multiple-race reporting but also asks multiple-race reporters to specify a primary race, and found that improved bridging methods could result from incorporating individual-level and contextual covariates into the bridging models.
While Schenker and Parker discussed only three large multiple-race groups, the current application requires predicting single-race categories for several small multiple-race groups as well. Thus, problems of sparse data arise in fitting the bridging models. We address these problems by building combined models for several multiple-race groups, thus borrowing strength across them. These and other methodological issues are discussed.
Release date: 2004-09-13
19. An investigation into the development and testing of a methodology for updating census indicators Archived
Articles and reports: 11-522-X20020016727
Description:
The census data are widely used in the distribution and targeting of resources at national, regional and local levels. In the United Kingdom (UK), a population census is conducted every 10 years. As time elapses, the census data become outdated and less relevant, thus making the distribution of resources less equitable. This paper examines alternative methods in rectifying this.
A number of small area methods have been developed for producing postcensal estimates, including the Structural Preserving Estimation technique as a result of Purcell and Kish (1980). This paper develops an alternative approach that is based on a linear mixed modelling approach to producing postcensal estimates. The validity of the methodology is tested on simulated data from the Finnish population register and the technique is applied to producing updated estimates for a number of the 1991 UK census variables.
Release date: 2004-09-13
20. WesVar: Software for analyzing data from complex surveys Archived
Articles and reports: 11-522-X20020016728
Description:
Nearly all surveys use complex sampling designs to collect data and these data are frequently used for statistical analyses beyond the estimation of simple descriptive parameters of the target population. Many procedures available in popular statistical software packages are not appropriate for this purpose because the analyses are based on the assumption that the sample has been drawn with simple random sampling. Therefore, the results of the analyses conducted using these software packages would not be valid when the sample design incorporates multistage sampling, stratification, or clustering. Two commonly used methods for analysing data from complex surveys are replication and Taylor linearization techniques. We discuss the use of WESVAR software to compute estimates and replicate variance estimates by properly reflecting complex sampling and estimation procedures. We also illustrate the WESVAR features by using data from two Westat surveys that employ complex survey designs: the Third International Mathematics and Science Study (TIMSS) and the National Health and Nutrition Examination Survey (NHANES).
Release date: 2004-09-13

Journals and periodicals (1)

Journals and periodicals (1) ((1 result))

1. Sampling and Weighting, 2001 Census Technical Report (Reference Products: 2001 Census) Archived
Journals and periodicals: 92-395-X
Description:
This report describes sampling and weighting procedures used in the 2001 Census. It reviews the history of these procedures in Canadian censuses, provides operational and theoretical justifications for them, and presents the results of the evaluation studies of these procedures.
Release date: 2004-12-15

Report a problem or mistake on this page

Date modified:: 2024-07-27