Sort Help
entries

Results

All (74)

All (74) (0 to 10 of 74 results)

  • Articles and reports: 75F0002M2004012
    Description:

    This study compares income estimates across several statistical programs at Statistics Canada. It examines how similar the estimates produced by different question sets are.

    Income data are collected by many household surveys. Some surveys have income as a major part of their content, and therefore collect income at a detailed level; others collect data from a much smaller set of income questions. No standard sets of income questions have been developed.

    Release date: 2004-12-23

  • Journals and periodicals: 92-395-X
    Description:

    This report describes sampling and weighting procedures used in the 2001 Census. It reviews the history of these procedures in Canadian censuses, provides operational and theoretical justifications for them, and presents the results of the evaluation studies of these procedures.

    Release date: 2004-12-15

  • Articles and reports: 75F0002M2004010
    Description:

    This document offers a set of guidelines for analysing income distributions. It focuses on the basic intuition of the concepts and techniques instead of the equations and technical details.

    Release date: 2004-10-08

  • Articles and reports: 12-002-X20040027032
    Description:

    This article examines why many Statistics Canada surveys supply bootstrap weights with their microdata for the purpose of design-based variance estimation. Bootstrap weights are not supported by commercially available software such as SUDAAN and WesVar, but there are ways to use these applications to produce boostrap variance estimates.

    The paper concludes with a brief discussion of other design-based approaches to variance estimation as well as software, programs and procedures where these methods have been employed.

    Release date: 2004-10-05

  • Articles and reports: 12-002-X20040027034
    Description:

    The use of command files in Stat/Transfer can expedite the transfer of several data sets in an efficient replicable manner. This note outlines a simple step-by-step method for creating command files and provides sample code.

    Release date: 2004-10-05

  • Articles and reports: 11-522-X20020016430
    Description:

    Linearization (or Taylor series) methods are widely used to estimate standard errors for the co-efficients of linear regression models fit to multi-stage samples. When the number of primary sampling units (PSUs) is large, linearization can produce accurate standard errors under quite general conditions. However, when the number of PSUs is small or a co-efficient depends primarily on data from a small number of PSUs, linearization estimators can have large negative bias.

    In this paper, we characterize features of the design matrix that produce large bias in linearization standard errors for linear regression co-efficients. We then propose a new method, bias reduced linearization (BRL), based on residuals adjusted to better approximate the covariance of the true errors. When the errors are independent and identically distributed (i.i.d.), the BRL estimator is unbiased for the variance. Furthermore, a simulation study shows that BRL can greatly reduce the bias, even if the errors are not i.i.d. We also propose using a Satterthwaite approximation to determine the degrees of freedom of the reference distribution for tests and confidence intervals about linear combinations of co-efficients based on the BRL estimator. We demonstrate that the jackknife estimator also tends to be biased in situations where linearization is biased. However, the jackknife's bias tends to be positive. Our bias-reduced linearization estimator can be viewed as a compromise between the traditional linearization and jackknife estimators.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016708
    Description:

    In this paper, we discuss the analysis of complex health survey data by using multivariate modelling techniques. Main interests are in various design-based and model-based methods that aim at accounting for the design complexities, including clustering, stratification and weighting. Methods covered include generalized linear modelling based on pseudo-likelihood and generalized estimating equations, linear mixed models estimated by restricted maximum likelihood, and hierarchical Bayes techniques using Markov Chain Monte Carlo (MCMC) methods. The methods will be compared empirically, using data from an extensive health interview and examination survey conducted in Finland in 2000 (Health 2000 Study).

    The data of the Health 2000 Study were collected using personal interviews, questionnaires and clinical examinations. A stratified two-stage cluster sampling design was used in the survey. The sampling design involved positive intra-cluster correlation for many study variables. For a closer investigation, we selected a small number of study variables from the health interview and health examination phases. In many cases, the different methods produced similar numerical results and supported similar statistical conclusions. Methods that failed to account for the design complexities sometimes led to conflicting conclusions. We also discuss the application of the methods in this paper by using standard statistical software products.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016712
    Description:

    In this paper, we consider the effect of the interval censoring of cessation time on intensity parameter estimation with regard to smoking cessation and pregnancy. The three waves of the National Population Health Survey allow the methodology of event history analysis to be applied to smoking initiation, cessation and relapse. One issue of interest is the relationship between smoking cessation and pregnancy. If a longitudinal respondent who is a smoker at the first cycle ceases smoking by the second cycle, we know the cessation time to within an interval of length at most a year, since the respondent is asked for the age at which she stopped smoking, and her date of birth is known. We also know whether she is pregnant at the time of the second cycle, and whether she has given birth since the time of the first cycle. For many such subjects, we know the date of conception to within a relatively small interval. If we knew the time of smoking cessation and pregnancy period exactly for each member who experienced one or other of these events between cycles, we could model their temporal relationship through their joint intensities.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016714
    Description:

    In this highly technical paper, we illustrate the application of the delete-a-group jack-knife variance estimator approach to a particular complex multi-wave longitudinal study, demonstrating its utility for linear regression and other analytic models. The delete-a-group jack-knife variance estimator is proving a very useful tool for measuring variances under complex sampling designs. This technique divides the first-phase sample into mutually exclusive and nearly equal variance groups, deletes one group at a time to create a set of replicates and makes analogous weighting adjustments in each replicate to those done for the sample as a whole. Variance estimation proceeds in the standard (unstratified) jack-knife fashion.

    Our application is to the Chicago Health and Aging Project (CHAP), a community-based longitudinal study examining risk factors for chronic health problems of older adults. A major aim of the study is the investigation of risk factors for incident Alzheimer's disease. The current design of CHAP has two components: (1) Every three years, all surviving members of the cohort are interviewed on a variety of health-related topics. These interviews include cognitive and physical function measures. (2) At each of these waves of data collection, a stratified Poisson sample is drawn from among the respondents to the full population interview for detailed clinical evaluation and neuropsychological testing. To investigate risk factors for incident disease, a 'disease-free' cohort is identified at the preceding time point and forms one major stratum in the sampling frame.

    We provide proofs of the theoretical applicability of the delete-a-group jack-knife for particular estimators under this Poisson design, paying needed attention to the distinction between finite-population and infinite-population (model) inference. In addition, we examine the issue of determining the 'right number' of variance groups.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016715
    Description:

    This paper will describe the multiple imputation of income in the National Health Interview Survey and discuss the methodological issues involved. In addition, the paper will present empirical summaries of the imputations as well as results of a Monte Carlo evaluation of inferences based on multiply imputed income items.

    Analysts of health data are often interested in studying relationships between income and health. The National Health Interview Survey, conducted by the National Center for Health Statistics of the U.S. Centers for Disease Control and Prevention, provides a rich source of data for studying such relationships. However, the nonresponse rates on two key income items, an individual's earned income and a family's total income, are over 20%. Moreover, these nonresponse rates appear to be increasing over time. A project is currently underway to multiply impute individual earnings and family income along with some other covariates for the National Health Interview Survey in 1997 and subsequent years.

    There are many challenges in developing appropriate multiple imputations for such large-scale surveys. First, there are many variables of different types, with different skip patterns and logical relationships. Second, it is not known what types of associations will be investigated by the analysts of multiply imputed data. Finally, some variables, such as family income, are collected at the family level and others, such as earned income, are collected at the individual level. To make the imputations for both the family- and individual-level variables conditional on as many predictors as possible, and to simplify modelling, we are using a modified version of the sequential regression imputation method described in Raghunathan et al. ( Survey Methodology, 2001).

    Besides issues related to the hierarchical nature of the imputations just described, there are other methodological issues of interest such as the use of transformations of the income variables, the imposition of restrictions on the values of variables, the general validity of sequential regression imputation and, even more generally, the validity of multiple-imputation inferences for surveys with complex sample designs.

    Release date: 2004-09-13
Stats in brief (0)

Stats in brief (0) (0 results)

No content available at this time.

Articles and reports (73)

Articles and reports (73) (0 to 10 of 73 results)

  • Articles and reports: 75F0002M2004012
    Description:

    This study compares income estimates across several statistical programs at Statistics Canada. It examines how similar the estimates produced by different question sets are.

    Income data are collected by many household surveys. Some surveys have income as a major part of their content, and therefore collect income at a detailed level; others collect data from a much smaller set of income questions. No standard sets of income questions have been developed.

    Release date: 2004-12-23

  • Articles and reports: 75F0002M2004010
    Description:

    This document offers a set of guidelines for analysing income distributions. It focuses on the basic intuition of the concepts and techniques instead of the equations and technical details.

    Release date: 2004-10-08

  • Articles and reports: 12-002-X20040027032
    Description:

    This article examines why many Statistics Canada surveys supply bootstrap weights with their microdata for the purpose of design-based variance estimation. Bootstrap weights are not supported by commercially available software such as SUDAAN and WesVar, but there are ways to use these applications to produce boostrap variance estimates.

    The paper concludes with a brief discussion of other design-based approaches to variance estimation as well as software, programs and procedures where these methods have been employed.

    Release date: 2004-10-05

  • Articles and reports: 12-002-X20040027034
    Description:

    The use of command files in Stat/Transfer can expedite the transfer of several data sets in an efficient replicable manner. This note outlines a simple step-by-step method for creating command files and provides sample code.

    Release date: 2004-10-05

  • Articles and reports: 11-522-X20020016430
    Description:

    Linearization (or Taylor series) methods are widely used to estimate standard errors for the co-efficients of linear regression models fit to multi-stage samples. When the number of primary sampling units (PSUs) is large, linearization can produce accurate standard errors under quite general conditions. However, when the number of PSUs is small or a co-efficient depends primarily on data from a small number of PSUs, linearization estimators can have large negative bias.

    In this paper, we characterize features of the design matrix that produce large bias in linearization standard errors for linear regression co-efficients. We then propose a new method, bias reduced linearization (BRL), based on residuals adjusted to better approximate the covariance of the true errors. When the errors are independent and identically distributed (i.i.d.), the BRL estimator is unbiased for the variance. Furthermore, a simulation study shows that BRL can greatly reduce the bias, even if the errors are not i.i.d. We also propose using a Satterthwaite approximation to determine the degrees of freedom of the reference distribution for tests and confidence intervals about linear combinations of co-efficients based on the BRL estimator. We demonstrate that the jackknife estimator also tends to be biased in situations where linearization is biased. However, the jackknife's bias tends to be positive. Our bias-reduced linearization estimator can be viewed as a compromise between the traditional linearization and jackknife estimators.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016708
    Description:

    In this paper, we discuss the analysis of complex health survey data by using multivariate modelling techniques. Main interests are in various design-based and model-based methods that aim at accounting for the design complexities, including clustering, stratification and weighting. Methods covered include generalized linear modelling based on pseudo-likelihood and generalized estimating equations, linear mixed models estimated by restricted maximum likelihood, and hierarchical Bayes techniques using Markov Chain Monte Carlo (MCMC) methods. The methods will be compared empirically, using data from an extensive health interview and examination survey conducted in Finland in 2000 (Health 2000 Study).

    The data of the Health 2000 Study were collected using personal interviews, questionnaires and clinical examinations. A stratified two-stage cluster sampling design was used in the survey. The sampling design involved positive intra-cluster correlation for many study variables. For a closer investigation, we selected a small number of study variables from the health interview and health examination phases. In many cases, the different methods produced similar numerical results and supported similar statistical conclusions. Methods that failed to account for the design complexities sometimes led to conflicting conclusions. We also discuss the application of the methods in this paper by using standard statistical software products.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016712
    Description:

    In this paper, we consider the effect of the interval censoring of cessation time on intensity parameter estimation with regard to smoking cessation and pregnancy. The three waves of the National Population Health Survey allow the methodology of event history analysis to be applied to smoking initiation, cessation and relapse. One issue of interest is the relationship between smoking cessation and pregnancy. If a longitudinal respondent who is a smoker at the first cycle ceases smoking by the second cycle, we know the cessation time to within an interval of length at most a year, since the respondent is asked for the age at which she stopped smoking, and her date of birth is known. We also know whether she is pregnant at the time of the second cycle, and whether she has given birth since the time of the first cycle. For many such subjects, we know the date of conception to within a relatively small interval. If we knew the time of smoking cessation and pregnancy period exactly for each member who experienced one or other of these events between cycles, we could model their temporal relationship through their joint intensities.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016714
    Description:

    In this highly technical paper, we illustrate the application of the delete-a-group jack-knife variance estimator approach to a particular complex multi-wave longitudinal study, demonstrating its utility for linear regression and other analytic models. The delete-a-group jack-knife variance estimator is proving a very useful tool for measuring variances under complex sampling designs. This technique divides the first-phase sample into mutually exclusive and nearly equal variance groups, deletes one group at a time to create a set of replicates and makes analogous weighting adjustments in each replicate to those done for the sample as a whole. Variance estimation proceeds in the standard (unstratified) jack-knife fashion.

    Our application is to the Chicago Health and Aging Project (CHAP), a community-based longitudinal study examining risk factors for chronic health problems of older adults. A major aim of the study is the investigation of risk factors for incident Alzheimer's disease. The current design of CHAP has two components: (1) Every three years, all surviving members of the cohort are interviewed on a variety of health-related topics. These interviews include cognitive and physical function measures. (2) At each of these waves of data collection, a stratified Poisson sample is drawn from among the respondents to the full population interview for detailed clinical evaluation and neuropsychological testing. To investigate risk factors for incident disease, a 'disease-free' cohort is identified at the preceding time point and forms one major stratum in the sampling frame.

    We provide proofs of the theoretical applicability of the delete-a-group jack-knife for particular estimators under this Poisson design, paying needed attention to the distinction between finite-population and infinite-population (model) inference. In addition, we examine the issue of determining the 'right number' of variance groups.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016715
    Description:

    This paper will describe the multiple imputation of income in the National Health Interview Survey and discuss the methodological issues involved. In addition, the paper will present empirical summaries of the imputations as well as results of a Monte Carlo evaluation of inferences based on multiply imputed income items.

    Analysts of health data are often interested in studying relationships between income and health. The National Health Interview Survey, conducted by the National Center for Health Statistics of the U.S. Centers for Disease Control and Prevention, provides a rich source of data for studying such relationships. However, the nonresponse rates on two key income items, an individual's earned income and a family's total income, are over 20%. Moreover, these nonresponse rates appear to be increasing over time. A project is currently underway to multiply impute individual earnings and family income along with some other covariates for the National Health Interview Survey in 1997 and subsequent years.

    There are many challenges in developing appropriate multiple imputations for such large-scale surveys. First, there are many variables of different types, with different skip patterns and logical relationships. Second, it is not known what types of associations will be investigated by the analysts of multiply imputed data. Finally, some variables, such as family income, are collected at the family level and others, such as earned income, are collected at the individual level. To make the imputations for both the family- and individual-level variables conditional on as many predictors as possible, and to simplify modelling, we are using a modified version of the sequential regression imputation method described in Raghunathan et al. ( Survey Methodology, 2001).

    Besides issues related to the hierarchical nature of the imputations just described, there are other methodological issues of interest such as the use of transformations of the income variables, the imposition of restrictions on the values of variables, the general validity of sequential regression imputation and, even more generally, the validity of multiple-imputation inferences for surveys with complex sample designs.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016716
    Description:

    Missing data are a constant problem in large-scale surveys. Such incompleteness is usually dealt with either by restricting the analysis to the cases with complete records or by imputing, for each missing item, an efficiently estimated value. The deficiencies of these approaches will be discussed in this paper, especially in the context of estimating a large number of quantities. The main part of the paper will describe two examples of analyses using multiple imputation.

    In the first, the International Labour Organization (ILO) employment status is imputed in the British Labour Force Survey by a Bayesian bootstrap method. It is an adaptation of the hot-deck method, which seeks to fully exploit the auxiliary information. Important auxiliary information is given by the previous ILO status, when available, and the standard demographic variables.

    Missing data can be interpreted more generally, as in the framework of the expectation maximization (EM) algorithm. The second example is from the Scottish House Condition Survey, and its focus is on the inconsistency of the surveyors. The surveyors assess the sampled dwelling units on a large number of elements or features of the dwelling, such as internal walls, roof and plumbing, that are scored and converted to a summarizing 'comprehensive repair cost.' The level of inconsistency is estimated from the discrepancies between the pairs of assessments of doubly surveyed dwellings. The principal research questions concern the amount of information that is lost as a result of the inconsistency and whether the naive estimators that ignore the inconsistency are unbiased. The problem is solved by multiple imputation, generating plausible scores for all the dwellings in the survey.

    Release date: 2004-09-13
Journals and periodicals (1)

Journals and periodicals (1) ((1 result))

  • Journals and periodicals: 92-395-X
    Description:

    This report describes sampling and weighting procedures used in the 2001 Census. It reviews the history of these procedures in Canadian censuses, provides operational and theoretical justifications for them, and presents the results of the evaluation studies of these procedures.

    Release date: 2004-12-15
Date modified: