Sort Help
entries

Results

All (12)

All (12) (0 to 10 of 12 results)

  • Articles and reports: 75F0002M2004010
    Description:

    This document offers a set of guidelines for analysing income distributions. It focuses on the basic intuition of the concepts and techniques instead of the equations and technical details.

    Release date: 2004-10-08

  • Articles and reports: 11-522-X20020016715
    Description:

    This paper will describe the multiple imputation of income in the National Health Interview Survey and discuss the methodological issues involved. In addition, the paper will present empirical summaries of the imputations as well as results of a Monte Carlo evaluation of inferences based on multiply imputed income items.

    Analysts of health data are often interested in studying relationships between income and health. The National Health Interview Survey, conducted by the National Center for Health Statistics of the U.S. Centers for Disease Control and Prevention, provides a rich source of data for studying such relationships. However, the nonresponse rates on two key income items, an individual's earned income and a family's total income, are over 20%. Moreover, these nonresponse rates appear to be increasing over time. A project is currently underway to multiply impute individual earnings and family income along with some other covariates for the National Health Interview Survey in 1997 and subsequent years.

    There are many challenges in developing appropriate multiple imputations for such large-scale surveys. First, there are many variables of different types, with different skip patterns and logical relationships. Second, it is not known what types of associations will be investigated by the analysts of multiply imputed data. Finally, some variables, such as family income, are collected at the family level and others, such as earned income, are collected at the individual level. To make the imputations for both the family- and individual-level variables conditional on as many predictors as possible, and to simplify modelling, we are using a modified version of the sequential regression imputation method described in Raghunathan et al. ( Survey Methodology, 2001).

    Besides issues related to the hierarchical nature of the imputations just described, there are other methodological issues of interest such as the use of transformations of the income variables, the imposition of restrictions on the values of variables, the general validity of sequential regression imputation and, even more generally, the validity of multiple-imputation inferences for surveys with complex sample designs.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016725
    Description:

    In 1997, the US Office of Management and Budget issued revised standards for the collection of race information within the federal statistical system. One revision allows individuals to choose more than one race group when responding to federal surveys and other federal data collections. This change presents challenges for analyses that involve data collected under both the old and new race-reporting systems, since the data on race are not comparable. The following paper discusses the problems encountered by these changes and methods developed to overcome them.

    Since most people under both systems report only a single race, a common proposed solution is to try to bridge the transition by assigning a single-race category to each multiple-race reporter under the new system, and to conduct analyses using just the observed and assigned single-race categories. Thus, the problem can be viewed as a missing-data problem, in which single-race responses are missing for multiple-race reporters and needing to be imputed.

    The US Office of Management and Budget suggested several simple bridging methods to handle this missing-data problem. Schenker and Parker (Statistics in Medicine, forthcoming) analysed data from the National Health Interview Survey of the US National Center for Health Statistics, which allows multiple-race reporting but also asks multiple-race reporters to specify a primary race, and found that improved bridging methods could result from incorporating individual-level and contextual covariates into the bridging models.

    While Schenker and Parker discussed only three large multiple-race groups, the current application requires predicting single-race categories for several small multiple-race groups as well. Thus, problems of sparse data arise in fitting the bridging models. We address these problems by building combined models for several multiple-race groups, thus borrowing strength across them. These and other methodological issues are discussed.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016730
    Description:

    A wide class of models of interest in social and economic research can be represented by specifying a parametric structure for the covariances of observed variables. The availability of software, such as LISREL (Jöreskog and Sörbom 1988) and EQS (Bentler 1995), has enabled these models to be fitted to survey data in many applications. In this paper, we consider approaches to inference about such models using survey data derived by complex sampling schemes. We consider evidence of finite sample biases in parameter estimation and ways to reduce such biases (Altonji and Segal 1996) and associated issues of efficiency of estimation, standard error estimation and testing. We use longitudinal data from the British Household Panel Survey for illustration. As these data are subject to attrition, we also consider the issue of how to use nonresponse weights in the modelling.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016748
    Description:

    Practitioners often use data collected from complex surveys (such as labour force and health surveys involving stratified cluster sampling) to fit logistic regression and other models of interest. A great deal of effort over the last two decades has been spent on developing methods to analyse survey data that take account of design features. This paper looks at an alternative method known as inverse sampling.

    Specialized programs, such as SUDAAN and WESVAR, are also available to implement some of the methods developed to take into account the design features. However, these methods require additional information such as survey weights, design effects or cluster identification of microdata and thus, another method is necessary.

    Inverse sampling (Hinkins et al., Survey Methodology, 1977) provides an alternative approach by undoing the complex data structures so that standard methods can be applied. Repeated subsamples with simple random structure are drawn and each subsample is analysed by standard methods and is combined to increase the efficiency. Although computer-intensive, this method has the potential to preserve confidentiality of microdata files. A drawback of the method is that it can lead to biased estimates of regression parameters when the subsample sizes are small (as in the case of stratified cluster sampling).

    In this paper, we propose using the estimating equation approach that combines the subsamples before estimation and thus leads to nearly unbiased estimates of regression parameters regardless of subsample sizes. This method is computationally less intensive than the original method. We apply the method to cluster-correlated data generated from a nested error linear regression model to illustrate its advantages. A real dataset from a Statistics Canada survey will also be analysed using the estimating equation method.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016749
    Description:

    Survey sampling is a statistical domain that has been slow to take advantage of flexible regression methods. In this technical paper, two approaches are discussed that could be used to make these regression methods accessible: adapt the techniques to the complex survey design that has been used or sample the survey data so that the standard techniques are applicable.

    In following the former route, we introduce techniques that account for the complex survey structure of the data for scatterplot smoothing and additive models. The use of penalized least squares in the sampling context is studied as a tool for the analysis of a general trend in a finite population. We focus on smooth regression with a normal error model. Ties in covariates abound for large scale surveys resulting in the application of scatterplot smoothers to means. The estimation of smooths (for example, smoothing splines) depends on the sampling design only via the sampling weights, meaning that standard software can be used for estimation. Inference for these curves is more challenging, as a result of correlations induced by the sampling design. We propose and illustrate tests that account for the sampling design. Illustrative examples are given using the Ontario health survey, including scatterplot smoothing, additive models and model diagnostics. In an attempt to resolve the problem by appropriate sampling of the survey data file, we discuss some of the hurdles that are faced when using this approach.

    Release date: 2004-09-13

  • Articles and reports: 12-001-X20040016996
    Description:

    This article studies the use of the sample distribution for the prediction of finite population totals under single-stage sampling. The proposed predictors employ the sample values of the target study variable, the sampling weights of the sample units and possibly known population values of auxiliary variables. The prediction problem is solved by estimating the expectation of the study values for units outside the sample as a function of the corresponding expectation under the sample distribution and the sampling weights. The prediction mean square error is estimated by a combination of an inverse sampling procedure and a re-sampling method. An interesting outcome of the present analysis is that several familiar estimators in common use are shown to be special cases of the proposed approach, thus providing them a new interpretation. The performance of the new and some old predictors in common use is evaluated and compared by a Monte Carlo simulation study using a real data set.

    Release date: 2004-07-14

  • Articles and reports: 12-001-X20040019186
    Description:

    In this Issue is a column where the Editor biefly presents each paper of the current issue of Survey Methodology. As well, it sometimes contain informations on structure or management changes in the journal.

    Release date: 2004-07-14

  • Articles and reports: 82-003-X20030036847
    Geography: Canada
    Description:

    This paper examines whether accepting proxy- instead of self-responses results in lower estimates of some health conditions. It analyses data from the National Population Health Survey and the Canadian Community Health Survey.

    Release date: 2004-05-18

  • Articles and reports: 12-001-X20030026784
    Description:

    Skinner and Elliot (2002) proposed a simple measure of disclosure risk for survey microdata and showed how to estimate this measure under sampling with equal probabilities. In this paper we show how their results on point estimation and variance estimation may be extended to handle unequal probability sampling. Our approach assumes a Poisson sampling design. Comments are made about the possible impact of departures from this assumption.

    Release date: 2004-01-27
Stats in brief (0)

Stats in brief (0) (0 results)

No content available at this time.

Articles and reports (12)

Articles and reports (12) (0 to 10 of 12 results)

  • Articles and reports: 75F0002M2004010
    Description:

    This document offers a set of guidelines for analysing income distributions. It focuses on the basic intuition of the concepts and techniques instead of the equations and technical details.

    Release date: 2004-10-08

  • Articles and reports: 11-522-X20020016715
    Description:

    This paper will describe the multiple imputation of income in the National Health Interview Survey and discuss the methodological issues involved. In addition, the paper will present empirical summaries of the imputations as well as results of a Monte Carlo evaluation of inferences based on multiply imputed income items.

    Analysts of health data are often interested in studying relationships between income and health. The National Health Interview Survey, conducted by the National Center for Health Statistics of the U.S. Centers for Disease Control and Prevention, provides a rich source of data for studying such relationships. However, the nonresponse rates on two key income items, an individual's earned income and a family's total income, are over 20%. Moreover, these nonresponse rates appear to be increasing over time. A project is currently underway to multiply impute individual earnings and family income along with some other covariates for the National Health Interview Survey in 1997 and subsequent years.

    There are many challenges in developing appropriate multiple imputations for such large-scale surveys. First, there are many variables of different types, with different skip patterns and logical relationships. Second, it is not known what types of associations will be investigated by the analysts of multiply imputed data. Finally, some variables, such as family income, are collected at the family level and others, such as earned income, are collected at the individual level. To make the imputations for both the family- and individual-level variables conditional on as many predictors as possible, and to simplify modelling, we are using a modified version of the sequential regression imputation method described in Raghunathan et al. ( Survey Methodology, 2001).

    Besides issues related to the hierarchical nature of the imputations just described, there are other methodological issues of interest such as the use of transformations of the income variables, the imposition of restrictions on the values of variables, the general validity of sequential regression imputation and, even more generally, the validity of multiple-imputation inferences for surveys with complex sample designs.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016725
    Description:

    In 1997, the US Office of Management and Budget issued revised standards for the collection of race information within the federal statistical system. One revision allows individuals to choose more than one race group when responding to federal surveys and other federal data collections. This change presents challenges for analyses that involve data collected under both the old and new race-reporting systems, since the data on race are not comparable. The following paper discusses the problems encountered by these changes and methods developed to overcome them.

    Since most people under both systems report only a single race, a common proposed solution is to try to bridge the transition by assigning a single-race category to each multiple-race reporter under the new system, and to conduct analyses using just the observed and assigned single-race categories. Thus, the problem can be viewed as a missing-data problem, in which single-race responses are missing for multiple-race reporters and needing to be imputed.

    The US Office of Management and Budget suggested several simple bridging methods to handle this missing-data problem. Schenker and Parker (Statistics in Medicine, forthcoming) analysed data from the National Health Interview Survey of the US National Center for Health Statistics, which allows multiple-race reporting but also asks multiple-race reporters to specify a primary race, and found that improved bridging methods could result from incorporating individual-level and contextual covariates into the bridging models.

    While Schenker and Parker discussed only three large multiple-race groups, the current application requires predicting single-race categories for several small multiple-race groups as well. Thus, problems of sparse data arise in fitting the bridging models. We address these problems by building combined models for several multiple-race groups, thus borrowing strength across them. These and other methodological issues are discussed.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016730
    Description:

    A wide class of models of interest in social and economic research can be represented by specifying a parametric structure for the covariances of observed variables. The availability of software, such as LISREL (Jöreskog and Sörbom 1988) and EQS (Bentler 1995), has enabled these models to be fitted to survey data in many applications. In this paper, we consider approaches to inference about such models using survey data derived by complex sampling schemes. We consider evidence of finite sample biases in parameter estimation and ways to reduce such biases (Altonji and Segal 1996) and associated issues of efficiency of estimation, standard error estimation and testing. We use longitudinal data from the British Household Panel Survey for illustration. As these data are subject to attrition, we also consider the issue of how to use nonresponse weights in the modelling.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016748
    Description:

    Practitioners often use data collected from complex surveys (such as labour force and health surveys involving stratified cluster sampling) to fit logistic regression and other models of interest. A great deal of effort over the last two decades has been spent on developing methods to analyse survey data that take account of design features. This paper looks at an alternative method known as inverse sampling.

    Specialized programs, such as SUDAAN and WESVAR, are also available to implement some of the methods developed to take into account the design features. However, these methods require additional information such as survey weights, design effects or cluster identification of microdata and thus, another method is necessary.

    Inverse sampling (Hinkins et al., Survey Methodology, 1977) provides an alternative approach by undoing the complex data structures so that standard methods can be applied. Repeated subsamples with simple random structure are drawn and each subsample is analysed by standard methods and is combined to increase the efficiency. Although computer-intensive, this method has the potential to preserve confidentiality of microdata files. A drawback of the method is that it can lead to biased estimates of regression parameters when the subsample sizes are small (as in the case of stratified cluster sampling).

    In this paper, we propose using the estimating equation approach that combines the subsamples before estimation and thus leads to nearly unbiased estimates of regression parameters regardless of subsample sizes. This method is computationally less intensive than the original method. We apply the method to cluster-correlated data generated from a nested error linear regression model to illustrate its advantages. A real dataset from a Statistics Canada survey will also be analysed using the estimating equation method.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016749
    Description:

    Survey sampling is a statistical domain that has been slow to take advantage of flexible regression methods. In this technical paper, two approaches are discussed that could be used to make these regression methods accessible: adapt the techniques to the complex survey design that has been used or sample the survey data so that the standard techniques are applicable.

    In following the former route, we introduce techniques that account for the complex survey structure of the data for scatterplot smoothing and additive models. The use of penalized least squares in the sampling context is studied as a tool for the analysis of a general trend in a finite population. We focus on smooth regression with a normal error model. Ties in covariates abound for large scale surveys resulting in the application of scatterplot smoothers to means. The estimation of smooths (for example, smoothing splines) depends on the sampling design only via the sampling weights, meaning that standard software can be used for estimation. Inference for these curves is more challenging, as a result of correlations induced by the sampling design. We propose and illustrate tests that account for the sampling design. Illustrative examples are given using the Ontario health survey, including scatterplot smoothing, additive models and model diagnostics. In an attempt to resolve the problem by appropriate sampling of the survey data file, we discuss some of the hurdles that are faced when using this approach.

    Release date: 2004-09-13

  • Articles and reports: 12-001-X20040016996
    Description:

    This article studies the use of the sample distribution for the prediction of finite population totals under single-stage sampling. The proposed predictors employ the sample values of the target study variable, the sampling weights of the sample units and possibly known population values of auxiliary variables. The prediction problem is solved by estimating the expectation of the study values for units outside the sample as a function of the corresponding expectation under the sample distribution and the sampling weights. The prediction mean square error is estimated by a combination of an inverse sampling procedure and a re-sampling method. An interesting outcome of the present analysis is that several familiar estimators in common use are shown to be special cases of the proposed approach, thus providing them a new interpretation. The performance of the new and some old predictors in common use is evaluated and compared by a Monte Carlo simulation study using a real data set.

    Release date: 2004-07-14

  • Articles and reports: 12-001-X20040019186
    Description:

    In this Issue is a column where the Editor biefly presents each paper of the current issue of Survey Methodology. As well, it sometimes contain informations on structure or management changes in the journal.

    Release date: 2004-07-14

  • Articles and reports: 82-003-X20030036847
    Geography: Canada
    Description:

    This paper examines whether accepting proxy- instead of self-responses results in lower estimates of some health conditions. It analyses data from the National Population Health Survey and the Canadian Community Health Survey.

    Release date: 2004-05-18

  • Articles and reports: 12-001-X20030026784
    Description:

    Skinner and Elliot (2002) proposed a simple measure of disclosure risk for survey microdata and showed how to estimate this measure under sampling with equal probabilities. In this paper we show how their results on point estimation and variance estimation may be extended to handle unequal probability sampling. Our approach assumes a Poisson sampling design. Comments are made about the possible impact of departures from this assumption.

    Release date: 2004-01-27
Journals and periodicals (0)

Journals and periodicals (0) (0 results)

No content available at this time.

Date modified: