Sort Help
entries

Results

All (18)

All (18) (0 to 10 of 18 results)

  • Articles and reports: 12-001-X202300200012
    Description: In recent decades, many different uses of auxiliary information have enriched survey sampling theory and practice. Jean-Claude Deville contributed significantly to this progress. My comments trace some of the steps on the way to one important theory for the use of auxiliary information: Estimation by calibration.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X201900200008
    Description:

    High nonresponse occurs in many sample surveys today, including important surveys carried out by government statistical agencies. An adaptive data collection can be advantageous in those conditions: Lower nonresponse bias in survey estimates can be gained, up to a point, by producing a well-balanced set of respondents. Auxiliary variables serve a twofold purpose: Used in the estimation phase, through calibrated adjustment weighting, they reduce, but do not entirely remove, the bias. In the preceding adaptive data collection phase, auxiliary variables also play a major role: They are instrumental in reducing the imbalance in the ultimate set of respondents. For such combined use of auxiliary variables, the deviation of the calibrated estimate from the unbiased estimate (under full response) is studied in the article. We show that this deviation is a sum of two components. The reducible component can be decreased through adaptive data collection, all the way to zero if perfectly balanced response is realized with respect to a chosen auxiliary vector. By contrast, the resisting component changes little or not at all by a better balanced response; it represents a part of the deviation that adaptive design does not get rid of. The relative size of the former component is an indicator of the potential payoff from an adaptive survey design.

    Release date: 2019-06-27

  • Articles and reports: 12-001-X201600214663
    Description:

    We present theoretical evidence that efforts during data collection to balance the survey response with respect to selected auxiliary variables will improve the chances for low nonresponse bias in the estimates that are ultimately produced by calibrated weighting. One of our results shows that the variance of the bias – measured here as the deviation of the calibration estimator from the (unrealized) full-sample unbiased estimator – decreases linearly as a function of the response imbalance that we assume measured and controlled continuously over the data collection period. An attractive prospect is thus a lower risk of bias if one can manage the data collection to get low imbalance. The theoretical results are validated in a simulation study with real data from an Estonian household survey.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201000211376
    Description:

    This article develops computational tools, called indicators, for judging the effectiveness of the auxiliary information used to control nonresponse bias in survey estimates, obtained in this article by calibration. This work is motivated by the survey environment in a number of countries, notably in northern Europe, where many potential auxiliary variables are derived from reliable administrative registers for household and individuals. Many auxiliary vectors can be composed. There is a need to compare these vectors to assess their potential for reducing bias. The indicators in this article are designed to meet that need. They are used in surveys at Statistics Sweden. General survey conditions are considered: There is probability sampling from the finite population, by an arbitrary sampling design; nonresponse occurs. The probability of inclusion in the sample is known for each population unit; the probability of response is unknown, causing bias. The study variable (the y-variable) is observed for the set of respondents only. No matter what auxiliary vector is used in a calibration estimator (or in any other estimation method), a residual bias will always remain. The choice of a "best possible" auxiliary vector is guided by the indicators proposed in the article. Their background and computational features are described in the early sections of the article. Their theoretical background is explained. The concluding sections are devoted to empirical studies. One of these illustrates the selection of auxiliary variables in a survey at Statistics Sweden. A second empirical illustration is a simulation with a constructed finite population; a number of potential auxiliary vectors are ranked in order of preference with the aid of the indicators.

    Release date: 2010-12-21

  • Articles and reports: 11-536-X200900110814
    Description:

    Calibration is the principal theme in many recent articles on estimation in survey sampling. Words such as "calibration approach" and "calibration estimators" are frequently used. As article authors like to point out, calibration provides a systematic way to incorporate auxiliary information in the procedure.

    Calibration has established itself as an important methodological instrument in large-scale production of statistics. Several national statistical agencies have developed software designed to compute weights, usually calibrated to auxiliary information available in administrative registers and other accurate sources.

    This paper presents a review of the calibration approach, with an emphasis on progress achieved in the past decade or so. The literature on calibration is growing rapidly; selected issues are discussed in this paper.

    The paper starts with a definition of the calibration approach. Its important features are reviewed. The calibration approach is contrasted with (generalized) regression estimation, which is an alternative but different way to take auxiliary information into account. The computational aspects of calibration are discussed, including methods for avoiding extreme weights. In the early sections of the paper, simple applications of calibration are examined: Estimation of a population total in direct, single phase sampling. Generalization to more complex parameters and more complex sampling designs are then considered. A common feature of more complex designs (sampling in two or more phases or stages) is that the available auxiliary information may consist of several components or layers. The uses of calibration in such cases of composite information are reviewed. In later sections of the paper, some examples are given to illustrate how the results of the calibration thinking may contrast with answers given by earlier established approaches. Finally, applications of calibration in the presence of nonsampling error are discussed, in particular methods for nonresponse bias adjustment.

    Release date: 2009-08-11

  • Articles and reports: 12-001-X200900110880
    Description:

    This paper provides a framework for estimation by calibration in two phase sampling designs. This work grew out of the continuing development of generalized estimation software at Statistics Canada. An important objective in this development is to provide a wide range of options for effective use of auxiliary information in different sampling designs. This objective is reflected in the general methodology for two phase designs presented in this paper.

    We consider the traditional two phase sampling design. A phase one sample is drawn from the finite population and then a phase two sample is drawn as a sub sample of the first. The study variable, whose unknown population total is to be estimated, is observed only for the units in the phase two sample. Arbitrary sampling designs are allowed in each phase of sampling. Different types of auxiliary information are identified for the computation of the calibration weights at each phase. The auxiliary variables and the study variables can be continuous or categorical.

    The paper contributes to four important areas in the general context of calibration for two phase designs:(1) Three broad types of auxiliary information for two phase designs are identified and used in the estimation. The information is incorporated into the weights in two steps: a phase one calibration and a phase two calibration. We discuss the composition of the appropriate auxiliary vectors for each step, and use a linearization method to arrive at the residuals that determine the asymptotic variance of the calibration estimator.(2) We examine the effect of alternative choices of starting weights for the calibration. The two "natural" choices for the starting weights generally produce slightly different estimators. However, under certain conditions, these two estimators have the same asymptotic variance.(3) We re examine variance estimation for the two phase calibration estimator. A new procedure is proposed that can improve significantly on the usual technique of conditioning on the phase one sample. A simulation in section 10 serves to validate the advantage of this new method.(4) We compare the calibration approach with the traditional model assisted regression technique which uses a linear regression fit at two levels. We show that the model assisted estimator has properties similar to a two phase calibration estimator.

    Release date: 2009-06-22

  • Articles and reports: 12-001-X200700210488
    Description:

    Calibration is the principal theme in many recent articles on estimation in survey sampling. Words such as "calibration approach" and "calibration estimators" are frequently used. As article authors like to point out, calibration provides a systematic way to incorporate auxiliary information in the procedure.

    Calibration has established itself as an important methodological instrument in large-scale production of statistics. Several national statistical agencies have developed software designed to compute weights, usually calibrated to auxiliary information available in administrative registers and other accurate sources.

    This paper presents a review of the calibration approach, with an emphasis on progress achieved in the past decade or so. The literature on calibration is growing rapidly; selected issues are discussed in this paper. The paper starts with a definition of the calibration approach. Its important features are reviewed. The calibration approach is contrasted with (generalized) regression estimation, which is an alternative but conceptually different way to take auxiliary information into account. The computational aspects of calibration are discussed, including methods for avoiding extreme weights. In the early sections of the paper, simple applications of calibration are examined: The estimation of a population total in direct, single phase sampling. Generalization to more complex parameters and more complex sampling designs are then considered. A common feature of more complex designs (sampling in two or more phases or stages) is that the available auxiliary information may consist of several components or layers. The uses of calibration in such cases of composite information are reviewed. Later in the paper, examples are given to illustrate how the results of the calibration thinking may contrast with answers given by earlier established approaches. Finally, applications of calibration in the presence of nonsampling error are discussed, in particular methods for nonresponse bias adjustment.

    Release date: 2008-01-03

  • Articles and reports: 12-001-X20030016605
    Description:

    In this paper, we examine the effects of model choice on different types of estimators for totals of domains (including small domains or small areas) for a sampled finite population. The paper asks how different estimator types compare for a common underlying model statement. We argue that estimator type - synthetic, generalized regression (GREG), composite, empirical best linear unbiased predicition (EBLUP), hierarchical Bayes, and so on - is one important aspect of domain estimation, and that the choice of the model, including its parameters and effects, is a second aspect, conceptually different from the first. Earlier work has not always made this distinction clear. For a given estimator type, one can derive different estimators, depending on the choice of model. In recent literature, a number of estimator types have been proposed, but there is relatively little impartial comparisons made among them. In this paper, we discuss three types: synthetic, GREG, and, to a limited extent, composite. We show that model improvement - the transition from a weaker to a stronger model - has very different effects on the different estimator types. We also show that the difference in accuracy between the different estimator types depends on the choice of model. For a well-specified model, the difference in accuracy between synthetic and GREG is negligible, but it can be substantial if the model is mis-specified. The synthetic type then tends to be highly inaccurate. We rely partly on theoretical results (for simple random sampling only) and partly on empirical results. The empirical results are based on simulations with repeated samples drawn from two finite populations, one artificially constructed, the other constructed from the real data of the Finnish Labour Force Survey.

    Release date: 2003-07-31

  • Articles and reports: 12-001-X20010015858
    Description:

    The objective of this paper is to study and measure the change (from the initial to the final weight) which results from the procedure used to modify weights. A breakdown of the final weights is proposed in order to evaluate the relative impact of the nonresponse adjustment, the correction for poststratification and the interaction between these two adjustments. This measure of change is used as a tool for comparing the effectiveness of the various methods for adjusting for nonresponse, in particular the methods relying on the formation of Response Homogeneity Groups. The measure of change is examined through a simulation study, which uses data from a Statistics Canada longitudinal survey, the Survey of Labour and Income Dynamics. The measure of change is also applied to data obtained from a second longitudinal survey, the National Longitudinal Survey of Children and Youth.

    Release date: 2001-08-22

  • Articles and reports: 12-001-X19990024884
    Description:

    In the final paper of this special issue, Estevao and Särndal consider two types of design-based estimators used for domain estimation. The first, a linear prediction estimator, is built on the principle of model fitting, requires known auxiliary information at the domain level, and results in weights that depend on the domain to be estimated. The second, a uni-weight estimator, has weights which are independent of the domain being estimated and has the clear advantage that it does not require the calculation of different weight systems for each different domain of interest. These estimators are compared and situations under which one is preferred over the other are identified.

    Release date: 2000-03-01
Articles and reports (18)

Articles and reports (18) (0 to 10 of 18 results)

  • Articles and reports: 12-001-X202300200012
    Description: In recent decades, many different uses of auxiliary information have enriched survey sampling theory and practice. Jean-Claude Deville contributed significantly to this progress. My comments trace some of the steps on the way to one important theory for the use of auxiliary information: Estimation by calibration.
    Release date: 2024-01-03

  • Articles and reports: 12-001-X201900200008
    Description:

    High nonresponse occurs in many sample surveys today, including important surveys carried out by government statistical agencies. An adaptive data collection can be advantageous in those conditions: Lower nonresponse bias in survey estimates can be gained, up to a point, by producing a well-balanced set of respondents. Auxiliary variables serve a twofold purpose: Used in the estimation phase, through calibrated adjustment weighting, they reduce, but do not entirely remove, the bias. In the preceding adaptive data collection phase, auxiliary variables also play a major role: They are instrumental in reducing the imbalance in the ultimate set of respondents. For such combined use of auxiliary variables, the deviation of the calibrated estimate from the unbiased estimate (under full response) is studied in the article. We show that this deviation is a sum of two components. The reducible component can be decreased through adaptive data collection, all the way to zero if perfectly balanced response is realized with respect to a chosen auxiliary vector. By contrast, the resisting component changes little or not at all by a better balanced response; it represents a part of the deviation that adaptive design does not get rid of. The relative size of the former component is an indicator of the potential payoff from an adaptive survey design.

    Release date: 2019-06-27

  • Articles and reports: 12-001-X201600214663
    Description:

    We present theoretical evidence that efforts during data collection to balance the survey response with respect to selected auxiliary variables will improve the chances for low nonresponse bias in the estimates that are ultimately produced by calibrated weighting. One of our results shows that the variance of the bias – measured here as the deviation of the calibration estimator from the (unrealized) full-sample unbiased estimator – decreases linearly as a function of the response imbalance that we assume measured and controlled continuously over the data collection period. An attractive prospect is thus a lower risk of bias if one can manage the data collection to get low imbalance. The theoretical results are validated in a simulation study with real data from an Estonian household survey.

    Release date: 2016-12-20

  • Articles and reports: 12-001-X201000211376
    Description:

    This article develops computational tools, called indicators, for judging the effectiveness of the auxiliary information used to control nonresponse bias in survey estimates, obtained in this article by calibration. This work is motivated by the survey environment in a number of countries, notably in northern Europe, where many potential auxiliary variables are derived from reliable administrative registers for household and individuals. Many auxiliary vectors can be composed. There is a need to compare these vectors to assess their potential for reducing bias. The indicators in this article are designed to meet that need. They are used in surveys at Statistics Sweden. General survey conditions are considered: There is probability sampling from the finite population, by an arbitrary sampling design; nonresponse occurs. The probability of inclusion in the sample is known for each population unit; the probability of response is unknown, causing bias. The study variable (the y-variable) is observed for the set of respondents only. No matter what auxiliary vector is used in a calibration estimator (or in any other estimation method), a residual bias will always remain. The choice of a "best possible" auxiliary vector is guided by the indicators proposed in the article. Their background and computational features are described in the early sections of the article. Their theoretical background is explained. The concluding sections are devoted to empirical studies. One of these illustrates the selection of auxiliary variables in a survey at Statistics Sweden. A second empirical illustration is a simulation with a constructed finite population; a number of potential auxiliary vectors are ranked in order of preference with the aid of the indicators.

    Release date: 2010-12-21

  • Articles and reports: 11-536-X200900110814
    Description:

    Calibration is the principal theme in many recent articles on estimation in survey sampling. Words such as "calibration approach" and "calibration estimators" are frequently used. As article authors like to point out, calibration provides a systematic way to incorporate auxiliary information in the procedure.

    Calibration has established itself as an important methodological instrument in large-scale production of statistics. Several national statistical agencies have developed software designed to compute weights, usually calibrated to auxiliary information available in administrative registers and other accurate sources.

    This paper presents a review of the calibration approach, with an emphasis on progress achieved in the past decade or so. The literature on calibration is growing rapidly; selected issues are discussed in this paper.

    The paper starts with a definition of the calibration approach. Its important features are reviewed. The calibration approach is contrasted with (generalized) regression estimation, which is an alternative but different way to take auxiliary information into account. The computational aspects of calibration are discussed, including methods for avoiding extreme weights. In the early sections of the paper, simple applications of calibration are examined: Estimation of a population total in direct, single phase sampling. Generalization to more complex parameters and more complex sampling designs are then considered. A common feature of more complex designs (sampling in two or more phases or stages) is that the available auxiliary information may consist of several components or layers. The uses of calibration in such cases of composite information are reviewed. In later sections of the paper, some examples are given to illustrate how the results of the calibration thinking may contrast with answers given by earlier established approaches. Finally, applications of calibration in the presence of nonsampling error are discussed, in particular methods for nonresponse bias adjustment.

    Release date: 2009-08-11

  • Articles and reports: 12-001-X200900110880
    Description:

    This paper provides a framework for estimation by calibration in two phase sampling designs. This work grew out of the continuing development of generalized estimation software at Statistics Canada. An important objective in this development is to provide a wide range of options for effective use of auxiliary information in different sampling designs. This objective is reflected in the general methodology for two phase designs presented in this paper.

    We consider the traditional two phase sampling design. A phase one sample is drawn from the finite population and then a phase two sample is drawn as a sub sample of the first. The study variable, whose unknown population total is to be estimated, is observed only for the units in the phase two sample. Arbitrary sampling designs are allowed in each phase of sampling. Different types of auxiliary information are identified for the computation of the calibration weights at each phase. The auxiliary variables and the study variables can be continuous or categorical.

    The paper contributes to four important areas in the general context of calibration for two phase designs:(1) Three broad types of auxiliary information for two phase designs are identified and used in the estimation. The information is incorporated into the weights in two steps: a phase one calibration and a phase two calibration. We discuss the composition of the appropriate auxiliary vectors for each step, and use a linearization method to arrive at the residuals that determine the asymptotic variance of the calibration estimator.(2) We examine the effect of alternative choices of starting weights for the calibration. The two "natural" choices for the starting weights generally produce slightly different estimators. However, under certain conditions, these two estimators have the same asymptotic variance.(3) We re examine variance estimation for the two phase calibration estimator. A new procedure is proposed that can improve significantly on the usual technique of conditioning on the phase one sample. A simulation in section 10 serves to validate the advantage of this new method.(4) We compare the calibration approach with the traditional model assisted regression technique which uses a linear regression fit at two levels. We show that the model assisted estimator has properties similar to a two phase calibration estimator.

    Release date: 2009-06-22

  • Articles and reports: 12-001-X200700210488
    Description:

    Calibration is the principal theme in many recent articles on estimation in survey sampling. Words such as "calibration approach" and "calibration estimators" are frequently used. As article authors like to point out, calibration provides a systematic way to incorporate auxiliary information in the procedure.

    Calibration has established itself as an important methodological instrument in large-scale production of statistics. Several national statistical agencies have developed software designed to compute weights, usually calibrated to auxiliary information available in administrative registers and other accurate sources.

    This paper presents a review of the calibration approach, with an emphasis on progress achieved in the past decade or so. The literature on calibration is growing rapidly; selected issues are discussed in this paper. The paper starts with a definition of the calibration approach. Its important features are reviewed. The calibration approach is contrasted with (generalized) regression estimation, which is an alternative but conceptually different way to take auxiliary information into account. The computational aspects of calibration are discussed, including methods for avoiding extreme weights. In the early sections of the paper, simple applications of calibration are examined: The estimation of a population total in direct, single phase sampling. Generalization to more complex parameters and more complex sampling designs are then considered. A common feature of more complex designs (sampling in two or more phases or stages) is that the available auxiliary information may consist of several components or layers. The uses of calibration in such cases of composite information are reviewed. Later in the paper, examples are given to illustrate how the results of the calibration thinking may contrast with answers given by earlier established approaches. Finally, applications of calibration in the presence of nonsampling error are discussed, in particular methods for nonresponse bias adjustment.

    Release date: 2008-01-03

  • Articles and reports: 12-001-X20030016605
    Description:

    In this paper, we examine the effects of model choice on different types of estimators for totals of domains (including small domains or small areas) for a sampled finite population. The paper asks how different estimator types compare for a common underlying model statement. We argue that estimator type - synthetic, generalized regression (GREG), composite, empirical best linear unbiased predicition (EBLUP), hierarchical Bayes, and so on - is one important aspect of domain estimation, and that the choice of the model, including its parameters and effects, is a second aspect, conceptually different from the first. Earlier work has not always made this distinction clear. For a given estimator type, one can derive different estimators, depending on the choice of model. In recent literature, a number of estimator types have been proposed, but there is relatively little impartial comparisons made among them. In this paper, we discuss three types: synthetic, GREG, and, to a limited extent, composite. We show that model improvement - the transition from a weaker to a stronger model - has very different effects on the different estimator types. We also show that the difference in accuracy between the different estimator types depends on the choice of model. For a well-specified model, the difference in accuracy between synthetic and GREG is negligible, but it can be substantial if the model is mis-specified. The synthetic type then tends to be highly inaccurate. We rely partly on theoretical results (for simple random sampling only) and partly on empirical results. The empirical results are based on simulations with repeated samples drawn from two finite populations, one artificially constructed, the other constructed from the real data of the Finnish Labour Force Survey.

    Release date: 2003-07-31

  • Articles and reports: 12-001-X20010015858
    Description:

    The objective of this paper is to study and measure the change (from the initial to the final weight) which results from the procedure used to modify weights. A breakdown of the final weights is proposed in order to evaluate the relative impact of the nonresponse adjustment, the correction for poststratification and the interaction between these two adjustments. This measure of change is used as a tool for comparing the effectiveness of the various methods for adjusting for nonresponse, in particular the methods relying on the formation of Response Homogeneity Groups. The measure of change is examined through a simulation study, which uses data from a Statistics Canada longitudinal survey, the Survey of Labour and Income Dynamics. The measure of change is also applied to data obtained from a second longitudinal survey, the National Longitudinal Survey of Children and Youth.

    Release date: 2001-08-22

  • Articles and reports: 12-001-X19990024884
    Description:

    In the final paper of this special issue, Estevao and Särndal consider two types of design-based estimators used for domain estimation. The first, a linear prediction estimator, is built on the principle of model fitting, requires known auxiliary information at the domain level, and results in weights that depend on the domain to be estimated. The second, a uni-weight estimator, has weights which are independent of the domain being estimated and has the clear advantage that it does not require the calculation of different weight systems for each different domain of interest. These estimators are compared and situations under which one is preferred over the other are identified.

    Release date: 2000-03-01