Editing and imputation
Filter results by
Search HelpKeyword(s)
Type
Survey or statistical program
Results
All (93)
All (93) (30 to 40 of 93 results)
- Articles and reports: 12-001-X200700210493Description:
In this paper, we study the problem of variance estimation for a ratio of two totals when marginal random hot deck imputation has been used to fill in missing data. We consider two approaches to inference. In the first approach, the validity of an imputation model is required. In the second approach, the validity of an imputation model is not required but response probabilities need to be estimated, in which case the validity of a nonresponse model is required. We derive variance estimators under two distinct frameworks: the customary two-phase framework and the reverse framework.
Release date: 2008-01-03 - 32. Imputing distributions in administrative tax data ArchivedArticles and reports: 11-522-X20050019458Description:
The proposed paper presents an alternative methodology that gives the data the possibility of defining homogenous groups determined by a bottom up classification of the values of observed details. The problem is then to assign a non respondent business to one of these groups. Several assignment procedures, based on explanatory variables available in the tax returns, are compared, using gross or distributed data: parametric and non parametric classification analyses, log linear models, etc.
Release date: 2007-03-02 - Articles and reports: 11-522-X20050019474Description:
Missingness is a common feature of longitudinal studies. In recent years there has been considerable research devoted to the development of methods for the analysis of incomplete longitudinal data. One common practice is imputation by the " last observation carried forward" (LOCF) approach, in which values for missing responses are imputed using observations from the most recently completed assessment. In this talk I will first examine the performance of the LOCF approach where the generalized estimating equations (GEE) are employed as the inferential procedures.
Release date: 2007-03-02 - Articles and reports: 11-522-X20050019494Description:
Traditionally, data quality indicators reported by surveys have been the sampling variance, coverage error, non-response rate and imputation rate. To obtain an imputation rate when combining survey data and administrative data, one of the problems is to compute the imputation rate itself. The presentation will discuss how to solve this problem. First, we will discuss the desired properties when developing a rate in a general context. Second, we will develop some concepts and definitions that will help us to develop combine rates. Third, we will propose different combined rates for the case of imputation. We will then present three different combined rates, and we will discuss properties for each rate. We will end with some illustrative examples.
Release date: 2007-03-02 - Articles and reports: 12-001-X20060029548Description:
The theory of multiple imputation for missing data requires that imputations be made conditional on the sampling design. However, most standard software packages for performing model-based multiple imputation assume simple random samples, leading many practitioners not to account for complex sample design features, such as stratification and clustering, in their imputations. Theory predicts that analyses of such multiply-imputed data sets can yield biased estimates from the design-based perspective. In this article, we illustrate through simulation that (i) the bias can be severe when the design features are related to the survey variables of interest, and (ii) the bias can be reduced by controlling for the design features in the imputation models. The simulations also illustrate that conditioning on irrelevant design features in the imputation models can yield conservative inferences, provided that the models include other relevant predictors. These results suggest a prescription for imputers: the safest course of action is to include design variables in the specification of imputation models. Using real data, we demonstrate a simple approach for incorporating complex design features that can be used with some of the standard software packages for creating multiple imputations.
Release date: 2006-12-21 - 36. An evaluation of matrix sampling methods using data from the National Health and Nutrition Examination Survey ArchivedArticles and reports: 12-001-X20060029555Description:
Researchers and policy makers often use data from nationally representative probability sample surveys. The number of topics covered by such surveys, and hence the amount of interviewing time involved, have typically increased over the years, resulting in increased costs and respondent burden. A potential solution to this problem is to carefully form subsets of the items in a survey and administer one such subset to each respondent. Designs of this type are called "split-questionnaire" designs or "matrix sampling" designs. The administration of only a subset of the survey items to each respondent in a matrix sampling design creates what can be considered missing data. Multiple imputation (Rubin 1987), a general-purpose approach developed for handling data with missing values, is appealing for the analysis of data from a matrix sample, because once the multiple imputations are created, data analysts can apply standard methods for analyzing complete data from a sample survey. This paper develops and evaluates a method for creating matrix sampling forms, each form containing a subset of items to be administered to randomly selected respondents. The method can be applied in complex settings, including situations in which skip patterns are present. Forms are created in such a way that each form includes items that are predictive of the excluded items, so that subsequent analyses based on multiple imputation can recover some of the information about the excluded items that would have been collected had there been no matrix sampling. The matrix sampling and multiple-imputation methods are evaluated using data from the National Health and Nutrition Examination Survey, one of many nationally representative probability sample surveys conducted by the National Center for Health Statistics, Centers for Disease Control and Prevention. The study demonstrates the feasibility of the approach applied to a major national health survey with complex structure, and it provides practical advice about appropriate items to include in matrix sampling designs in future surveys.
Release date: 2006-12-21 - Articles and reports: 75F0002M2006007Description:
This paper summarizes the data available from SLID on housing characteristics and shelter costs, with a special focus on the imputation methods used for this data. From 1994 to 2001, the survey covered only a few housing characteristics, primarily ownership status and dwelling type. In 2002, with the start of sponsorship from Canada Mortgage and Housing Corporation (CMHC), several other characteristics and detailed shelter costs were added to the survey. Several imputation methods were also introduced at that time, in order to replace missing values due to survey non-response and to provide utility costs, which contribute to total shelter costs. These methods take advantage of SLID's longitudinal design and also use data from other sources such as the Labour Force Survey and the Census. In June 2006, further improvements in the imputation methods were introduced for 2004 and applied to past years in a historical revision. This report also documents that revision.
Release date: 2006-07-26 - Articles and reports: 12-001-X20060019260Description:
This paper considers the use of imputation and weighting to correct for measurement error in the estimation of a distribution function. The paper is motivated by the problem of estimating the distribution of hourly pay in the United Kingdom, using data from the Labour Force Survey. Errors in measurement lead to bias and the aim is to use auxiliary data, measured accurately for a subsample, to correct for this bias. Alternative point estimators are considered, based upon a variety of imputation and weighting approaches, including fractional imputation, nearest neighbour imputation, predictive mean matching and propensity score weighting. Properties of these point estimators are then compared both theoretically and by simulation. A fractional predictive mean matching imputation approach is advocated. It performs similarly to propensity score weighting, but displays slight advantages of robustness and efficiency.
Release date: 2006-07-20 - 39. Hot deck imputation for the response model ArchivedArticles and reports: 12-001-X20050029041Description:
Hot deck imputation is a procedure in which missing items are replaced with values from respondents. A model supporting such procedures is the model in which response probabilities are assumed equal within imputation cells. An efficient version of hot deck imputation is described for the cell response model and a computationally efficient variance estimator is given. An approximation to the fully efficient procedure in which a small number of values are imputed for each nonrespondent is described. Variance estimation procedures are illustrated in a Monte Carlo study.
Release date: 2006-02-17 - Articles and reports: 12-001-X20050029044Description:
Complete data methods for estimating the variances of survey estimates are biased when some data are imputed. This paper uses simulation to compare the performance of the model-assisted, the adjusted jackknife, and the multiple imputation methods for estimating the variance of a total when missing items have been imputed using hot deck imputation. The simulation studies the properties of the variance estimates for imputed estimates of totals for the full population and for domains from a single-stage disproportionate stratified sample design when underlying assumptions, such as unbiasedness of the point estimate and item responses being randomly missing within hot deck cells, do not hold. The variance estimators for full population estimates produce confidence intervals with coverage rates near the nominal level even under modest departures from the assumptions, but this finding does not apply for the domain estimates. Coverage is most sensitive to bias in the point estimates. As the simulation demonstrates, even if an imputation method gives almost unbiased estimates for the full population, estimates for domains may be very biased.
Release date: 2006-02-17
- Previous Go to previous page of All results
- 1 Go to page 1 of All results
- 2 Go to page 2 of All results
- 3 Go to page 3 of All results
- 4 (current) Go to page 4 of All results
- 5 Go to page 5 of All results
- 6 Go to page 6 of All results
- 7 Go to page 7 of All results
- ...
- 10 Go to page 10 of All results
- Next Go to next page of All results
Data (0)
Data (0) (0 results)
No content available at this time.
Analysis (85)
Analysis (85) (70 to 80 of 85 results)
- Articles and reports: 12-001-X198600214450Description:
From an annual sample of U.S. corporate tax returns, the U.S. Internal Revenue Service provides estimates of population and subpopulation totals for several hundred financial items. The basic sample design is highly stratified and fairly complex. Starting with the 1981 and 1982 samples, the design was altered to include a double sampling procedure. This was motivated by the need for better allocation of resources, in an environment of shrinking budgets. Items not observed in the subsample are predicted, using a modified hot deck imputation procedure. The present paper describes the design, estimation, and evaluation of the effects of the new procedure.
Release date: 1986-12-15 - Articles and reports: 12-001-X198600214451Description:
The Canadian Census of Construction (COC) uses a complex plan for sampling small businesses (those having a gross income of less than $750,000). Stratified samples are drawn from overlapping frames. Two subsamples are selected independently from one of the samples, and more detailed information is collected on the businesses in the subsamples. There are two possible methods of estimating totals for the variables collected in the subsamples. The first approach is to determine weights based on sampling rates. A number of different weights must be used. The second approach is to impute values to the businesses included in the sample but not in the subsamples. This approach creates a complete “rectangular” sample file, and a single weight may then be used to produce estimates for the population. This “large-scale imputation” technique is presently applied for the Census of Construction. The purpose of the study is to compare the figures obtained using various estimation techniques with the estimates produced by means of large-scale imputation.
Release date: 1986-12-15 - 73. The treatment of missing survey data ArchivedArticles and reports: 12-001-X198600114404Description:
Missing survey data occur because of total nonresponse and item nonresponse. The standard way to attempt to compensate for total nonresponse is by some form of weighting adjustment, whereas item nonresponses are handled by some form of imputation. This paper reviews methods of weighting adjustment and imputation and discusses their properties.
Release date: 1986-06-16 - 74. Basic ideas of multiple imputation for nonresponse ArchivedArticles and reports: 12-001-X198600114439Description:
Multiple imputation is a technique for handling survey nonresponse that replaces each missing value created by nonresponse by a vector of possible values that reflect uncertainty about which values to impute. A simple example and brief overview of the underlying theory are used to introduce the general procedure.
Release date: 1986-06-16 - Articles and reports: 12-001-X198600114440Description:
Statistics Canada has undertaken a project to develop a generalized edit and imputation system, the intent of which is to meet the processing requirements of most of its surveys. The various approaches to imputation for item non-response, which have been proposed, will be discussed. Important issues related to the implementation of these proposals into a generalized setting will also be addressed.
Release date: 1986-06-16 - Articles and reports: 12-001-X198600114441Description:
The analysis of survey data becomes difficult in the presence of incomplete responses. By the use of the maximum likelihood method, estimators for the parameters of interest and test statistics can be generated. In this paper the maximum likelihood estimators are given for the case where the data is considered missing at random. A method for imputing the missing values is considered along with the problem of estimating the change points in the mean. Possible extensions of the results to structured covariances and to non-randomly incomplete data are also proposed.
Release date: 1986-06-16 - Articles and reports: 12-001-X198600114442Description:
For periodic business surveys which are conducted on a monthly, quarterly or annual basis, the data for responding units must be edited and the data for non-responding units must be imputed. This paper reports on methods which can be used for editing and imputing data. The editing is comprised of consistency and statistical edits. The imputation is done for both total non-response and partial non-response.
Release date: 1986-06-16 - 78. A study of the effects of imputation groups in the nearest neighbour imputation method for the National Farm Survey ArchivedArticles and reports: 12-001-X198600114444Description:
A new processing system using the nearest neighbour (N-N) imputation method is being implemented for the National Farm Survey (NFS). An empirical study was conducted to determine if the NFS estimates would be affected by using imputation groups based on type of farm. For the specific imputation rule examined, the study showed evidence that the effect might be small.
Release date: 1986-06-16 - Articles and reports: 12-001-X198500114371Description:
This paper presents an overview of the methodology used in the processing of the 1981 Census of Agriculture data. The edit and imputation techniques are stressed, with emphasis on the multivariate search algorithm. A brief evaluation of the system’s performance is given.
Release date: 1985-06-14 - 80. Imputation in surveys: Coping with reality ArchivedArticles and reports: 12-001-X198100154934Description:
In surveys a response may be incomplete or some items may be inconsistent or, as in the case of two-phase sampling, items may be unavailable. In these cases it may be expedient to impute values for the missing items. While imputation is not a particularly good solution to any specific estimation problem, it does permit the production of arbitrary estimates in a consistent way.
The survey statistician may have to cope with a mixture of numerical and categorical items, subject to a variety of constraints. He should evaluate his technique, especially with respect to bias. He should make sure that imputed items are clearly identified and summary reports produced.
A variety of imputation techniques in current use is described and discussed, with particular reference to the practical problems involved.
Release date: 1981-06-15
- Previous Go to previous page of Analysis results
- 1 Go to page 1 of Analysis results
- ...
- 3 Go to page 3 of Analysis results
- 4 Go to page 4 of Analysis results
- 5 Go to page 5 of Analysis results
- 6 Go to page 6 of Analysis results
- 7 Go to page 7 of Analysis results
- 8 (current) Go to page 8 of Analysis results
- 9 Go to page 9 of Analysis results
- Next Go to next page of Analysis results
Reference (7)
Reference (7) ((7 results))
- Surveys and statistical programs – Documentation: 71F0031X2005002Description:
This paper introduces and explains modifications made to the Labour Force Survey estimates in January 2005. Some of these modifications include the adjustment of all LFS estimates to reflect population counts based on the 2001 Census, updates to industry and occupation classification systems and sample redesign changes.
Release date: 2005-01-26 - Surveys and statistical programs – Documentation: 92-397-XDescription:
This report covers concepts and definitions, the imputation method and data quality for this variable. The 2001 Census collected information on three types of unpaid work performed during the week preceding the Census: looking after children, housework and caring for seniors. The 2001 data on unpaid work are compared with the 1996 Census data and with the data from the General Social Survey (use of time in 1998). The report also includes historical tables.
Release date: 2005-01-11 - Surveys and statistical programs – Documentation: 92-388-XDescription:
This report contains basic conceptual and data quality information to help users interpret and make use of census occupation data. It gives an overview of the collection, coding (to the 2001 National Occupational Classification), edit and imputation of the occupation data from the 2001 Census. The report describes procedural changes between the 2001 and earlier censuses, and provides an analysis of the quality level of the 2001 Census occupation data. Finally, it details the revision of the 1991 Standard Occupational Classification used in the 1991 and 1996 Censuses to the 2001 National Occupational Classification for Statistics used in 2001. The historical comparability of data coded to the two classifications is discussed. Appendices to the report include a table showing historical data for the 1991, 1996 and 2001 Censuses.
Release date: 2004-07-15 - Surveys and statistical programs – Documentation: 92-398-XDescription:
This report contains basic conceptual and data quality information intended to facilitate the use and interpretation of census class of worker data. It provides an overview of the class of worker processing cycle including elements such as regional office processing, and edit and imputation. The report concludes with summary tables that indicate the level of data quality in the 2001 Census class of worker data.
Release date: 2004-04-22 - Surveys and statistical programs – Documentation: 85-602-XDescription:
The purpose of this report is to provide an overview of existing methods and techniques making use of personal identifiers to support record linkage. Record linkage can be loosely defined as a methodology for manipulating and / or transforming personal identifiers from individual data records from one or more operational databases and subsequently attempting to match these personal identifiers to create a composite record about an individual. Record linkage is not intended to uniquely identify individuals for operational purposes; however, it does provide probabilistic matches of varying degrees of reliability for use in statistical reporting. Techniques employed in record linkage may also be of use for investigative purposes to help narrow the field of search against existing databases when some form of personal identification information exists.
Release date: 2000-12-05 - Surveys and statistical programs – Documentation: 75F0002M1998012Description:
This paper looks at the work of the task force responsible for reviewing Statistics Canada's household and family income statistics programs, and at one of associated program changes, namely, the integration of two major sources of annual income data in Canada, the Survey of Consumer Finances (SCF) and the Survey of Labour and Income Dynamics (SLID).
Release date: 1998-12-30 - Surveys and statistical programs – Documentation: 75F0002M1997006Description:
This report documents the edit and imputation approach taken in processing Wave 1 income data from the Survey of Labour and Income Dynamics (SLID).
Release date: 1997-12-31
- Date modified: