Keyword search

Sort Help
entries

Results

All (53)

All (53) (30 to 40 of 53 results)

  • Articles and reports: 11-522-X20050019443
    Description:

    A large part of sample survey theory has been directly motivated by practical problems encountered in the design and analysis of sample surveys. On the other hand, sample survey theory has influenced practice, often leading to significant improvements. This paper will examine this interplay over the past 60 years or so.

    Release date: 2007-03-02

  • Articles and reports: 11-522-X20050019461
    Description:

    We propose a generalization of the usual coefficient of variation (CV) to address some of the known problems when used in measuring quality of estimates. Some of the problems associated with CV include interpretation when the estimate is near zero, and the inconsistency in the interpretation about precision when computed for different one-to-one monotonic transformations.

    Release date: 2007-03-02

  • Articles and reports: 11-522-X20050019491
    Geography: Canada
    Description:

    Evaluating the impact of changes to services on the health status of frail elderly adults calls for longitudinal studies. Many subjects are however lost during follow-up because of the high incidence of death in this population. Traditional methods of repeated measures analysis are thus inappropriate since they ignore subjects with incomplete follow-up data. This leads to a considerable reduction in sample size and to biases.

    Release date: 2007-03-02

  • Journals and periodicals: 92-134-X
    Description:

    This document summarizes the results of content analyses of the 2004 Census Test. The first section briefly explains the context of the content analyses by describing the nature of the sample, its limitations and the strategies used to evaluate data quality. The second section provides an overview of the results for questions that have not changed since the 2001 Census by describing the similarities between 2001 and 2004 distributions and non-response rates. The third section evaluates data quality of new census questions or questions that have changed substantially: same-sex married couples, ethnic origins, levels of schooling, location where highest diploma was obtained, school attendance, permission to access income tax files, and permission to make personal data publicly available 92 years after the census. The last section summarizes the overall results for questions whose content was coded and evaluated as part of the 2004 test, namely industry, occupation and place of work variables.

    Release date: 2006-03-21

  • Articles and reports: 12-001-X20050029040
    Description:

    A large part of sample survey theory has been directly motivated by practical problems encountered in the design and analysis of sample surveys. On the other hand, sample survey theory has influenced practice, often leading to significant improvements. This paper will examine this interplay over the past 60 years or so. Examples where new theory is needed or where theory exists but is not used will also be presented.

    Release date: 2006-02-17

  • Articles and reports: 12-001-X20040016993
    Description:

    The weighting cell estimator corrects for unit nonresponse by dividing the sample into homogeneous groups (cells) and applying a ratio correction to the respondents within each cell. Previous studies of the statistical properties of weighting cell estimators have assumed that these cells correspond to known population cells with homogeneous characteristics. In this article, we study the properties of the weighting cell estimator under a response probability model that does not require correct specification of homogeneous population cells. Instead, we assume that the response probabilities are a smooth but otherwise unspecified function of a known auxiliary variable. Under this more general model, we study the robustness of the weighting cell estimator against model misspecification. We show that, even when the population cells are unknown, the estimator is consistent with respect to the sampling design and the response model. We describe the effect of the number of weighting cells on the asymptotic properties of the estimator. Simulation experiments explore the finite sample properties of the estimator. We conclude with some guidance on how to select the size and number of cells for practical implementation of weighting cell estimation when those cells cannot be specified a priori.

    Release date: 2004-07-14

  • Articles and reports: 12-001-X20040016996
    Description:

    This article studies the use of the sample distribution for the prediction of finite population totals under single-stage sampling. The proposed predictors employ the sample values of the target study variable, the sampling weights of the sample units and possibly known population values of auxiliary variables. The prediction problem is solved by estimating the expectation of the study values for units outside the sample as a function of the corresponding expectation under the sample distribution and the sampling weights. The prediction mean square error is estimated by a combination of an inverse sampling procedure and a re-sampling method. An interesting outcome of the present analysis is that several familiar estimators in common use are shown to be special cases of the proposed approach, thus providing them a new interpretation. The performance of the new and some old predictors in common use is evaluated and compared by a Monte Carlo simulation study using a real data set.

    Release date: 2004-07-14

  • Articles and reports: 12-001-X20040016997
    Description:

    Multilevel models are often fitted to survey data gathered with a complex multistage sampling design. However, if such a design is informative, in the sense that the inclusion probabilities depend on the response variable even after conditioning on the covariates, then standard maximum likelihood estimators are biased. In this paper, following the Pseudo Maximum Likelihood (PML) approach of Skinner (1989), we propose a probability weighted estimation procedure for multilevel ordinal and binary models which eliminates the bias generated by the informativeness of the design. The reciprocals of the inclusion probabilities at each sampling stage are used to weight the log likelihood function and the weighted estimators obtained in this way are tested by means of a simulation study for the simple case of a binary random intercept model with and without covariates. The variance estimators are obtained by a bootstrap procedure. The maximization of the weighted log likelihood of the model is done by the NLMIXED procedure of the SAS, which is based on adaptive Gaussian quadrature. Also the bootstrap estimation of variances is implemented in the SAS environment.

    Release date: 2004-07-14

  • Journals and periodicals: 82F0077X
    Geography: Canada
    Description:

    The objective of this working paper series is to analyse the comparability of surveys conducted by Statistics Canada on smoking, to highlight the changes in the data among data years and to illustrate their statistical significance. The aim is to clarify any confusion regarding comparability of survey estimates of smoking prevalence and daily cigarette consumption over this period, as well as to provide the user-requested data in a technical but understandable format.

    Release date: 2002-12-16

  • Articles and reports: 11-522-X20010016252
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    The use of sample co-ordination in business surveys is crucial because it provides a way of smoothing out the survey burden. In many co-ordination methodologies, the random numbers representing the units are permanent and the sample selection method varies. In the microstrata methodology, however, it is the selection function that is permanent. On the other hand, random numbers are systematically rearranged between units for different co-ordination purposes: smoothing out the burden, updating panels or minimizing the overlap between two surveys. These rearrangements are made in the intersections of strata, known as microstrata. This microstrata method has good, mathematical properties and demonstrates a general approach to sample co-ordination in which births, deaths and strata changes are automatically handled. There are no particular constraints on stratification and rotation rates of panels. Two software programs have been written to implement this method and its evolutions: SALOMON in 1998, and MICROSTRAT in 2001.

    Release date: 2002-09-12
Data (1)

Data (1) ((1 result))

  • Public use microdata: 89M0018X
    Description:

    This is a CD-ROM product from the Ontario Adult Literacy Survey (OALS), conducted in the spring of 1998 with the goal of providing information on: the ability of Ontario immigrants to use either English or French in their daily activities; and on their self-perceived literacy skills, training needs and barriers to training.

    In order to cover the majority of Ontario immigrants, the Census Metropolitan Areas (CMAs) of Toronto, Hamilton, Ottawa, Kitchener, London and St. Catharines were included in the sample. With these 6 CMAs, about 83% of Ontario immigrants were included in the sample frame. This sample of 7,107 dwellings covered the population of Ontario immigrants in general as well as specifically targetting immigrants with a mother tongue of Italian, Chinese, Portuguese, Polish, and Spanish and immigrants born in the Caribbean Islands with a mother tongue of English.

    Each interview was approximately 1.5 hours in duration and consisted of a half-hour questionnaire, asking demographic and literacy-related questions as well as a one-hour literacy test. This literacy test was derived from that used in the 1994 International Adult Literacy Survey (IALS) and covered the domains of document and quantitative literacy. An overall response rate to the survey of 76% was achieved, resulting in 4,648 respondents.

    Release date: 1999-10-29
Analysis (48)

Analysis (48) (0 to 10 of 48 results)

  • Articles and reports: 11-522-X202100100008
    Description:

    Non-probability samples are being increasingly explored by National Statistical Offices as a complement to probability samples. We consider the scenario where the variable of interest and auxiliary variables are observed in both a probability and non-probability sample. Our objective is to use data from the non-probability sample to improve the efficiency of survey-weighted estimates obtained from the probability sample. Recently, Sakshaug, Wisniowski, Ruiz and Blom (2019) and Wisniowski, Sakshaug, Ruiz and Blom (2020) proposed a Bayesian approach to integrating data from both samples for the estimation of model parameters. In their approach, non-probability sample data are used to determine the prior distribution of model parameters, and the posterior distribution is obtained under the assumption that the probability sampling design is ignorable (or not informative). We extend this Bayesian approach to the prediction of finite population parameters under non-ignorable (or informative) sampling by conditioning on appropriate survey-weighted statistics. We illustrate the properties of our predictor through a simulation study.

    Key Words: Bayesian prediction; Gibbs sampling; Non-ignorable sampling; Statistical data integration.

    Release date: 2021-10-29

  • Articles and reports: 12-001-X201500214230
    Description:

    This paper develops allocation methods for stratified sample surveys where composite small area estimators are a priority, and areas are used as strata. Longford (2006) proposed an objective criterion for this situation, based on a weighted combination of the mean squared errors of small area means and a grand mean. Here, we redefine this approach within a model-assisted framework, allowing regressor variables and a more natural interpretation of results using an intra-class correlation parameter. We also consider several uses of power allocation, and allow the placing of other constraints such as maximum relative root mean squared errors for stratum estimators. We find that a simple power allocation can perform very nearly as well as the optimal design even when the objective is to minimize Longford’s (2006) criterion.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500214248
    Description:

    Unit level population models are often used in model-based small area estimation of totals and means, but the models may not hold for the sample if the sampling design is informative for the model. As a result, standard methods, assuming that the model holds for the sample, can lead to biased estimators. We study alternative methods that use a suitable function of the unit selection probability as an additional auxiliary variable in the sample model. We report the results of a simulation study on the bias and mean squared error (MSE) of the proposed estimators of small area means and on the relative bias of the associated MSE estimators, using informative sampling schemes to generate the samples. Alternative methods, based on modeling the conditional expectation of the design weight as a function of the model covariates and the response, are also included in the simulation study.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201300111823
    Description:

    Although weights are widely used in survey sampling their ultimate justification from the design perspective is often problematical. Here we will argue for a stepwise Bayes justification for weights that does not depend explicitly on the sampling design. This approach will make use of the standard kind of information present in auxiliary variables however it will not assume a model relating the auxiliary variables to the characteristic of interest. The resulting weight for a unit in the sample can be given the usual interpretation as the number of units in the population which it represents.

    Release date: 2013-06-28

  • Articles and reports: 12-001-X201300111824
    Description:

    In most surveys all sample units receive the same treatment and the same design features apply to all selected people and households. In this paper, it is explained how survey designs may be tailored to optimize quality given constraints on costs. Such designs are called adaptive survey designs. The basic ingredients of such designs are introduced, discussed and illustrated with various examples.

    Release date: 2013-06-28

  • Articles and reports: 12-001-X201200111682
    Description:

    Sample allocation issues are studied in the context of estimating sub-population (stratum or domain) means as well as the aggregate population mean under stratified simple random sampling. A non-linear programming method is used to obtain "optimal" sample allocation to strata that minimizes the total sample size subject to specified tolerances on the coefficient of variation of the estimators of strata means and the population mean. The resulting total sample size is then used to determine sample allocations for the methods of Costa, Satorra and Ventura (2004) based on compromise allocation and Longford (2006) based on specified "inferential priorities". In addition, we study sample allocation to strata when reliability requirements for domains, cutting across strata, are also specified. Performance of the three methods is studied using data from Statistics Canada's Monthly Retail Trade Survey (MRTS) of single establishments.

    Release date: 2012-06-27

  • Articles and reports: 12-001-X201200111683
    Description:

    We consider alternatives to poststratification for doubly classified data in which at least one of the two-way cells is too small to allow the poststratification based upon this double classification. In our study data set, the expected count in the smallest cell is 0.36. One approach is simply to collapse cells. This is likely, however, to destroy the double classification structure. Our alternative approaches allows one to maintain the original double classification of the data. The approaches are based upon the calibration study by Chang and Kott (2008). We choose weight adjustments dependent upon the marginal classifications (but not full cross classification) to minimize an objective function of the differences between the population counts of the two way cells and their sample estimates. In the terminology of Chang and Kott (2008), if the row and column classifications have I and J cells respectively, this results in IJ benchmark variables and I + J - 1 model variables. We study the performance of these estimators by constructing simulation simple random samples from the 2005 Quarterly Census of Employment and Wages which is maintained by the Bureau of Labor Statistics. We use the double classification of state and industry group. In our study, the calibration approaches introduced an asymptotically trivial bias, but reduced the MSE, compared to the unbiased estimator, by as much as 20% for a small sample.

    Release date: 2012-06-27

  • Articles and reports: 12-001-X201100111443
    Description:

    Dual frame telephone surveys are becoming common in the U.S. because of the incompleteness of the landline frame as people transition to cell phones. This article examines nonsampling errors in dual frame telephone surveys. Even though nonsampling errors are ignored in much of the dual frame literature, we find that under some conditions substantial biases may arise in dual frame telephone surveys due to these errors. We specifically explore biases due to nonresponse and measurement error in these telephone surveys. To reduce the bias resulting from these errors, we propose dual frame sampling and weighting methods. The compositing factor for combining the estimates from the two frames is shown to play an important role in reducing nonresponse bias.

    Release date: 2011-06-29

  • Articles and reports: 12-001-X201100111447
    Description:

    This paper introduces a R-package for the stratification of a survey population using a univariate stratification variable X and for the calculation of stratum sample sizes. Non iterative methods such as the cumulative root frequency method and the geometric stratum boundaries are implemented. Optimal designs, with stratum boundaries that minimize either the CV of the simple expansion estimator for a fixed sample size n or the n value for a fixed CV can be constructed. Two iterative algorithms are available to find the optimal stratum boundaries. The design can feature a user defined certainty stratum where all the units are sampled. Take-all and take-none strata can be included in the stratified design as they might lead to smaller sample sizes. The sample size calculations are based on the anticipated moments of the survey variable Y, given the stratification variable X. The package handles conditional distributions of Y given X that are either a heteroscedastic linear model, or a log-linear model. Stratum specific non-response can be accounted for in the design construction and in the sample size calculations.

    Release date: 2011-06-29

  • Articles and reports: 12-001-X201100111449
    Description:

    We analyze the statistical and economic efficiency of different designs of cluster surveys collected in two consecutive time periods, or waves. In an independent design, two cluster samples in two waves are taken independently from one another. In a cluster-panel design, the same clusters are used in both waves, but samples within clusters are taken independently in two time periods. In an observation-panel design, both clusters and observations are retained from one wave of data collection to another. By assuming a simple population structure, we derive design variances and costs of the surveys conducted according to these designs. We first consider a situation in which the interest lies in estimation of the change in the population mean between two time periods, and derive the optimal sample allocations for the three designs of interest. We then propose the utility maximization framework borrowed from microeconomics to illustrate a possible approach to the choice of the design that strives to optimize several variances simultaneously. Incorporating the contemporaneous means and their variances tends to shift the preferences from observation-panel towards simpler panel-cluster and independent designs if the panel mode of data collection is too expensive. We present numeric illustrations demonstrating how a survey designer may want to choose the efficient design given the population parameters and data collection cost.

    Release date: 2011-06-29
Reference (3)

Reference (3) ((3 results))

  • Surveys and statistical programs – Documentation: 71-526-X
    Description:

    The Canadian Labour Force Survey (LFS) is the official source of monthly estimates of total employment and unemployment. Following the 2011 census, the LFS underwent a sample redesign to account for the evolution of the population and labour market characteristics, to adjust to changes in the information needs and to update the geographical information used to carry out the survey. The redesign program following the 2011 census culminated with the introduction of a new sample at the beginning of 2015. This report is a reference on the methodological aspects of the LFS, covering stratification, sampling, collection, processing, weighting, estimation, variance estimation and data quality.

    Release date: 2017-12-21

  • Surveys and statistical programs – Documentation: 92-370-X
    Description:

    Series description

    This series includes five general reference products - the Preview of Products and Services; the Catalogue; the Dictionary; the Handbook and the Technical Reports - as well as geography reference products - GeoSuite and Reference Maps.

    Product description

    Technical Reports examine the quality of data from the 1996 Census, a large and complex undertaking. While considerable effort was taken to ensure high quality standards throughout each step, the results are subject to a certain degree of error. Each report looks at the collection and processing operations and presents results from data evaluation, as well as notes on historical comparability.

    Technical Reports are aimed at moderate and sophisticated users but are written in a manner which could make them useful to all census data users. Most of the technical reports have been cancelled, with the exception of Age, Sex, Marital Status and Common-law Status, Coverage and Sampling and Weighting. These reports will be available as bilingual publications as well as being available in both official languages on the Internet as free products.

    This report deals with coverage errors, which occured when persons, households, dwellings or families were missed by the 1996 Census or enumerated in error. Coverage errors are one of the most important types of error since they affect not only the accuracy of the counts of the various census universes but also the accuracy of all of the census data describing the characteristics of these universes. With this information, users can determine the risks involved in basing conclusions or decisions on census data.

    Release date: 1999-12-14

  • Surveys and statistical programs – Documentation: 75F0002M1993014
    Description:

    This paper presents the results from test 3A of the Survey of Labour and Income Dynamics (SLID), conducted in January 1993, with a view to identify any necessary changes to the questions or to the algorithm used to derive labour force status.

    Release date: 1995-12-30
Date modified: