Analysis

Skip to main content
Skip to footer

Language selection

Français

Search and menus

Search and menus

Search

Skip to filters. View results.

What’s new on our website

Statistics Canada's Trust Centre

Results

All (61)

All (61) (0 to 10 of 61 results)

1. Bayesian inference for a variance component model using pairwise composite likelihood with survey data
Articles and reports: 12-001-X202200100002
Description:
We consider an intercept only linear random effects model for analysis of data from a two stage cluster sampling design. At the first stage a simple random sample of clusters is drawn, and at the second stage a simple random sample of elementary units is taken within each selected cluster. The response variable is assumed to consist of a cluster-level random effect plus an independent error term with known variance. The objects of inference are the mean of the outcome variable and the random effect variance. With a more complex two stage sampling design, the use of an approach based on an estimated pairwise composite likelihood function has appealing properties. Our purpose is to use our simpler context to compare the results of likelihood inference with inference based on a pairwise composite likelihood function that is treated as an approximate likelihood, in particular treated as the likelihood component in Bayesian inference. In order to provide credible intervals having frequentist coverage close to nominal values, the pairwise composite likelihood function and corresponding posterior density need modification, such as a curvature adjustment. Through simulation studies, we investigate the performance of an adjustment proposed in the literature, and find that it works well for the mean but provides credible intervals for the random effect variance that suffer from under-coverage. We propose possible future directions including extensions to the case of a complex design.

Release date: 2022-06-21
2. Cost optimal sampling for the integrated observation of different populations
Articles and reports: 12-001-X201900300004
Description:
Social or economic studies often need to have a global view of society. For example, in agricultural studies, the characteristics of farms can be linked to the social activities of individuals. Hence, studies of a given phenomenon should be done by considering variables of interest referring to different target populations that are related to each other. In order to get an insight into an underlying phenomenon, the observations must be carried out in an integrated way, in which the units of a given population have to be observed jointly with related units of the other population. In the agricultural example, this means that a sample of rural households should be selected that have some relationship with the farm sample to be used for the study. There are several ways to select integrated samples. This paper studies the problem of defining an optimal sampling strategy for this situation: the solution proposed minimizes the sampling cost, ensuring a predefined estimation precision for the variables of interest (of either one or both populations) describing the phenomenon. Indirect sampling provides a natural framework for this setting since the units belonging to a population can become carriers of information on another population that is the object of a given survey. The problem is studied for different contexts which characterize the information concerning the links available in the sampling design phase, ranging from situations in which the links among the different units are known in the design phase to a situation in which the available information on links is very poor. An empirical study of agricultural data for a developing country is presented. It shows how controlling the inclusion probabilities at the design phase using the available information (namely the links) is effective, can significantly reduce the errors of the estimates for the indirectly observed population. The need for good models for predicting the unknown variables or the links is also demonstrated.
Release date: 2019-12-17
3. An alternative way of estimating a cumulative logistic model with complex survey data
Articles and reports: 12-001-X201900200007
Description:
When fitting an ordered categorical variable with L > 2 levels to a set of covariates onto complex survey data, it is common to assume that the elements of the population fit a simple cumulative logistic regression model (proportional-odds logistic-regression model). This means the probability that the categorical variable is at or below some level is a binary logistic function of the model covariates. Moreover, except for the intercept, the values of the logistic-regression parameters are the same at each level. The conventional “design-based” method used for fitting the proportional-odds model is based on pseudo-maximum likelihood. We compare estimates computed using pseudo-maximum likelihood with those computed by assuming an alternative design-sensitive robust model-based framework. We show with a simple numerical example how estimates using the two approaches can differ. The alternative approach is easily extended to fit a general cumulative logistic model, in which the parallel-lines assumption can fail. A test of that assumption easily follows.
Release date: 2019-06-27
4. Weighted censored quantile regression Archived
Articles and reports: 12-001-X201900100004
Description:
In this paper, we make use of auxiliary information to improve the efficiency of the estimates of the censored quantile regression parameters. Utilizing the information available from previous studies, we computed empirical likelihood probabilities as weights and proposed weighted censored quantile regression. Theoretical properties of the proposed method are derived. Our simulation studies shown that our proposed method has advantages compared to standard censored quantile regression.
Release date: 2019-05-07
5. Measurement error in small area estimation: Functional versus structural versus naïve models Archived
Articles and reports: 12-001-X201900100005
Description:
Small area estimation using area-level models can sometimes benefit from covariates that are observed subject to random errors, such as covariates that are themselves estimates drawn from another survey. Given estimates of the variances of these measurement (sampling) errors for each small area, one can account for the uncertainty in such covariates using measurement error models (e.g., Ybarra and Lohr, 2008). Two types of area-level measurement error models have been examined in the small area estimation literature. The functional measurement error model assumes that the underlying true values of the covariates with measurement error are fixed but unknown quantities. The structural measurement error model assumes that these true values follow a model, leading to a multivariate model for the covariates observed with error and the original dependent variable. We compare and contrast these two models with the alternative of simply ignoring measurement error when it is present (naïve model), exploring the consequences for prediction mean squared errors of use of an incorrect model under different underlying assumptions about the true model. Comparisons done using analytic formulas for the mean squared errors assuming model parameters are known yield some surprising results. We also illustrate results with a model fitted to data from the U.S. Census Bureau’s Small Area Income and Poverty Estimates (SAIPE) Program.
Release date: 2019-05-07
6. Comparison of the conditional bias and Kokic and Bell methods for Poisson and stratified sampling Archived
Articles and reports: 12-001-X201800254961
Description:
In business surveys, it is common to collect economic variables with highly skewed distribution. In this context, winsorization is frequently used to address the problem of influential values. In stratified simple random sampling, there are two methods for selecting the thresholds involved in winsorization. This article comprises two parts. The first reviews the notations and the concept of a winsorization estimator. The second part details the two methods and extends them to the case of Poisson sampling, and then compares them on simulated data sets and on the labour cost and structure of earnings survey carried out by INSEE.
Release date: 2018-12-20
7. Sample survey theory and methods: Past, present, and future directions Archived
Articles and reports: 12-001-X201700254888
Description:
We discuss developments in sample survey theory and methods covering the past 100 years. Neyman’s 1934 landmark paper laid the theoretical foundations for the probability sampling approach to inference from survey samples. Classical sampling books by Cochran, Deming, Hansen, Hurwitz and Madow, Sukhatme, and Yates, which appeared in the early 1950s, expanded and elaborated the theory of probability sampling, emphasizing unbiasedness, model free features, and designs that minimize variance for a fixed cost. During the period 1960-1970, theoretical foundations of inference from survey data received attention, with the model-dependent approach generating considerable discussion. Introduction of general purpose statistical software led to the use of such software with survey data, which led to the design of methods specifically for complex survey data. At the same time, weighting methods, such as regression estimation and calibration, became practical and design consistency replaced unbiasedness as the requirement for standard estimators. A bit later, computer-intensive resampling methods also became practical for large scale survey samples. Improved computer power led to more sophisticated imputation for missing data, use of more auxiliary data, some treatment of measurement errors in estimation, and more complex estimation procedures. A notable use of models was in the expanded use of small area estimation. Future directions in research and methods will be influenced by budgets, response rates, timeliness, improved data collection devices, and availability of auxiliary data, some of which will come from “Big Data”. Survey taking will be impacted by changing cultural behavior and by a changing physical-technical environment.
Release date: 2017-12-21
8. Nonresponse adjustments with misspecified models in stratified designs Archived
Articles and reports: 12-001-X201600114546
Description:
Adjusting the base weights using weighting classes is a standard approach for dealing with unit nonresponse. A common approach is to create nonresponse adjustments that are weighted by the inverse of the assumed response propensity of respondents within weighting classes under a quasi-randomization approach. Little and Vartivarian (2003) questioned the value of weighting the adjustment factor. In practice the models assumed are misspecified, so it is critical to understand the impact of weighting might have in this case. This paper describes the effects on nonresponse adjusted estimates of means and totals for population and domains computed using the weighted and unweighted inverse of the response propensities in stratified simple random sample designs. The performance of these estimators under different conditions such as different sample allocation, response mechanism, and population structure is evaluated. The findings show that for the scenarios considered the weighted adjustment has substantial advantages for estimating totals and using an unweighted adjustment may lead to serious biases except in very limited cases. Furthermore, unlike the unweighted estimates, the weighted estimates are not sensitive to how the sample is allocated.
Release date: 2016-06-22
9. Combining link-tracing sampling and cluster sampling to estimate the size of a hidden population in presence of heterogeneous link-probabilities Archived
Articles and reports: 12-001-X201500214238
Description:
Félix-Medina and Thompson (2004) proposed a variant of link-tracing sampling to sample hidden and/or hard-to-detect human populations such as drug users and sex workers. In their variant, an initial sample of venues is selected and the people found in the sampled venues are asked to name other members of the population to be included in the sample. Those authors derived maximum likelihood estimators of the population size under the assumption that the probability that a person is named by another in a sampled venue (link-probability) does not depend on the named person (homogeneity assumption). In this work we extend their research to the case of heterogeneous link-probabilities and derive unconditional and conditional maximum likelihood estimators of the population size. We also propose profile likelihood and bootstrap confidence intervals for the size of the population. The results of simulations studies carried out by us show that in presence of heterogeneous link-probabilities the proposed estimators perform reasonably well provided that relatively large sampling fractions, say larger than 0.5, be used, whereas the estimators derived under the homogeneity assumption perform badly. The outcomes also show that the proposed confidence intervals are not very robust to deviations from the assumed models.
Release date: 2015-12-17
10. Generalized framework for defining the optimal inclusion probabilities of one-stage sampling designs for multivariate and multi-domain surveys Archived
Articles and reports: 12-001-X201500114149
Description:
This paper introduces a general framework for deriving the optimal inclusion probabilities for a variety of survey contexts in which disseminating survey estimates of pre-established accuracy for a multiplicity of both variables and domains of interest is required. The framework can define either standard stratified or incomplete stratified sampling designs. The optimal inclusion probabilities are obtained by minimizing costs through an algorithm that guarantees the bounding of sampling errors at the domains level, assuming that the domain membership variables are available in the sampling frame. The target variables are unknown, but can be predicted with suitable super-population models. The algorithm takes properly into account this model uncertainty. Some experiments based on real data show the empirical properties of the algorithm.
Release date: 2015-06-29

Stats in brief (0)

Stats in brief (0) (0 results)

No content available at this time.

Articles and reports (61)

Articles and reports (61) (50 to 60 of 61 results)

51. Estimation of livestock inventories using several area- and multiple-frame estimators Archived
Articles and reports: 12-001-X198900114580
Description:
Estimation of total numbers of hogs and pigs, sows and gilts, and cattle and calves in a state is studied using data obtained in the June Enumerative Survey conducted by the National Agricultural Statistics Service of the U.S. Department of Agriculture. It is possible to construct six different estimators using the June Enumerative Survey data. Three estimators involve data from area samples and three estimators combine data from list-frame and area-frame surveys. A rotation sampling scheme is used for the area frame portion of the June Enumerative Survey. Using data from the five years, 1982 through 1986, covariances among the estimators for different years are estimated. A composite estimator is proposed for the livestock numbers. The composite estimator is obtained by a generalized least-squares regression of the vector of different yearly estimators on an appropriate set of dummy variables. The composite estimator is designed to yield estimates for livestock inventories that are “at the same level” as the official estimates made by the U.S. Department of Agriculture.
Release date: 1989-06-15
52. The sources of census undercount: Findings from the 1986 Los Angeles test census Archived
Articles and reports: 12-001-X198800214590
Description:
This paper presents results from a study of the causes of census undercount for a hard-to-enumerate, largely Hispanic urban area. A framework for organizing the causes of undercount is offered, and various hypotheses about these causes are tested. The approach is distinctive for its attempt to quantify the sources of undercount and isolate problems of unique importance by controlling for other problems statistically.
Release date: 1988-12-15
53. Personal computer variance software for complex surveys Archived
Articles and reports: 12-001-X198800114600
Description:
A personal computer program for variance estimation with large scale surveys is described. The program, called PC CARP, will compute estimates and estimated variances for totals, ratios, means, quantiles, and regression coefficients.
Release date: 1988-06-15
54. Statistical properties of crop production estimators Archived
Articles and reports: 12-001-X198700114468
Description:
The National Agricultural Statistics Service, U.S. Department of Agriculture, conducts yield surveys for a variety of field crops in the United States. While field sampling procedures for various crops differ, the same basic survey design is used for all crops. The survey design and current estimators are reviewed. Alternative estimators of yield and production and of the variance of the estimators are presented. Current estimators and alternative estimators are compared, both theoretically and in a Monte Carlo simulation.
Release date: 1987-06-15
55. Stratification in the Canadian Labour Force Survey Archived
Articles and reports: 12-001-X198500214372
Description:
The use of a multivariate clustering algorithm to perform stratification for the Labour Force Survey is described. The algorithm developed by Friedman and Rubin (1967) is modified to allow the formation of geographically contiguous strata and to delineate heterogeneous but compact primary sampling units (PSUs) within these strata. Studies dealing with stratification variables, stratification robustness over time, and type of stratification are described.
Release date: 1985-12-16
56. Application of linear and log-linear models to data from complex samples Archived
Articles and reports: 12-001-X198400114351
Description:
Most sample surveys conducted by organizations such as Statistics Canada or the U.S. Bureau of the Census employ complex designs. The design-based approach to statistical inference, typically the institutional standard of inference for simple population statistics such as means and totals, may be extended to parameters of analytic models as well. Most of this paper focuses on application of design-based inferences to such models, but rationales are offered for use of model-based alternatives in some instances, by way of explanation for the author’s observation that both modes of inference are used in practice at his own institution.
Within the design-based approach to inference, the paper briefly describes experience with linear regression analysis. Recently, variance computations for a number of surveys of the Census Bureau have been implemented through “replicate weighting”; the principal application has been for variances of simple statistics, but this technique also facilitates variance computation for virtually any complex analytic model. Finally, approaches and experience with log-linear models are reported.
Release date: 1984-06-15
57. Least squares and related analyses for complex survey designs Archived
Articles and reports: 12-001-X198400114352
Description:
The paper shows different estimation methods for complex survey designs. Among others, estimation of mean, ratio and regression coefficient is presented. The standard errors are estimated by different methods: the ordinary least squares procedure, the stratified weighted sample procedure, the stratified unit weight procedure, etc. Theory of large samples and conditions to apply it are also presented.
Release date: 1984-06-15
58. Estimating monthly gross flows in labour force participation Archived
Articles and reports: 12-001-X198300114335
Description:
The Canadian Labour Force Survey is a household survey conducted each month for the purpose of producing point-in-time estimates of the number of persons employed, unemployed and not in the labor force. The survey has a rotating panel design in which all individuals in a sampled household location are interviewed each month, for six consecutive months. In the past, little use has been made of this longitudinal structure, although considerable interest has been expressed in the month-to-month gross flows (transitions) amongst the labour force status categories. In this paper we discuss methods being considered by Statistics Canada for the production of gross flow estimates, but from a model-based perspective.
Release date: 1983-06-15
59. Data, statistics, information - Some issues of the Canadian Social Statistics Scene Archived
Articles and reports: 12-001-X197900254833
Description:
This paper looks at the current state of development of social statistics in Canada. Some key concepts related to statistics and social information are defined and discussed. The availability and analysis of administrative data is highlighted, along with the need for social surveys. Suggestions are made about the types of data analysis needed for the development of social decision models to meet policy requirements. Finally, an outline of priorities for future work toward the effective use of social statistics is given.
Release date: 1979-12-14
60. Approximate tests of independence and goodness of fit based on stratified multi-stage samples Archived
Articles and reports: 12-001-X197800154831
Description: The impact on linear statistics of the sample design used in obtaining survey data is the subject of much of sampling literature. Recently, more attention has been paid to the design’s impact on non-linear statistics; the major factor inhibiting these investigations has been the problem of estimating at least the first two moments of such statistics. The present article examines the problem of estimating the variances of non-linear statistics from complex samples, in the light of existing literature. The behaviour of the chi-square statistic computed from a complex sample to test hypotheses of goodness of fit or independence is studied. Alternative tests are developed and their properties studied in simulation experiments.
Release date: 1978-06-15

Journals and periodicals (0)

Journals and periodicals (0) (0 results)

No content available at this time.

Report a problem or mistake on this page

Date modified:: 2024-05-20