Analysis

Skip to main content
Skip to footer

Language selection

Français

Search and menus

Search and menus

Search

Skip to filters. View results.

What’s new on our website

Statistics Canada's Trust Centre

Results

All (61)

All (61) (30 to 40 of 61 results)

31. Multilevel modelling of complex survey longitudinal data with time varying random effects Archived
Articles and reports: 12-001-X20000015178
Description:
Longitudinal observations consist of repeated measurements on the same units over a number of occasions, with fixed or varying time spells between the occasions. Each vector observation can be viewed therefore as a time series, usually of short length. Analyzing the measurements for all the units permits the fitting of low-order time series models, despite the short lengths of the individual series.
Release date: 2000-08-30
32. Model-based estimation with link-tracing sampling designs Archived
Articles and reports: 12-001-X20000015181
Description:
Samples from hidden and hard-to-access human populations are often obtained by procedures in which social links are followed from one respondent to another. Inference from the sample to the larger population of interest can be affected by the link-tracing design and the type of data it produces. The population with its social network structure can be modeled as a stochastic graph with a joint distribution of node values representing characteristics of individuals and arc indicators representing social relationships between individuals.
Release date: 2000-08-30
33. Statistical services: Preparing for the future Archived
Articles and reports: 12-001-X19990024875
Geography: Canada
Description:
Dr. Fellegi considers the challenges facing government statistical agencies and strategies to prepare for these challenges. He first describes the environment of changing information needs and the social, economic and technological developments driving this change. He goes on to describe both internal and external elements of a strategy to meet these evolving needs. Internally, a flexible capacity for survey taking and information gathering must be developed. Externally, contacts must be developed to ensure continuing relevance of statistical programs while maintaining non-political objectivity.
Release date: 2000-03-01
34. Optimal recursive estimation for repeated surveys Archived
Articles and reports: 12-001-X19980013907
Description:
Least squares estimation for repeated surveys is addressed. Several estimators of current level, change in level and average level for multiple time periods are developed. The Recursive Regression Estimator, a recursive computational form of the best linear unbiased estimator based on all periods of the survey, is presented. It is shown that the recursive regression procedure converges; and that the dimension of the estimation problem is bounded as the number of periods increases indefinitely. The recursive procedure offers a solution to the problem of computational complexity associated with minimum variance unbiased estimation in repeated surveys. Data from the U.S. Current Population Survey are used to compare alternative estimators under two types of rotation designs: the intermittent rotation design used in the U.S. Current Population Survey, and two continuous rotation designs.
Release date: 1998-07-31
35. Empirical Bayes estimation of small area proportions based on ordinal outcome variables Archived
Articles and reports: 12-001-X19970023617
Description:
Much research has been conducted into the modelling of ordinal responses. Some authors argue that, when the response variable is ordinal, inclusion of ordinality in the model to be estimated should improve model performance. Under the condition of ordinality, Campbell and Donner (1989) compared the asymptotic classification error rate of the multinominal logistic model to that of the ordinal logistic model of Anderson (1984). They showed that the ordinal logistic model had a lower expected asymptotic error rate than the multinominal logistic model. This paper also aims to compare the performance of ordinal and multinomial logistic models for ordinal responses. However, rather than focussing on classification efficiency, the assessment is made in the context of an application where the objective is to estimate small area proportions. More specifically, using multinominal and ordinal logistic models, the empirical Bayes approach proposed by Farrell, MacGibbon and Tomberlin (1997a) for estimating small area proportions based on binomial outcome data is extended to response variables consisting of more than two outcome categories. The properties of estimators based on these two models are compared via a simulation study in which the empirical Bayes methods proposed here are applied to data from the 1950 United States Census with the objective of predicting, for a small area, the proportion of individuals who belong to the various categories of an ordinal response variable representing income level.
Release date: 1998-03-12
36. The application of McNemar tests to the current population survey's split panel study Archived
Articles and reports: 12-001-X19960022981
Description:
Results from the Current Population Survey split panel studies indicated a centralized computer-assisted telephone interviewing (CATI) effect on labor force estimates. One hypothesis is that the CATI interviewing increased the probability of respondent's changing their reported labor force status. The two sample McNemar test is appropriate for testing this type of hypothesis: the hypothesis of interest is that the marginal changes in each of two independent sample's tables are equal. We show two adaptations of this test to complex survey data, along with applications from the Current Population Survey's Parallel Survey split data and from the Current Population Survey's CATI Phase-in data.
Release date: 1997-01-30
37. Multiple sample estimation of population and census undercount in the presence of matching errors Archived
Articles and reports: 12-001-X199600114385
Description:
The multiple capture-recapture census is reconsidered by relaxing the traditional perfect matching assumption. We propose matching error models to characterize error-prone matching mechanisms. The observed data take the form of an incomplete 2^k contingency table with one missing cell and follow a multinomial distribution. We develop a procedure for the estimation of the population size. Our approach applies to both standard log-linear models for contingency tables and log-linear models for heterogeneity of catchability. We illustrate the method and estimation using a 1988 dress rehearsal study for the 1990 census conducted by the U.S. Bureau of the Census.
Release date: 1996-06-14
38. An assessment of the use of hand-held computers during demographic surveys in developing countries Archived
Articles and reports: 12-001-X199500214392
Description:
Although large scale surveys conducted in developing countries can provide an invaluable snapshot of the health situation in a community, results produced rarely reflect the current reality as they are often released several months or years after data collection. The time lag can be partially attributed to delays in entering, coding and cleaning data after it is collected in the field. Recent advances in computer technology have provided a means of directly recording data onto a hand-held computer. Errors are reduced because in-built checks triggered as the questionnaire is administered reject illogical or inconsistent entries. This paper reports the use of one such computer-assisted interviewing tool in the collection of demographic data in Kenya. Although initial costs of establishing computer-assisted interviewing are high, the benefits are clear: errors that can creep into data collected by experienced field staff can be reduced to negligible levels. In situations where speed is essential, a large number of staff are involved, or a pre-coded questionnaire is used to collect data routinely over a long period, computer-assisted interviewing could prove a means of saving costs in the long term, as well as producing a dramatic improvement in data quality in the immediate term.
Release date: 1995-12-15
39. Design effects for correlated (P_i - P_j) Archived
Articles and reports: 12-001-X199500214398
Description:
We present empirical evidence from 14 surveys in six countries concerning the existence and magnitude of design effects (defts) for five designs of two major types. The first type concerns deft (p_i – p_j), the difference of two proportions from a polytomous variable of three or more categories. The second type uses Chi-square tests for differences from two samples. We find that for all variables in all designs deft (p_i – p_j) \cong [deft (p_i) + deft (p_j)] / 2 are good approximations. These are empirical results, and exceptions disprove the existence of mere analytical inequalities. These results hold despite great variations of defts between variables and also between categories of the same variables. They also show the need for sample survey treatment of survey data even for analytical statistics. Furthermore they permit useful approximations of deft (p_i – p_j) from more accessible deft (p_i) values.
Release date: 1995-12-15
40. Stanley Warner’s contributions to statistically balanced information technology Archived
Articles and reports: 12-001-X199500114416
Description:
Stanley Warner was widely known for the creation of the randomized response technique for asking sensitive questions in surveys. Over almost two decades he also formulated and developed statistical methodology for another problem, that of deriving balanced information in advocacy settings so that both positions regarding a policy issue can be fairly and adequately represented. We review this work, including two survey applications implemented by Warner in which he applied the methodology, and we set the ideas into the context of current methodological thinking.
Release date: 1995-06-15

Stats in brief (0)

Stats in brief (0) (0 results)

No content available at this time.

Articles and reports (61)

Articles and reports (61) (0 to 10 of 61 results)

1. Bayesian inference for a variance component model using pairwise composite likelihood with survey data
Articles and reports: 12-001-X202200100002
Description:
We consider an intercept only linear random effects model for analysis of data from a two stage cluster sampling design. At the first stage a simple random sample of clusters is drawn, and at the second stage a simple random sample of elementary units is taken within each selected cluster. The response variable is assumed to consist of a cluster-level random effect plus an independent error term with known variance. The objects of inference are the mean of the outcome variable and the random effect variance. With a more complex two stage sampling design, the use of an approach based on an estimated pairwise composite likelihood function has appealing properties. Our purpose is to use our simpler context to compare the results of likelihood inference with inference based on a pairwise composite likelihood function that is treated as an approximate likelihood, in particular treated as the likelihood component in Bayesian inference. In order to provide credible intervals having frequentist coverage close to nominal values, the pairwise composite likelihood function and corresponding posterior density need modification, such as a curvature adjustment. Through simulation studies, we investigate the performance of an adjustment proposed in the literature, and find that it works well for the mean but provides credible intervals for the random effect variance that suffer from under-coverage. We propose possible future directions including extensions to the case of a complex design.

Release date: 2022-06-21
2. Cost optimal sampling for the integrated observation of different populations
Articles and reports: 12-001-X201900300004
Description:
Social or economic studies often need to have a global view of society. For example, in agricultural studies, the characteristics of farms can be linked to the social activities of individuals. Hence, studies of a given phenomenon should be done by considering variables of interest referring to different target populations that are related to each other. In order to get an insight into an underlying phenomenon, the observations must be carried out in an integrated way, in which the units of a given population have to be observed jointly with related units of the other population. In the agricultural example, this means that a sample of rural households should be selected that have some relationship with the farm sample to be used for the study. There are several ways to select integrated samples. This paper studies the problem of defining an optimal sampling strategy for this situation: the solution proposed minimizes the sampling cost, ensuring a predefined estimation precision for the variables of interest (of either one or both populations) describing the phenomenon. Indirect sampling provides a natural framework for this setting since the units belonging to a population can become carriers of information on another population that is the object of a given survey. The problem is studied for different contexts which characterize the information concerning the links available in the sampling design phase, ranging from situations in which the links among the different units are known in the design phase to a situation in which the available information on links is very poor. An empirical study of agricultural data for a developing country is presented. It shows how controlling the inclusion probabilities at the design phase using the available information (namely the links) is effective, can significantly reduce the errors of the estimates for the indirectly observed population. The need for good models for predicting the unknown variables or the links is also demonstrated.
Release date: 2019-12-17
3. An alternative way of estimating a cumulative logistic model with complex survey data
Articles and reports: 12-001-X201900200007
Description:
When fitting an ordered categorical variable with L > 2 levels to a set of covariates onto complex survey data, it is common to assume that the elements of the population fit a simple cumulative logistic regression model (proportional-odds logistic-regression model). This means the probability that the categorical variable is at or below some level is a binary logistic function of the model covariates. Moreover, except for the intercept, the values of the logistic-regression parameters are the same at each level. The conventional “design-based” method used for fitting the proportional-odds model is based on pseudo-maximum likelihood. We compare estimates computed using pseudo-maximum likelihood with those computed by assuming an alternative design-sensitive robust model-based framework. We show with a simple numerical example how estimates using the two approaches can differ. The alternative approach is easily extended to fit a general cumulative logistic model, in which the parallel-lines assumption can fail. A test of that assumption easily follows.
Release date: 2019-06-27
4. Weighted censored quantile regression Archived
Articles and reports: 12-001-X201900100004
Description:
In this paper, we make use of auxiliary information to improve the efficiency of the estimates of the censored quantile regression parameters. Utilizing the information available from previous studies, we computed empirical likelihood probabilities as weights and proposed weighted censored quantile regression. Theoretical properties of the proposed method are derived. Our simulation studies shown that our proposed method has advantages compared to standard censored quantile regression.
Release date: 2019-05-07
5. Measurement error in small area estimation: Functional versus structural versus naïve models Archived
Articles and reports: 12-001-X201900100005
Description:
Small area estimation using area-level models can sometimes benefit from covariates that are observed subject to random errors, such as covariates that are themselves estimates drawn from another survey. Given estimates of the variances of these measurement (sampling) errors for each small area, one can account for the uncertainty in such covariates using measurement error models (e.g., Ybarra and Lohr, 2008). Two types of area-level measurement error models have been examined in the small area estimation literature. The functional measurement error model assumes that the underlying true values of the covariates with measurement error are fixed but unknown quantities. The structural measurement error model assumes that these true values follow a model, leading to a multivariate model for the covariates observed with error and the original dependent variable. We compare and contrast these two models with the alternative of simply ignoring measurement error when it is present (naïve model), exploring the consequences for prediction mean squared errors of use of an incorrect model under different underlying assumptions about the true model. Comparisons done using analytic formulas for the mean squared errors assuming model parameters are known yield some surprising results. We also illustrate results with a model fitted to data from the U.S. Census Bureau’s Small Area Income and Poverty Estimates (SAIPE) Program.
Release date: 2019-05-07
6. Comparison of the conditional bias and Kokic and Bell methods for Poisson and stratified sampling Archived
Articles and reports: 12-001-X201800254961
Description:
In business surveys, it is common to collect economic variables with highly skewed distribution. In this context, winsorization is frequently used to address the problem of influential values. In stratified simple random sampling, there are two methods for selecting the thresholds involved in winsorization. This article comprises two parts. The first reviews the notations and the concept of a winsorization estimator. The second part details the two methods and extends them to the case of Poisson sampling, and then compares them on simulated data sets and on the labour cost and structure of earnings survey carried out by INSEE.
Release date: 2018-12-20
7. Sample survey theory and methods: Past, present, and future directions Archived
Articles and reports: 12-001-X201700254888
Description:
We discuss developments in sample survey theory and methods covering the past 100 years. Neyman’s 1934 landmark paper laid the theoretical foundations for the probability sampling approach to inference from survey samples. Classical sampling books by Cochran, Deming, Hansen, Hurwitz and Madow, Sukhatme, and Yates, which appeared in the early 1950s, expanded and elaborated the theory of probability sampling, emphasizing unbiasedness, model free features, and designs that minimize variance for a fixed cost. During the period 1960-1970, theoretical foundations of inference from survey data received attention, with the model-dependent approach generating considerable discussion. Introduction of general purpose statistical software led to the use of such software with survey data, which led to the design of methods specifically for complex survey data. At the same time, weighting methods, such as regression estimation and calibration, became practical and design consistency replaced unbiasedness as the requirement for standard estimators. A bit later, computer-intensive resampling methods also became practical for large scale survey samples. Improved computer power led to more sophisticated imputation for missing data, use of more auxiliary data, some treatment of measurement errors in estimation, and more complex estimation procedures. A notable use of models was in the expanded use of small area estimation. Future directions in research and methods will be influenced by budgets, response rates, timeliness, improved data collection devices, and availability of auxiliary data, some of which will come from “Big Data”. Survey taking will be impacted by changing cultural behavior and by a changing physical-technical environment.
Release date: 2017-12-21
8. Nonresponse adjustments with misspecified models in stratified designs Archived
Articles and reports: 12-001-X201600114546
Description:
Adjusting the base weights using weighting classes is a standard approach for dealing with unit nonresponse. A common approach is to create nonresponse adjustments that are weighted by the inverse of the assumed response propensity of respondents within weighting classes under a quasi-randomization approach. Little and Vartivarian (2003) questioned the value of weighting the adjustment factor. In practice the models assumed are misspecified, so it is critical to understand the impact of weighting might have in this case. This paper describes the effects on nonresponse adjusted estimates of means and totals for population and domains computed using the weighted and unweighted inverse of the response propensities in stratified simple random sample designs. The performance of these estimators under different conditions such as different sample allocation, response mechanism, and population structure is evaluated. The findings show that for the scenarios considered the weighted adjustment has substantial advantages for estimating totals and using an unweighted adjustment may lead to serious biases except in very limited cases. Furthermore, unlike the unweighted estimates, the weighted estimates are not sensitive to how the sample is allocated.
Release date: 2016-06-22
9. Combining link-tracing sampling and cluster sampling to estimate the size of a hidden population in presence of heterogeneous link-probabilities Archived
Articles and reports: 12-001-X201500214238
Description:
Félix-Medina and Thompson (2004) proposed a variant of link-tracing sampling to sample hidden and/or hard-to-detect human populations such as drug users and sex workers. In their variant, an initial sample of venues is selected and the people found in the sampled venues are asked to name other members of the population to be included in the sample. Those authors derived maximum likelihood estimators of the population size under the assumption that the probability that a person is named by another in a sampled venue (link-probability) does not depend on the named person (homogeneity assumption). In this work we extend their research to the case of heterogeneous link-probabilities and derive unconditional and conditional maximum likelihood estimators of the population size. We also propose profile likelihood and bootstrap confidence intervals for the size of the population. The results of simulations studies carried out by us show that in presence of heterogeneous link-probabilities the proposed estimators perform reasonably well provided that relatively large sampling fractions, say larger than 0.5, be used, whereas the estimators derived under the homogeneity assumption perform badly. The outcomes also show that the proposed confidence intervals are not very robust to deviations from the assumed models.
Release date: 2015-12-17
10. Generalized framework for defining the optimal inclusion probabilities of one-stage sampling designs for multivariate and multi-domain surveys Archived
Articles and reports: 12-001-X201500114149
Description:
This paper introduces a general framework for deriving the optimal inclusion probabilities for a variety of survey contexts in which disseminating survey estimates of pre-established accuracy for a multiplicity of both variables and domains of interest is required. The framework can define either standard stratified or incomplete stratified sampling designs. The optimal inclusion probabilities are obtained by minimizing costs through an algorithm that guarantees the bounding of sampling errors at the domains level, assuming that the domain membership variables are available in the sampling frame. The target variables are unknown, but can be predicted with suitable super-population models. The algorithm takes properly into account this model uncertainty. Some experiments based on real data show the empirical properties of the algorithm.
Release date: 2015-06-29

Journals and periodicals (0)

Journals and periodicals (0) (0 results)

No content available at this time.

Report a problem or mistake on this page

Date modified:: 2024-05-20