# Inference and foundations

## Filter results by

Search Help### Keyword(s)

### Type

### Survey or statistical program

### Results

## All (82)

## All (82) (20 to 30 of 82 results)

- 21. Modelling of complex survey data: Why model? Why is it a problem? How can we approach it?ArchivedArticles and reports: 12-001-X201100211602Description:
This article attempts to answer the three questions appearing in the title. It starts by discussing unique features of complex survey data not shared by other data sets, which require special attention but suggest a large variety of diverse inference procedures. Next a large number of different approaches proposed in the literature for handling these features are reviewed with discussion on their merits and limitations. The approaches differ in the conditions underlying their use, additional data required for their application, goodness of fit testing, the inference objectives that they accommodate, statistical efficiency, computational demands, and the skills required from analysts fitting the model. The last part of the paper presents simulation results, which compare the approaches when estimating linear regression coefficients from a stratified sample in terms of bias, variance, and coverage rates. It concludes with a short discussion of pending issues.

Release date: 2011-12-21 - Surveys and statistical programs – Documentation: 12-001-X201100211603Description:
In many sample surveys there are items requesting binary response (e.g., obese, not obese) from a number of small areas. Inference is required about the probability for a positive response (e.g., obese) in each area, the probability being the same for all individuals in each area and different across areas. Because of the sparseness of the data within areas, direct estimators are not reliable, and there is a need to use data from other areas to improve inference for a specific area. Essentially, a priori the areas are assumed to be similar, and a hierarchical Bayesian model, the standard beta-binomial model, is a natural choice. The innovation is that a practitioner may have much-needed additional prior information about a linear combination of the probabilities. For example, a weighted average of the probabilities is a parameter, and information can be elicited about this parameter, thereby making the Bayesian paradigm appropriate. We have modified the standard beta-binomial model for small areas to incorporate the prior information on the linear combination of the probabilities, which we call a constraint. Thus, there are three cases. The practitioner (a) does not specify a constraint, (b) specifies a constraint and the parameter completely, and (c) specifies a constraint and information which can be used to construct a prior distribution for the parameter. The griddy Gibbs sampler is used to fit the models. To illustrate our method, we use an example on obesity of children in the National Health and Nutrition Examination Survey in which the small areas are formed by crossing school (middle, high), ethnicity (white, black, Mexican) and gender (male, female). We use a simulation study to assess some of the statistical features of our method. We have shown that the gain in precision beyond (a) is in the order with (b) larger than (c).

Release date: 2011-12-21 - Articles and reports: 12-001-X201100111446Description:
Small area estimation based on linear mixed models can be inefficient when the underlying relationships are non-linear. In this paper we introduce SAE techniques for variables that can be modelled linearly following a non-linear transformation. In particular, we extend the model-based direct estimator of Chandra and Chambers (2005, 2009) to data that are consistent with a linear mixed model in the logarithmic scale, using model calibration to define appropriate weights for use in this estimator. Our results show that the resulting transformation-based estimator is both efficient and robust with respect to the distribution of the random effects in the model. An application to business survey data demonstrates the satisfactory performance of the method.

Release date: 2011-06-29 - Articles and reports: 12-001-X201100111451Description:
In the calibration method proposed by Deville and Särndal (1992), the calibration equations take only exact estimates of auxiliary variable totals into account. This article examines other parameters besides totals for calibration. Parameters that are considered complex include the ratio, median or variance of auxiliary variables.

Release date: 2011-06-29 - Surveys and statistical programs – Documentation: 12-001-X201000111250Description:
We propose a Bayesian Penalized Spline Predictive (BPSP) estimator for a finite population proportion in an unequal probability sampling setting. This new method allows the probabilities of inclusion to be directly incorporated into the estimation of a population proportion, using a probit regression of the binary outcome on the penalized spline of the inclusion probabilities. The posterior predictive distribution of the population proportion is obtained using Gibbs sampling. The advantages of the BPSP estimator over the Hájek (HK), Generalized Regression (GR), and parametric model-based prediction estimators are demonstrated by simulation studies and a real example in tax auditing. Simulation studies show that the BPSP estimator is more efficient, and its 95% credible interval provides better confidence coverage with shorter average width than the HK and GR estimators, especially when the population proportion is close to zero or one or when the sample is small. Compared to linear model-based predictive estimators, the BPSP estimators are robust to model misspecification and influential observations in the sample.

Release date: 2010-06-29 - Articles and reports: 11-536-X200900110806Description:
Recent work using a pseudo empirical likelihood (EL) method for finite population inferences with complex survey data focused primarily on a single survey sample, non-stratified or stratified, with considerable effort devoted to computational procedures. In this talk we present a pseudo empirical likelihood approach to inference from multiple surveys and multiple-frame surveys, two commonly encountered problems in survey practice. We show that inferences about the common parameter of interest and the effective use of various types of auxiliary information can be conveniently carried out through the constrained maximization of joint pseudo EL function. We obtain asymptotic results which are used for constructing the pseudo EL ratio confidence intervals, either using a chi-square approximation or a bootstrap calibration. All related computational problems can be handled using existing algorithms on stratified sampling after suitable re-formulation.

Release date: 2009-08-11 - 27. A Bayesian allocation of undecided votersArchivedArticles and reports: 12-001-X200800110606Description:
Data from election polls in the US are typically presented in two-way categorical tables, and there are many polls before the actual election in November. For example, in the Buckeye State Poll in 1998 for governor there are three polls, January, April and October; the first category represents the candidates (e.g., Fisher, Taft and other) and the second category represents the current status of the voters (likely to vote and not likely to vote for governor of Ohio). There is a substantial number of undecided voters for one or both categories in all three polls, and we use a Bayesian method to allocate the undecided voters to the three candidates. This method permits modeling different patterns of missingness under ignorable and nonignorable assumptions, and a multinomial-Dirichlet model is used to estimate the cell probabilities which can help to predict the winner. We propose a time-dependent nonignorable nonresponse model for the three tables. Here, a nonignorable nonresponse model is centered on an ignorable nonresponse model to induce some flexibility and uncertainty about ignorabilty or nonignorability. As competitors we also consider two other models, an ignorable and a nonignorable nonresponse model. These latter two models assume a common stochastic process to borrow strength over time. Markov chain Monte Carlo methods are used to fit the models. We also construct a parameter that can potentially be used to predict the winner among the candidates in the November election.

Release date: 2008-06-26 - Articles and reports: 11-522-X200600110392Description:
We use a robust Bayesian method to analyze data with possibly nonignorable nonresponse and selection bias. A robust logistic regression model is used to relate the response indicators (Bernoulli random variable) to the covariates, which are available for everyone in the finite population. This relationship can adequately explain the difference between respondents and nonrespondents for the sample. This robust model is obtained by expanding the standard logistic regression model to a mixture of Student's distributions, thereby providing propensity scores (selection probability) which are used to construct adjustment cells. The nonrespondents' values are filled in by drawing a random sample from a kernel density estimator, formed from the respondents' values within the adjustment cells. Prediction uses a linear spline rank-based regression of the response variable on the covariates by areas, sampling the errors from another kernel density estimator; thereby further robustifying our method. We use Markov chain Monte Carlo (MCMC) methods to fit our model. The posterior distribution of a quantile of the response variable is obtained within each sub-area using the order statistic over all the individuals (sampled and nonsampled). We compare our robust method with recent parametric methods

Release date: 2008-03-17 - Articles and reports: 11-522-X200600110398Description:
The study of longitudinal data is vital in terms of accurately observing changes in responses of interest for individuals, communities, and larger populations over time. Linear mixed effects models (for continuous responses observed over time) and generalized linear mixed effects models and generalized estimating equations (for more general responses such as binary or count data observed over time) are the most popular techniques used for analyzing longitudinal data from health studies, though, as with all modeling techniques, these approaches have limitations, partly due to their underlying assumptions. In this review paper, we will discuss some advances, including curve-based techniques, which make modeling longitudinal data more flexible. Three examples will be presented from the health literature utilizing these more flexible procedures, with the goal of demonstrating that some otherwise difficult questions can be reasonably answered when analyzing complex longitudinal data in population health studies.

Release date: 2008-03-17 - Articles and reports: 11-522-X200600110419Description:
Health services research generally relies on observational data to compare outcomes of patients receiving different therapies. Comparisons of patient groups in observational studies may be biased, in that outcomes differ due to both the effects of treatment and the effects of patient prognosis. In some cases, especially when data are collected on detailed clinical risk factors, these differences can be controlled for using statistical or epidemiological methods. In other cases, when unmeasured characteristics of the patient population affect both the decision to provide therapy and the outcome, these differences cannot be removed using standard techniques. Use of health administrative data requires particular cautions in undertaking observational studies since important clinical information does not exist. We discuss several statistical and epidemiological approaches to remove overt (measurable) and hidden (unmeasurable) bias in observational studies. These include regression model-based case-mix adjustment, propensity-based matching, redefining the exposure variable of interest, and the econometric technique of instrumental variable (IV) analysis. These methods are illustrated using examples from the medical literature including prediction of one-year mortality following heart attack; the return to health care spending in higher spending U.S. regions in terms of clinical and financial benefits; and the long-term survival benefits of invasive cardiac management of heart attack patients. It is possible to use health administrative data for observational studies provided careful attention is paid to addressing issues of reverse causation and unmeasured confounding.

Release date: 2008-03-17

- Previous Go to previous page of All results
- 1 Go to page 1 of All results
- 2 Go to page 2 of All results
- 3 (current) Go to page 3 of All results
- 4 Go to page 4 of All results
- 5 Go to page 5 of All results
- 6 Go to page 6 of All results
- 7 Go to page 7 of All results
- 8 Go to page 8 of All results
- 9 Go to page 9 of All results
- Next Go to next page of All results

## Data (0)

## Data (0) (0 results)

No content available at this time.

## Analysis (69)

## Analysis (69) (50 to 60 of 69 results)

- Articles and reports: 11-522-X19990015654Description:
A meta analysis was performed to estimate the proportion of liver carcinogens, the proportion of chemicals carcinogenic at any site, and the corresponding proportion of anticarcinogens among chemicals tested in 397 long-term cancer bioassays conducted by the U.S. National Toxicology Program. Although the estimator used was negatively biased, the study provided persuasive evidence for a larger proportion of liver carcinogens (0.43,90%CI: 0.35,0.51) than was identified by the NTP (0.28). A larger proportion of chemicals carcinogenic at any site was also estimated (0.59,90%CI: 0.49,0.69) than was identified by the NTP (0.51), although this excess was not statistically significant. A larger proportion of anticarcinogens (0.66) was estimated than carcinogens (0.59). Despite the negative bias, it was estimated that 85% of the chemicals were either carcinogenic or anticarcinogenic at some site in some sex-species group. This suggests that most chemicals tested at high enough doses will cause some sort of perturbation in tumor rates.

Release date: 2000-03-02 - Articles and reports: 92F0138M2000003Description:
Statistics Canada's interest in a common delineation of the north for statistical analysis purposes evolved from research to devise a classification to further differentiate the largely rural and remote areas that make up 96% of Canada's land area. That research led to the establishment of the census metropolitan area and census agglomeration influenced zone (MIZ) concept. When applied to census subdivisions, the MIZ categories did not work as well in northern areas as in the south. Therefore, the Geography Division set out to determine a north-south divide that would differentiate the north from the south independent of any standard geographic area boundaries.

This working paper describes the methodology used to define a continuous line across Canada to separate the north from the south, as well as lines marking transition zones on both sides of the north-south line. It also describes the indicators selected to derive the north-south line and makes comparisons to alternative definitions of the north. The resulting classification of the north complements the MIZ classification. Together, census metropolitan areas, census agglomerations, MIZ and the North form a new Statistical Area Classification (SAC) for Canada.

Two related Geography working papers (catalogue no. 92F0138MPE) provide further details about the MIZ classification. Working paper no. 2000-1 (92F0138MPE00001) briefly describes MIZ and includes tables of selected socio-economic characteristics from the 1991 Census tabulated by the MIZ categories, and working paper no. 2000-2 (92F0138MPE00002) describes the methodology used to define the MIZ classification.

Release date: 2000-02-03 - 53. Comparative Study of Analytical Consumer Price Indexes for Different Subgroups of the Reference PopulationArchivedArticles and reports: 62F0014M1998013Geography: CanadaDescription:
The reference population for the Consumer Price Index (CPI) has been represented, since the 1992 updating of the basket of goods and services, by families and unattached individuals living in private urban or rural households. The official CPI is a measure of the average percentage change over time in the cost of a fixed basket of goods and services purchased by Canadian consumers.

Because of the broadly defined target population of the CPI, the measure has been criticised for failing to reflect the inflationary experiences of certain socio-economic groups. This study examines this question for three sub-groups of the reference population of the CPI. It is an extension of earlier studies on the subject done at Statistics Canada.

In this document, analytical consumer price indexes sub-group indexes are compared to the analytical index for the whole population calculated at the national geographic level.

The findings tend to point to those of earlier Statistics Canada studies on sub-groups in the CPI reference population. Those studies have consistently concluded that a consumer price index established for a given sub-group does not differ substantially from the index for the whole reference population.

Release date: 1999-05-13 - 54. Inverse sampling design algorithmsArchivedSurveys and statistical programs – Documentation: 12-001-X19970013101Description:
In the main body of statistics, sampling is often disposed of by assuming a sampling process that selects random variables such that they are independent and identically distributed (IID). Important techniques, like regression and contingency table analysis, were developed largely in the IID world; hence, adjustments are needed to use them in complex survey settings. Rather than adjust the analysis, however, what is new in the present formulation is to draw a second sample from the original sample. In this second sample, the first set of selections are inverted, so as to yield at the end a simple random sample. Of course, to employ this two-step process to draw a single simple random sample from the usually much larger complex survey would be inefficient, so multiple simple random samples are drawn and a way to base inferences on them developed. Not all original samples can be inverted; but many practical special cases are discussed which cover a wide range of practices.

Release date: 1997-08-18 - Surveys and statistical programs – Documentation: 12-001-X19970013102Description:
The selection of auxiliary variables is considered for regression estimation in finite populations under a simple random sampling design. This problem is a basic one for model-based and model-assisted survey sampling approaches and is of practical importance when the number of variables available is large. An approach is developed in which a mean squared error estimator is minimised. This approach is compared to alternative approaches using a fixed set of auxiliary variables, a conventional significance test criterion, a condition number reduction approach and a ridge regression approach. The proposed approach is found to perform well in terms of efficiency. It is noted that the variable selection approach affects the properties of standard variance estimators and thus leads to a problem of variance estimation.

Release date: 1997-08-18 - 56. A transformation method for finite population sampling calibrated with empirical likelihoodArchivedSurveys and statistical programs – Documentation: 12-001-X19960022980Description:
In this paper, we study a confidence interval estimation method for a finite population average when some auxiliairy information is available. As demonstrated by Royall and Cumberland in a series of empirical studies, naive use of existing methods to construct confidence intervals for population averages may result in very poor conditional coverage probabilities, conditional on the sample mean of the covariate. When this happens, we propose to transform the data to improve the precision of the normal approximation. The transformed data are then used to make inference on the original population average, and the auxiliary information is incorporated into the inference directly, or by calibration with empirical likelihood. Our approach is design-based. We apply our approach to six real populations and find that when transformation is needed, our approach performs well compared to the usual regression method.

Release date: 1997-01-30 - Articles and reports: 91F0015M1996001Geography: CanadaDescription:
This paper describes the methodology for fertility projections used in the 1993-based population projections by age and sex for Canada, provinces and territories, 1993-2016. A new version of the parametric model known as the Pearsonian Type III curve was applied for projecting fertility age pattern. The Pearsonian Type III model is considered as an improvement over the Type I used in the past projections. This is because the Type III curve better portrays both the distribution of the age-specific fertility rates and the estimates of births. Since the 1993-based population projections are the first official projections to incorporate the net census undercoverage in the population base, it has been necessary to recalculate fertility rates based on the adjusted population estimates. This recalculation resulted in lowering the historical series of age-specific and total fertility rates, 1971-1993. The three sets of fertility assumptions and projections were developed with these adjusted annual fertility rates.

It is hoped that this paper will provide valuable information about the technical and analytical aspects of the current fertility projection model. Discussions on the current and future levels and age pattern of fertility in Canada, provinces and territories are also presented in the paper.

Release date: 1996-08-02 - 58. Multiple sample estimation of population and census undercount in the presence of matching errorsArchivedArticles and reports: 12-001-X199600114385Description:
The multiple capture-recapture census is reconsidered by relaxing the traditional perfect matching assumption. We propose matching error models to characterize error-prone matching mechanisms. The observed data take the form of an incomplete 2^k contingency table with one missing cell and follow a multinomial distribution. We develop a procedure for the estimation of the population size. Our approach applies to both standard log-linear models for contingency tables and log-linear models for heterogeneity of catchability. We illustrate the method and estimation using a 1988 dress rehearsal study for the 1990 census conducted by the U.S. Bureau of the Census.

Release date: 1996-06-14 - 59. Design effects for correlated (P_i - P_j)ArchivedArticles and reports: 12-001-X199500214398Description:
We present empirical evidence from 14 surveys in six countries concerning the existence and magnitude of design effects (defts) for five designs of two major types. The first type concerns deft (p_i – p_j), the difference of two proportions from a polytomous variable of three or more categories. The second type uses Chi-square tests for differences from two samples. We find that for all variables in all designs deft (p_i – p_j) \cong [deft (p_i) + deft (p_j)] / 2 are good approximations. These are empirical results, and exceptions disprove the existence of mere analytical inequalities. These results hold despite great variations of defts between variables and also between categories of the same variables. They also show the need for sample survey treatment of survey data even for analytical statistics. Furthermore they permit useful approximations of deft (p_i – p_j) from more accessible deft (p_i) values.

Release date: 1995-12-15 - 60. Median estimation using auxiliary informationArchivedArticles and reports: 12-001-X199500114408Description:
The problem of estimating the median of a finite population when an auxiliary variable is present is considered. Point and interval estimators based on a non-informative Bayesian approach are proposed. The point estimator is compared to other possible estimators and is seen to perform well in a variety of situations.

Release date: 1995-06-15

- Previous Go to previous page of Analysis results
- 1 Go to page 1 of Analysis results
- 2 Go to page 2 of Analysis results
- 3 Go to page 3 of Analysis results
- 4 Go to page 4 of Analysis results
- 5 Go to page 5 of Analysis results
- 6 (current) Go to page 6 of Analysis results
- 7 Go to page 7 of Analysis results
- Next Go to next page of Analysis results

## Reference (16)

## Reference (16) (10 to 20 of 16 results)

- Surveys and statistical programs – Documentation: 11-522-X19990015650Description:
The U.S. Manufacturing Plant Ownership Change Database (OCD) was constructed using plant-level data taken from the Census Bureau's Longitudinal Research Database (LRD). It contains data on all manufacturing plants that have experienced ownership change at least once during the period 1963-92. This paper reports the status of the OCD and discuss its research possibilities. For an empirical demonstration, data taken from the database are used to study the effects of ownership changes on plant closure.

Release date: 2000-03-02 - Surveys and statistical programs – Documentation: 11-522-X19990015658Description:
Radon, a naturally occurring gas found at some level in most homes, is an established risk factor for human lung cancer. The U.S. National Research Council (1999) has recently completed a comprehensive evaluation of the health risks of residential exposure to radon, and developed models for projecting radon lung cancer risks in the general population. This analysis suggests that radon may play a role in the etiology of 10-15% of all lung cancer cases in the United States, although these estimates are subject to considerable uncertainty. In this article, we present a partial analysis of uncertainty and variability in estimates of lung cancer risk due to residential exposure to radon in the United States using a general framework for the analysis of uncertainty and variability that we have developed previously. Specifically, we focus on estimates of the age-specific excess relative risk (ERR) and lifetime relative risk (LRR), both of which vary substantially among individuals.

Release date: 2000-03-02 - Geographic files and documentation: 92F0138M1993001Geography: CanadaDescription:
The Geography Divisions of Statistics Canada and the U.S. Bureau of the Census have commenced a cooperative research program in order to foster an improved and expanded perspective on geographic areas and their relevance. One of the major objectives is to determine a common geographic area to form a geostatistical basis for cross-border research, analysis and mapping.

This report, which represents the first stage of the research, provides a list of comparable pairs of Canadian and U.S. standard geographic areas based on current definitions. Statistics Canada and the U.S. Bureau of the Census have two basic types of standard geographic entities: legislative/administrative areas (called "legal" entities in the U.S.) and statistical areas.

The preliminary pairing of geographic areas are based on face-value definitions only. The definitions are based on the June 4, 1991 Census of Population and Housing for Canada and the April 1, 1990 Census of Population and Housing for the U.S.A. The important aspect is the overall conceptual comparability, not the precise numerical thresholds used for delineating the areas.

Data users should use this report as a general guide to compare the census geographic areas of Canada and the United States, and should be aware that differences in settlement patterns and population levels preclude a precise one-to-one relationship between conceptually similar areas. The geographic areas compared in this report provide a framework for further empirical research and analysis.

Release date: 1999-03-05 - 14. Inverse sampling design algorithmsArchivedSurveys and statistical programs – Documentation: 12-001-X19970013101Description:
In the main body of statistics, sampling is often disposed of by assuming a sampling process that selects random variables such that they are independent and identically distributed (IID). Important techniques, like regression and contingency table analysis, were developed largely in the IID world; hence, adjustments are needed to use them in complex survey settings. Rather than adjust the analysis, however, what is new in the present formulation is to draw a second sample from the original sample. In this second sample, the first set of selections are inverted, so as to yield at the end a simple random sample. Of course, to employ this two-step process to draw a single simple random sample from the usually much larger complex survey would be inefficient, so multiple simple random samples are drawn and a way to base inferences on them developed. Not all original samples can be inverted; but many practical special cases are discussed which cover a wide range of practices.

Release date: 1997-08-18 - Surveys and statistical programs – Documentation: 12-001-X19970013102Description:
The selection of auxiliary variables is considered for regression estimation in finite populations under a simple random sampling design. This problem is a basic one for model-based and model-assisted survey sampling approaches and is of practical importance when the number of variables available is large. An approach is developed in which a mean squared error estimator is minimised. This approach is compared to alternative approaches using a fixed set of auxiliary variables, a conventional significance test criterion, a condition number reduction approach and a ridge regression approach. The proposed approach is found to perform well in terms of efficiency. It is noted that the variable selection approach affects the properties of standard variance estimators and thus leads to a problem of variance estimation.

Release date: 1997-08-18 - 16. A transformation method for finite population sampling calibrated with empirical likelihoodArchivedSurveys and statistical programs – Documentation: 12-001-X19960022980Description:
In this paper, we study a confidence interval estimation method for a finite population average when some auxiliairy information is available. As demonstrated by Royall and Cumberland in a series of empirical studies, naive use of existing methods to construct confidence intervals for population averages may result in very poor conditional coverage probabilities, conditional on the sample mean of the covariate. When this happens, we propose to transform the data to improve the precision of the normal approximation. The transformed data are then used to make inference on the original population average, and the auxiliary information is incorporated into the inference directly, or by calibration with empirical likelihood. Our approach is design-based. We apply our approach to six real populations and find that when transformation is needed, our approach performs well compared to the usual regression method.

Release date: 1997-01-30

- Date modified: