Inference and foundations

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

1 facets displayed. 0 facets selected.

Survey or statistical program

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (82)

All (82) (70 to 80 of 82 results)

  • Articles and reports: 12-001-X199600114385
    Description:

    The multiple capture-recapture census is reconsidered by relaxing the traditional perfect matching assumption. We propose matching error models to characterize error-prone matching mechanisms. The observed data take the form of an incomplete 2^k contingency table with one missing cell and follow a multinomial distribution. We develop a procedure for the estimation of the population size. Our approach applies to both standard log-linear models for contingency tables and log-linear models for heterogeneity of catchability. We illustrate the method and estimation using a 1988 dress rehearsal study for the 1990 census conducted by the U.S. Bureau of the Census.

    Release date: 1996-06-14

  • Articles and reports: 12-001-X199500214398
    Description:

    We present empirical evidence from 14 surveys in six countries concerning the existence and magnitude of design effects (defts) for five designs of two major types. The first type concerns deft (p_i – p_j), the difference of two proportions from a polytomous variable of three or more categories. The second type uses Chi-square tests for differences from two samples. We find that for all variables in all designs deft (p_i – p_j) \cong [deft (p_i) + deft (p_j)] / 2 are good approximations. These are empirical results, and exceptions disprove the existence of mere analytical inequalities. These results hold despite great variations of defts between variables and also between categories of the same variables. They also show the need for sample survey treatment of survey data even for analytical statistics. Furthermore they permit useful approximations of deft (p_i – p_j) from more accessible deft (p_i) values.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500114408
    Description:

    The problem of estimating the median of a finite population when an auxiliary variable is present is considered. Point and interval estimators based on a non-informative Bayesian approach are proposed. The point estimator is compared to other possible estimators and is seen to perform well in a variety of situations.

    Release date: 1995-06-15

  • Articles and reports: 12-001-X199200214487
    Description:

    This paper reviews the idea of robustness for randomisation and model-based inference for descriptive and analytic surveys. The lack of robustness for model-based procedures can be partially overcome by careful design. In this paper a robust model-based approach to analysis is proposed based on smoothing methods.

    Release date: 1992-12-15

  • Articles and reports: 12-001-X199200214488
    Description:

    In many finite population sampling problems the design that is optimal in the sense of minimizing the variance of the best linear unbiased estimator under a particular working model is bad in the sense of robustness - it leaves the estimator extremely vulnerable to bias if the working model is incorrect. However there are some important models under which one design provides both efficiency and robustness. We present a theorem that identifies such models and their optimal designs.

    Release date: 1992-12-15

  • Articles and reports: 12-001-X199100214504
    Description:

    Simple or marginal quota surveys are analyzed using two methods: (1) behaviour modelling (superpopulation model) and prediction estimation, and (2) sample modelling (simple restricted random sampling) and estimation derived from the sample distribution. In both cases the limitations of the theory used to establish the variance formulas and estimates when measuring totals are described. An extension of the quota method (non-proportional quotas) is also briefly described and analyzed. In some cases, this may provide a very significant improvement in survey precision. The advantages of the quota method are compared with those of random sampling. The latter remains indispensable in the case of large scale surveys within the framework of Official Statistics.

    Release date: 1991-12-16

  • Articles and reports: 12-001-X199100114521
    Description:

    Marginal and approximate conditional likelihoods are given for the correlation parameters in a normal linear regression model with correlated errors. This general likelihood approach is applied to obtain marginal and approximate conditional likelihoods for the correlation parameters in sampling on successive occasions under both simple random sampling on each occasion and more complex surveys.

    Release date: 1991-06-14

  • Articles and reports: 12-001-X199000114560
    Description:

    Early developments in sampling theory and methods largely concentrated on efficient sampling designs and associated estimation techniques for population totals or means. More recently, the theoretical foundations of survey based estimation have also been critically examined, and formal frameworks for inference on totals or means have emerged. During the past 10 years or so, rapid progress has also been made in the development of methods for the analysis of survey data that take account of the complexity of the sampling design. The scope of this paper is restricted to an overview and appraisal of some of these developments.

    Release date: 1990-06-15

  • Articles and reports: 12-001-X198900214568
    Description:

    The paper describes a Monte Carlo study of simultaneous confidence interval procedures for k > 2 proportions, under a model of two-stage cluster sampling. The procedures investigated include: (i) standard multinomial intervals; (ii) Scheffé intervals based on sample estimates of the variances of cell proportions; (iii) Quesenberry-Hurst intervals adapted for clustered data using Rao and Scott’s first and second order adjustments to X^2; (iv) simple Bonferroni intervals; (v) Bonferroni intervals based on transformations of the estimated proportions; (vi) Bonferroni intervals computed using the critical points of Student’s t. In several realistic situations, actual coverage rates of the multinomial procedures were found to be seriously depressed compared to the nominal rate. The best performing intervals, from the point of view of coverage rates and coverage symmetry (an extension of an idea due to Jennings), were the t-based Bonferroni intervals derived using log and logit transformations. Of the Scheffé-like procedures, the best performance was provided by Quesenberry-Hurst intervals in combination with first-order Rao-Scott adjustments.

    Release date: 1989-12-15

  • Articles and reports: 12-001-X198500114364
    Description:

    Conventional methods of inference in survey sampling are critically examined. The need for conditioning the inference on recognizable subsets of the population is emphasized. A number of real examples involving random sample sizes are presented to illustrate inferences conditional on the realized sample configuration and associated difficulties. The examples include the following: estimation of (a) population mean under simple random sampling; (b) population mean in the presence of outliers; (c) domain total and domain mean; (d) population mean with two-way stratification; (e) population mean in the presence of non-responses; (f) population mean under general designs. The conditional bias and the conditional variance of estimators of a population mean (or a domain mean or total), and the associated confidence intervals, are examined.

    Release date: 1985-06-14
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (69)

Analysis (69) (30 to 40 of 69 results)

  • Articles and reports: 11-522-X20030017700
    Description:

    This paper suggests a useful framework for exploring the effects of moderate deviations from idealized conditions. It offers evaluation criteria for point estimators and interval estimators.

    Release date: 2005-01-26

  • Articles and reports: 11-522-X20030017722
    Description:

    This paper shows how to adapt design-based and model-based frameworks to the case of two-stage sampling.

    Release date: 2005-01-26

  • Articles and reports: 11-522-X20020016708
    Description:

    In this paper, we discuss the analysis of complex health survey data by using multivariate modelling techniques. Main interests are in various design-based and model-based methods that aim at accounting for the design complexities, including clustering, stratification and weighting. Methods covered include generalized linear modelling based on pseudo-likelihood and generalized estimating equations, linear mixed models estimated by restricted maximum likelihood, and hierarchical Bayes techniques using Markov Chain Monte Carlo (MCMC) methods. The methods will be compared empirically, using data from an extensive health interview and examination survey conducted in Finland in 2000 (Health 2000 Study).

    The data of the Health 2000 Study were collected using personal interviews, questionnaires and clinical examinations. A stratified two-stage cluster sampling design was used in the survey. The sampling design involved positive intra-cluster correlation for many study variables. For a closer investigation, we selected a small number of study variables from the health interview and health examination phases. In many cases, the different methods produced similar numerical results and supported similar statistical conclusions. Methods that failed to account for the design complexities sometimes led to conflicting conclusions. We also discuss the application of the methods in this paper by using standard statistical software products.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016717
    Description:

    In the United States, the National Health and Nutrition Examination Survey (NHANES) is linked to the National Health Interview Survey (NHIS) at the primary sampling unit level (the same counties, but not necessarily the same persons, are in both surveys). The NHANES examines about 5,000 persons per year, while the NHIS samples about 100,000 persons per year. In this paper, we present and develop properties of models that allow NHIS and administrative data to be used as auxiliary information for estimating quantities of interest in the NHANES. The methodology, related to Fay-Herriot (1979) small-area models and to calibration estimators in Deville and Sarndal (1992), accounts for the survey designs in the error structure.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016719
    Description:

    This study takes a look at the modelling methods used for public health data. Public health has a renewed interest in the impact of the environment on health. Ecological or contextual studies ideally investigate these relationships using public health data augmented with environmental characteristics in multilevel or hierarchical models. In these models, individual respondents in health data are the first level and community data are the second level. Most public health data use complex sample survey designs, which require analyses accounting for the clustering, nonresponse, and poststratification to obtain representative estimates of prevalence of health risk behaviours.

    This study uses the Behavioral Risk Factor Surveillance System (BRFSS), a state-specific US health risk factor surveillance system conducted by the Center for Disease Control and Prevention, which assesses health risk factors in over 200,000 adults annually. BRFSS data are now available at the metropolitan statistical area (MSA) level and provide quality health information for studies of environmental effects. MSA-level analyses combining health and environmental data are further complicated by joint requirements of the survey sample design and the multilevel analyses.

    We compare three modelling methods in a study of physical activity and selected environmental factors using BRFSS 2000 data. Each of the methods described here is a valid way to analyse complex sample survey data augmented with environmental information, although each accounts for the survey design and multilevel data structure in a different manner and is thus appropriate for slightly different research questions.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016727
    Description:

    The census data are widely used in the distribution and targeting of resources at national, regional and local levels. In the United Kingdom (UK), a population census is conducted every 10 years. As time elapses, the census data become outdated and less relevant, thus making the distribution of resources less equitable. This paper examines alternative methods in rectifying this.

    A number of small area methods have been developed for producing postcensal estimates, including the Structural Preserving Estimation technique as a result of Purcell and Kish (1980). This paper develops an alternative approach that is based on a linear mixed modelling approach to producing postcensal estimates. The validity of the methodology is tested on simulated data from the Finnish population register and the technique is applied to producing updated estimates for a number of the 1991 UK census variables.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016730
    Description:

    A wide class of models of interest in social and economic research can be represented by specifying a parametric structure for the covariances of observed variables. The availability of software, such as LISREL (Jöreskog and Sörbom 1988) and EQS (Bentler 1995), has enabled these models to be fitted to survey data in many applications. In this paper, we consider approaches to inference about such models using survey data derived by complex sampling schemes. We consider evidence of finite sample biases in parameter estimation and ways to reduce such biases (Altonji and Segal 1996) and associated issues of efficiency of estimation, standard error estimation and testing. We use longitudinal data from the British Household Panel Survey for illustration. As these data are subject to attrition, we also consider the issue of how to use nonresponse weights in the modelling.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016731
    Description:

    Behavioural researchers use a variety of techniques to predict respondent scores on constructs that are not directly observable. Examples of such constructs include job satisfaction, work stress, aptitude for graduate study, children's mathematical ability, etc. The techniques commonly used for modelling and predicting scores on such constructs include factor analysis, classical psychometric scaling and item response theory (IRT), and for each technique there are often several different strategies that can be used to generate individual scores. However, researchers are seldom satisfied with simply measuring these constructs. They typically use the derived scores in multiple regression, analysis of variance and numerous multivariate procedures. Though using predicted scores in this way can result in biased estimates of model parameters, not all researchers are aware of this difficulty. The paper will review the literature on this issue, with particular emphasis on IRT methods. Problems will be illustrated, some remedies suggested, and areas for further research will be identified.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016733
    Description:

    While censuses and surveys are often said to measure populations as they are, most reflect information about individuals as they were at the time of measurement, or even at some prior time point. Inferences from such data therefore should take into account change over time at both the population and individual levels. In this paper, we provide a unifying framework for such inference problems, illustrating it through a diverse series of examples including: (1) estimating residency status on Census Day using multiple administrative records, (2) combining administrative records for estimating the size of the US population, (3) using rolling averages from the American Community Survey, and (4) estimating the prevalence of human rights abuses.

    Specifically, at the population level, the estimands of interest, such as the size or mean characteristics of a population, might be changing. At the same time, individual subjects might be moving in and out of the frame of the study or changing their characteristics. Such changes over time can affect statistical studies of government data that combine information from multiple data sources, including censuses, surveys and administrative records, an increasingly common practice. Inferences from the resulting merged databases often depend heavily on specific choices made in combining, editing and analysing the data that reflect assumptions about how populations of interest change or remain stable over time.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016743
    Description:

    There is much interest in using data from longitudinal surveys to help understand life history processes such as education, employment, fertility, health and marriage. The analysis of data on the durations of spells or sojourns that individuals spend in certain states (e.g., employment, marriage) is a primary tool in studying such processes. This paper examines methods for analysing duration data that address important features associated with longitudinal surveys: the use of complex survey designs in heterogeneous populations; missing or inaccurate information about the timing of events; and the possibility of non-ignorable dropout or censoring mechanisms. Parametric and non-parametric techniques for estimation and for model checking are considered. Both new and existing methodology are proposed and applied to duration data from Canada's Survey of Labour and Income Dynamics (SLID).

    Release date: 2004-09-13
Reference (16)

Reference (16) (0 to 10 of 16 results)

  • Surveys and statistical programs – Documentation: 11-522-X201300014259
    Description:

    In an effort to reduce response burden on farm operators, Statistics Canada is studying alternative approaches to telephone surveys for producing field crop estimates. One option is to publish harvested area and yield estimates in September as is currently done, but to calculate them using models based on satellite and weather data, and data from the July telephone survey. However before adopting such an approach, a method must be found which produces estimates with a sufficient level of accuracy. Research is taking place to investigate different possibilities. Initial research results and issues to consider are discussed in this paper.

    Release date: 2014-10-31

  • Surveys and statistical programs – Documentation: 12-001-X201300211887
    Description:

    Multi-level models are extensively used for analyzing survey data with the design hierarchy matching the model hierarchy. We propose a unified approach, based on a design-weighted log composite likelihood, for two-level models that leads to design-model consistent estimators of the model parameters even when the within cluster sample sizes are small provided the number of sample clusters is large. This method can handle both linear and generalized linear two-level models and it requires level 2 and level 1 inclusion probabilities and level 1 joint inclusion probabilities, where level 2 represents a cluster and level 1 an element within a cluster. Results of a simulation study demonstrating superior performance of the proposed method relative to existing methods under informative sampling are also reported.

    Release date: 2014-01-15

  • Surveys and statistical programs – Documentation: 12-001-X201200211758
    Description:

    This paper develops two Bayesian methods for inference about finite population quantiles of continuous survey variables from unequal probability sampling. The first method estimates cumulative distribution functions of the continuous survey variable by fitting a number of probit penalized spline regression models on the inclusion probabilities. The finite population quantiles are then obtained by inverting the estimated distribution function. This method is quite computationally demanding. The second method predicts non-sampled values by assuming a smoothly-varying relationship between the continuous survey variable and the probability of inclusion, by modeling both the mean function and the variance function using splines. The two Bayesian spline-model-based estimators yield a desirable balance between robustness and efficiency. Simulation studies show that both methods yield smaller root mean squared errors than the sample-weighted estimator and the ratio and difference estimators described by Rao, Kovar, and Mantel (RKM 1990), and are more robust to model misspecification than the regression through the origin model-based estimator described in Chambers and Dunstan (1986). When the sample size is small, the 95% credible intervals of the two new methods have closer to nominal confidence coverage than the sample-weighted estimator.

    Release date: 2012-12-19

  • Surveys and statistical programs – Documentation: 12-001-X201200111688
    Description:

    We study the problem of nonignorable nonresponse in a two dimensional contingency table which can be constructed for each of several small areas when there is both item and unit nonresponse. In general, the provision for both types of nonresponse with small areas introduces significant additional complexity in the estimation of model parameters. For this paper, we conceptualize the full data array for each area to consist of a table for complete data and three supplemental tables for missing row data, missing column data, and missing row and column data. For nonignorable nonresponse, the total cell probabilities are allowed to vary by area, cell and these three types of "missingness". The underlying cell probabilities (i.e., those which would apply if full classification were always possible) for each area are generated from a common distribution and their similarity across the areas is parametrically quantified. Our approach is an extension of the selection approach for nonignorable nonresponse investigated by Nandram and Choi (2002a, b) for binary data; this extension creates additional complexity because of the multivariate nature of the data coupled with the small area structure. As in that earlier work, the extension is an expansion model centered on an ignorable nonresponse model so that the total cell probability is dependent upon which of the categories is the response. Our investigation employs hierarchical Bayesian models and Markov chain Monte Carlo methods for posterior inference. The models and methods are illustrated with data from the third National Health and Nutrition Examination Survey.

    Release date: 2012-06-27

  • Surveys and statistical programs – Documentation: 12-001-X201100211603
    Description:

    In many sample surveys there are items requesting binary response (e.g., obese, not obese) from a number of small areas. Inference is required about the probability for a positive response (e.g., obese) in each area, the probability being the same for all individuals in each area and different across areas. Because of the sparseness of the data within areas, direct estimators are not reliable, and there is a need to use data from other areas to improve inference for a specific area. Essentially, a priori the areas are assumed to be similar, and a hierarchical Bayesian model, the standard beta-binomial model, is a natural choice. The innovation is that a practitioner may have much-needed additional prior information about a linear combination of the probabilities. For example, a weighted average of the probabilities is a parameter, and information can be elicited about this parameter, thereby making the Bayesian paradigm appropriate. We have modified the standard beta-binomial model for small areas to incorporate the prior information on the linear combination of the probabilities, which we call a constraint. Thus, there are three cases. The practitioner (a) does not specify a constraint, (b) specifies a constraint and the parameter completely, and (c) specifies a constraint and information which can be used to construct a prior distribution for the parameter. The griddy Gibbs sampler is used to fit the models. To illustrate our method, we use an example on obesity of children in the National Health and Nutrition Examination Survey in which the small areas are formed by crossing school (middle, high), ethnicity (white, black, Mexican) and gender (male, female). We use a simulation study to assess some of the statistical features of our method. We have shown that the gain in precision beyond (a) is in the order with (b) larger than (c).

    Release date: 2011-12-21

  • Surveys and statistical programs – Documentation: 12-001-X201000111250
    Description:

    We propose a Bayesian Penalized Spline Predictive (BPSP) estimator for a finite population proportion in an unequal probability sampling setting. This new method allows the probabilities of inclusion to be directly incorporated into the estimation of a population proportion, using a probit regression of the binary outcome on the penalized spline of the inclusion probabilities. The posterior predictive distribution of the population proportion is obtained using Gibbs sampling. The advantages of the BPSP estimator over the Hájek (HK), Generalized Regression (GR), and parametric model-based prediction estimators are demonstrated by simulation studies and a real example in tax auditing. Simulation studies show that the BPSP estimator is more efficient, and its 95% credible interval provides better confidence coverage with shorter average width than the HK and GR estimators, especially when the population proportion is close to zero or one or when the sample is small. Compared to linear model-based predictive estimators, the BPSP estimators are robust to model misspecification and influential observations in the sample.

    Release date: 2010-06-29

  • Surveys and statistical programs – Documentation: 12-002-X20040027035
    Description:

    As part of the processing of the National Longitudinal Survey of Children and Youth (NLSCY) cycle 4 data, historical revisions have been made to the data of the first 3 cycles, either to correct errors or to update the data. During processing, particular attention was given to the PERSRUK (Person Identifier) and the FIELDRUK (Household Identifier). The same level of attention has not been given to the other identifiers that are included in the data base, the CHILDID (Person identifier) and the _IDHD01 (Household identifier). These identifiers have been created for the public files and can also be found in the master files by default. The PERSRUK should be used to link records between files and the FIELDRUK to determine the household when using the master files.

    Release date: 2004-10-05

  • Surveys and statistical programs – Documentation: 13F0026M2001003
    Description:

    Initial results from the Survey of Financial Security (SFS), which provides information on the net worth of Canadians, were released on March 15 2001, in The daily. The survey collected information on the value of the financial and non-financial assets owned by each family unit and on the amount of their debt.

    Statistics Canada is currently refining this initial estimate of net worth by adding to it an estimate of the value of benefits accrued in employer pension plans. This is an important addition to any asset and debt survey as, for many family units, it is likely to be one of the largest assets. With the aging of the population, information on pension accumulations is greatly needed to better understand the financial situation of those nearing retirement. These updated estimates of the Survey of Financial Security will be released in late fall 2001.

    The process for estimating the value of employer pension plan benefits is a complex one. This document describes the methodology for estimating that value, for the following groups: a) persons who belonged to an RPP at the time of the survey (referred to as current plan members); b) persons who had previously belonged to an RPP and either left the money in the plan or transferred it to a new plan; c) persons who are receiving RPP benefits.

    This methodology was proposed by Hubert Frenken and Michael Cohen. The former has many years of experience with Statistics Canada working with data on employer pension plans; the latter is a principal with the actuarial consulting firm William M. Mercer. Earlier this year, Statistics Canada carried out a public consultation on the proposed methodology. This report includes updates made as a result of feedback received from data users.

    Release date: 2001-09-05

  • Surveys and statistical programs – Documentation: 13F0026M2001002
    Description:

    The Survey of Financial Security (SFS) will provide information on the net worth of Canadians. In order to do this, information was collected - in May and June 1999 - on the value of the assets and debts of each of the families or unattached individuals in the sample. The value of one particular asset is not easy to determine, or to estimate. That is the present value of the amount people have accrued in their employer pension plan. These plans are often called registered pension plans (RPP), as they must be registered with Canada Customs and Revenue Agency. Although some RPP members receive estimates of the value of their accrued benefit, in most cases plan members would not know this amount. However, it is likely to be one of the largest assets for many family units. And, as the baby boomers approach retirement, information on their pension accumulations is much needed to better understand their financial readiness for this transition.

    The intent of this paper is to: present, for discussion, a methodology for estimating the present value of employer pension plan benefits for the Survey of Financial Security; and to seek feedback on the proposed methodology. This document proposes a methodology for estimating the value of employer pension plan benefits for the following groups:a) persons who belonged to an RPP at the time of the survey (referred to as current plan members); b) persons who had previously belonged to an RPP and either left the money in the plan or transferred it to a new plan; c) persons who are receiving RPP benefits.

    Release date: 2001-02-07

  • Surveys and statistical programs – Documentation: 11-522-X19990015642
    Description:

    The Longitudinal Immigration Database (IMDB) links immigration and taxation administrative records into a comprehensive source of data on the labour market behaviour of the landed immigrant population in Canada. It covers the period 1980 to 1995 and will be updated annually starting with the 1996 tax year in 1999. Statistics Canada manages the database on behalf of a federal-provincial consortium led by Citizenship and Immigration Canada. The IMDB was created specifically to respond to the need for detailed and reliable data on the performance and impact of immigration policies and programs. It is the only source of data at Statistics Canada that provides a direct link between immigration policy levers and the economic performance of immigrants. The paper will examine the issues related to the development of a longitudinal database combining administrative records to support policy-relevant research and analysis. Discussion will focus specifically on the methodological, conceptual, analytical and privacy issues involved in the creation and ongoing development of this database. The paper will also touch briefly on research findings, which illustrate the policy outcome links the IMDB allows policy-makers to investigate.

    Release date: 2000-03-02
Date modified: