Inference and foundations

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

1 facets displayed. 0 facets selected.

Survey or statistical program

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (69)

All (69) (20 to 30 of 69 results)

  • Articles and reports: 12-001-X200800110606
    Description:

    Data from election polls in the US are typically presented in two-way categorical tables, and there are many polls before the actual election in November. For example, in the Buckeye State Poll in 1998 for governor there are three polls, January, April and October; the first category represents the candidates (e.g., Fisher, Taft and other) and the second category represents the current status of the voters (likely to vote and not likely to vote for governor of Ohio). There is a substantial number of undecided voters for one or both categories in all three polls, and we use a Bayesian method to allocate the undecided voters to the three candidates. This method permits modeling different patterns of missingness under ignorable and nonignorable assumptions, and a multinomial-Dirichlet model is used to estimate the cell probabilities which can help to predict the winner. We propose a time-dependent nonignorable nonresponse model for the three tables. Here, a nonignorable nonresponse model is centered on an ignorable nonresponse model to induce some flexibility and uncertainty about ignorabilty or nonignorability. As competitors we also consider two other models, an ignorable and a nonignorable nonresponse model. These latter two models assume a common stochastic process to borrow strength over time. Markov chain Monte Carlo methods are used to fit the models. We also construct a parameter that can potentially be used to predict the winner among the candidates in the November election.

    Release date: 2008-06-26

  • Articles and reports: 11-522-X200600110392
    Description:

    We use a robust Bayesian method to analyze data with possibly nonignorable nonresponse and selection bias. A robust logistic regression model is used to relate the response indicators (Bernoulli random variable) to the covariates, which are available for everyone in the finite population. This relationship can adequately explain the difference between respondents and nonrespondents for the sample. This robust model is obtained by expanding the standard logistic regression model to a mixture of Student's distributions, thereby providing propensity scores (selection probability) which are used to construct adjustment cells. The nonrespondents' values are filled in by drawing a random sample from a kernel density estimator, formed from the respondents' values within the adjustment cells. Prediction uses a linear spline rank-based regression of the response variable on the covariates by areas, sampling the errors from another kernel density estimator; thereby further robustifying our method. We use Markov chain Monte Carlo (MCMC) methods to fit our model. The posterior distribution of a quantile of the response variable is obtained within each sub-area using the order statistic over all the individuals (sampled and nonsampled). We compare our robust method with recent parametric methods

    Release date: 2008-03-17

  • Articles and reports: 11-522-X200600110398
    Description:

    The study of longitudinal data is vital in terms of accurately observing changes in responses of interest for individuals, communities, and larger populations over time. Linear mixed effects models (for continuous responses observed over time) and generalized linear mixed effects models and generalized estimating equations (for more general responses such as binary or count data observed over time) are the most popular techniques used for analyzing longitudinal data from health studies, though, as with all modeling techniques, these approaches have limitations, partly due to their underlying assumptions. In this review paper, we will discuss some advances, including curve-based techniques, which make modeling longitudinal data more flexible. Three examples will be presented from the health literature utilizing these more flexible procedures, with the goal of demonstrating that some otherwise difficult questions can be reasonably answered when analyzing complex longitudinal data in population health studies.

    Release date: 2008-03-17

  • Articles and reports: 11-522-X200600110419
    Description:

    Health services research generally relies on observational data to compare outcomes of patients receiving different therapies. Comparisons of patient groups in observational studies may be biased, in that outcomes differ due to both the effects of treatment and the effects of patient prognosis. In some cases, especially when data are collected on detailed clinical risk factors, these differences can be controlled for using statistical or epidemiological methods. In other cases, when unmeasured characteristics of the patient population affect both the decision to provide therapy and the outcome, these differences cannot be removed using standard techniques. Use of health administrative data requires particular cautions in undertaking observational studies since important clinical information does not exist. We discuss several statistical and epidemiological approaches to remove overt (measurable) and hidden (unmeasurable) bias in observational studies. These include regression model-based case-mix adjustment, propensity-based matching, redefining the exposure variable of interest, and the econometric technique of instrumental variable (IV) analysis. These methods are illustrated using examples from the medical literature including prediction of one-year mortality following heart attack; the return to health care spending in higher spending U.S. regions in terms of clinical and financial benefits; and the long-term survival benefits of invasive cardiac management of heart attack patients. It is possible to use health administrative data for observational studies provided careful attention is paid to addressing issues of reverse causation and unmeasured confounding.

    Release date: 2008-03-17

  • Articles and reports: 92F0138M2008002
    Description:

    On November 26 2006, the Organization for Economic Co-operation and Development (OECD) held an international workshop on defining and measuring metropolitan regions. The reasons the OECD organized this workshop are listed below.

    1. Metropolitan Regions have become a crucial economic actor in today's highly integrated world. Not only do they play their traditional role of growth poles in their countries but they function as essential nodes of the global economy.2. Policy makers, international organisations and research networks are increasingly called to compare the economic and social performances of Metropolitan Regions across countries. Examples of this work undertaken in international organisation and networks include the UN-Habitat, the EU Urban Audit, ESPON and the OECD Competitive Cities.3. The scope of what we can learn from these international comparisons, however, is limited by the lack of a comparable definition of Metropolitan Regions. Although most countries have their own definitions, these vary significantly from one country to another. Furthermore, in search for higher cross-country comparability, international initiatives have - somehow paradoxically - generated an even larger number of definitions.4. In principle, there is no clear reason to prefer one definition to another. As each definition has been elaborated for a specific analytical purpose, it captures some features of a Metropolitan Region while it tends to overlook others. The issue, rather, is that we do not know the pros and the cons of different definitions nor, most important, the analytical implications of using one definition rather than another. 5. In order to respond to these questions, the OECD hosted an international workshop on 'Defining and Measuring Metropolitan Regions'. The workshop brought together major international organisations (the UN, Eurostat, the World Bank, and the OECD), National Statistical Offices and researchers from this field. The aim of the workshop was to develop some 'guiding principles', which could be agreed upon among the participants and would eventually provide the basis for some form of 'International Guidance' for comparing Metropolitan Regions across countries.

    This working paper was presented at this workshop. It provides the conceptual and methodological basis for the definition of metropolitan areas in Canada and provides a detailed comparison of Canada's methodology to that of the USA. The intent was to encourage discussion regarding Canada's approach to defining metropolitan areas in the effort to identify the 'guiding principles'. It is being made available as a working paper to continue this discussion and to provide background to the user community to encourage dialogue and commentary from the user community regarding Canada's metropolitan area methodology.

    Release date: 2008-02-20

  • Articles and reports: 92F0138M2007001
    Description:

    Statistics Canada creates files that provide the link between postal codes and the geographic areas by which it disseminates statistical data. By linking postal codes to the Statistics Canada geographic areas, Statistics Canada facilitates the extraction and subsequent aggregation of data for selected geographic areas from files available to users. Users can then take data from Statistics Canada for their areas and tabulate this with other data for these same areas to create a combined statistical profile for these areas.

    An issue has been the methodology used by Statistics Canada to establish the linkage of postal codes to geographic areas. In order to address this issue, Statistics Canada decided to create a conceptual framework on which to base the rules for linking postal codes and Statistics Canada's geographic areas. This working paper presents the conceptual framework and the geocoding rules. The methodology described in this paper will be the basis for linking postal codes to the 2006 Census geographic areas. This paper is presented for feedback from users of Statistics Canada's postal codes related products.

    Release date: 2007-02-12

  • Articles and reports: 12-001-X20060019257
    Description:

    In the presence of item nonreponse, two approaches have been traditionally used to make inference on parameters of interest. The first approach assumes uniform response within imputation cells whereas the second approach assumes ignorable response but make use of a model on the variable of interest as the basis for inference. In this paper, we propose a third appoach that assumes a specified ignorable response mechanism without having to specify a model on the variable of interest. In this case, we show how to obtain imputed values which lead to estimators of a total that are approximately unbiased under the proposed approach as well as the second approach. Variance estimators of the imputed estimators that are approximately unbiased are also obtained using an approach of Fay (1991) in which the order of sampling and response is reversed. Finally, simulation studies are conducted to investigate the finite sample performance of the methods in terms of bias and mean square error.

    Release date: 2006-07-20

  • Articles and reports: 11F0024M20050008805
    Description:

    This paper reports on the potential development of sub-annual indicators for selected service industries using Goods and Services Tax (GST) data. The services sector is now of central importance to advanced economies; however, our knowledge of this sector remains incomplete, partly due to a lack of data. The Voorburg Group on Service Statistics has been meeting for almost twenty years to develop and incorporate better measures for the services sector. Despite this effort, many sub-annual economic measures continue to rely on output data for the goods-producing sector and, with the exception of distributive trades, on employment data for service industries.

    The development of sub-annual indicators for service industries raises two questions regarding the national statistical program. First, is there a need for service output indicators to supplement existing sub-annual measures? And second, what service industries are the most promising for development? The paper begins by reviewing the importance of service industries and how they behave during economic downturns. Next, it examines considerations in determining which service industries to select as GST-based, sub-annual indicators. A case study of the accommodation services industry serves to illustrate improving timeliness and accuracy. We conclude by discussing the opportunities for, and limitations of, these indicators.

    Release date: 2005-10-20

  • Articles and reports: 12-002-X20050018030
    Description:

    People often wish to use survey micro-data to study whether the rate of occurrence of a particular condition in a subpopulation is the same as the rate of occurrence in the full population. This paper describes some alternatives for making inferences about such a rate difference and shows whether and how these alternatives may be implemented in three different survey software packages. The software packages illustrated - SUDAAN, WesVar and Bootvar - all can make use of bootstrap weights provided by the analyst to carry out variance estimation.

    Release date: 2005-06-23

  • Articles and reports: 12-001-X20040027753
    Description:

    Samplers often distrust model-based approaches to survey inference because of concerns about misspecification when models are applied to large samples from complex populations. We suggest that the model-based paradigm can work very successfully in survey settings, provided models are chosen that take into account the sample design and avoid strong parametric assumptions. The Horvitz-Thompson (HT) estimator is a simple design-unbiased estimator of the finite population total. From a modeling perspective, the HT estimator performs well when the ratios of the outcome values and the inclusion probabilities are exchangeable. When this assumption is not met, the HT estimator can be very inefficient. In Zheng and Little (2003, 2004) we used penalized splines (p-splines) to model smoothly - varying relationships between the outcome and the inclusion probabilities in one-stage probability proportional to size (PPS) samples. We showed that p spline model-based estimators are in general more efficient than the HT estimator, and can provide narrower confidence intervals with close to nominal confidence coverage. In this article, we extend this approach to two-stage sampling designs. We use a p-spline based mixed model that fits a nonparametric relationship between the primary sampling unit (PSU) means and a measure of PSU size, and incorporates random effects to model clustering. For variance estimation we consider the empirical Bayes model-based variance, the jackknife and balanced repeated replication (BRR) methods. Simulation studies on simulated data and samples drawn from public use microdata in the 1990 census demonstrate gains for the model-based p-spline estimator over the HT estimator and linear model-assisted estimators. Simulations also show the variance estimation methods yield confidence intervals with satisfactory confidence coverage. Interestingly, these gains can be seen for a common equal-probability design, where the first stage selection is PPS and the second stage selection probabilities are proportional to the inverse of the first stage inclusion probabilities, and the HT estimator leads to the unweighted mean. In situations that most favor the HT estimator, the model-based estimators have comparable efficiency.

    Release date: 2005-02-03
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (69)

Analysis (69) (40 to 50 of 69 results)

  • Articles and reports: 11-522-X20020016745
    Description:

    The attractiveness of the Regression Discontinuity Design (RDD) rests on its close similarity to a normal experimental design. On the other hand, it is of limited applicability since it is not often the case that units are assigned to the treatment group on the basis of an observable (to the analyst) pre-program measure. Besides, it only allows identification of the mean impact on a very specific subpopulation. In this technical paper, we show that the RDD straightforwardly generalizes to the instances in which the units' eligibility is established on an observable pre-program measure with eligible units allowed to freely self-select into the program. This set-up also proves to be very convenient for building a specification test on conventional non-experimental estimators of the program mean impact. The data requirements are clearly described.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016750
    Description:

    Analyses of data from social and economic surveys sometimes use generalized variance function models to approximate the design variance of point estimators of population means and proportions. Analysts may use the resulting standard error estimates to compute associated confidence intervals or test statistics for the means and proportions of interest. In comparison with design-based variance estimators computed directly from survey microdata, generalized variance function models have several potential advantages, as will be discussed in this paper, including operational simplicity; increased stability of standard errors; and, for cases involving public-use datasets, reduction of disclosure limitation problems arising from the public release of stratum and cluster indicators.

    These potential advantages, however, may be offset in part by several inferential issues. First, the properties of inferential statistics based on generalized variance functions (e.g., confidence interval coverage rates and widths) depend heavily on the relative empirical magnitudes of the components of variability associated, respectively, with:

    (a) the random selection of a subset of items used in estimation of the generalized variance function model(b) the selection of sample units under a complex sample design (c) the lack of fit of the generalized variance function model (d) the generation of a finite population under a superpopulation model.

    Second, under conditions, one may link each of components (a) through (d) with different empirical measures of the predictive adequacy of a generalized variance function model. Consequently, these measures of predictive adequacy can offer us some insight into the extent to which a given generalized variance function model may be appropriate for inferential use in specific applications.

    Some of the proposed diagnostics are applied to data from the US Survey of Doctoral Recipients and the US Current Employment Survey. For the Survey of Doctoral Recipients, components (a), (c) and (d) are of principal concern. For the Current Employment Survey, components (b), (c) and (d) receive principal attention, and the availability of population microdata allow the development of especially detailed models for components (b) and (c).

    Release date: 2004-09-13

  • Articles and reports: 12-001-X20030026785
    Description:

    To avoid disclosures, one approach is to release partially synthetic, public use microdata sets. These comprise the units originally surveyed, but some collected values, for example sensitive values at high risk of disclosure or values of key identifiers, are replaced with multiple imputations. Although partially synthetic approaches are currently used to protect public use data, valid methods of inference have not been developed for them. This article presents such methods. They are based on the concepts of multiple imputation for missing data but use different rules for combining point and variance estimates. The combining rules also differ from those for fully synthetic data sets developed by Raghunathan, Reiter and Rubin (2003). The validity of these new rules is illustrated in simulation studies.

    Release date: 2004-01-27

  • Articles and reports: 12-001-X20030016610
    Description:

    In the presence of item nonreponse, unweighted imputation methods are often used in practice but they generally lead to biased estimators under uniform response within imputation classes. Following Skinner and Rao (2002), we propose a bias-adjusted estimator of a population mean under unweighted ratio imputation and random hot-deck imputation and derive linearization variance estimators. A small simulation study is conducted to study the performance of the methods in terms of bias and mean square error. Relative bias and relative stability of the variance estimators are also studied.

    Release date: 2003-07-31

  • Articles and reports: 92F0138M2003002
    Description:

    This working paper describes the preliminary 2006 census metropolitan areas and census agglomerations and is presented for user feedback. The paper briefly describes the factors that have resulted in changes to some of the census metropolitan areas and census agglomerations and includes tables and maps that list and illustrate these changes to their limits and to the component census subdivisions.

    Release date: 2003-07-11

  • Articles and reports: 92F0138M2003001
    Description:

    The goal of this working paper is to assess how well Canada's current method of delineating Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs) reflects the metropolitan nature of these geographic areas according to the facilities and services they provide. The effectiveness of Canada's delineation methodology can be evaluated by applying a functional model to Statistics Canada's CMAs and CAs.

    As a consequence of the research undertaken for this working paper, Statistics Canada has proposed lowering the urban core population threshold it uses to define CMAs: a CA will be promoted to a CMA if it has a total population of at least 100,000, of which 50,000 or more live in the urban core. User consultation on this proposal took place in the fall of 2002 as part of the 2006 Census content determination process.

    Release date: 2003-03-31

  • Articles and reports: 11F0019M2003199
    Geography: Canada
    Description:

    Using a nationally representative sample of establishments, we have examined whether selected alternative work practices (AWPs) tend to reduce quit rates. Overall, our analysis provides strong evidence of a negative association between these AWPs and quit rates among establishments of more than 10 employees operating in high-skill services. We also found some evidence of a negative association in low-skill services. However, the magnitude of this negative association was reduced substantially when we added an indicator of whether the workplace has a formal policy of information sharing. There was very little evidence of a negative association in manufacturing. While establishments with self-directed workgroups have lower quit rates than others, none of the bundles of work practices considered yielded a negative and statistically significant effect. We surmise that key AWPs might be more successful in reducing labour turnover in technologically complex environments than in low-skill ones.

    Release date: 2003-03-17

  • Articles and reports: 12-001-X20020026428
    Description:

    The analysis of survey data from different geographical areas where the data from each area are polychotomous can be easily performed using hierarchical Bayesian models, even if there are small cell counts in some of these areas. However, there are difficulties when the survey data have missing information in the form of non-response, especially when the characteristics of the respondents differ from the non-respondents. We use the selection approach for estimation when there are non-respondents because it permits inference for all the parameters. Specifically, we describe a hierarchical Bayesian model to analyse multinomial non-ignorable non-response data from different geographical areas; some of them can be small. For the model, we use a Dirichlet prior density for the multinomial probabilities and a beta prior density for the response probabilities. This permits a 'borrowing of strength' of the data from larger areas to improve the reliability in the estimates of the model parameters corresponding to the smaller areas. Because the joint posterior density of all the parameters is complex, inference is sampling-based and Markov chain Monte Carlo methods are used. We apply our method to provide an analysis of body mass index (BMI) data from the third National Health and Nutrition Examination Survey (NHANES III). For simplicity, the BMI is categorized into 3 natural levels, and this is done for each of 8 age-race-sex domains and 34 counties. We assess the performance of our model using the NHANES III data and simulated examples, which show our model works reasonably well.

    Release date: 2003-01-29

  • Articles and reports: 11-522-X20010016277
    Description:

    This paper discusses in detail issues dealing with the technical aspects of designing and conducting surveys. It is intended for an audience of survey methodologists.

    The advent of computerized record-linkage methodology has facilitated the conduct of cohort mortality studies in which exposure data in one database are electronically linked with mortality data from another database. In this article, the impact of linkage errors on estimates of epidemiological indicators of risk, such as standardized mortality ratios and relative risk regression model parameters, is explored. It is shown that these indicators can be subject to bias and additional variability in the presence of linkage errors, with false links and non-links leading to positive and negative bias, respectively, in estimates of the standardized mortality ratio. Although linkage errors always increase the uncertainty in the estimates, bias can be effectively eliminated in the special case in which the false positive rate equals the false negative rate within homogeneous states defined by cross-classification of the covariates of interest.

    Release date: 2002-09-12

  • Articles and reports: 89-552-M2000007
    Geography: Canada
    Description:

    This paper addresses the problem of statistical inference with ordinal variates and examines the robustness to alternative literacy measurement and scaling choices of rankings of average literacy and of estimates of the impact of literacy on individual earnings.

    Release date: 2000-06-02
Reference (3)

Reference (3) ((3 results))

  • Surveys and statistical programs – Documentation: 12-001-X19970013101
    Description:

    In the main body of statistics, sampling is often disposed of by assuming a sampling process that selects random variables such that they are independent and identically distributed (IID). Important techniques, like regression and contingency table analysis, were developed largely in the IID world; hence, adjustments are needed to use them in complex survey settings. Rather than adjust the analysis, however, what is new in the present formulation is to draw a second sample from the original sample. In this second sample, the first set of selections are inverted, so as to yield at the end a simple random sample. Of course, to employ this two-step process to draw a single simple random sample from the usually much larger complex survey would be inefficient, so multiple simple random samples are drawn and a way to base inferences on them developed. Not all original samples can be inverted; but many practical special cases are discussed which cover a wide range of practices.

    Release date: 1997-08-18

  • Surveys and statistical programs – Documentation: 12-001-X19970013102
    Description:

    The selection of auxiliary variables is considered for regression estimation in finite populations under a simple random sampling design. This problem is a basic one for model-based and model-assisted survey sampling approaches and is of practical importance when the number of variables available is large. An approach is developed in which a mean squared error estimator is minimised. This approach is compared to alternative approaches using a fixed set of auxiliary variables, a conventional significance test criterion, a condition number reduction approach and a ridge regression approach. The proposed approach is found to perform well in terms of efficiency. It is noted that the variable selection approach affects the properties of standard variance estimators and thus leads to a problem of variance estimation.

    Release date: 1997-08-18

  • Surveys and statistical programs – Documentation: 12-001-X19960022980
    Description:

    In this paper, we study a confidence interval estimation method for a finite population average when some auxiliairy information is available. As demonstrated by Royall and Cumberland in a series of empirical studies, naive use of existing methods to construct confidence intervals for population averages may result in very poor conditional coverage probabilities, conditional on the sample mean of the covariate. When this happens, we propose to transform the data to improve the precision of the normal approximation. The transformed data are then used to make inference on the original population average, and the auxiliary information is incorporated into the inference directly, or by calibration with empirical likelihood. Our approach is design-based. We apply our approach to six real populations and find that when transformation is needed, our approach performs well compared to the usual regression method.

    Release date: 1997-01-30
Date modified: