Inference and foundations

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

1 facets displayed. 0 facets selected.

Survey or statistical program

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (105)

All (105) (10 to 20 of 105 results)

  • Articles and reports: 12-001-X202200200003
    Description:

    Non-probability surveys play an increasing role in survey research. Wu’s essay ably brings together the many tools available when assuming the non-response is conditionally independent of the study variable. In this commentary, I explore how to integrate Wu’s insights in a broader framework that encompasses the case in which non-response depends on the study variable, a case that is particularly dangerous in non-probabilistic polling.

    Release date: 2022-12-15

  • Articles and reports: 12-001-X202200200004
    Description:

    This discussion attempts to add to Wu’s review of inference from non-probability samples, as well as to highlighting aspects that are likely avenues for useful additional work. It concludes with a call for an organized stable of high-quality probability surveys that will be focused on providing adjustment information for non-probability surveys.

    Release date: 2022-12-15

  • Articles and reports: 12-001-X202200200005
    Description:

    Strong assumptions are required to make inferences about a finite population from a nonprobability sample. Statistics from a nonprobability sample should be accompanied by evidence that the assumptions are met and that point estimates and confidence intervals are fit for use. I describe some diagnostics that can be used to assess the model assumptions, and discuss issues to consider when deciding whether to use data from a nonprobability sample.

    Release date: 2022-12-15

  • Articles and reports: 12-001-X202200200006
    Description:

    Non-probability samples are deprived of the powerful design probability for randomization-based inference. This deprivation, however, encourages us to take advantage of a natural divine probability that comes with any finite population. A key metric from this perspective is the data defect correlation (ddc), which is the model-free finite-population correlation between the individual’s sample inclusion indicator and the individual’s attribute being sampled. A data generating mechanism is equivalent to a probability sampling, in terms of design effect, if and only if its corresponding ddc is of N-1/2 (stochastic) order, where N is the population size (Meng, 2018). Consequently, existing valid linear estimation methods for non-probability samples can be recast as various strategies to miniaturize the ddc down to the N-1/2 order. The quasi design-based methods accomplish this task by diminishing the variability among the N inclusion propensities via weighting. The super-population model-based approach achieves the same goal through reducing the variability of the N individual attributes by replacing them with their residuals from a regression model. The doubly robust estimators enjoy their celebrated property because a correlation is zero whenever one of the variables being correlated is constant, regardless of which one. Understanding the commonality of these methods through ddc also helps us see clearly the possibility of “double-plus robustness”: a valid estimation without relying on the full validity of either the regression model or the estimated inclusion propensity, neither of which is guaranteed because both rely on device probability. The insight generated by ddc also suggests counterbalancing sub-sampling, a strategy aimed at creating a miniature of the population out of a non-probability sample, and with favorable quality-quantity trade-off because mean-squared errors are much more sensitive to ddc than to the sample size, especially for large populations.

    Release date: 2022-12-15

  • Articles and reports: 12-001-X202200200007
    Description:

    Statistical inference with non-probability survey samples is a notoriously challenging problem in statistics. We introduce two new methods of nonparametric propensity score technique for weighting in the non-probability samples. One is the information projection approach and the other is the uniform calibration in the reproducing kernel Hilbert space.

    Release date: 2022-12-15

  • Articles and reports: 12-001-X202200200008
    Description:

    This response contains additional remarks on a few selected issues raised by the discussants.

    Release date: 2022-12-15

  • Articles and reports: 12-001-X202200200011
    Description:

    Two-phase sampling is a cost effective sampling design employed extensively in surveys. In this paper a method of most efficient linear estimation of totals in two-phase sampling is proposed, which exploits optimally auxiliary survey information. First, a best linear unbiased estimator (BLUE) of any total is formally derived in analytic form, and shown to be also a calibration estimator. Then, a proper reformulation of such a BLUE and estimation of its unknown coefficients leads to the construction of an “optimal” regression estimator, which can also be obtained through a suitable calibration procedure. A distinctive feature of such calibration is the alignment of estimates from the two phases in an one-step procedure involving the combined first-and-second phase samples. Optimal estimation is feasible for certain two-phase designs that are used often in large scale surveys. For general two-phase designs, an alternative calibration procedure gives a generalized regression estimator as an approximate optimal estimator. The proposed general approach to optimal estimation leads to the most effective use of the available auxiliary information in any two-phase survey. The advantages of this approach over existing methods of estimation in two-phase sampling are shown both theoretically and through a simulation study.

    Release date: 2022-12-15

  • Articles and reports: 12-001-X202200100004
    Description:

    When the sample size of an area is small, borrowing information from neighbors is a small area estimation technique to provide more reliable estimates. One of the famous models in small area estimation is a multinomial-Dirichlet hierarchical model for multinomial counts. Due to natural characteristics of the data, making unimodal order restriction assumption to parameter spaces is relevant. In our application, body mass index is more likely at an overweight level, which means the unimodal order restriction may be reasonable. The same unimodal order restriction for all areas may be too strong to be true for some cases. To increase flexibility, we add uncertainty to the unimodal order restriction. Each area will have similar unimodal patterns, but not the same. Since the order restriction with uncertainty increases the inference difficulty, we make comparison with the posterior summaries and approximated log-pseudo marginal likelihood.

    Release date: 2022-06-21

  • Articles and reports: 12-001-X202200100009
    Description:

    In finite population estimation, the inverse probability or Horvitz-Thompson estimator is a basic tool. Even when auxiliary information is available to model the variable of interest, it is still used to estimate the model error. Here, the inverse probability estimator is generalized by introducing a positive definite matrix. The usual inverse probability estimator is a special case of the generalized estimator, where the positive definite matrix is the identity matrix. Since calibration estimation seeks weights that are close to the inverse probability weights, it too can be generalized by seeking weights that are close to those of the generalized inverse probability estimator. Calibration is known to be optimal, in the sense that it asymptotically attains the Godambe-Joshi lower bound. That lower bound has been derived under a model where no correlation is present. This too, can be generalized to allow for correlation. With the correct choice of the positive definite matrix that generalizes the calibration estimators, this generalized lower bound can be asymptotically attained. There is often no closed-form formula for the generalized estimators. However, simple explicit examples are given here to illustrate how the generalized estimators take advantage of the correlation. This simplicity is achieved here, by assuming a correlation of one between some population units. Those simple estimators can still be useful, even if the correlation is smaller than one. Simulation results are used to compare the generalized estimators to the ordinary estimators.

    Release date: 2022-06-21

  • Articles and reports: 12-001-X202100200003
    Description:

    Calibration weighting is a statistically efficient way for handling unit nonresponse. Assuming the response (or output) model justifying the calibration-weight adjustment is correct, it is often possible to measure the variance of estimates in an asymptotically unbiased manner. One approach to variance estimation is to create jackknife replicate weights. Sometimes, however, the conventional method for computing jackknife replicate weights for calibrated analysis weights fails. In that case, an alternative method for computing jackknife replicate weights is usually available. That method is described here and then applied to a simple example.

    Release date: 2022-01-06
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (97)

Analysis (97) (50 to 60 of 97 results)

  • Articles and reports: 11-522-X200600110419
    Description:

    Health services research generally relies on observational data to compare outcomes of patients receiving different therapies. Comparisons of patient groups in observational studies may be biased, in that outcomes differ due to both the effects of treatment and the effects of patient prognosis. In some cases, especially when data are collected on detailed clinical risk factors, these differences can be controlled for using statistical or epidemiological methods. In other cases, when unmeasured characteristics of the patient population affect both the decision to provide therapy and the outcome, these differences cannot be removed using standard techniques. Use of health administrative data requires particular cautions in undertaking observational studies since important clinical information does not exist. We discuss several statistical and epidemiological approaches to remove overt (measurable) and hidden (unmeasurable) bias in observational studies. These include regression model-based case-mix adjustment, propensity-based matching, redefining the exposure variable of interest, and the econometric technique of instrumental variable (IV) analysis. These methods are illustrated using examples from the medical literature including prediction of one-year mortality following heart attack; the return to health care spending in higher spending U.S. regions in terms of clinical and financial benefits; and the long-term survival benefits of invasive cardiac management of heart attack patients. It is possible to use health administrative data for observational studies provided careful attention is paid to addressing issues of reverse causation and unmeasured confounding.

    Release date: 2008-03-17

  • Articles and reports: 92F0138M2008002
    Description:

    On November 26 2006, the Organization for Economic Co-operation and Development (OECD) held an international workshop on defining and measuring metropolitan regions. The reasons the OECD organized this workshop are listed below.

    1. Metropolitan Regions have become a crucial economic actor in today's highly integrated world. Not only do they play their traditional role of growth poles in their countries but they function as essential nodes of the global economy.2. Policy makers, international organisations and research networks are increasingly called to compare the economic and social performances of Metropolitan Regions across countries. Examples of this work undertaken in international organisation and networks include the UN-Habitat, the EU Urban Audit, ESPON and the OECD Competitive Cities.3. The scope of what we can learn from these international comparisons, however, is limited by the lack of a comparable definition of Metropolitan Regions. Although most countries have their own definitions, these vary significantly from one country to another. Furthermore, in search for higher cross-country comparability, international initiatives have - somehow paradoxically - generated an even larger number of definitions.4. In principle, there is no clear reason to prefer one definition to another. As each definition has been elaborated for a specific analytical purpose, it captures some features of a Metropolitan Region while it tends to overlook others. The issue, rather, is that we do not know the pros and the cons of different definitions nor, most important, the analytical implications of using one definition rather than another. 5. In order to respond to these questions, the OECD hosted an international workshop on 'Defining and Measuring Metropolitan Regions'. The workshop brought together major international organisations (the UN, Eurostat, the World Bank, and the OECD), National Statistical Offices and researchers from this field. The aim of the workshop was to develop some 'guiding principles', which could be agreed upon among the participants and would eventually provide the basis for some form of 'International Guidance' for comparing Metropolitan Regions across countries.

    This working paper was presented at this workshop. It provides the conceptual and methodological basis for the definition of metropolitan areas in Canada and provides a detailed comparison of Canada's methodology to that of the USA. The intent was to encourage discussion regarding Canada's approach to defining metropolitan areas in the effort to identify the 'guiding principles'. It is being made available as a working paper to continue this discussion and to provide background to the user community to encourage dialogue and commentary from the user community regarding Canada's metropolitan area methodology.

    Release date: 2008-02-20

  • Articles and reports: 92F0138M2007001
    Description:

    Statistics Canada creates files that provide the link between postal codes and the geographic areas by which it disseminates statistical data. By linking postal codes to the Statistics Canada geographic areas, Statistics Canada facilitates the extraction and subsequent aggregation of data for selected geographic areas from files available to users. Users can then take data from Statistics Canada for their areas and tabulate this with other data for these same areas to create a combined statistical profile for these areas.

    An issue has been the methodology used by Statistics Canada to establish the linkage of postal codes to geographic areas. In order to address this issue, Statistics Canada decided to create a conceptual framework on which to base the rules for linking postal codes and Statistics Canada's geographic areas. This working paper presents the conceptual framework and the geocoding rules. The methodology described in this paper will be the basis for linking postal codes to the 2006 Census geographic areas. This paper is presented for feedback from users of Statistics Canada's postal codes related products.

    Release date: 2007-02-12

  • Articles and reports: 12-001-X20060019257
    Description:

    In the presence of item nonreponse, two approaches have been traditionally used to make inference on parameters of interest. The first approach assumes uniform response within imputation cells whereas the second approach assumes ignorable response but make use of a model on the variable of interest as the basis for inference. In this paper, we propose a third appoach that assumes a specified ignorable response mechanism without having to specify a model on the variable of interest. In this case, we show how to obtain imputed values which lead to estimators of a total that are approximately unbiased under the proposed approach as well as the second approach. Variance estimators of the imputed estimators that are approximately unbiased are also obtained using an approach of Fay (1991) in which the order of sampling and response is reversed. Finally, simulation studies are conducted to investigate the finite sample performance of the methods in terms of bias and mean square error.

    Release date: 2006-07-20

  • Articles and reports: 11F0024M20050008805
    Description:

    This paper reports on the potential development of sub-annual indicators for selected service industries using Goods and Services Tax (GST) data. The services sector is now of central importance to advanced economies; however, our knowledge of this sector remains incomplete, partly due to a lack of data. The Voorburg Group on Service Statistics has been meeting for almost twenty years to develop and incorporate better measures for the services sector. Despite this effort, many sub-annual economic measures continue to rely on output data for the goods-producing sector and, with the exception of distributive trades, on employment data for service industries.

    The development of sub-annual indicators for service industries raises two questions regarding the national statistical program. First, is there a need for service output indicators to supplement existing sub-annual measures? And second, what service industries are the most promising for development? The paper begins by reviewing the importance of service industries and how they behave during economic downturns. Next, it examines considerations in determining which service industries to select as GST-based, sub-annual indicators. A case study of the accommodation services industry serves to illustrate improving timeliness and accuracy. We conclude by discussing the opportunities for, and limitations of, these indicators.

    Release date: 2005-10-20

  • Articles and reports: 12-002-X20050018030
    Description:

    People often wish to use survey micro-data to study whether the rate of occurrence of a particular condition in a subpopulation is the same as the rate of occurrence in the full population. This paper describes some alternatives for making inferences about such a rate difference and shows whether and how these alternatives may be implemented in three different survey software packages. The software packages illustrated - SUDAAN, WesVar and Bootvar - all can make use of bootstrap weights provided by the analyst to carry out variance estimation.

    Release date: 2005-06-23

  • Articles and reports: 12-001-X20040027753
    Description:

    Samplers often distrust model-based approaches to survey inference because of concerns about misspecification when models are applied to large samples from complex populations. We suggest that the model-based paradigm can work very successfully in survey settings, provided models are chosen that take into account the sample design and avoid strong parametric assumptions. The Horvitz-Thompson (HT) estimator is a simple design-unbiased estimator of the finite population total. From a modeling perspective, the HT estimator performs well when the ratios of the outcome values and the inclusion probabilities are exchangeable. When this assumption is not met, the HT estimator can be very inefficient. In Zheng and Little (2003, 2004) we used penalized splines (p-splines) to model smoothly - varying relationships between the outcome and the inclusion probabilities in one-stage probability proportional to size (PPS) samples. We showed that p spline model-based estimators are in general more efficient than the HT estimator, and can provide narrower confidence intervals with close to nominal confidence coverage. In this article, we extend this approach to two-stage sampling designs. We use a p-spline based mixed model that fits a nonparametric relationship between the primary sampling unit (PSU) means and a measure of PSU size, and incorporates random effects to model clustering. For variance estimation we consider the empirical Bayes model-based variance, the jackknife and balanced repeated replication (BRR) methods. Simulation studies on simulated data and samples drawn from public use microdata in the 1990 census demonstrate gains for the model-based p-spline estimator over the HT estimator and linear model-assisted estimators. Simulations also show the variance estimation methods yield confidence intervals with satisfactory confidence coverage. Interestingly, these gains can be seen for a common equal-probability design, where the first stage selection is PPS and the second stage selection probabilities are proportional to the inverse of the first stage inclusion probabilities, and the HT estimator leads to the unweighted mean. In situations that most favor the HT estimator, the model-based estimators have comparable efficiency.

    Release date: 2005-02-03

  • Articles and reports: 11-522-X20030017700
    Description:

    This paper suggests a useful framework for exploring the effects of moderate deviations from idealized conditions. It offers evaluation criteria for point estimators and interval estimators.

    Release date: 2005-01-26

  • Articles and reports: 11-522-X20030017722
    Description:

    This paper shows how to adapt design-based and model-based frameworks to the case of two-stage sampling.

    Release date: 2005-01-26

  • Articles and reports: 11-522-X20020016708
    Description:

    In this paper, we discuss the analysis of complex health survey data by using multivariate modelling techniques. Main interests are in various design-based and model-based methods that aim at accounting for the design complexities, including clustering, stratification and weighting. Methods covered include generalized linear modelling based on pseudo-likelihood and generalized estimating equations, linear mixed models estimated by restricted maximum likelihood, and hierarchical Bayes techniques using Markov Chain Monte Carlo (MCMC) methods. The methods will be compared empirically, using data from an extensive health interview and examination survey conducted in Finland in 2000 (Health 2000 Study).

    The data of the Health 2000 Study were collected using personal interviews, questionnaires and clinical examinations. A stratified two-stage cluster sampling design was used in the survey. The sampling design involved positive intra-cluster correlation for many study variables. For a closer investigation, we selected a small number of study variables from the health interview and health examination phases. In many cases, the different methods produced similar numerical results and supported similar statistical conclusions. Methods that failed to account for the design complexities sometimes led to conflicting conclusions. We also discuss the application of the methods in this paper by using standard statistical software products.

    Release date: 2004-09-13
Reference (8)

Reference (8) ((8 results))

No content available at this time.

Date modified: