Inference and foundations

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

1 facets displayed. 0 facets selected.

Survey or statistical program

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (82)

All (82) (10 to 20 of 82 results)

  • Articles and reports: 11-522-X201700014759
    Description:

    Many of the challenges and opportunities of modern data science have to do with dynamic aspects: evolving populations, the growing volume of administrative and commercial data on individuals and establishments, continuous flows of data and the capacity to analyze and summarize them in real time, and the deterioration of data absent the resources to maintain them. With its emphasis on data quality and supportable results, the domain of Official Statistics is ideal for highlighting statistical and data science issues in a variety of contexts. The messages of the talk include the importance of population frames and their maintenance; the potential for use of multi-frame methods and linkages; how the use of large scale non-survey data as auxiliary information shapes the objects of inference; the complexity of models for large data sets; the importance of recursive methods and regularization; and the benefits of sophisticated data visualization tools in capturing change.

    Release date: 2016-03-24

  • Articles and reports: 11-522-X201300014251
    Description:

    I present a modeller's perspective on the current status quo in official statistics surveys-based inference. In doing so, I try to identify the strengths and weaknesses of the design and model-based inferential positions that survey sampling, at least as far as the official statistics world is concerned, finds itself at present. I close with an example from adaptive survey design that illustrates why taking a model-based perspective (either frequentist or Bayesian) represents the best way for official statistics to avoid the debilitating 'inferential schizophrenia' that seems inevitable if current methodologies are applied to the emerging information requirements of today's world (and possibly even tomorrow's).

    Release date: 2014-10-31

  • Articles and reports: 11-522-X201300014252
    Description:

    Although estimating finite populations characteristics from probability samples has been very successful for large samples, inferences from non-probability samples may also be possible. Non-probability samples have been criticized due to self-selection bias and the lack of methods for estimating the precision of the estimates. The wide spread access to the Web and the ability to do very inexpensive data collection on the Web has reinvigorated interest in this topic. We review of non-probability sampling strategies and summarize some of the key issues. We then propose conditions under which non-probability sampling may be a reasonable approach. We conclude with ideas for future research.

    Release date: 2014-10-31

  • Surveys and statistical programs – Documentation: 11-522-X201300014259
    Description:

    In an effort to reduce response burden on farm operators, Statistics Canada is studying alternative approaches to telephone surveys for producing field crop estimates. One option is to publish harvested area and yield estimates in September as is currently done, but to calculate them using models based on satellite and weather data, and data from the July telephone survey. However before adopting such an approach, a method must be found which produces estimates with a sufficient level of accuracy. Research is taking place to investigate different possibilities. Initial research results and issues to consider are discussed in this paper.

    Release date: 2014-10-31

  • Articles and reports: 11-522-X201300014280
    Description:

    During the last decade, web panel surveys have been established as a fast and cost-efficient method in market surveys. The rationale for this is new developments in information technology, in particular the continued rapid growth of internet and computer use among the public. Also growing nonresponse rates and prices forced down in the survey industry lie behind this change. However, there are some serious inherent risks connected with web panel surveys, not least selection bias due to the self-selection of respondents. There are also risks of coverage and measurement errors. The absence of an inferential framework and of data quality indicators is an obstacle against using the web panel approach for high-quality statistics about general populations. Still, there seems to be increasing challenges for some national statistical institutes by a new form of competition for ad hoc statistics and even official statistics from web panel surveys.This paper explores the question of design and use of web panels in a scientifically sound way. An outline is given of a standard from the Swedish Survey Society for performance metrics to assess some quality aspects of results from web panel surveys. Decomposition of bias and mitigation of bias risks are discussed in some detail. Some ideas are presented for combining web panel surveys and traditional surveys to achieve controlled cost-efficient inference.

    Release date: 2014-10-31

  • Articles and reports: 12-001-X201400114004
    Description:

    In 2009, two major surveys in the Governments Division of the U.S. Census Bureau were redesigned to reduce sample size, save resources, and improve the precision of the estimates (Cheng, Corcoran, Barth and Hogue 2009). The new design divides each of the traditional state by government-type strata with sufficiently many units into two sub-strata according to each governmental unit’s total payroll, in order to sample less from the sub-stratum with small size units. The model-assisted approach is adopted in estimating population totals. Regression estimators using auxiliary variables are obtained either within each created sub-stratum or within the original stratum by collapsing two sub-strata. A decision-based method was proposed in Cheng, Slud and Hogue (2010), applying a hypothesis test to decide which regression estimator is used within each original stratum. Consistency and asymptotic normality of these model-assisted estimators are established here, under a design-based or model-assisted asymptotic framework. Our asymptotic results also suggest two types of consistent variance estimators, one obtained by substituting unknown quantities in the asymptotic variances and the other by applying the bootstrap. The performance of all the estimators of totals and of their variance estimators are examined in some empirical studies. The U.S. Annual Survey of Public Employment and Payroll (ASPEP) is used to motivate and illustrate our study.

    Release date: 2014-06-27

  • Surveys and statistical programs – Documentation: 12-001-X201300211887
    Description:

    Multi-level models are extensively used for analyzing survey data with the design hierarchy matching the model hierarchy. We propose a unified approach, based on a design-weighted log composite likelihood, for two-level models that leads to design-model consistent estimators of the model parameters even when the within cluster sample sizes are small provided the number of sample clusters is large. This method can handle both linear and generalized linear two-level models and it requires level 2 and level 1 inclusion probabilities and level 1 joint inclusion probabilities, where level 2 represents a cluster and level 1 an element within a cluster. Results of a simulation study demonstrating superior performance of the proposed method relative to existing methods under informative sampling are also reported.

    Release date: 2014-01-15

  • Articles and reports: 82-003-X201300611796
    Geography: Canada
    Description:

    The study assesses the feasibility of using statistical modelling techniques to fill information gaps related to risk factors, specifically, smoking status, in linked long-form census data.

    Release date: 2013-06-19

  • Surveys and statistical programs – Documentation: 12-001-X201200211758
    Description:

    This paper develops two Bayesian methods for inference about finite population quantiles of continuous survey variables from unequal probability sampling. The first method estimates cumulative distribution functions of the continuous survey variable by fitting a number of probit penalized spline regression models on the inclusion probabilities. The finite population quantiles are then obtained by inverting the estimated distribution function. This method is quite computationally demanding. The second method predicts non-sampled values by assuming a smoothly-varying relationship between the continuous survey variable and the probability of inclusion, by modeling both the mean function and the variance function using splines. The two Bayesian spline-model-based estimators yield a desirable balance between robustness and efficiency. Simulation studies show that both methods yield smaller root mean squared errors than the sample-weighted estimator and the ratio and difference estimators described by Rao, Kovar, and Mantel (RKM 1990), and are more robust to model misspecification than the regression through the origin model-based estimator described in Chambers and Dunstan (1986). When the sample size is small, the 95% credible intervals of the two new methods have closer to nominal confidence coverage than the sample-weighted estimator.

    Release date: 2012-12-19

  • Surveys and statistical programs – Documentation: 12-001-X201200111688
    Description:

    We study the problem of nonignorable nonresponse in a two dimensional contingency table which can be constructed for each of several small areas when there is both item and unit nonresponse. In general, the provision for both types of nonresponse with small areas introduces significant additional complexity in the estimation of model parameters. For this paper, we conceptualize the full data array for each area to consist of a table for complete data and three supplemental tables for missing row data, missing column data, and missing row and column data. For nonignorable nonresponse, the total cell probabilities are allowed to vary by area, cell and these three types of "missingness". The underlying cell probabilities (i.e., those which would apply if full classification were always possible) for each area are generated from a common distribution and their similarity across the areas is parametrically quantified. Our approach is an extension of the selection approach for nonignorable nonresponse investigated by Nandram and Choi (2002a, b) for binary data; this extension creates additional complexity because of the multivariate nature of the data coupled with the small area structure. As in that earlier work, the extension is an expansion model centered on an ignorable nonresponse model so that the total cell probability is dependent upon which of the categories is the response. Our investigation employs hierarchical Bayesian models and Markov chain Monte Carlo methods for posterior inference. The models and methods are illustrated with data from the third National Health and Nutrition Examination Survey.

    Release date: 2012-06-27
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (69)

Analysis (69) (50 to 60 of 69 results)

  • Articles and reports: 11-522-X19990015654
    Description:

    A meta analysis was performed to estimate the proportion of liver carcinogens, the proportion of chemicals carcinogenic at any site, and the corresponding proportion of anticarcinogens among chemicals tested in 397 long-term cancer bioassays conducted by the U.S. National Toxicology Program. Although the estimator used was negatively biased, the study provided persuasive evidence for a larger proportion of liver carcinogens (0.43,90%CI: 0.35,0.51) than was identified by the NTP (0.28). A larger proportion of chemicals carcinogenic at any site was also estimated (0.59,90%CI: 0.49,0.69) than was identified by the NTP (0.51), although this excess was not statistically significant. A larger proportion of anticarcinogens (0.66) was estimated than carcinogens (0.59). Despite the negative bias, it was estimated that 85% of the chemicals were either carcinogenic or anticarcinogenic at some site in some sex-species group. This suggests that most chemicals tested at high enough doses will cause some sort of perturbation in tumor rates.

    Release date: 2000-03-02

  • Articles and reports: 92F0138M2000003
    Description:

    Statistics Canada's interest in a common delineation of the north for statistical analysis purposes evolved from research to devise a classification to further differentiate the largely rural and remote areas that make up 96% of Canada's land area. That research led to the establishment of the census metropolitan area and census agglomeration influenced zone (MIZ) concept. When applied to census subdivisions, the MIZ categories did not work as well in northern areas as in the south. Therefore, the Geography Division set out to determine a north-south divide that would differentiate the north from the south independent of any standard geographic area boundaries.

    This working paper describes the methodology used to define a continuous line across Canada to separate the north from the south, as well as lines marking transition zones on both sides of the north-south line. It also describes the indicators selected to derive the north-south line and makes comparisons to alternative definitions of the north. The resulting classification of the north complements the MIZ classification. Together, census metropolitan areas, census agglomerations, MIZ and the North form a new Statistical Area Classification (SAC) for Canada.

    Two related Geography working papers (catalogue no. 92F0138MPE) provide further details about the MIZ classification. Working paper no. 2000-1 (92F0138MPE00001) briefly describes MIZ and includes tables of selected socio-economic characteristics from the 1991 Census tabulated by the MIZ categories, and working paper no. 2000-2 (92F0138MPE00002) describes the methodology used to define the MIZ classification.

    Release date: 2000-02-03

  • Articles and reports: 62F0014M1998013
    Geography: Canada
    Description:

    The reference population for the Consumer Price Index (CPI) has been represented, since the 1992 updating of the basket of goods and services, by families and unattached individuals living in private urban or rural households. The official CPI is a measure of the average percentage change over time in the cost of a fixed basket of goods and services purchased by Canadian consumers.

    Because of the broadly defined target population of the CPI, the measure has been criticised for failing to reflect the inflationary experiences of certain socio-economic groups. This study examines this question for three sub-groups of the reference population of the CPI. It is an extension of earlier studies on the subject done at Statistics Canada.

    In this document, analytical consumer price indexes sub-group indexes are compared to the analytical index for the whole population calculated at the national geographic level.

    The findings tend to point to those of earlier Statistics Canada studies on sub-groups in the CPI reference population. Those studies have consistently concluded that a consumer price index established for a given sub-group does not differ substantially from the index for the whole reference population.

    Release date: 1999-05-13

  • Surveys and statistical programs – Documentation: 12-001-X19970013101
    Description:

    In the main body of statistics, sampling is often disposed of by assuming a sampling process that selects random variables such that they are independent and identically distributed (IID). Important techniques, like regression and contingency table analysis, were developed largely in the IID world; hence, adjustments are needed to use them in complex survey settings. Rather than adjust the analysis, however, what is new in the present formulation is to draw a second sample from the original sample. In this second sample, the first set of selections are inverted, so as to yield at the end a simple random sample. Of course, to employ this two-step process to draw a single simple random sample from the usually much larger complex survey would be inefficient, so multiple simple random samples are drawn and a way to base inferences on them developed. Not all original samples can be inverted; but many practical special cases are discussed which cover a wide range of practices.

    Release date: 1997-08-18

  • Surveys and statistical programs – Documentation: 12-001-X19970013102
    Description:

    The selection of auxiliary variables is considered for regression estimation in finite populations under a simple random sampling design. This problem is a basic one for model-based and model-assisted survey sampling approaches and is of practical importance when the number of variables available is large. An approach is developed in which a mean squared error estimator is minimised. This approach is compared to alternative approaches using a fixed set of auxiliary variables, a conventional significance test criterion, a condition number reduction approach and a ridge regression approach. The proposed approach is found to perform well in terms of efficiency. It is noted that the variable selection approach affects the properties of standard variance estimators and thus leads to a problem of variance estimation.

    Release date: 1997-08-18

  • Surveys and statistical programs – Documentation: 12-001-X19960022980
    Description:

    In this paper, we study a confidence interval estimation method for a finite population average when some auxiliairy information is available. As demonstrated by Royall and Cumberland in a series of empirical studies, naive use of existing methods to construct confidence intervals for population averages may result in very poor conditional coverage probabilities, conditional on the sample mean of the covariate. When this happens, we propose to transform the data to improve the precision of the normal approximation. The transformed data are then used to make inference on the original population average, and the auxiliary information is incorporated into the inference directly, or by calibration with empirical likelihood. Our approach is design-based. We apply our approach to six real populations and find that when transformation is needed, our approach performs well compared to the usual regression method.

    Release date: 1997-01-30

  • Articles and reports: 91F0015M1996001
    Geography: Canada
    Description:

    This paper describes the methodology for fertility projections used in the 1993-based population projections by age and sex for Canada, provinces and territories, 1993-2016. A new version of the parametric model known as the Pearsonian Type III curve was applied for projecting fertility age pattern. The Pearsonian Type III model is considered as an improvement over the Type I used in the past projections. This is because the Type III curve better portrays both the distribution of the age-specific fertility rates and the estimates of births. Since the 1993-based population projections are the first official projections to incorporate the net census undercoverage in the population base, it has been necessary to recalculate fertility rates based on the adjusted population estimates. This recalculation resulted in lowering the historical series of age-specific and total fertility rates, 1971-1993. The three sets of fertility assumptions and projections were developed with these adjusted annual fertility rates.

    It is hoped that this paper will provide valuable information about the technical and analytical aspects of the current fertility projection model. Discussions on the current and future levels and age pattern of fertility in Canada, provinces and territories are also presented in the paper.

    Release date: 1996-08-02

  • Articles and reports: 12-001-X199600114385
    Description:

    The multiple capture-recapture census is reconsidered by relaxing the traditional perfect matching assumption. We propose matching error models to characterize error-prone matching mechanisms. The observed data take the form of an incomplete 2^k contingency table with one missing cell and follow a multinomial distribution. We develop a procedure for the estimation of the population size. Our approach applies to both standard log-linear models for contingency tables and log-linear models for heterogeneity of catchability. We illustrate the method and estimation using a 1988 dress rehearsal study for the 1990 census conducted by the U.S. Bureau of the Census.

    Release date: 1996-06-14

  • Articles and reports: 12-001-X199500214398
    Description:

    We present empirical evidence from 14 surveys in six countries concerning the existence and magnitude of design effects (defts) for five designs of two major types. The first type concerns deft (p_i – p_j), the difference of two proportions from a polytomous variable of three or more categories. The second type uses Chi-square tests for differences from two samples. We find that for all variables in all designs deft (p_i – p_j) \cong [deft (p_i) + deft (p_j)] / 2 are good approximations. These are empirical results, and exceptions disprove the existence of mere analytical inequalities. These results hold despite great variations of defts between variables and also between categories of the same variables. They also show the need for sample survey treatment of survey data even for analytical statistics. Furthermore they permit useful approximations of deft (p_i – p_j) from more accessible deft (p_i) values.

    Release date: 1995-12-15

  • Articles and reports: 12-001-X199500114408
    Description:

    The problem of estimating the median of a finite population when an auxiliary variable is present is considered. Point and interval estimators based on a non-informative Bayesian approach are proposed. The point estimator is compared to other possible estimators and is seen to perform well in a variety of situations.

    Release date: 1995-06-15
Reference (16)

Reference (16) (10 to 20 of 16 results)

  • Surveys and statistical programs – Documentation: 11-522-X19990015650
    Description:

    The U.S. Manufacturing Plant Ownership Change Database (OCD) was constructed using plant-level data taken from the Census Bureau's Longitudinal Research Database (LRD). It contains data on all manufacturing plants that have experienced ownership change at least once during the period 1963-92. This paper reports the status of the OCD and discuss its research possibilities. For an empirical demonstration, data taken from the database are used to study the effects of ownership changes on plant closure.

    Release date: 2000-03-02

  • Surveys and statistical programs – Documentation: 11-522-X19990015658
    Description:

    Radon, a naturally occurring gas found at some level in most homes, is an established risk factor for human lung cancer. The U.S. National Research Council (1999) has recently completed a comprehensive evaluation of the health risks of residential exposure to radon, and developed models for projecting radon lung cancer risks in the general population. This analysis suggests that radon may play a role in the etiology of 10-15% of all lung cancer cases in the United States, although these estimates are subject to considerable uncertainty. In this article, we present a partial analysis of uncertainty and variability in estimates of lung cancer risk due to residential exposure to radon in the United States using a general framework for the analysis of uncertainty and variability that we have developed previously. Specifically, we focus on estimates of the age-specific excess relative risk (ERR) and lifetime relative risk (LRR), both of which vary substantially among individuals.

    Release date: 2000-03-02

  • Geographic files and documentation: 92F0138M1993001
    Geography: Canada
    Description:

    The Geography Divisions of Statistics Canada and the U.S. Bureau of the Census have commenced a cooperative research program in order to foster an improved and expanded perspective on geographic areas and their relevance. One of the major objectives is to determine a common geographic area to form a geostatistical basis for cross-border research, analysis and mapping.

    This report, which represents the first stage of the research, provides a list of comparable pairs of Canadian and U.S. standard geographic areas based on current definitions. Statistics Canada and the U.S. Bureau of the Census have two basic types of standard geographic entities: legislative/administrative areas (called "legal" entities in the U.S.) and statistical areas.

    The preliminary pairing of geographic areas are based on face-value definitions only. The definitions are based on the June 4, 1991 Census of Population and Housing for Canada and the April 1, 1990 Census of Population and Housing for the U.S.A. The important aspect is the overall conceptual comparability, not the precise numerical thresholds used for delineating the areas.

    Data users should use this report as a general guide to compare the census geographic areas of Canada and the United States, and should be aware that differences in settlement patterns and population levels preclude a precise one-to-one relationship between conceptually similar areas. The geographic areas compared in this report provide a framework for further empirical research and analysis.

    Release date: 1999-03-05

  • Surveys and statistical programs – Documentation: 12-001-X19970013101
    Description:

    In the main body of statistics, sampling is often disposed of by assuming a sampling process that selects random variables such that they are independent and identically distributed (IID). Important techniques, like regression and contingency table analysis, were developed largely in the IID world; hence, adjustments are needed to use them in complex survey settings. Rather than adjust the analysis, however, what is new in the present formulation is to draw a second sample from the original sample. In this second sample, the first set of selections are inverted, so as to yield at the end a simple random sample. Of course, to employ this two-step process to draw a single simple random sample from the usually much larger complex survey would be inefficient, so multiple simple random samples are drawn and a way to base inferences on them developed. Not all original samples can be inverted; but many practical special cases are discussed which cover a wide range of practices.

    Release date: 1997-08-18

  • Surveys and statistical programs – Documentation: 12-001-X19970013102
    Description:

    The selection of auxiliary variables is considered for regression estimation in finite populations under a simple random sampling design. This problem is a basic one for model-based and model-assisted survey sampling approaches and is of practical importance when the number of variables available is large. An approach is developed in which a mean squared error estimator is minimised. This approach is compared to alternative approaches using a fixed set of auxiliary variables, a conventional significance test criterion, a condition number reduction approach and a ridge regression approach. The proposed approach is found to perform well in terms of efficiency. It is noted that the variable selection approach affects the properties of standard variance estimators and thus leads to a problem of variance estimation.

    Release date: 1997-08-18

  • Surveys and statistical programs – Documentation: 12-001-X19960022980
    Description:

    In this paper, we study a confidence interval estimation method for a finite population average when some auxiliairy information is available. As demonstrated by Royall and Cumberland in a series of empirical studies, naive use of existing methods to construct confidence intervals for population averages may result in very poor conditional coverage probabilities, conditional on the sample mean of the covariate. When this happens, we propose to transform the data to improve the precision of the normal approximation. The transformed data are then used to make inference on the original population average, and the auxiliary information is incorporated into the inference directly, or by calibration with empirical likelihood. Our approach is design-based. We apply our approach to six real populations and find that when transformation is needed, our approach performs well compared to the usual regression method.

    Release date: 1997-01-30
Date modified: