Inference and foundations

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

1 facets displayed. 0 facets selected.

Survey or statistical program

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help

Results

All (82)

All (82) (30 to 40 of 82 results)

  • Articles and reports: 92F0138M2008002
    Description:

    On November 26 2006, the Organization for Economic Co-operation and Development (OECD) held an international workshop on defining and measuring metropolitan regions. The reasons the OECD organized this workshop are listed below.

    1. Metropolitan Regions have become a crucial economic actor in today's highly integrated world. Not only do they play their traditional role of growth poles in their countries but they function as essential nodes of the global economy.2. Policy makers, international organisations and research networks are increasingly called to compare the economic and social performances of Metropolitan Regions across countries. Examples of this work undertaken in international organisation and networks include the UN-Habitat, the EU Urban Audit, ESPON and the OECD Competitive Cities.3. The scope of what we can learn from these international comparisons, however, is limited by the lack of a comparable definition of Metropolitan Regions. Although most countries have their own definitions, these vary significantly from one country to another. Furthermore, in search for higher cross-country comparability, international initiatives have - somehow paradoxically - generated an even larger number of definitions.4. In principle, there is no clear reason to prefer one definition to another. As each definition has been elaborated for a specific analytical purpose, it captures some features of a Metropolitan Region while it tends to overlook others. The issue, rather, is that we do not know the pros and the cons of different definitions nor, most important, the analytical implications of using one definition rather than another. 5. In order to respond to these questions, the OECD hosted an international workshop on 'Defining and Measuring Metropolitan Regions'. The workshop brought together major international organisations (the UN, Eurostat, the World Bank, and the OECD), National Statistical Offices and researchers from this field. The aim of the workshop was to develop some 'guiding principles', which could be agreed upon among the participants and would eventually provide the basis for some form of 'International Guidance' for comparing Metropolitan Regions across countries.

    This working paper was presented at this workshop. It provides the conceptual and methodological basis for the definition of metropolitan areas in Canada and provides a detailed comparison of Canada's methodology to that of the USA. The intent was to encourage discussion regarding Canada's approach to defining metropolitan areas in the effort to identify the 'guiding principles'. It is being made available as a working paper to continue this discussion and to provide background to the user community to encourage dialogue and commentary from the user community regarding Canada's metropolitan area methodology.

    Release date: 2008-02-20

  • Articles and reports: 92F0138M2007001
    Description:

    Statistics Canada creates files that provide the link between postal codes and the geographic areas by which it disseminates statistical data. By linking postal codes to the Statistics Canada geographic areas, Statistics Canada facilitates the extraction and subsequent aggregation of data for selected geographic areas from files available to users. Users can then take data from Statistics Canada for their areas and tabulate this with other data for these same areas to create a combined statistical profile for these areas.

    An issue has been the methodology used by Statistics Canada to establish the linkage of postal codes to geographic areas. In order to address this issue, Statistics Canada decided to create a conceptual framework on which to base the rules for linking postal codes and Statistics Canada's geographic areas. This working paper presents the conceptual framework and the geocoding rules. The methodology described in this paper will be the basis for linking postal codes to the 2006 Census geographic areas. This paper is presented for feedback from users of Statistics Canada's postal codes related products.

    Release date: 2007-02-12

  • Articles and reports: 12-001-X20060019257
    Description:

    In the presence of item nonreponse, two approaches have been traditionally used to make inference on parameters of interest. The first approach assumes uniform response within imputation cells whereas the second approach assumes ignorable response but make use of a model on the variable of interest as the basis for inference. In this paper, we propose a third appoach that assumes a specified ignorable response mechanism without having to specify a model on the variable of interest. In this case, we show how to obtain imputed values which lead to estimators of a total that are approximately unbiased under the proposed approach as well as the second approach. Variance estimators of the imputed estimators that are approximately unbiased are also obtained using an approach of Fay (1991) in which the order of sampling and response is reversed. Finally, simulation studies are conducted to investigate the finite sample performance of the methods in terms of bias and mean square error.

    Release date: 2006-07-20

  • Articles and reports: 11F0024M20050008805
    Description:

    This paper reports on the potential development of sub-annual indicators for selected service industries using Goods and Services Tax (GST) data. The services sector is now of central importance to advanced economies; however, our knowledge of this sector remains incomplete, partly due to a lack of data. The Voorburg Group on Service Statistics has been meeting for almost twenty years to develop and incorporate better measures for the services sector. Despite this effort, many sub-annual economic measures continue to rely on output data for the goods-producing sector and, with the exception of distributive trades, on employment data for service industries.

    The development of sub-annual indicators for service industries raises two questions regarding the national statistical program. First, is there a need for service output indicators to supplement existing sub-annual measures? And second, what service industries are the most promising for development? The paper begins by reviewing the importance of service industries and how they behave during economic downturns. Next, it examines considerations in determining which service industries to select as GST-based, sub-annual indicators. A case study of the accommodation services industry serves to illustrate improving timeliness and accuracy. We conclude by discussing the opportunities for, and limitations of, these indicators.

    Release date: 2005-10-20

  • Articles and reports: 12-002-X20050018030
    Description:

    People often wish to use survey micro-data to study whether the rate of occurrence of a particular condition in a subpopulation is the same as the rate of occurrence in the full population. This paper describes some alternatives for making inferences about such a rate difference and shows whether and how these alternatives may be implemented in three different survey software packages. The software packages illustrated - SUDAAN, WesVar and Bootvar - all can make use of bootstrap weights provided by the analyst to carry out variance estimation.

    Release date: 2005-06-23

  • Articles and reports: 12-001-X20040027753
    Description:

    Samplers often distrust model-based approaches to survey inference because of concerns about misspecification when models are applied to large samples from complex populations. We suggest that the model-based paradigm can work very successfully in survey settings, provided models are chosen that take into account the sample design and avoid strong parametric assumptions. The Horvitz-Thompson (HT) estimator is a simple design-unbiased estimator of the finite population total. From a modeling perspective, the HT estimator performs well when the ratios of the outcome values and the inclusion probabilities are exchangeable. When this assumption is not met, the HT estimator can be very inefficient. In Zheng and Little (2003, 2004) we used penalized splines (p-splines) to model smoothly - varying relationships between the outcome and the inclusion probabilities in one-stage probability proportional to size (PPS) samples. We showed that p spline model-based estimators are in general more efficient than the HT estimator, and can provide narrower confidence intervals with close to nominal confidence coverage. In this article, we extend this approach to two-stage sampling designs. We use a p-spline based mixed model that fits a nonparametric relationship between the primary sampling unit (PSU) means and a measure of PSU size, and incorporates random effects to model clustering. For variance estimation we consider the empirical Bayes model-based variance, the jackknife and balanced repeated replication (BRR) methods. Simulation studies on simulated data and samples drawn from public use microdata in the 1990 census demonstrate gains for the model-based p-spline estimator over the HT estimator and linear model-assisted estimators. Simulations also show the variance estimation methods yield confidence intervals with satisfactory confidence coverage. Interestingly, these gains can be seen for a common equal-probability design, where the first stage selection is PPS and the second stage selection probabilities are proportional to the inverse of the first stage inclusion probabilities, and the HT estimator leads to the unweighted mean. In situations that most favor the HT estimator, the model-based estimators have comparable efficiency.

    Release date: 2005-02-03

  • Articles and reports: 11-522-X20030017700
    Description:

    This paper suggests a useful framework for exploring the effects of moderate deviations from idealized conditions. It offers evaluation criteria for point estimators and interval estimators.

    Release date: 2005-01-26

  • Articles and reports: 11-522-X20030017722
    Description:

    This paper shows how to adapt design-based and model-based frameworks to the case of two-stage sampling.

    Release date: 2005-01-26

  • Surveys and statistical programs – Documentation: 12-002-X20040027035
    Description:

    As part of the processing of the National Longitudinal Survey of Children and Youth (NLSCY) cycle 4 data, historical revisions have been made to the data of the first 3 cycles, either to correct errors or to update the data. During processing, particular attention was given to the PERSRUK (Person Identifier) and the FIELDRUK (Household Identifier). The same level of attention has not been given to the other identifiers that are included in the data base, the CHILDID (Person identifier) and the _IDHD01 (Household identifier). These identifiers have been created for the public files and can also be found in the master files by default. The PERSRUK should be used to link records between files and the FIELDRUK to determine the household when using the master files.

    Release date: 2004-10-05

  • Articles and reports: 11-522-X20020016708
    Description:

    In this paper, we discuss the analysis of complex health survey data by using multivariate modelling techniques. Main interests are in various design-based and model-based methods that aim at accounting for the design complexities, including clustering, stratification and weighting. Methods covered include generalized linear modelling based on pseudo-likelihood and generalized estimating equations, linear mixed models estimated by restricted maximum likelihood, and hierarchical Bayes techniques using Markov Chain Monte Carlo (MCMC) methods. The methods will be compared empirically, using data from an extensive health interview and examination survey conducted in Finland in 2000 (Health 2000 Study).

    The data of the Health 2000 Study were collected using personal interviews, questionnaires and clinical examinations. A stratified two-stage cluster sampling design was used in the survey. The sampling design involved positive intra-cluster correlation for many study variables. For a closer investigation, we selected a small number of study variables from the health interview and health examination phases. In many cases, the different methods produced similar numerical results and supported similar statistical conclusions. Methods that failed to account for the design complexities sometimes led to conflicting conclusions. We also discuss the application of the methods in this paper by using standard statistical software products.

    Release date: 2004-09-13
Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (69)

Analysis (69) (20 to 30 of 69 results)

  • Articles and reports: 12-001-X200800110606
    Description:

    Data from election polls in the US are typically presented in two-way categorical tables, and there are many polls before the actual election in November. For example, in the Buckeye State Poll in 1998 for governor there are three polls, January, April and October; the first category represents the candidates (e.g., Fisher, Taft and other) and the second category represents the current status of the voters (likely to vote and not likely to vote for governor of Ohio). There is a substantial number of undecided voters for one or both categories in all three polls, and we use a Bayesian method to allocate the undecided voters to the three candidates. This method permits modeling different patterns of missingness under ignorable and nonignorable assumptions, and a multinomial-Dirichlet model is used to estimate the cell probabilities which can help to predict the winner. We propose a time-dependent nonignorable nonresponse model for the three tables. Here, a nonignorable nonresponse model is centered on an ignorable nonresponse model to induce some flexibility and uncertainty about ignorabilty or nonignorability. As competitors we also consider two other models, an ignorable and a nonignorable nonresponse model. These latter two models assume a common stochastic process to borrow strength over time. Markov chain Monte Carlo methods are used to fit the models. We also construct a parameter that can potentially be used to predict the winner among the candidates in the November election.

    Release date: 2008-06-26

  • Articles and reports: 11-522-X200600110392
    Description:

    We use a robust Bayesian method to analyze data with possibly nonignorable nonresponse and selection bias. A robust logistic regression model is used to relate the response indicators (Bernoulli random variable) to the covariates, which are available for everyone in the finite population. This relationship can adequately explain the difference between respondents and nonrespondents for the sample. This robust model is obtained by expanding the standard logistic regression model to a mixture of Student's distributions, thereby providing propensity scores (selection probability) which are used to construct adjustment cells. The nonrespondents' values are filled in by drawing a random sample from a kernel density estimator, formed from the respondents' values within the adjustment cells. Prediction uses a linear spline rank-based regression of the response variable on the covariates by areas, sampling the errors from another kernel density estimator; thereby further robustifying our method. We use Markov chain Monte Carlo (MCMC) methods to fit our model. The posterior distribution of a quantile of the response variable is obtained within each sub-area using the order statistic over all the individuals (sampled and nonsampled). We compare our robust method with recent parametric methods

    Release date: 2008-03-17

  • Articles and reports: 11-522-X200600110398
    Description:

    The study of longitudinal data is vital in terms of accurately observing changes in responses of interest for individuals, communities, and larger populations over time. Linear mixed effects models (for continuous responses observed over time) and generalized linear mixed effects models and generalized estimating equations (for more general responses such as binary or count data observed over time) are the most popular techniques used for analyzing longitudinal data from health studies, though, as with all modeling techniques, these approaches have limitations, partly due to their underlying assumptions. In this review paper, we will discuss some advances, including curve-based techniques, which make modeling longitudinal data more flexible. Three examples will be presented from the health literature utilizing these more flexible procedures, with the goal of demonstrating that some otherwise difficult questions can be reasonably answered when analyzing complex longitudinal data in population health studies.

    Release date: 2008-03-17

  • Articles and reports: 11-522-X200600110419
    Description:

    Health services research generally relies on observational data to compare outcomes of patients receiving different therapies. Comparisons of patient groups in observational studies may be biased, in that outcomes differ due to both the effects of treatment and the effects of patient prognosis. In some cases, especially when data are collected on detailed clinical risk factors, these differences can be controlled for using statistical or epidemiological methods. In other cases, when unmeasured characteristics of the patient population affect both the decision to provide therapy and the outcome, these differences cannot be removed using standard techniques. Use of health administrative data requires particular cautions in undertaking observational studies since important clinical information does not exist. We discuss several statistical and epidemiological approaches to remove overt (measurable) and hidden (unmeasurable) bias in observational studies. These include regression model-based case-mix adjustment, propensity-based matching, redefining the exposure variable of interest, and the econometric technique of instrumental variable (IV) analysis. These methods are illustrated using examples from the medical literature including prediction of one-year mortality following heart attack; the return to health care spending in higher spending U.S. regions in terms of clinical and financial benefits; and the long-term survival benefits of invasive cardiac management of heart attack patients. It is possible to use health administrative data for observational studies provided careful attention is paid to addressing issues of reverse causation and unmeasured confounding.

    Release date: 2008-03-17

  • Articles and reports: 92F0138M2008002
    Description:

    On November 26 2006, the Organization for Economic Co-operation and Development (OECD) held an international workshop on defining and measuring metropolitan regions. The reasons the OECD organized this workshop are listed below.

    1. Metropolitan Regions have become a crucial economic actor in today's highly integrated world. Not only do they play their traditional role of growth poles in their countries but they function as essential nodes of the global economy.2. Policy makers, international organisations and research networks are increasingly called to compare the economic and social performances of Metropolitan Regions across countries. Examples of this work undertaken in international organisation and networks include the UN-Habitat, the EU Urban Audit, ESPON and the OECD Competitive Cities.3. The scope of what we can learn from these international comparisons, however, is limited by the lack of a comparable definition of Metropolitan Regions. Although most countries have their own definitions, these vary significantly from one country to another. Furthermore, in search for higher cross-country comparability, international initiatives have - somehow paradoxically - generated an even larger number of definitions.4. In principle, there is no clear reason to prefer one definition to another. As each definition has been elaborated for a specific analytical purpose, it captures some features of a Metropolitan Region while it tends to overlook others. The issue, rather, is that we do not know the pros and the cons of different definitions nor, most important, the analytical implications of using one definition rather than another. 5. In order to respond to these questions, the OECD hosted an international workshop on 'Defining and Measuring Metropolitan Regions'. The workshop brought together major international organisations (the UN, Eurostat, the World Bank, and the OECD), National Statistical Offices and researchers from this field. The aim of the workshop was to develop some 'guiding principles', which could be agreed upon among the participants and would eventually provide the basis for some form of 'International Guidance' for comparing Metropolitan Regions across countries.

    This working paper was presented at this workshop. It provides the conceptual and methodological basis for the definition of metropolitan areas in Canada and provides a detailed comparison of Canada's methodology to that of the USA. The intent was to encourage discussion regarding Canada's approach to defining metropolitan areas in the effort to identify the 'guiding principles'. It is being made available as a working paper to continue this discussion and to provide background to the user community to encourage dialogue and commentary from the user community regarding Canada's metropolitan area methodology.

    Release date: 2008-02-20

  • Articles and reports: 92F0138M2007001
    Description:

    Statistics Canada creates files that provide the link between postal codes and the geographic areas by which it disseminates statistical data. By linking postal codes to the Statistics Canada geographic areas, Statistics Canada facilitates the extraction and subsequent aggregation of data for selected geographic areas from files available to users. Users can then take data from Statistics Canada for their areas and tabulate this with other data for these same areas to create a combined statistical profile for these areas.

    An issue has been the methodology used by Statistics Canada to establish the linkage of postal codes to geographic areas. In order to address this issue, Statistics Canada decided to create a conceptual framework on which to base the rules for linking postal codes and Statistics Canada's geographic areas. This working paper presents the conceptual framework and the geocoding rules. The methodology described in this paper will be the basis for linking postal codes to the 2006 Census geographic areas. This paper is presented for feedback from users of Statistics Canada's postal codes related products.

    Release date: 2007-02-12

  • Articles and reports: 12-001-X20060019257
    Description:

    In the presence of item nonreponse, two approaches have been traditionally used to make inference on parameters of interest. The first approach assumes uniform response within imputation cells whereas the second approach assumes ignorable response but make use of a model on the variable of interest as the basis for inference. In this paper, we propose a third appoach that assumes a specified ignorable response mechanism without having to specify a model on the variable of interest. In this case, we show how to obtain imputed values which lead to estimators of a total that are approximately unbiased under the proposed approach as well as the second approach. Variance estimators of the imputed estimators that are approximately unbiased are also obtained using an approach of Fay (1991) in which the order of sampling and response is reversed. Finally, simulation studies are conducted to investigate the finite sample performance of the methods in terms of bias and mean square error.

    Release date: 2006-07-20

  • Articles and reports: 11F0024M20050008805
    Description:

    This paper reports on the potential development of sub-annual indicators for selected service industries using Goods and Services Tax (GST) data. The services sector is now of central importance to advanced economies; however, our knowledge of this sector remains incomplete, partly due to a lack of data. The Voorburg Group on Service Statistics has been meeting for almost twenty years to develop and incorporate better measures for the services sector. Despite this effort, many sub-annual economic measures continue to rely on output data for the goods-producing sector and, with the exception of distributive trades, on employment data for service industries.

    The development of sub-annual indicators for service industries raises two questions regarding the national statistical program. First, is there a need for service output indicators to supplement existing sub-annual measures? And second, what service industries are the most promising for development? The paper begins by reviewing the importance of service industries and how they behave during economic downturns. Next, it examines considerations in determining which service industries to select as GST-based, sub-annual indicators. A case study of the accommodation services industry serves to illustrate improving timeliness and accuracy. We conclude by discussing the opportunities for, and limitations of, these indicators.

    Release date: 2005-10-20

  • Articles and reports: 12-002-X20050018030
    Description:

    People often wish to use survey micro-data to study whether the rate of occurrence of a particular condition in a subpopulation is the same as the rate of occurrence in the full population. This paper describes some alternatives for making inferences about such a rate difference and shows whether and how these alternatives may be implemented in three different survey software packages. The software packages illustrated - SUDAAN, WesVar and Bootvar - all can make use of bootstrap weights provided by the analyst to carry out variance estimation.

    Release date: 2005-06-23

  • Articles and reports: 12-001-X20040027753
    Description:

    Samplers often distrust model-based approaches to survey inference because of concerns about misspecification when models are applied to large samples from complex populations. We suggest that the model-based paradigm can work very successfully in survey settings, provided models are chosen that take into account the sample design and avoid strong parametric assumptions. The Horvitz-Thompson (HT) estimator is a simple design-unbiased estimator of the finite population total. From a modeling perspective, the HT estimator performs well when the ratios of the outcome values and the inclusion probabilities are exchangeable. When this assumption is not met, the HT estimator can be very inefficient. In Zheng and Little (2003, 2004) we used penalized splines (p-splines) to model smoothly - varying relationships between the outcome and the inclusion probabilities in one-stage probability proportional to size (PPS) samples. We showed that p spline model-based estimators are in general more efficient than the HT estimator, and can provide narrower confidence intervals with close to nominal confidence coverage. In this article, we extend this approach to two-stage sampling designs. We use a p-spline based mixed model that fits a nonparametric relationship between the primary sampling unit (PSU) means and a measure of PSU size, and incorporates random effects to model clustering. For variance estimation we consider the empirical Bayes model-based variance, the jackknife and balanced repeated replication (BRR) methods. Simulation studies on simulated data and samples drawn from public use microdata in the 1990 census demonstrate gains for the model-based p-spline estimator over the HT estimator and linear model-assisted estimators. Simulations also show the variance estimation methods yield confidence intervals with satisfactory confidence coverage. Interestingly, these gains can be seen for a common equal-probability design, where the first stage selection is PPS and the second stage selection probabilities are proportional to the inverse of the first stage inclusion probabilities, and the HT estimator leads to the unweighted mean. In situations that most favor the HT estimator, the model-based estimators have comparable efficiency.

    Release date: 2005-02-03
Reference (16)

Reference (16) (0 to 10 of 16 results)

  • Surveys and statistical programs – Documentation: 11-522-X201300014259
    Description:

    In an effort to reduce response burden on farm operators, Statistics Canada is studying alternative approaches to telephone surveys for producing field crop estimates. One option is to publish harvested area and yield estimates in September as is currently done, but to calculate them using models based on satellite and weather data, and data from the July telephone survey. However before adopting such an approach, a method must be found which produces estimates with a sufficient level of accuracy. Research is taking place to investigate different possibilities. Initial research results and issues to consider are discussed in this paper.

    Release date: 2014-10-31

  • Surveys and statistical programs – Documentation: 12-001-X201300211887
    Description:

    Multi-level models are extensively used for analyzing survey data with the design hierarchy matching the model hierarchy. We propose a unified approach, based on a design-weighted log composite likelihood, for two-level models that leads to design-model consistent estimators of the model parameters even when the within cluster sample sizes are small provided the number of sample clusters is large. This method can handle both linear and generalized linear two-level models and it requires level 2 and level 1 inclusion probabilities and level 1 joint inclusion probabilities, where level 2 represents a cluster and level 1 an element within a cluster. Results of a simulation study demonstrating superior performance of the proposed method relative to existing methods under informative sampling are also reported.

    Release date: 2014-01-15

  • Surveys and statistical programs – Documentation: 12-001-X201200211758
    Description:

    This paper develops two Bayesian methods for inference about finite population quantiles of continuous survey variables from unequal probability sampling. The first method estimates cumulative distribution functions of the continuous survey variable by fitting a number of probit penalized spline regression models on the inclusion probabilities. The finite population quantiles are then obtained by inverting the estimated distribution function. This method is quite computationally demanding. The second method predicts non-sampled values by assuming a smoothly-varying relationship between the continuous survey variable and the probability of inclusion, by modeling both the mean function and the variance function using splines. The two Bayesian spline-model-based estimators yield a desirable balance between robustness and efficiency. Simulation studies show that both methods yield smaller root mean squared errors than the sample-weighted estimator and the ratio and difference estimators described by Rao, Kovar, and Mantel (RKM 1990), and are more robust to model misspecification than the regression through the origin model-based estimator described in Chambers and Dunstan (1986). When the sample size is small, the 95% credible intervals of the two new methods have closer to nominal confidence coverage than the sample-weighted estimator.

    Release date: 2012-12-19

  • Surveys and statistical programs – Documentation: 12-001-X201200111688
    Description:

    We study the problem of nonignorable nonresponse in a two dimensional contingency table which can be constructed for each of several small areas when there is both item and unit nonresponse. In general, the provision for both types of nonresponse with small areas introduces significant additional complexity in the estimation of model parameters. For this paper, we conceptualize the full data array for each area to consist of a table for complete data and three supplemental tables for missing row data, missing column data, and missing row and column data. For nonignorable nonresponse, the total cell probabilities are allowed to vary by area, cell and these three types of "missingness". The underlying cell probabilities (i.e., those which would apply if full classification were always possible) for each area are generated from a common distribution and their similarity across the areas is parametrically quantified. Our approach is an extension of the selection approach for nonignorable nonresponse investigated by Nandram and Choi (2002a, b) for binary data; this extension creates additional complexity because of the multivariate nature of the data coupled with the small area structure. As in that earlier work, the extension is an expansion model centered on an ignorable nonresponse model so that the total cell probability is dependent upon which of the categories is the response. Our investigation employs hierarchical Bayesian models and Markov chain Monte Carlo methods for posterior inference. The models and methods are illustrated with data from the third National Health and Nutrition Examination Survey.

    Release date: 2012-06-27

  • Surveys and statistical programs – Documentation: 12-001-X201100211603
    Description:

    In many sample surveys there are items requesting binary response (e.g., obese, not obese) from a number of small areas. Inference is required about the probability for a positive response (e.g., obese) in each area, the probability being the same for all individuals in each area and different across areas. Because of the sparseness of the data within areas, direct estimators are not reliable, and there is a need to use data from other areas to improve inference for a specific area. Essentially, a priori the areas are assumed to be similar, and a hierarchical Bayesian model, the standard beta-binomial model, is a natural choice. The innovation is that a practitioner may have much-needed additional prior information about a linear combination of the probabilities. For example, a weighted average of the probabilities is a parameter, and information can be elicited about this parameter, thereby making the Bayesian paradigm appropriate. We have modified the standard beta-binomial model for small areas to incorporate the prior information on the linear combination of the probabilities, which we call a constraint. Thus, there are three cases. The practitioner (a) does not specify a constraint, (b) specifies a constraint and the parameter completely, and (c) specifies a constraint and information which can be used to construct a prior distribution for the parameter. The griddy Gibbs sampler is used to fit the models. To illustrate our method, we use an example on obesity of children in the National Health and Nutrition Examination Survey in which the small areas are formed by crossing school (middle, high), ethnicity (white, black, Mexican) and gender (male, female). We use a simulation study to assess some of the statistical features of our method. We have shown that the gain in precision beyond (a) is in the order with (b) larger than (c).

    Release date: 2011-12-21

  • Surveys and statistical programs – Documentation: 12-001-X201000111250
    Description:

    We propose a Bayesian Penalized Spline Predictive (BPSP) estimator for a finite population proportion in an unequal probability sampling setting. This new method allows the probabilities of inclusion to be directly incorporated into the estimation of a population proportion, using a probit regression of the binary outcome on the penalized spline of the inclusion probabilities. The posterior predictive distribution of the population proportion is obtained using Gibbs sampling. The advantages of the BPSP estimator over the Hájek (HK), Generalized Regression (GR), and parametric model-based prediction estimators are demonstrated by simulation studies and a real example in tax auditing. Simulation studies show that the BPSP estimator is more efficient, and its 95% credible interval provides better confidence coverage with shorter average width than the HK and GR estimators, especially when the population proportion is close to zero or one or when the sample is small. Compared to linear model-based predictive estimators, the BPSP estimators are robust to model misspecification and influential observations in the sample.

    Release date: 2010-06-29

  • Surveys and statistical programs – Documentation: 12-002-X20040027035
    Description:

    As part of the processing of the National Longitudinal Survey of Children and Youth (NLSCY) cycle 4 data, historical revisions have been made to the data of the first 3 cycles, either to correct errors or to update the data. During processing, particular attention was given to the PERSRUK (Person Identifier) and the FIELDRUK (Household Identifier). The same level of attention has not been given to the other identifiers that are included in the data base, the CHILDID (Person identifier) and the _IDHD01 (Household identifier). These identifiers have been created for the public files and can also be found in the master files by default. The PERSRUK should be used to link records between files and the FIELDRUK to determine the household when using the master files.

    Release date: 2004-10-05

  • Surveys and statistical programs – Documentation: 13F0026M2001003
    Description:

    Initial results from the Survey of Financial Security (SFS), which provides information on the net worth of Canadians, were released on March 15 2001, in The daily. The survey collected information on the value of the financial and non-financial assets owned by each family unit and on the amount of their debt.

    Statistics Canada is currently refining this initial estimate of net worth by adding to it an estimate of the value of benefits accrued in employer pension plans. This is an important addition to any asset and debt survey as, for many family units, it is likely to be one of the largest assets. With the aging of the population, information on pension accumulations is greatly needed to better understand the financial situation of those nearing retirement. These updated estimates of the Survey of Financial Security will be released in late fall 2001.

    The process for estimating the value of employer pension plan benefits is a complex one. This document describes the methodology for estimating that value, for the following groups: a) persons who belonged to an RPP at the time of the survey (referred to as current plan members); b) persons who had previously belonged to an RPP and either left the money in the plan or transferred it to a new plan; c) persons who are receiving RPP benefits.

    This methodology was proposed by Hubert Frenken and Michael Cohen. The former has many years of experience with Statistics Canada working with data on employer pension plans; the latter is a principal with the actuarial consulting firm William M. Mercer. Earlier this year, Statistics Canada carried out a public consultation on the proposed methodology. This report includes updates made as a result of feedback received from data users.

    Release date: 2001-09-05

  • Surveys and statistical programs – Documentation: 13F0026M2001002
    Description:

    The Survey of Financial Security (SFS) will provide information on the net worth of Canadians. In order to do this, information was collected - in May and June 1999 - on the value of the assets and debts of each of the families or unattached individuals in the sample. The value of one particular asset is not easy to determine, or to estimate. That is the present value of the amount people have accrued in their employer pension plan. These plans are often called registered pension plans (RPP), as they must be registered with Canada Customs and Revenue Agency. Although some RPP members receive estimates of the value of their accrued benefit, in most cases plan members would not know this amount. However, it is likely to be one of the largest assets for many family units. And, as the baby boomers approach retirement, information on their pension accumulations is much needed to better understand their financial readiness for this transition.

    The intent of this paper is to: present, for discussion, a methodology for estimating the present value of employer pension plan benefits for the Survey of Financial Security; and to seek feedback on the proposed methodology. This document proposes a methodology for estimating the value of employer pension plan benefits for the following groups:a) persons who belonged to an RPP at the time of the survey (referred to as current plan members); b) persons who had previously belonged to an RPP and either left the money in the plan or transferred it to a new plan; c) persons who are receiving RPP benefits.

    Release date: 2001-02-07

  • Surveys and statistical programs – Documentation: 11-522-X19990015642
    Description:

    The Longitudinal Immigration Database (IMDB) links immigration and taxation administrative records into a comprehensive source of data on the labour market behaviour of the landed immigrant population in Canada. It covers the period 1980 to 1995 and will be updated annually starting with the 1996 tax year in 1999. Statistics Canada manages the database on behalf of a federal-provincial consortium led by Citizenship and Immigration Canada. The IMDB was created specifically to respond to the need for detailed and reliable data on the performance and impact of immigration policies and programs. It is the only source of data at Statistics Canada that provides a direct link between immigration policy levers and the economic performance of immigrants. The paper will examine the issues related to the development of a longitudinal database combining administrative records to support policy-relevant research and analysis. Discussion will focus specifically on the methodological, conceptual, analytical and privacy issues involved in the creation and ongoing development of this database. The paper will also touch briefly on research findings, which illustrate the policy outcome links the IMDB allows policy-makers to investigate.

    Release date: 2000-03-02
Date modified: