Keyword search

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

3 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (472)

All (472) (0 to 10 of 472 results)

  • Public use microdata: 89F0002X
    Description: The SPSD/M is a static microsimulation model designed to analyse financial interactions between governments and individuals in Canada. It can compute taxes paid to and cash transfers received from government. It is comprised of a database, a series of tax/transfer algorithms and models, analytical software and user documentation.
    Release date: 2024-04-12

  • Articles and reports: 11-522-X202100100009
    Description:

    Use of auxiliary data to improve the efficiency of estimators of totals and means through model-assisted survey regression estimation has received considerable attention in recent years. Generalized regression (GREG) estimators, based on a working linear regression model, are currently used in establishment surveys at Statistics Canada and several other statistical agencies.  GREG estimators use common survey weights for all study variables and calibrate to known population totals of auxiliary variables. Increasingly, many auxiliary variables are available, some of which may be extraneous. This leads to unstable GREG weights when all the available auxiliary variables, including interactions among categorical variables, are used in the working linear regression model. On the other hand, new machine learning methods, such as regression trees and lasso, automatically select significant auxiliary variables and lead to stable nonnegative weights and possible efficiency gains over GREG.  In this paper, a simulation study, based on a real business survey sample data set treated as the target population, is conducted to study the relative performance of GREG, regression trees and lasso in terms of efficiency of the estimators.

    Key Words: Model assisted inference; calibration estimation; model selection; generalized regression estimator.

    Release date: 2021-10-29

  • Articles and reports: 11-522-X202100100001
    Description:

    We consider regression analysis in the context of data integration. To combine partial information from external sources, we employ the idea of model calibration which introduces a “working” reduced model based on the observed covariates. The working reduced model is not necessarily correctly specified but can be a useful device to incorporate the partial information from the external data. The actual implementation is based on a novel application of the empirical likelihood method. The proposed method is particularly attractive for combining information from several sources with different missing patterns. The proposed method is applied to a real data example combining survey data from Korean National Health and Nutrition Examination Survey and big data from National Health Insurance Sharing Service in Korea.

    Key Words: Big data; Empirical likelihood; Measurement error models; Missing covariates.

    Release date: 2021-10-15

  • Surveys and statistical programs – Documentation: 11-633-X2021005
    Description:

    The Analytical Studies and Modelling Branch (ASMB) is the research arm of Statistics Canada mandated to provide high-quality, relevant and timely information on economic, health and social issues that are important to Canadians. The branch strategically makes use of expert knowledge and a broad range of data sources and modelling techniques to address the information needs of a broad range of government, academic and public sector partners and stakeholders through analysis and research, modeling and predictive analytics, and data development. The branch strives to deliver relevant, high-quality, timely, comprehensive, horizontal and integrated research and to enable the use of its research through capacity building and strategic dissemination to meet the user needs of policy makers, academics and the general public.

    This Multi-year Consolidated Plan for Research, Modelling and Data Development outlines the priorities for the branch over the next two years.

    Release date: 2021-08-12

  • Articles and reports: 62F0026M2020001
    Description:

    Since the 2010 Survey of Household Spending redesign, statistics on the annual proportion of households reporting expenditures and the annual average expenditure per reporting household have not been available for many good and service categories. To help fill this data gap for users, a statistical model was developed in order to produce approximations of these statistics. This product consists of data tables and a user guide.

    Release date: 2021-01-07

  • 36-23-0001
    Description: Input-output (IO) models are generally used to simulate the economic impacts of an expenditure on a given basket of goods and services or the output of one or several industries. The simulation results from a “shock” to an IO model will show the direct, indirect and induced impacts on Gross Domestic Product (GDP), which industries benefit the most, the number of jobs created, estimates of indirect taxes and subsidies generated, etc. The model also includes estimates of the impacts on energy use (expressed in terajoules) and greenhouse gas emissions (carbon dioxide equivalent, expressed in kilotonnes). IO price, energy, and tax models may also be available depending on the availability of resources. For more details, ask us for the Guide to using the input-output simulation model, available upon request.
    Release date: 2020-11-23

  • 36-23-0002
    Description: Input-output (IO) models are generally used to simulate the economic impacts of an expenditure on a given basket of goods and services or the output of one or several industries. The simulation results from a “shock” to an IO model will show the direct, indirect and induced impacts on Gross Domestic Product (GDP), which industries benefit the most, the number of jobs created, estimates of indirect taxes and subsidies generated, etc. The model also includes an estimate of the impact on interprovincial trade flows. IO price, energy, and tax models may also be available depending on the availability of resources. For more details, ask us for the Guide to using the input-output simulation model, available upon request.
    Release date: 2020-11-23

  • Articles and reports: 82-003-X202001100002
    Description:

    Using data from the 2003 to 2013 cycles of the Canadian Community Health Survey, this study’s objective was to characterize smoking history by sex using birth cohorts beginning in 1920. Smoking histories for each birth cohort included age at smoking initiation and cessation, which was used to construct smoking prevalence for each calendar year from 1971 to 2041. A secondary objective was to characterize smoking history by socioeconomic status.

    Release date: 2020-11-18

  • Surveys and statistical programs – Documentation: 12-539-X
    Description:

    This document brings together guidelines and checklists on many issues that need to be considered in the pursuit of quality objectives in the execution of statistical activities. Its focus is on how to assure quality through effective and appropriate design or redesign of a statistical project or program from inception through to data evaluation, dissemination and documentation. These guidelines draw on the collective knowledge and experience of many Statistics Canada employees. It is expected that Quality Guidelines will be useful to staff engaged in the planning and design of surveys and other statistical projects, as well as to those who evaluate and analyze the outputs of these projects.

    Release date: 2019-12-04

  • Surveys and statistical programs – Documentation: 15F0004X
    Description:

    The input-output (IO) models are generally used to simulate the economic impacts of an expenditure on a given basket of goods and services or the output of one or several industries. The simulation results from a "shock" to an IO model will show the direct, indirect and induced impacts on GDP, which industries benefit the most, the number of jobs created, estimates of indirect taxes and subsidies generated, etc. For more details, ask us for the Guide to using the input-output simulation model, available free of charge upon request.

    At various times, clients have requested the use of IO price, energy, tax and market models. Given their availability, arrangements can be made to use these models on request.

    The national IO model was not released in 2015 or 2016.

    Release date: 2019-04-04
Data (3)

Data (3) ((3 results))

  • Public use microdata: 89F0002X
    Description: The SPSD/M is a static microsimulation model designed to analyse financial interactions between governments and individuals in Canada. It can compute taxes paid to and cash transfers received from government. It is comprised of a database, a series of tax/transfer algorithms and models, analytical software and user documentation.
    Release date: 2024-04-12

  • Public use microdata: 12M0014X
    Geography: Province or territory
    Description:

    This report presents a brief overview of the information collected in Cycle 14 of the General Social Survey (GSS). Cycle 14 is the first cycle to collect detailed information on access to and use of information communication technology in Canada. Topics include general use of technology and computers, technology in the workplace, development of computer skills, frequency of Internet and E-mail use, non-users and security and information on the Internet. The target population of the GSS is all individuals aged 15 and over living in a private household in one of the ten provinces.

    Release date: 2001-06-29

  • Public use microdata: 82M0009X
    Description:

    The National Population Health Survey (NPHS) used the Labour Force Survey sampling frame to draw the initial sample of approximately 20,000 households starting in 1994 and for the sample top-up this third cycle. The survey is conducted every two years. The sample collection is distributed over four quarterly periods followed by a follow-up period and the whole process takes a year. In each household, some limited health information is collected from all household members and one person in each household is randomly selected for a more in-depth interview.

    The survey is designed to collect information on the health of the Canadian population and related socio-demographic information. The first cycle of data collection began in 1994, and continues every second year thereafter. The survey is designed to produce both cross-sectional and longitudinal estimates. The questionnaires includes content related to health status, use of health services, determinants of health, a health index, chronic conditions and activity restrictions. The use of health services is probed through visits to health care providers, both traditional and non-traditional, and the use of drugs and other mediciations. Health determinants include smoking, alcohol use and physical activity. A special focus content for this cycle includes family medical history with questions about certain chronic conditions among immediate family members and when they were acquired. As well, a section on self care has also been included this cycle. The socio-demographic information includes age, sex, education, ethnicity, household income and labour force status.

    Release date: 2000-12-19
Analysis (435)

Analysis (435) (10 to 20 of 435 results)

  • Articles and reports: 12-001-X201600114539
    Description:

    Statistical matching is a technique for integrating two or more data sets when information available for matching records for individual participants across data sets is incomplete. Statistical matching can be viewed as a missing data problem where a researcher wants to perform a joint analysis of variables that are never jointly observed. A conditional independence assumption is often used to create imputed data for statistical matching. We consider a general approach to statistical matching using parametric fractional imputation of Kim (2011) to create imputed data under the assumption that the specified model is fully identified. The proposed method does not have a convergent EM sequence if the model is not identified. We also present variance estimators appropriate for the imputation procedure. We explain how the method applies directly to the analysis of data from split questionnaire designs and measurement error models.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201600114540
    Description:

    In this paper, we compare the EBLUP and pseudo-EBLUP estimators for small area estimation under the nested error regression model and three area level model-based estimators using the Fay-Herriot model. We conduct a design-based simulation study to compare the model-based estimators for unit level and area level models under informative and non-informative sampling. In particular, we are interested in the confidence interval coverage rate of the unit level and area level estimators. We also compare the estimators if the model has been misspecified. Our simulation results show that estimators based on the unit level model perform better than those based on the area level. The pseudo-EBLUP estimator is the best among unit level and area level estimators.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201600114541
    Description:

    In this work we compare nonparametric estimators for finite population distribution functions based on two types of fitted values: the fitted values from the well-known Kuo estimator and a modified version of them, which incorporates a nonparametric estimate for the mean regression function. For each type of fitted values we consider the corresponding model-based estimator and, after incorporating design weights, the corresponding generalized difference estimator. We show under fairly general conditions that the leading term in the model mean square error is not affected by the modification of the fitted values, even though it slows down the convergence rate for the model bias. Second order terms of the model mean square errors are difficult to obtain and will not be derived in the present paper. It remains thus an open question whether the modified fitted values bring about some benefit from the model-based perspective. We discuss also design-based properties of the estimators and propose a variance estimator for the generalized difference estimator based on the modified fitted values. Finally, we perform a simulation study. The simulation results suggest that the modified fitted values lead to a considerable reduction of the design mean square error if the sample size is small.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201600114542
    Description:

    The restricted maximum likelihood (REML) method is generally used to estimate the variance of the random area effect under the Fay-Herriot model (Fay and Herriot 1979) to obtain the empirical best linear unbiased (EBLUP) estimator of a small area mean. When the REML estimate is zero, the weight of the direct sample estimator is zero and the EBLUP becomes a synthetic estimator. This is not often desirable. As a solution to this problem, Li and Lahiri (2011) and Yoshimori and Lahiri (2014) developed adjusted maximum likelihood (ADM) consistent variance estimators which always yield positive variance estimates. Some of the ADM estimators always yield positive estimates but they have a large bias and this affects the estimation of the mean squared error (MSE) of the EBLUP. We propose to use a MIX variance estimator, defined as a combination of the REML and ADM methods. We show that it is unbiased up to the second order and it always yields a positive variance estimate. Furthermore, we propose an MSE estimator under the MIX method and show via a model-based simulation that in many situations, it performs better than other ‘Taylor linearization’ MSE estimators proposed recently.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201600114544
    Description:

    In the Netherlands, statistical information about income and wealth is based on two large scale household panels that are completely derived from administrative data. A problem with using households as sampling units in the sample design of panels is the instability of these units over time. Changes in the household composition affect the inclusion probabilities required for design-based and model-assisted inference procedures. Such problems are circumvented in the two aforementioned household panels by sampling persons, who are followed over time. At each period the household members of these sampled persons are included in the sample. This is equivalent to sampling with probabilities proportional to household size where households can be selected more than once but with a maximum equal to the number of household members. In this paper properties of this sample design are described and contrasted with the Generalized Weight Share method for indirect sampling (Lavallée 1995, 2007). Methods are illustrated with an application to the Dutch Regional Income Survey.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201600114546
    Description:

    Adjusting the base weights using weighting classes is a standard approach for dealing with unit nonresponse. A common approach is to create nonresponse adjustments that are weighted by the inverse of the assumed response propensity of respondents within weighting classes under a quasi-randomization approach. Little and Vartivarian (2003) questioned the value of weighting the adjustment factor. In practice the models assumed are misspecified, so it is critical to understand the impact of weighting might have in this case. This paper describes the effects on nonresponse adjusted estimates of means and totals for population and domains computed using the weighted and unweighted inverse of the response propensities in stratified simple random sample designs. The performance of these estimators under different conditions such as different sample allocation, response mechanism, and population structure is evaluated. The findings show that for the scenarios considered the weighted adjustment has substantial advantages for estimating totals and using an unweighted adjustment may lead to serious biases except in very limited cases. Furthermore, unlike the unweighted estimates, the weighted estimates are not sensitive to how the sample is allocated.

    Release date: 2016-06-22

  • Articles and reports: 82-003-X201600314338
    Description:

    This paper describes the methods and data used in the development and implementation of the POHEM-Neurological meta-model.

    Release date: 2016-03-16

  • Articles and reports: 12-001-X201500214230
    Description:

    This paper develops allocation methods for stratified sample surveys where composite small area estimators are a priority, and areas are used as strata. Longford (2006) proposed an objective criterion for this situation, based on a weighted combination of the mean squared errors of small area means and a grand mean. Here, we redefine this approach within a model-assisted framework, allowing regressor variables and a more natural interpretation of results using an intra-class correlation parameter. We also consider several uses of power allocation, and allow the placing of other constraints such as maximum relative root mean squared errors for stratum estimators. We find that a simple power allocation can perform very nearly as well as the optimal design even when the objective is to minimize Longford’s (2006) criterion.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500214231
    Description:

    Rotating panels are widely applied by national statistical institutes, for example, to produce official statistics about the labour force. Estimation procedures are generally based on traditional design-based procedures known from classical sampling theory. A major drawback of this class of estimators is that small sample sizes result in large standard errors and that they are not robust for measurement bias. Two examples showing the effects of measurement bias are rotation group bias in rotating panels, and systematic differences in the outcome of a survey due to a major redesign of the underlying process. In this paper we apply a multivariate structural time series model to the Dutch Labour Force Survey to produce model-based figures about the monthly labour force. The model reduces the standard errors of the estimates by taking advantage of sample information collected in previous periods, accounts for rotation group bias and autocorrelation induced by the rotating panel, and models discontinuities due to a survey redesign. Additionally, we discuss the use of correlated auxiliary series in the model to further improve the accuracy of the model estimates. The method is applied by Statistics Netherlands to produce accurate official monthly statistics about the labour force that are consistent over time, despite a redesign of the survey process.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500214238
    Description:

    Félix-Medina and Thompson (2004) proposed a variant of link-tracing sampling to sample hidden and/or hard-to-detect human populations such as drug users and sex workers. In their variant, an initial sample of venues is selected and the people found in the sampled venues are asked to name other members of the population to be included in the sample. Those authors derived maximum likelihood estimators of the population size under the assumption that the probability that a person is named by another in a sampled venue (link-probability) does not depend on the named person (homogeneity assumption). In this work we extend their research to the case of heterogeneous link-probabilities and derive unconditional and conditional maximum likelihood estimators of the population size. We also propose profile likelihood and bootstrap confidence intervals for the size of the population. The results of simulations studies carried out by us show that in presence of heterogeneous link-probabilities the proposed estimators perform reasonably well provided that relatively large sampling fractions, say larger than 0.5, be used, whereas the estimators derived under the homogeneity assumption perform badly. The outcomes also show that the proposed confidence intervals are not very robust to deviations from the assumed models.

    Release date: 2015-12-17
Reference (32)

Reference (32) (0 to 10 of 32 results)

  • Surveys and statistical programs – Documentation: 11-633-X2021005
    Description:

    The Analytical Studies and Modelling Branch (ASMB) is the research arm of Statistics Canada mandated to provide high-quality, relevant and timely information on economic, health and social issues that are important to Canadians. The branch strategically makes use of expert knowledge and a broad range of data sources and modelling techniques to address the information needs of a broad range of government, academic and public sector partners and stakeholders through analysis and research, modeling and predictive analytics, and data development. The branch strives to deliver relevant, high-quality, timely, comprehensive, horizontal and integrated research and to enable the use of its research through capacity building and strategic dissemination to meet the user needs of policy makers, academics and the general public.

    This Multi-year Consolidated Plan for Research, Modelling and Data Development outlines the priorities for the branch over the next two years.

    Release date: 2021-08-12

  • Surveys and statistical programs – Documentation: 12-539-X
    Description:

    This document brings together guidelines and checklists on many issues that need to be considered in the pursuit of quality objectives in the execution of statistical activities. Its focus is on how to assure quality through effective and appropriate design or redesign of a statistical project or program from inception through to data evaluation, dissemination and documentation. These guidelines draw on the collective knowledge and experience of many Statistics Canada employees. It is expected that Quality Guidelines will be useful to staff engaged in the planning and design of surveys and other statistical projects, as well as to those who evaluate and analyze the outputs of these projects.

    Release date: 2019-12-04

  • Surveys and statistical programs – Documentation: 15F0004X
    Description:

    The input-output (IO) models are generally used to simulate the economic impacts of an expenditure on a given basket of goods and services or the output of one or several industries. The simulation results from a "shock" to an IO model will show the direct, indirect and induced impacts on GDP, which industries benefit the most, the number of jobs created, estimates of indirect taxes and subsidies generated, etc. For more details, ask us for the Guide to using the input-output simulation model, available free of charge upon request.

    At various times, clients have requested the use of IO price, energy, tax and market models. Given their availability, arrangements can be made to use these models on request.

    The national IO model was not released in 2015 or 2016.

    Release date: 2019-04-04

  • Surveys and statistical programs – Documentation: 15F0009X
    Description:

    The input-output (IO) models are generally used to simulate the economic impacts of an expenditure on a given basket of goods and services or the output of one or several industries. The simulation results from a "shock" to an IO model will show the direct, indirect and induced impacts on GDP, which industries benefit the most, the number of jobs created, estimates of indirect taxes and subsidies generated, etc. For more details, ask us for the Guide to using the input-output simulation model, available free of charge upon request.

    At various times, clients have requested the use of IO price, energy, tax and market models. Given their availability, arrangements can be made to use these models on request.

    The interprovincial IO model was not released in 2015 or 2016.

    Release date: 2019-04-04

  • Surveys and statistical programs – Documentation: 71-526-X
    Description:

    The Canadian Labour Force Survey (LFS) is the official source of monthly estimates of total employment and unemployment. Following the 2011 census, the LFS underwent a sample redesign to account for the evolution of the population and labour market characteristics, to adjust to changes in the information needs and to update the geographical information used to carry out the survey. The redesign program following the 2011 census culminated with the introduction of a new sample at the beginning of 2015. This report is a reference on the methodological aspects of the LFS, covering stratification, sampling, collection, processing, weighting, estimation, variance estimation and data quality.

    Release date: 2017-12-21

  • Notices and consultations: 92-140-X2016001
    Description:

    The 2016 Census Program Content Test was conducted from May 2 to June 30, 2014. The Test was designed to assess the impact of any proposed content changes to the 2016 Census Program and to measure the impact of including a social insurance number (SIN) question on the data quality.

    This quantitative test used a split-panel design involving 55,000 dwellings, divided into 11 panels of 5,000 dwellings each: five panels were dedicated to the Content Test while the remaining six panels were for the SIN Test. Two models of test questionnaires were developed to meet the objectives, namely a model with all the proposed changes EXCEPT the SIN question and a model with all the proposed changes INCLUDING the SIN question. A third model of 'control' questionnaire with the 2011 content was also developed. The population living in a private dwelling in mail-out areas in one of the ten provinces was targeted for the test. Paper and electronic response channels were part of the Test as well.

    This report presents the Test objectives, the design and a summary of the analysis in order to determine potential content for the 2016 Census Program. Results from the data analysis of the Test were not the only elements used to determine the content for 2016. Other elements were also considered, such as response burden, comparison over time and users’ needs.

    Release date: 2016-04-01

  • Surveys and statistical programs – Documentation: 62F0026M2005006
    Description:

    This report describes the quality indicators produced for the 2003 Survey of Household Spending. These quality indicators, such as coefficients of variation, nonresponse rates, slippage rates and imputation rates, help users interpret the survey data.

    Release date: 2005-10-06

  • Surveys and statistical programs – Documentation: 15-002-M2001001
    Description:

    This document describes the sources, concepts and methods utilized by the Canadian Productivity Accounts and discusses how they compare with their U.S. counterparts.

    Release date: 2004-12-24

  • Notices and consultations: 13-605-X20020038512
    Description:

    As of September 30, 2002 the monthly GDP by industry estimates will incorporate the Chain Fisher formula. This change will be applied from January 1997 and will be pushed back to January 1961 within a year.

    Release date: 2002-09-30

  • Notices and consultations: 13-605-X20010018529
    Description:

    As of May 31, 2001 the Quarterly Income and Expenditure Accounts will have adopted the following change: Chain Fisher formula.

    Release date: 2001-05-31
Date modified: