Keyword search

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (81)

All (81) (0 to 10 of 81 results)

  • Journals and periodicals: 12-206-X
    Description: This report summarizes the annual achievements of the Methodology Research and Development Program (MRDP) sponsored by the Modern Statistical Methods and Data Science Branch at Statistics Canada. This program covers research and development activities in statistical methods with potentially broad application in the agency’s statistical programs; these activities would otherwise be less likely to be carried out during the provision of regular methodology services to those programs. The MRDP also includes activities that provide support in the application of past successful developments in order to promote the use of the results of research and development work. Selected prospective research activities are also presented.
    Release date: 2023-10-11

  • Articles and reports: 11-522-X202100100001
    Description:

    We consider regression analysis in the context of data integration. To combine partial information from external sources, we employ the idea of model calibration which introduces a “working” reduced model based on the observed covariates. The working reduced model is not necessarily correctly specified but can be a useful device to incorporate the partial information from the external data. The actual implementation is based on a novel application of the empirical likelihood method. The proposed method is particularly attractive for combining information from several sources with different missing patterns. The proposed method is applied to a real data example combining survey data from Korean National Health and Nutrition Examination Survey and big data from National Health Insurance Sharing Service in Korea.

    Key Words: Big data; Empirical likelihood; Measurement error models; Missing covariates.

    Release date: 2021-10-15

  • Table: 98-508-X
    Description:

    The Census Profile Standard Error Supplement provides the standard error for each long-form estimate along with the standard Census Profile data for a selected ADA, its corresponding census division (CD) and province/territory, as well as for Canada. It can be downloaded for selected areas or the entire profile in a variety of commonly used formats (e.g., CSV, TAB or IVT). This product will be updated with additional content released on November 29, 2017.

    Release date: 2018-01-19

  • Articles and reports: 12-001-X201600114538
    Description:

    The aim of automatic editing is to use a computer to detect and amend erroneous values in a data set, without human intervention. Most automatic editing methods that are currently used in official statistics are based on the seminal work of Fellegi and Holt (1976). Applications of this methodology in practice have shown systematic differences between data that are edited manually and automatically, because human editors may perform complex edit operations. In this paper, a generalization of the Fellegi-Holt paradigm is proposed that can incorporate a large class of edit operations in a natural way. In addition, an algorithm is outlined that solves the resulting generalized error localization problem. It is hoped that this generalization may be used to increase the suitability of automatic editing in practice, and hence to improve the efficiency of data editing processes. Some first results on synthetic data are promising in this respect.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201600114539
    Description:

    Statistical matching is a technique for integrating two or more data sets when information available for matching records for individual participants across data sets is incomplete. Statistical matching can be viewed as a missing data problem where a researcher wants to perform a joint analysis of variables that are never jointly observed. A conditional independence assumption is often used to create imputed data for statistical matching. We consider a general approach to statistical matching using parametric fractional imputation of Kim (2011) to create imputed data under the assumption that the specified model is fully identified. The proposed method does not have a convergent EM sequence if the model is not identified. We also present variance estimators appropriate for the imputation procedure. We explain how the method applies directly to the analysis of data from split questionnaire designs and measurement error models.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201500214230
    Description:

    This paper develops allocation methods for stratified sample surveys where composite small area estimators are a priority, and areas are used as strata. Longford (2006) proposed an objective criterion for this situation, based on a weighted combination of the mean squared errors of small area means and a grand mean. Here, we redefine this approach within a model-assisted framework, allowing regressor variables and a more natural interpretation of results using an intra-class correlation parameter. We also consider several uses of power allocation, and allow the placing of other constraints such as maximum relative root mean squared errors for stratum estimators. We find that a simple power allocation can perform very nearly as well as the optimal design even when the objective is to minimize Longford’s (2006) criterion.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500114151
    Description:

    One of the main variables in the Dutch Labour Force Survey is the variable measuring whether a respondent has a permanent or a temporary job. The aim of our study is to determine the measurement error in this variable by matching the information obtained by the longitudinal part of this survey with unique register data from the Dutch Institute for Employee Insurance. Contrary to previous approaches confronting such datasets, we take into account that also register data are not error-free and that measurement error in these data is likely to be correlated over time. More specifically, we propose the estimation of the measurement error in these two sources using an extended hidden Markov model with two observed indicators for the type of contract. Our results indicate that none of the two sources should be considered as error-free. For both indicators, we find that workers in temporary contracts are often misclassified as having a permanent contract. Particularly for the register data, we find that measurement errors are strongly autocorrelated, as, if made, they tend to repeat themselves. In contrast, when the register is correct, the probability of an error at the next time period is almost zero. Finally, we find that temporary contracts are more widespread than the Labour Force Survey suggests, while transition rates between temporary to permanent contracts are much less common than both datasets suggest.

    Release date: 2015-06-29

  • Notices and consultations: 12-002-X
    Description:

    The Research Data Centres (RDCs) Information and Technical Bulletin (ITB) is a forum by which Statistics Canada analysts and the research community can inform each other on survey data uses and methodological techniques. Articles in the ITB focus on data analysis and modelling, data management, and best or ineffective statistical, computational, and scientific practices. Further, ITB topics will include essays on data content, implications of questionnaire wording, comparisons of datasets, reviews on methodologies and their application, data peculiarities, problematic data and solutions, and explanations of innovative tools using RDC surveys and relevant software. All of these essays may provide advice and detailed examples outlining commands, habits, tricks and strategies used to make problem-solving easier for the RDC user.

    The main aims of the ITB are:

    - the advancement and dissemination of knowledge surrounding Statistics Canada's data; - the exchange of ideas among the RDC-user community;- the support of new users; - the co-operation with subject matter experts and divisions within Statistics Canada.

    The ITB is interested in quality articles that are worth publicizing throughout the research community, and that will add value to the quality of research produced at Statistics Canada's RDCs.

    Release date: 2015-03-25

  • 9. Survey Quality Archived
    Articles and reports: 12-001-X201200211751
    Description:

    Survey quality is a multi-faceted concept that originates from two different development paths. One path is the total survey error paradigm that rests on four pillars providing principles that guide survey design, survey implementation, survey evaluation, and survey data analysis. We should design surveys so that the mean squared error of an estimate is minimized given budget and other constraints. It is important to take all known error sources into account, to monitor major error sources during implementation, to periodically evaluate major error sources and combinations of these sources after the survey is completed, and to study the effects of errors on the survey analysis. In this context survey quality can be measured by the mean squared error and controlled by observations made during implementation and improved by evaluation studies. The paradigm has both strengths and weaknesses. One strength is that research can be defined by error sources and one weakness is that most total survey error assessments are incomplete in the sense that it is not possible to include the effects of all the error sources. The second path is influenced by ideas from the quality management sciences. These sciences concern business excellence in providing products and services with a focus on customers and competition from other providers. These ideas have had a great influence on many statistical organizations. One effect is the acceptance among data providers that product quality cannot be achieved without a sufficient underlying process quality and process quality cannot be achieved without a good organizational quality. These levels can be controlled and evaluated by service level agreements, customer surveys, paradata analysis using statistical process control, and organizational assessment using business excellence models or other sets of criteria. All levels can be improved by conducting improvement projects chosen by means of priority functions. The ultimate goal of improvement projects is that the processes involved should gradually approach a state where they are error-free. Of course, this might be an unattainable goal, albeit one to strive for. It is not realistic to hope for continuous measurements of the total survey error using the mean squared error. Instead one can hope that continuous quality improvement using management science ideas and statistical methods can minimize biases and other survey process problems so that the variance becomes an approximation of the mean squared error. If that can be achieved we have made the two development paths approximately coincide.

    Release date: 2012-12-19

  • Articles and reports: 12-001-X201100211604
    Description:

    We propose a method of mean squared error (MSE) estimation for estimators of finite population domain means that can be expressed in pseudo-linear form, i.e., as weighted sums of sample values. In particular, it can be used for estimating the MSE of the empirical best linear unbiased predictor, the model-based direct estimator and the M-quantile predictor. The proposed method represents an extension of the ideas in Royall and Cumberland (1978) and leads to MSE estimators that are simpler to implement, and potentially more bias-robust, than those suggested in the small area literature. However, it should be noted that the MSE estimators defined using this method can also exhibit large variability when the area-specific sample sizes are very small. We illustrate the performance of the method through extensive model-based and design-based simulation, with the latter based on two realistic survey data sets containing small area information.

    Release date: 2011-12-21
Data (6)

Data (6) ((6 results))

  • Table: 98-508-X
    Description:

    The Census Profile Standard Error Supplement provides the standard error for each long-form estimate along with the standard Census Profile data for a selected ADA, its corresponding census division (CD) and province/territory, as well as for Canada. It can be downloaded for selected areas or the entire profile in a variety of commonly used formats (e.g., CSV, TAB or IVT). This product will be updated with additional content released on November 29, 2017.

    Release date: 2018-01-19

  • Public use microdata: 82M0011X
    Description:

    The main objective of the 2002 Youth Smoking Survey (YSS) is to provide current information on the smoking behaviour of students in grades 5 to 9 (in Quebec primary school grades 5 and 6 and secondary school grades 1 to 3), and to measure changes that occurred since the last time the survey was conducted in 1994. Additionally, the 2002 survey collected basic data on alcohol and drug use by students in grades 7 to 9 (in Quebec secondary 1 to 3). Results of the Youth Smoking Survey will help with the evaluation of anti-smoking and anti-drug use programs, as well as with the development of new programs.

    Release date: 2004-07-14

  • Table: 50-002-X20010015780
    Description:

    Section 1 described results for small for-hire carriers whose operating revenues were between $30,000 and $1,000,000. Section 2 contains data for all owner operators included in the Annual Motor Carriers of Freight Survey of Small For-hire Carriers and Owner Operators including some firms whose operating revenues exceeded $1,000,000. Section 3 provides a general discussion of the Annual Motor Carriers of Freight Survey of Small For-hire Carriers and Owner Operators methodology and data quality.

    Release date: 2001-06-29

  • Public use microdata: 82M0010X
    Description:

    The National Population Health Survey (NPHS) program is designed to collect information related to the health of the Canadian population. The first cycle of data collection began in 1994. The institutional component includes long-term residents (expected to stay longer than six months) in health care facilities with four or more beds in Canada with the principal exclusion of the Yukon and the Northwest Teritories. The document has been produced to facilitate the manipulation of the 1996-1997 microdata file containing survey results. The main variables include: demography, health status, chronic conditions, restriction of activity, socio-demographic, and others.

    Release date: 2000-08-02

  • Public use microdata: 89M0018X
    Description:

    This is a CD-ROM product from the Ontario Adult Literacy Survey (OALS), conducted in the spring of 1998 with the goal of providing information on: the ability of Ontario immigrants to use either English or French in their daily activities; and on their self-perceived literacy skills, training needs and barriers to training.

    In order to cover the majority of Ontario immigrants, the Census Metropolitan Areas (CMAs) of Toronto, Hamilton, Ottawa, Kitchener, London and St. Catharines were included in the sample. With these 6 CMAs, about 83% of Ontario immigrants were included in the sample frame. This sample of 7,107 dwellings covered the population of Ontario immigrants in general as well as specifically targetting immigrants with a mother tongue of Italian, Chinese, Portuguese, Polish, and Spanish and immigrants born in the Caribbean Islands with a mother tongue of English.

    Each interview was approximately 1.5 hours in duration and consisted of a half-hour questionnaire, asking demographic and literacy-related questions as well as a one-hour literacy test. This literacy test was derived from that used in the 1994 International Adult Literacy Survey (IALS) and covered the domains of document and quantitative literacy. An overall response rate to the survey of 76% was achieved, resulting in 4,648 respondents.

    Release date: 1999-10-29

  • Public use microdata: 82F0001X
    Description:

    The National Population Health Survey (NPHS) uses the Labour Force Survey sampling frame to draw a sample of approximately 22,000 households. The sample is distributed over four quarterly collection periods. In each household, some limited information is collected from all household members and one person, aged 12 years and over, in each household is randomly selected for a more in-depth interview.

    The questionnaire includes content related to health status, use of health services, determinants of health and a range of demographic and economic information. For example, the health status information includes self-perception of health, a health status index, chronic conditions, and activity restrictions. The use of health services is probed through visits to health care providers, both traditional and non-traditional, and the use of drugs and other medications. Health determinants include smoking, alcohol use, physical activity and in the first survey, emphasis has been placed on the collection of selected psycho-social factors that may influence health, such as stress, self-esteem and social support. The demographic and economic information includes age, sex, education, ethnicity, household income and labour force status.

    Release date: 1995-11-21
Analysis (65)

Analysis (65) (30 to 40 of 65 results)

  • Articles and reports: 11-522-X20050019463
    Description:

    Statisticians are developing additional concepts for communicating errors associated with estimates. Many of these concepts are readily understood by statisticians but are even more difficult to explain to users than the traditional confidence interval. The proposed solution, when communicating with non-statisticians, is to improve the estimates so that the requirement for explaining the error is minimised. The user is then not confused by having too many numbers to understand.

    Release date: 2007-03-02

  • Articles and reports: 11-522-X20050019477
    Description:

    Using probabilistic data linkage, an integrated database of injuries is obtained by linking on some subset of various key variables or their derivatives: names (given names, surnames and alternative names), age, sex, birthdate, phone numbers, injury date, unique identification numbers, diagnosis. To assess the quality of the links produced, false positive rates and false negative rates are computed. These rates however do not give an indication of whether the databases used for linking have undercounted injuries (bias). It is of interest to an injury researcher moreover, to have some idea of the error margin for the figures generated from integrating various injury databases, similar to what one would get in a survey for instance.

    Release date: 2007-03-02

  • Articles and reports: 12-001-X20060019257
    Description:

    In the presence of item nonreponse, two approaches have been traditionally used to make inference on parameters of interest. The first approach assumes uniform response within imputation cells whereas the second approach assumes ignorable response but make use of a model on the variable of interest as the basis for inference. In this paper, we propose a third appoach that assumes a specified ignorable response mechanism without having to specify a model on the variable of interest. In this case, we show how to obtain imputed values which lead to estimators of a total that are approximately unbiased under the proposed approach as well as the second approach. Variance estimators of the imputed estimators that are approximately unbiased are also obtained using an approach of Fay (1991) in which the order of sampling and response is reversed. Finally, simulation studies are conducted to investigate the finite sample performance of the methods in terms of bias and mean square error.

    Release date: 2006-07-20

  • Articles and reports: 12-001-X20060019258
    Description:

    This paper primarily aims at proposing a cost-effective strategy to estimate the intercensal unemployment rate at the provincial level in Iran. Taking advantage of the small area estimation (SAE) methods, this strategy is based on a single sampling at the national level. Three methods of synthetic, composite, and empirical Bayes estimators are used to find the indirect estimates of interest for the year 1996. Findings not only confirm the adequacy of the suggested strategy, but they also indicate that the composite and empirical Bayes estimators perform well and similarly.

    Release date: 2006-07-20

  • Articles and reports: 12-001-X20060019260
    Description:

    This paper considers the use of imputation and weighting to correct for measurement error in the estimation of a distribution function. The paper is motivated by the problem of estimating the distribution of hourly pay in the United Kingdom, using data from the Labour Force Survey. Errors in measurement lead to bias and the aim is to use auxiliary data, measured accurately for a subsample, to correct for this bias. Alternative point estimators are considered, based upon a variety of imputation and weighting approaches, including fractional imputation, nearest neighbour imputation, predictive mean matching and propensity score weighting. Properties of these point estimators are then compared both theoretically and by simulation. A fractional predictive mean matching imputation approach is advocated. It performs similarly to propensity score weighting, but displays slight advantages of robustness and efficiency.

    Release date: 2006-07-20

  • Articles and reports: 12-001-X20060019263
    Description:

    In small area estimation, area level models such as the Fay - Herriot model (Fay and Herriot 1979) are widely used to obtain efficient model-based estimators for small areas. The sampling error variances are customarily assumed to be known in the model. In this paper we consider the situation where the sampling error variances are estimated individually by direct estimators. A full hierarchical Bayes (HB) model is constructed for the direct survey estimators and the sampling error variances estimators. The Gibbs sampling method is employed to obtain the small area HB estimators. The proposed HB approach automatically takes account of the extra uncertainty of estimating the sampling error variances, especially when the area-specific sample sizes are small. We compare the proposed HB model with the Fay - Herriot model through analysis of two survey data sets. Our results have shown that the proposed HB estimators perform quite well compared to the direct estimates. We also discussed the problem of priors on the variance components.

    Release date: 2006-07-20

  • Articles and reports: 75F0002M2006005
    Description:

    The Survey of Labour and Income Dynamics (SLID) is a longitudinal survey initiated in 1993. The survey was designed to measure changes in the economic well-being of Canadians as well as the factors affecting these changes.

    Sample surveys are subject to errors. As with all surveys conducted at Statistics Canada, considerable time and effort is taken to control such errors at every stage of the Survey of Labour and Income Dynamics. Nonetheless errors do occur. It is the policy at Statistics Canada to furnish users with measures of data quality so that the user is able to interpret the data properly. This report summarizes a set of quality measures that has been produced in an attempt to describe the overall quality of SLID data. Among the measures included in the report are sample composition and attrition rates, sampling errors, coverage errors in the form of slippage rates, response rates, tax permission and tax linkage rates, and imputation rates.

    Release date: 2006-04-06

  • Articles and reports: 75F0002M2005011
    Description:

    The Survey of Labour and Income Dynamics (SLID) is a longitudinal survey initiated in 1993. The survey was designed to measure changes in the economic well-being of Canadians as well as the factors affecting these changes.

    Sample surveys are subject to errors. As with all surveys conducted at Statistics Canada, considerable time and effort is taken to control such errors at every stage of the Survey of Labour and Income Dynamics. Nonetheless errors do occur. It is the policy at Statistics Canada to furnish users with measures of data quality so that the user is able to interpret the data properly. This report summarizes a set of quality measures that has been produced in an attempt to describe the overall quality of SLID data. Among the measures included in the report are sample composition and attrition rates, sampling errors, coverage errors in the form of slippage rates, response rates, tax permission and tax linkage rates, and imputation rates.

    Release date: 2005-09-15

  • Articles and reports: 75F0002M2005012
    Description:

    The Survey of Labour and Income Dynamics (SLID) is a longitudinal survey initiated in 1993. The survey was designed to measure changes in the economic well-being of Canadians as well as the factors affecting these changes.

    Sample surveys are subject to errors. As with all surveys conducted at Statistics Canada, considerable time and effort is taken to control such errors at every stage of the Survey of Labour and Income Dynamics. Nonetheless errors do occur. It is the policy at Statistics Canada to furnish users with measures of data quality so that the user is able to interpret the data properly. This report summarizes a set of quality measures that has been produced in an attempt to describe the overall quality of SLID data. Among the measures included in the report are sample composition and attrition rates, sampling errors, coverage errors in the form of slippage rates, response rates, tax permission and tax linkage rates, and imputation rates.

    Release date: 2005-09-15

  • Articles and reports: 12-001-X20050018083
    Description:

    The advent of computerized record linkage methodology has facilitated the conduct of cohort mortality studies in which exposure data in one database are electronically linked with mortality data from another database. This, however, introduces linkage errors due to mismatching an individual from one database with a different individual from the other database. In this article, the impact of linkage errors on estimates of epidemiological indicators of risk such as standardized mortality ratios and relative risk regression model parameters is explored. It is shown that the observed and expected number of deaths are affected in opposite direction and, as a result, these indicators can be subject to bias and additional variability in the presence of linkage errors.

    Release date: 2005-07-21
Reference (10)

Reference (10) ((10 results))

  • Notices and consultations: 12-002-X
    Description:

    The Research Data Centres (RDCs) Information and Technical Bulletin (ITB) is a forum by which Statistics Canada analysts and the research community can inform each other on survey data uses and methodological techniques. Articles in the ITB focus on data analysis and modelling, data management, and best or ineffective statistical, computational, and scientific practices. Further, ITB topics will include essays on data content, implications of questionnaire wording, comparisons of datasets, reviews on methodologies and their application, data peculiarities, problematic data and solutions, and explanations of innovative tools using RDC surveys and relevant software. All of these essays may provide advice and detailed examples outlining commands, habits, tricks and strategies used to make problem-solving easier for the RDC user.

    The main aims of the ITB are:

    - the advancement and dissemination of knowledge surrounding Statistics Canada's data; - the exchange of ideas among the RDC-user community;- the support of new users; - the co-operation with subject matter experts and divisions within Statistics Canada.

    The ITB is interested in quality articles that are worth publicizing throughout the research community, and that will add value to the quality of research produced at Statistics Canada's RDCs.

    Release date: 2015-03-25

  • Surveys and statistical programs – Documentation: 92-567-X
    Description:

    The Coverage Technical Report will present the error included in census data that results from persons missed by the 2006 Census or persons enumerated in error. Population coverage errors are one of the most important types of error because they affect not only the accuracy of population counts but also the accuracy of all of the census data describing characteristics of the population universe.

    Release date: 2010-03-25

  • Surveys and statistical programs – Documentation: 13F0026M2007001
    Description:

    This guide will be of assistance when using the public use microdata file (PUMF) of the Survey of Financial Security (SFS) conducted by the Pensions and Wealth Surveys Section of the Income Statistics Division.

    Release date: 2007-09-04

  • Surveys and statistical programs – Documentation: 62F0026M2005006
    Description:

    This report describes the quality indicators produced for the 2003 Survey of Household Spending. These quality indicators, such as coefficients of variation, nonresponse rates, slippage rates and imputation rates, help users interpret the survey data.

    Release date: 2005-10-06

  • Surveys and statistical programs – Documentation: 62F0026M2004001
    Description:

    This report describes the quality indicators produced for the 2002 Survey of Household Spending. These quality indicators, such as coefficients of variation, nonresponse rates, slippage rates and imputation rates, help users interpret the survey data.

    Release date: 2004-09-15

  • Surveys and statistical programs – Documentation: 92-391-X
    Description:

    This report contains basic conceptual and data quality information intended to facilitate the use and interpretation of census industry data. It provides an overview of the industry processing cycle, including elements such as regional processing, edit and imputation, and the tabulation of error rates. A detailed explanation of the automated coding systems used in the 2001 Census is also documented, in addition to notable changes in the imputation procedures. The report concludes with summary tables that indicate the level of data quality in the 2001 Census industry data. Appendices to the report contain historical data going back to the 1971 Census.

    Release date: 2004-06-02

  • Surveys and statistical programs – Documentation: 62F0026M2003001
    Description:

    This report describes the quality indicators produced for the 2001 Survey of Household Spending. It covers the usual quality indicators that help users interpret the data, such as coefficients of variation, nonresponse rates, slippage rates and imputation rates.

    Release date: 2003-11-26

  • Surveys and statistical programs – Documentation: 82-003-X20010036099
    Description:

    Cycle 1.1 of the Canadian Community Health Survey (CCHS) will provide information for 136 health regions. A brief overview of the CCHS design, sampling strategy, interviewing procedures, data collection and processing is presented.

    Release date: 2002-03-13

  • Surveys and statistical programs – Documentation: 62F0026M2001004
    Geography: Province or territory
    Description:

    This guide presents information of interest to users of data from the Survey of Household Spending. Data are collected via personal interview conducted in January, February and March after the reference year using a paper questionnaire. Information is gathered about the spending habits, dwelling characteristics and household equipment of Canadian households during the reference year. The survey covers private households in the ten provinces. (The three territories are surveyed every second year starting in 2001.)

    This guide includes definitions of survey terms and variables, as well as descriptions of survey methodology and data quality. There is also a section describing the various statistics that can be created using expenditure data (e.g., budget share, market share, and aggregates).

    Release date: 2001-12-12

  • Surveys and statistical programs – Documentation: 11-522-X19980015036
    Description:

    Multivariate logistic regression, introduced by Glonek and McCullagh (1995) as a generalisation of logistic regression, is useful in the analysis of longitudinal data as it allows for dependent repeated observations of a categorical variable and for incomplete response profiles. We show how the method can be extended to deal with data from complex surveys and we illustrate it on data from the Swiss Labour Force Survey. The effect of the sampling weights on the parameter estimates and their standard errors is considered.

    Release date: 1999-10-22
Date modified: