Keyword search

Filter results by

Search Help
Currently selected filters that can be removed

Keyword(s)

Geography

2 facets displayed. 0 facets selected.

Content

1 facets displayed. 0 facets selected.
Sort Help
entries

Results

All (81)

All (81) (0 to 10 of 81 results)

  • Journals and periodicals: 12-206-X
    Description: This report summarizes the annual achievements of the Methodology Research and Development Program (MRDP) sponsored by the Modern Statistical Methods and Data Science Branch at Statistics Canada. This program covers research and development activities in statistical methods with potentially broad application in the agency’s statistical programs; these activities would otherwise be less likely to be carried out during the provision of regular methodology services to those programs. The MRDP also includes activities that provide support in the application of past successful developments in order to promote the use of the results of research and development work. Selected prospective research activities are also presented.
    Release date: 2023-10-11

  • Articles and reports: 11-522-X202100100001
    Description:

    We consider regression analysis in the context of data integration. To combine partial information from external sources, we employ the idea of model calibration which introduces a “working” reduced model based on the observed covariates. The working reduced model is not necessarily correctly specified but can be a useful device to incorporate the partial information from the external data. The actual implementation is based on a novel application of the empirical likelihood method. The proposed method is particularly attractive for combining information from several sources with different missing patterns. The proposed method is applied to a real data example combining survey data from Korean National Health and Nutrition Examination Survey and big data from National Health Insurance Sharing Service in Korea.

    Key Words: Big data; Empirical likelihood; Measurement error models; Missing covariates.

    Release date: 2021-10-15

  • Table: 98-508-X
    Description:

    The Census Profile Standard Error Supplement provides the standard error for each long-form estimate along with the standard Census Profile data for a selected ADA, its corresponding census division (CD) and province/territory, as well as for Canada. It can be downloaded for selected areas or the entire profile in a variety of commonly used formats (e.g., CSV, TAB or IVT). This product will be updated with additional content released on November 29, 2017.

    Release date: 2018-01-19

  • Articles and reports: 12-001-X201600114538
    Description:

    The aim of automatic editing is to use a computer to detect and amend erroneous values in a data set, without human intervention. Most automatic editing methods that are currently used in official statistics are based on the seminal work of Fellegi and Holt (1976). Applications of this methodology in practice have shown systematic differences between data that are edited manually and automatically, because human editors may perform complex edit operations. In this paper, a generalization of the Fellegi-Holt paradigm is proposed that can incorporate a large class of edit operations in a natural way. In addition, an algorithm is outlined that solves the resulting generalized error localization problem. It is hoped that this generalization may be used to increase the suitability of automatic editing in practice, and hence to improve the efficiency of data editing processes. Some first results on synthetic data are promising in this respect.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201600114539
    Description:

    Statistical matching is a technique for integrating two or more data sets when information available for matching records for individual participants across data sets is incomplete. Statistical matching can be viewed as a missing data problem where a researcher wants to perform a joint analysis of variables that are never jointly observed. A conditional independence assumption is often used to create imputed data for statistical matching. We consider a general approach to statistical matching using parametric fractional imputation of Kim (2011) to create imputed data under the assumption that the specified model is fully identified. The proposed method does not have a convergent EM sequence if the model is not identified. We also present variance estimators appropriate for the imputation procedure. We explain how the method applies directly to the analysis of data from split questionnaire designs and measurement error models.

    Release date: 2016-06-22

  • Articles and reports: 12-001-X201500214230
    Description:

    This paper develops allocation methods for stratified sample surveys where composite small area estimators are a priority, and areas are used as strata. Longford (2006) proposed an objective criterion for this situation, based on a weighted combination of the mean squared errors of small area means and a grand mean. Here, we redefine this approach within a model-assisted framework, allowing regressor variables and a more natural interpretation of results using an intra-class correlation parameter. We also consider several uses of power allocation, and allow the placing of other constraints such as maximum relative root mean squared errors for stratum estimators. We find that a simple power allocation can perform very nearly as well as the optimal design even when the objective is to minimize Longford’s (2006) criterion.

    Release date: 2015-12-17

  • Articles and reports: 12-001-X201500114151
    Description:

    One of the main variables in the Dutch Labour Force Survey is the variable measuring whether a respondent has a permanent or a temporary job. The aim of our study is to determine the measurement error in this variable by matching the information obtained by the longitudinal part of this survey with unique register data from the Dutch Institute for Employee Insurance. Contrary to previous approaches confronting such datasets, we take into account that also register data are not error-free and that measurement error in these data is likely to be correlated over time. More specifically, we propose the estimation of the measurement error in these two sources using an extended hidden Markov model with two observed indicators for the type of contract. Our results indicate that none of the two sources should be considered as error-free. For both indicators, we find that workers in temporary contracts are often misclassified as having a permanent contract. Particularly for the register data, we find that measurement errors are strongly autocorrelated, as, if made, they tend to repeat themselves. In contrast, when the register is correct, the probability of an error at the next time period is almost zero. Finally, we find that temporary contracts are more widespread than the Labour Force Survey suggests, while transition rates between temporary to permanent contracts are much less common than both datasets suggest.

    Release date: 2015-06-29

  • Notices and consultations: 12-002-X
    Description:

    The Research Data Centres (RDCs) Information and Technical Bulletin (ITB) is a forum by which Statistics Canada analysts and the research community can inform each other on survey data uses and methodological techniques. Articles in the ITB focus on data analysis and modelling, data management, and best or ineffective statistical, computational, and scientific practices. Further, ITB topics will include essays on data content, implications of questionnaire wording, comparisons of datasets, reviews on methodologies and their application, data peculiarities, problematic data and solutions, and explanations of innovative tools using RDC surveys and relevant software. All of these essays may provide advice and detailed examples outlining commands, habits, tricks and strategies used to make problem-solving easier for the RDC user.

    The main aims of the ITB are:

    - the advancement and dissemination of knowledge surrounding Statistics Canada's data; - the exchange of ideas among the RDC-user community;- the support of new users; - the co-operation with subject matter experts and divisions within Statistics Canada.

    The ITB is interested in quality articles that are worth publicizing throughout the research community, and that will add value to the quality of research produced at Statistics Canada's RDCs.

    Release date: 2015-03-25

  • 9. Survey Quality Archived
    Articles and reports: 12-001-X201200211751
    Description:

    Survey quality is a multi-faceted concept that originates from two different development paths. One path is the total survey error paradigm that rests on four pillars providing principles that guide survey design, survey implementation, survey evaluation, and survey data analysis. We should design surveys so that the mean squared error of an estimate is minimized given budget and other constraints. It is important to take all known error sources into account, to monitor major error sources during implementation, to periodically evaluate major error sources and combinations of these sources after the survey is completed, and to study the effects of errors on the survey analysis. In this context survey quality can be measured by the mean squared error and controlled by observations made during implementation and improved by evaluation studies. The paradigm has both strengths and weaknesses. One strength is that research can be defined by error sources and one weakness is that most total survey error assessments are incomplete in the sense that it is not possible to include the effects of all the error sources. The second path is influenced by ideas from the quality management sciences. These sciences concern business excellence in providing products and services with a focus on customers and competition from other providers. These ideas have had a great influence on many statistical organizations. One effect is the acceptance among data providers that product quality cannot be achieved without a sufficient underlying process quality and process quality cannot be achieved without a good organizational quality. These levels can be controlled and evaluated by service level agreements, customer surveys, paradata analysis using statistical process control, and organizational assessment using business excellence models or other sets of criteria. All levels can be improved by conducting improvement projects chosen by means of priority functions. The ultimate goal of improvement projects is that the processes involved should gradually approach a state where they are error-free. Of course, this might be an unattainable goal, albeit one to strive for. It is not realistic to hope for continuous measurements of the total survey error using the mean squared error. Instead one can hope that continuous quality improvement using management science ideas and statistical methods can minimize biases and other survey process problems so that the variance becomes an approximation of the mean squared error. If that can be achieved we have made the two development paths approximately coincide.

    Release date: 2012-12-19

  • Articles and reports: 12-001-X201100211604
    Description:

    We propose a method of mean squared error (MSE) estimation for estimators of finite population domain means that can be expressed in pseudo-linear form, i.e., as weighted sums of sample values. In particular, it can be used for estimating the MSE of the empirical best linear unbiased predictor, the model-based direct estimator and the M-quantile predictor. The proposed method represents an extension of the ideas in Royall and Cumberland (1978) and leads to MSE estimators that are simpler to implement, and potentially more bias-robust, than those suggested in the small area literature. However, it should be noted that the MSE estimators defined using this method can also exhibit large variability when the area-specific sample sizes are very small. We illustrate the performance of the method through extensive model-based and design-based simulation, with the latter based on two realistic survey data sets containing small area information.

    Release date: 2011-12-21
Data (6)

Data (6) ((6 results))

  • Table: 98-508-X
    Description:

    The Census Profile Standard Error Supplement provides the standard error for each long-form estimate along with the standard Census Profile data for a selected ADA, its corresponding census division (CD) and province/territory, as well as for Canada. It can be downloaded for selected areas or the entire profile in a variety of commonly used formats (e.g., CSV, TAB or IVT). This product will be updated with additional content released on November 29, 2017.

    Release date: 2018-01-19

  • Public use microdata: 82M0011X
    Description:

    The main objective of the 2002 Youth Smoking Survey (YSS) is to provide current information on the smoking behaviour of students in grades 5 to 9 (in Quebec primary school grades 5 and 6 and secondary school grades 1 to 3), and to measure changes that occurred since the last time the survey was conducted in 1994. Additionally, the 2002 survey collected basic data on alcohol and drug use by students in grades 7 to 9 (in Quebec secondary 1 to 3). Results of the Youth Smoking Survey will help with the evaluation of anti-smoking and anti-drug use programs, as well as with the development of new programs.

    Release date: 2004-07-14

  • Table: 50-002-X20010015780
    Description:

    Section 1 described results for small for-hire carriers whose operating revenues were between $30,000 and $1,000,000. Section 2 contains data for all owner operators included in the Annual Motor Carriers of Freight Survey of Small For-hire Carriers and Owner Operators including some firms whose operating revenues exceeded $1,000,000. Section 3 provides a general discussion of the Annual Motor Carriers of Freight Survey of Small For-hire Carriers and Owner Operators methodology and data quality.

    Release date: 2001-06-29

  • Public use microdata: 82M0010X
    Description:

    The National Population Health Survey (NPHS) program is designed to collect information related to the health of the Canadian population. The first cycle of data collection began in 1994. The institutional component includes long-term residents (expected to stay longer than six months) in health care facilities with four or more beds in Canada with the principal exclusion of the Yukon and the Northwest Teritories. The document has been produced to facilitate the manipulation of the 1996-1997 microdata file containing survey results. The main variables include: demography, health status, chronic conditions, restriction of activity, socio-demographic, and others.

    Release date: 2000-08-02

  • Public use microdata: 89M0018X
    Description:

    This is a CD-ROM product from the Ontario Adult Literacy Survey (OALS), conducted in the spring of 1998 with the goal of providing information on: the ability of Ontario immigrants to use either English or French in their daily activities; and on their self-perceived literacy skills, training needs and barriers to training.

    In order to cover the majority of Ontario immigrants, the Census Metropolitan Areas (CMAs) of Toronto, Hamilton, Ottawa, Kitchener, London and St. Catharines were included in the sample. With these 6 CMAs, about 83% of Ontario immigrants were included in the sample frame. This sample of 7,107 dwellings covered the population of Ontario immigrants in general as well as specifically targetting immigrants with a mother tongue of Italian, Chinese, Portuguese, Polish, and Spanish and immigrants born in the Caribbean Islands with a mother tongue of English.

    Each interview was approximately 1.5 hours in duration and consisted of a half-hour questionnaire, asking demographic and literacy-related questions as well as a one-hour literacy test. This literacy test was derived from that used in the 1994 International Adult Literacy Survey (IALS) and covered the domains of document and quantitative literacy. An overall response rate to the survey of 76% was achieved, resulting in 4,648 respondents.

    Release date: 1999-10-29

  • Public use microdata: 82F0001X
    Description:

    The National Population Health Survey (NPHS) uses the Labour Force Survey sampling frame to draw a sample of approximately 22,000 households. The sample is distributed over four quarterly collection periods. In each household, some limited information is collected from all household members and one person, aged 12 years and over, in each household is randomly selected for a more in-depth interview.

    The questionnaire includes content related to health status, use of health services, determinants of health and a range of demographic and economic information. For example, the health status information includes self-perception of health, a health status index, chronic conditions, and activity restrictions. The use of health services is probed through visits to health care providers, both traditional and non-traditional, and the use of drugs and other medications. Health determinants include smoking, alcohol use, physical activity and in the first survey, emphasis has been placed on the collection of selected psycho-social factors that may influence health, such as stress, self-esteem and social support. The demographic and economic information includes age, sex, education, ethnicity, household income and labour force status.

    Release date: 1995-11-21
Analysis (65)

Analysis (65) (40 to 50 of 65 results)

  • Articles and reports: 12-001-X20050018084
    Description:

    At national statistical institutes, experiments embedded in ongoing sample surveys are conducted occasionally to investigate possible effects of alternative survey methodologies on estimates of finite population parameters. To test hypotheses about differences between sample estimates due to alternative survey implementations, a design-based theory is developed for the analysis of completely randomized designs or randomized block designs embedded in general complex sampling designs. For both experimental designs, design-based Wald statistics are derived for the Horvitz-Thompson estimator and the generalized regression estimator. The theory is illustrated with a simulation study.

    Release date: 2005-07-21

  • Articles and reports: 12-001-X20050018085
    Description:

    Record linkage is a process of pairing records from two files and trying to select the pairs that belong to the same entity. The basic framework uses a match weight to measure the likelihood of a correct match and a decision rule to assign record pairs as "true" or "false" match pairs. Weight thresholds for selecting a record pair as matched or unmatched depend on the desired control over linkage errors. Current methods to determine the selection thresholds and estimate linkage errors can provide divergent results, depending on the type of linkage error and the approach to linkage. This paper presents a case study that uses existing linkage methods to link record pairs but a new simulation approach (SimRate) to help determine selection thresholds and estimate linkage errors. SimRate uses the observed distribution of data in matched and unmatched pairs to generate a large simulated set of record pairs, assigns a match weight to each pair based on specified match rules, and uses the weight curves of the simulated pairs for error estimation.

    Release date: 2005-07-21

  • Articles and reports: 12-001-X20050018087
    Description:

    In Official Statistics, data editing process plays an important role in terms of timeliness, data accuracy, and survey costs. Techniques introduced to identify and eliminate errors from data are essentially required to consider all of these aspects simultaneously. Among others, a frequent and pervasive systematic error appearing in surveys collecting numerical data, is the unity measure error. It highly affects timeliness, data accuracy and costs of the editing and imputation phase. In this paper we propose a probabilistic formalisation of the problem based on finite mixture models. This setting allows us to deal with the problem in a multivariate context, and provides also a number of useful diagnostics for prioritising cases to be more deeply investigated through a clerical review. Prioritising units is important in order to increase data accuracy while avoiding waste of time due to the follow up of non-really critical units.

    Release date: 2005-07-21

  • Articles and reports: 12-001-X20050018094
    Description:

    Nested error regression models are frequently used in small-area estimation and related problems. Standard regression model selection criterion, when applied to nested error regression models, may result in inefficient model selection methods. We illustrate this point by examining the performance of the C_P statistic through a Monte Carlo simulation study. The inefficiency of the C_P statistic may, however, be rectified by a suitable transformation of the data.

    Release date: 2005-07-21

  • Articles and reports: 75F0002M2005004
    Description:

    The Survey of Labour and Income Dynamics (SLID) is a longitudinal survey initiated in 1993. The survey was designed to measure changes in the economic well-being of Canadians as well as the factors affecting these changes.

    Sample surveys are subject to errors. As with all surveys conducted at Statistics Canada, considerable time and effort is taken to control such errors at every stage of the Survey of Labour and Income Dynamics. Nonetheless errors do occur. It is the policy at Statistics Canada to furnish users with measures of data quality so that the user is able to interpret the data properly. This report summarizes a set of quality measures that has been produced in an attempt to describe the overall quality of SLID data. Among the measures included in the report are sample composition and attrition rates, sampling errors, coverage errors in the form of slippage rates, response rates, tax permission and tax linkage rates, and imputation rates.

    Release date: 2005-05-12

  • Articles and reports: 12-001-X20040027747
    Description:

    The reduced accuracy of the revised classification of unemployed persons in the Current Population Survey (CPS) was documented in Biemer and Bushery (2000). In this paper, we provide additional evidence of this anomaly and attempt to trace the source of the error through extended analysis of the CPS data before and after the redesign. The paper presents an novel approach decomposing the error in a complex classification process, such as the CPS labor force status classification, using Markov Latent Class Analysis (MLCA). To identify the cause of the apparent reduction in unemployed classification accuracy, we identify the key question components that determine the classifications and estimate the contribution of each of these question components to the total error in the classification process. This work provides guidance for further investigation into the root causes of the errors in the collection of labor force data in the CPS possibly through cognitive laboratory and/or field experiments.

    Release date: 2005-02-03

  • Articles and reports: 11-522-X20030017702
    Description:

    This paper proposes a procedure to test hypotheses about differences between sample estimates observed under alternative survey methodologies.

    Release date: 2005-01-26

  • Articles and reports: 11-522-X20030017719
    Description:

    This paper covers model determination, choice of priors on model parameters in hierarchical Bayes (HB) estimation, benchmarking to reliable direct large area estimators, use of survey weights in model-based estimation, and other practical issues related to model-based small area estimation.

    Release date: 2005-01-26

  • Articles and reports: 11-522-X20020016716
    Description:

    Missing data are a constant problem in large-scale surveys. Such incompleteness is usually dealt with either by restricting the analysis to the cases with complete records or by imputing, for each missing item, an efficiently estimated value. The deficiencies of these approaches will be discussed in this paper, especially in the context of estimating a large number of quantities. The main part of the paper will describe two examples of analyses using multiple imputation.

    In the first, the International Labour Organization (ILO) employment status is imputed in the British Labour Force Survey by a Bayesian bootstrap method. It is an adaptation of the hot-deck method, which seeks to fully exploit the auxiliary information. Important auxiliary information is given by the previous ILO status, when available, and the standard demographic variables.

    Missing data can be interpreted more generally, as in the framework of the expectation maximization (EM) algorithm. The second example is from the Scottish House Condition Survey, and its focus is on the inconsistency of the surveyors. The surveyors assess the sampled dwelling units on a large number of elements or features of the dwelling, such as internal walls, roof and plumbing, that are scored and converted to a summarizing 'comprehensive repair cost.' The level of inconsistency is estimated from the discrepancies between the pairs of assessments of doubly surveyed dwellings. The principal research questions concern the amount of information that is lost as a result of the inconsistency and whether the naive estimators that ignore the inconsistency are unbiased. The problem is solved by multiple imputation, generating plausible scores for all the dwellings in the survey.

    Release date: 2004-09-13

  • Articles and reports: 11-522-X20020016737
    Description:

    If the dataset available to machine learning results from cluster sampling (e.g., patients from a sample of hospital wards), the usual cross-validation error rate estimate can lead to biased and misleading results. In this technical paper, an adapted cross-validation is described for this case. Using a simulation, the sampling distribution of the generalization error rate estimate, under cluster or simple random sampling hypothesis, is compared with the true value. The results highlight the impact of the sampling design on inference: clearly, clustering has a significant impact; the repartition between learning set and test set should result from a random partition of the clusters, not from a random partition of the examples. With cluster sampling, standard cross-validation underestimates the generalization error rate, and is deficient for model selection. These results are illustrated with a real application of automatic identification of spoken language.

    Release date: 2004-09-13
Reference (10)

Reference (10) ((10 results))

  • Notices and consultations: 12-002-X
    Description:

    The Research Data Centres (RDCs) Information and Technical Bulletin (ITB) is a forum by which Statistics Canada analysts and the research community can inform each other on survey data uses and methodological techniques. Articles in the ITB focus on data analysis and modelling, data management, and best or ineffective statistical, computational, and scientific practices. Further, ITB topics will include essays on data content, implications of questionnaire wording, comparisons of datasets, reviews on methodologies and their application, data peculiarities, problematic data and solutions, and explanations of innovative tools using RDC surveys and relevant software. All of these essays may provide advice and detailed examples outlining commands, habits, tricks and strategies used to make problem-solving easier for the RDC user.

    The main aims of the ITB are:

    - the advancement and dissemination of knowledge surrounding Statistics Canada's data; - the exchange of ideas among the RDC-user community;- the support of new users; - the co-operation with subject matter experts and divisions within Statistics Canada.

    The ITB is interested in quality articles that are worth publicizing throughout the research community, and that will add value to the quality of research produced at Statistics Canada's RDCs.

    Release date: 2015-03-25

  • Surveys and statistical programs – Documentation: 92-567-X
    Description:

    The Coverage Technical Report will present the error included in census data that results from persons missed by the 2006 Census or persons enumerated in error. Population coverage errors are one of the most important types of error because they affect not only the accuracy of population counts but also the accuracy of all of the census data describing characteristics of the population universe.

    Release date: 2010-03-25

  • Surveys and statistical programs – Documentation: 13F0026M2007001
    Description:

    This guide will be of assistance when using the public use microdata file (PUMF) of the Survey of Financial Security (SFS) conducted by the Pensions and Wealth Surveys Section of the Income Statistics Division.

    Release date: 2007-09-04

  • Surveys and statistical programs – Documentation: 62F0026M2005006
    Description:

    This report describes the quality indicators produced for the 2003 Survey of Household Spending. These quality indicators, such as coefficients of variation, nonresponse rates, slippage rates and imputation rates, help users interpret the survey data.

    Release date: 2005-10-06

  • Surveys and statistical programs – Documentation: 62F0026M2004001
    Description:

    This report describes the quality indicators produced for the 2002 Survey of Household Spending. These quality indicators, such as coefficients of variation, nonresponse rates, slippage rates and imputation rates, help users interpret the survey data.

    Release date: 2004-09-15

  • Surveys and statistical programs – Documentation: 92-391-X
    Description:

    This report contains basic conceptual and data quality information intended to facilitate the use and interpretation of census industry data. It provides an overview of the industry processing cycle, including elements such as regional processing, edit and imputation, and the tabulation of error rates. A detailed explanation of the automated coding systems used in the 2001 Census is also documented, in addition to notable changes in the imputation procedures. The report concludes with summary tables that indicate the level of data quality in the 2001 Census industry data. Appendices to the report contain historical data going back to the 1971 Census.

    Release date: 2004-06-02

  • Surveys and statistical programs – Documentation: 62F0026M2003001
    Description:

    This report describes the quality indicators produced for the 2001 Survey of Household Spending. It covers the usual quality indicators that help users interpret the data, such as coefficients of variation, nonresponse rates, slippage rates and imputation rates.

    Release date: 2003-11-26

  • Surveys and statistical programs – Documentation: 82-003-X20010036099
    Description:

    Cycle 1.1 of the Canadian Community Health Survey (CCHS) will provide information for 136 health regions. A brief overview of the CCHS design, sampling strategy, interviewing procedures, data collection and processing is presented.

    Release date: 2002-03-13

  • Surveys and statistical programs – Documentation: 62F0026M2001004
    Geography: Province or territory
    Description:

    This guide presents information of interest to users of data from the Survey of Household Spending. Data are collected via personal interview conducted in January, February and March after the reference year using a paper questionnaire. Information is gathered about the spending habits, dwelling characteristics and household equipment of Canadian households during the reference year. The survey covers private households in the ten provinces. (The three territories are surveyed every second year starting in 2001.)

    This guide includes definitions of survey terms and variables, as well as descriptions of survey methodology and data quality. There is also a section describing the various statistics that can be created using expenditure data (e.g., budget share, market share, and aggregates).

    Release date: 2001-12-12

  • Surveys and statistical programs – Documentation: 11-522-X19980015036
    Description:

    Multivariate logistic regression, introduced by Glonek and McCullagh (1995) as a generalisation of logistic regression, is useful in the analysis of longitudinal data as it allows for dependent repeated observations of a categorical variable and for incomplete response profiles. We show how the method can be extended to deal with data from complex surveys and we illustrate it on data from the Swiss Labour Force Survey. The effect of the sampling weights on the parameter estimates and their standard errors is considered.

    Release date: 1999-10-22
Date modified: