Analysis

Skip to main content
Skip to footer

Language selection

Français

Search and menus

Search and menus

Search

Skip to filters. View results.

Statistics Canada's Trust Centre

Results

All (74)

All (74) (30 to 40 of 74 results)

31. Longitudinal analysis of LabLongitudinal analysis of Labour Force Survey Data Archived
Articles and reports: 11-522-X20020016739
Description:
The Labour Force Survey (LFS) was not designed to be a longitudinal survey. However, given that respondent households typically remain in the sample for six consecutive months, it is possible to reconstruct six-month fragments of longitudinal data from the monthly records of household members. Such longitudinal data (altogether consisting of millions of person-months of individual- and family-level data) is useful for analyses of monthly labour market dynamics over relatively long periods of time, 20 years and more.
We make use of these data to estimate hazard functions describing transitions among the labour market states: self-employed, paid employee and not employed. Data on job tenure for the employed, and data on the date last worked for the not employed - together with the date of survey responses - permit the estimated models to include terms reflecting seasonality and macro-economic cycles, as well as the duration dependence of each type of transition. In addition, the LFS data permit spouse labour market activity and family composition variables to be included in the hazard models as time-varying covariates. The estimated hazard equations have been included in the LifePaths socio-economic microsimulation model. In this setting, the equations may be used to simulate lifetime employment activity from past, present and future birth cohorts. Cross-sectional simulation results have been used to validate these models by comparisons with census data from the period 1971 to 1996.
Release date: 2004-09-13
32. Bias reduction in standard errors for linear and generalized linear models with multi-stage samples Archived
Articles and reports: 11-522-X20020016741
Description:
Linearization and the jack-knife method are widely used to estimate standard errors for the coefficients of linear regression models fit to multi-stage samples. With few primary sampling units (PSUs) or when a few PSUs have high leverage, linearization estimators can have large negative bias, while the jack-knife method has a correspondingly large positive bias. We characterize the design factors that produce large biases in these standard error estimators. In this technical paper, we propose an alternative estimator, bias reduced linearization (BRL), based on residuals adjusted to better approximate the covariance of the true errors.
When errors are independently and identically distributed (iid), the BRL estimator is unbiased. The BRL method applies to stratified samples with non-constant selection weights and to generalized linear models such as logistic regression. We also discuss BRL standard error estimators for generalized estimating equation models that explicitly model the dependence among observations from the same PSU in data from complex sample designs. Simulation study results show that BRL standard errors are combined with the Satterthwaite approximation to determine the reference distribution yield tests with Type I error rates near nominal values. We contrast our method with alternatives proposed by Kott (1994 and 1996) and Mancl and DeRouen (2001).
Release date: 2004-09-13
33. A causal event history approach to interrelated family events Archived
Articles and reports: 11-522-X20020016742
Geography: Canada
Description:
One of the most important advances brought about by life course and event history studies is the use of parallel or interdependent processes as explaining factors in transition rate models. The purpose of this paper is to demonstrate a causal approach to the study of interrelated family events. Various types of interdependent processes are described first, followed by two event history perspectives: the 'system' and 'causal' approaches. The authors assert that the causal approach is more appropriate from an analytical point of view as it provides a straightforward solution to simultaneity, cause-effect lags, and temporal shapes of effects. Based on comparative cross-national applications in West and East Germany, Canada, Latvia and the Netherlands, the usefulness of the causal approach is demonstrated by analysing two highly interdependent family processes: entry into marriage (for individuals who are in a consensual union) as the dependent process, and first pregnancy/childbirth as the explaining one. Both statistical and theoretical explanations are explored emphasizing the need for conceptual reasoning.
Release date: 2004-09-13
34. Modelling and analysis of duration data from longitudinal surveys Archived
Articles and reports: 11-522-X20020016743
Description:
There is much interest in using data from longitudinal surveys to help understand life history processes such as education, employment, fertility, health and marriage. The analysis of data on the durations of spells or sojourns that individuals spend in certain states (e.g., employment, marriage) is a primary tool in studying such processes. This paper examines methods for analysing duration data that address important features associated with longitudinal surveys: the use of complex survey designs in heterogeneous populations; missing or inaccurate information about the timing of events; and the possibility of non-ignorable dropout or censoring mechanisms. Parametric and non-parametric techniques for estimation and for model checking are considered. Both new and existing methodology are proposed and applied to duration data from Canada's Survey of Labour and Income Dynamics (SLID).
Release date: 2004-09-13
35. Analyzing developmental trajectories: An overview of a group-based approach Archived
Articles and reports: 11-522-X20020016744
Description:
A developmental trajectory describes the course of a behaviour over age or time. This technical paper provides an overview of a semi-parametric, group-based method for analysing developmental trajectories. This methodology provides an alternative to assuming a homogenous population of trajectories as is done in standard growth modelling.
Four capabilities are described: (1) the capability to identify, rather than assume, distinctive groups of trajectories; (2) the capability to estimate the proportion of the population following each such trajectory group; (3) the capability to relate group membership probability to individual characteristics and circumstances; and (4) the capability to use the group membership probabilities for various other purposes, such as creating profiles of group members.
In addition, two important extensions of the method are described: the capability to add time-varying covariates to trajectory models and the capability to estimate joint trajectory models of distinct but related behaviours. The former provides the statistical capacity for testing if a contemporary factor, such as an experimental intervention or a non-experimental event like pregnancy, deflects a pre-existing trajectory. The latter provides the capability to study the unfolding of distinct but related behaviours such as problematic childhood behaviour and adolescent drug abuse.
Release date: 2004-09-13
36. Another look at the regression discontinuity design Archived
Articles and reports: 11-522-X20020016745
Description:
The attractiveness of the Regression Discontinuity Design (RDD) rests on its close similarity to a normal experimental design. On the other hand, it is of limited applicability since it is not often the case that units are assigned to the treatment group on the basis of an observable (to the analyst) pre-program measure. Besides, it only allows identification of the mean impact on a very specific subpopulation. In this technical paper, we show that the RDD straightforwardly generalizes to the instances in which the units' eligibility is established on an observable pre-program measure with eligible units allowed to freely self-select into the program. This set-up also proves to be very convenient for building a specification test on conventional non-experimental estimators of the program mean impact. The data requirements are clearly described.
Release date: 2004-09-13
37. Use of the qualitative business and consumers survey for analysis and economic modelling Archived
Articles and reports: 11-522-X20020016746
Description:
In 1961, the European Commission launched a harmonized qualitative survey program to the consumers and the heads of companies (industry, services, construction, retail trade, investments) that covers more than 40 countries today. These qualitative surveys are aimed at understanding the economic situation of these companies. Results are available a few days after the end of the reference period, well before the results of the quantitative surveys.
Although qualitative, these surveys have quickly become an essential tool of the cyclical diagnosis and of the short-term economic forecast. This product shows how these surveys are used by the European Commission, in particular by the Directorate-General for economic and financial Affairs (DG ECFIN) and the Statistical Office of the European Communities (EUROSTAT), to evaluate the economic situation of the Euro zone.
The first part of this product briefly presents the harmonized European business and consumer survey program. In the second part, we look at how DG ECFIN calculates a coincident indicator of the economic activity, using a dynamic factorial analysis of the questions of the survey in industry. This type of indicator makes it possible, in addition, to study the convergence of the economic cycles of the member states. The quantitative short-term indicators for the Euro zone are often criticized for the delay with which they are published. In the third part, we look at how EUROSTAT plans to publish flash estimates of the industrial product price index (IPPI) resulting from econometric models integrating the business survey series. Lastly, we show how these surveys can be used to forecast the gross domestic product (GDP) and to define proxies for some non-available key indicators (new orders in industry, etc.).
Release date: 2004-09-13
38. Analysis of complex survey data using inverse sampling Archived
Articles and reports: 11-522-X20020016748
Description:
Practitioners often use data collected from complex surveys (such as labour force and health surveys involving stratified cluster sampling) to fit logistic regression and other models of interest. A great deal of effort over the last two decades has been spent on developing methods to analyse survey data that take account of design features. This paper looks at an alternative method known as inverse sampling.
Specialized programs, such as SUDAAN and WESVAR, are also available to implement some of the methods developed to take into account the design features. However, these methods require additional information such as survey weights, design effects or cluster identification of microdata and thus, another method is necessary.
Inverse sampling (Hinkins et al., Survey Methodology, 1977) provides an alternative approach by undoing the complex data structures so that standard methods can be applied. Repeated subsamples with simple random structure are drawn and each subsample is analysed by standard methods and is combined to increase the efficiency. Although computer-intensive, this method has the potential to preserve confidentiality of microdata files. A drawback of the method is that it can lead to biased estimates of regression parameters when the subsample sizes are small (as in the case of stratified cluster sampling).
In this paper, we propose using the estimating equation approach that combines the subsamples before estimation and thus leads to nearly unbiased estimates of regression parameters regardless of subsample sizes. This method is computationally less intensive than the original method. We apply the method to cluster-correlated data generated from a nested error linear regression model to illustrate its advantages. A real dataset from a Statistics Canada survey will also be analysed using the estimating equation method.
Release date: 2004-09-13
39. Some flexible regression techniques for complex surveys Archived
Articles and reports: 11-522-X20020016749
Description:
Survey sampling is a statistical domain that has been slow to take advantage of flexible regression methods. In this technical paper, two approaches are discussed that could be used to make these regression methods accessible: adapt the techniques to the complex survey design that has been used or sample the survey data so that the standard techniques are applicable.
In following the former route, we introduce techniques that account for the complex survey structure of the data for scatterplot smoothing and additive models. The use of penalized least squares in the sampling context is studied as a tool for the analysis of a general trend in a finite population. We focus on smooth regression with a normal error model. Ties in covariates abound for large scale surveys resulting in the application of scatterplot smoothers to means. The estimation of smooths (for example, smoothing splines) depends on the sampling design only via the sampling weights, meaning that standard software can be used for estimation. Inference for these curves is more challenging, as a result of correlations induced by the sampling design. We propose and illustrate tests that account for the sampling design. Illustrative examples are given using the Ontario health survey, including scatterplot smoothing, additive models and model diagnostics. In an attempt to resolve the problem by appropriate sampling of the survey data file, we discuss some of the hurdles that are faced when using this approach.
Release date: 2004-09-13
40. Use of generalized variance function models in inference from social and economic survey data Archived
Articles and reports: 11-522-X20020016750
Description:
Analyses of data from social and economic surveys sometimes use generalized variance function models to approximate the design variance of point estimators of population means and proportions. Analysts may use the resulting standard error estimates to compute associated confidence intervals or test statistics for the means and proportions of interest. In comparison with design-based variance estimators computed directly from survey microdata, generalized variance function models have several potential advantages, as will be discussed in this paper, including operational simplicity; increased stability of standard errors; and, for cases involving public-use datasets, reduction of disclosure limitation problems arising from the public release of stratum and cluster indicators.
These potential advantages, however, may be offset in part by several inferential issues. First, the properties of inferential statistics based on generalized variance functions (e.g., confidence interval coverage rates and widths) depend heavily on the relative empirical magnitudes of the components of variability associated, respectively, with:
(a) the random selection of a subset of items used in estimation of the generalized variance function model(b) the selection of sample units under a complex sample design (c) the lack of fit of the generalized variance function model (d) the generation of a finite population under a superpopulation model.
Second, under conditions, one may link each of components (a) through (d) with different empirical measures of the predictive adequacy of a generalized variance function model. Consequently, these measures of predictive adequacy can offer us some insight into the extent to which a given generalized variance function model may be appropriate for inferential use in specific applications.
Some of the proposed diagnostics are applied to data from the US Survey of Doctoral Recipients and the US Current Employment Survey. For the Survey of Doctoral Recipients, components (a), (c) and (d) are of principal concern. For the Current Employment Survey, components (b), (c) and (d) receive principal attention, and the availability of population microdata allow the development of especially detailed models for components (b) and (c).
Release date: 2004-09-13

Stats in brief (0)

Stats in brief (0) (0 results)

No content available at this time.

Articles and reports (73)

Articles and reports (73) (0 to 10 of 73 results)

1. Comparison of Income Estimates Across Household Survey Programs Archived
Articles and reports: 75F0002M2004012
Description:
This study compares income estimates across several statistical programs at Statistics Canada. It examines how similar the estimates produced by different question sets are.
Income data are collected by many household surveys. Some surveys have income as a major part of their content, and therefore collect income at a detailed level; others collect data from a much smaller set of income questions. No standard sets of income questions have been developed.
Release date: 2004-12-23
2. Describing the Distribution of Income: Guidelines for Effective Analysis Archived
Articles and reports: 75F0002M2004010
Description:
This document offers a set of guidelines for analysing income distributions. It focuses on the basic intuition of the concepts and techniques instead of the equations and technical details.
Release date: 2004-10-08
3. Using bootstrap weights with Wes Var and SUDAAN Archived
Articles and reports: 12-002-X20040027032
Description:
This article examines why many Statistics Canada surveys supply bootstrap weights with their microdata for the purpose of design-based variance estimation. Bootstrap weights are not supported by commercially available software such as SUDAAN and WesVar, but there are ways to use these applications to produce boostrap variance estimates.
The paper concludes with a brief discussion of other design-based approaches to variance estimation as well as software, programs and procedures where these methods have been employed.
Release date: 2004-10-05
4. Stat/Transfer's command files and the efficient transfer of data files Archived
Articles and reports: 12-002-X20040027034
Description:
The use of command files in Stat/Transfer can expedite the transfer of several data sets in an efficient replicable manner. This note outlines a simple step-by-step method for creating command files and provides sample code.
Release date: 2004-10-05
5. Bias reduction in standard errors for linear regression with multi-stage samples Archived
Articles and reports: 11-522-X20020016430
Description:
Linearization (or Taylor series) methods are widely used to estimate standard errors for the co-efficients of linear regression models fit to multi-stage samples. When the number of primary sampling units (PSUs) is large, linearization can produce accurate standard errors under quite general conditions. However, when the number of PSUs is small or a co-efficient depends primarily on data from a small number of PSUs, linearization estimators can have large negative bias.
In this paper, we characterize features of the design matrix that produce large bias in linearization standard errors for linear regression co-efficients. We then propose a new method, bias reduced linearization (BRL), based on residuals adjusted to better approximate the covariance of the true errors. When the errors are independent and identically distributed (i.i.d.), the BRL estimator is unbiased for the variance. Furthermore, a simulation study shows that BRL can greatly reduce the bias, even if the errors are not i.i.d. We also propose using a Satterthwaite approximation to determine the degrees of freedom of the reference distribution for tests and confidence intervals about linear combinations of co-efficients based on the BRL estimator. We demonstrate that the jackknife estimator also tends to be biased in situations where linearization is biased. However, the jackknife's bias tends to be positive. Our bias-reduced linearization estimator can be viewed as a compromise between the traditional linearization and jackknife estimators.
Release date: 2004-09-13
6. Comparison of design-based and model-based methods in analyzing complex health survey data: A case study Archived
Articles and reports: 11-522-X20020016708
Description:
In this paper, we discuss the analysis of complex health survey data by using multivariate modelling techniques. Main interests are in various design-based and model-based methods that aim at accounting for the design complexities, including clustering, stratification and weighting. Methods covered include generalized linear modelling based on pseudo-likelihood and generalized estimating equations, linear mixed models estimated by restricted maximum likelihood, and hierarchical Bayes techniques using Markov Chain Monte Carlo (MCMC) methods. The methods will be compared empirically, using data from an extensive health interview and examination survey conducted in Finland in 2000 (Health 2000 Study).
The data of the Health 2000 Study were collected using personal interviews, questionnaires and clinical examinations. A stratified two-stage cluster sampling design was used in the survey. The sampling design involved positive intra-cluster correlation for many study variables. For a closer investigation, we selected a small number of study variables from the health interview and health examination phases. In many cases, the different methods produced similar numerical results and supported similar statistical conclusions. Methods that failed to account for the design complexities sometimes led to conflicting conclusions. We also discuss the application of the methods in this paper by using standard statistical software products.
Release date: 2004-09-13
7. Interval censoring of smoking cessation in the National Population Health Survey Archived
Articles and reports: 11-522-X20020016712
Description:
In this paper, we consider the effect of the interval censoring of cessation time on intensity parameter estimation with regard to smoking cessation and pregnancy. The three waves of the National Population Health Survey allow the methodology of event history analysis to be applied to smoking initiation, cessation and relapse. One issue of interest is the relationship between smoking cessation and pregnancy. If a longitudinal respondent who is a smoker at the first cycle ceases smoking by the second cycle, we know the cessation time to within an interval of length at most a year, since the respondent is asked for the age at which she stopped smoking, and her date of birth is known. We also know whether she is pregnant at the time of the second cycle, and whether she has given birth since the time of the first cycle. For many such subjects, we know the date of conception to within a relatively small interval. If we knew the time of smoking cessation and pregnancy period exactly for each member who experienced one or other of these events between cycles, we could model their temporal relationship through their joint intensities.
Release date: 2004-09-13
8. Application of the delete-a-group jackknife variance estimator to analyses of data from a complex longitudinal survey Archived
Articles and reports: 11-522-X20020016714
Description:
In this highly technical paper, we illustrate the application of the delete-a-group jack-knife variance estimator approach to a particular complex multi-wave longitudinal study, demonstrating its utility for linear regression and other analytic models. The delete-a-group jack-knife variance estimator is proving a very useful tool for measuring variances under complex sampling designs. This technique divides the first-phase sample into mutually exclusive and nearly equal variance groups, deletes one group at a time to create a set of replicates and makes analogous weighting adjustments in each replicate to those done for the sample as a whole. Variance estimation proceeds in the standard (unstratified) jack-knife fashion.
Our application is to the Chicago Health and Aging Project (CHAP), a community-based longitudinal study examining risk factors for chronic health problems of older adults. A major aim of the study is the investigation of risk factors for incident Alzheimer's disease. The current design of CHAP has two components: (1) Every three years, all surviving members of the cohort are interviewed on a variety of health-related topics. These interviews include cognitive and physical function measures. (2) At each of these waves of data collection, a stratified Poisson sample is drawn from among the respondents to the full population interview for detailed clinical evaluation and neuropsychological testing. To investigate risk factors for incident disease, a 'disease-free' cohort is identified at the preceding time point and forms one major stratum in the sampling frame.
We provide proofs of the theoretical applicability of the delete-a-group jack-knife for particular estimators under this Poisson design, paying needed attention to the distinction between finite-population and infinite-population (model) inference. In addition, we examine the issue of determining the 'right number' of variance groups.
Release date: 2004-09-13
9. Multiple imputation of missing income data at the individual and family levels using sequential regression imputation: Application to the National Health Interview Survey Archived
Articles and reports: 11-522-X20020016715
Description:
This paper will describe the multiple imputation of income in the National Health Interview Survey and discuss the methodological issues involved. In addition, the paper will present empirical summaries of the imputations as well as results of a Monte Carlo evaluation of inferences based on multiply imputed income items.
Analysts of health data are often interested in studying relationships between income and health. The National Health Interview Survey, conducted by the National Center for Health Statistics of the U.S. Centers for Disease Control and Prevention, provides a rich source of data for studying such relationships. However, the nonresponse rates on two key income items, an individual's earned income and a family's total income, are over 20%. Moreover, these nonresponse rates appear to be increasing over time. A project is currently underway to multiply impute individual earnings and family income along with some other covariates for the National Health Interview Survey in 1997 and subsequent years.
There are many challenges in developing appropriate multiple imputations for such large-scale surveys. First, there are many variables of different types, with different skip patterns and logical relationships. Second, it is not known what types of associations will be investigated by the analysts of multiply imputed data. Finally, some variables, such as family income, are collected at the family level and others, such as earned income, are collected at the individual level. To make the imputations for both the family- and individual-level variables conditional on as many predictors as possible, and to simplify modelling, we are using a modified version of the sequential regression imputation method described in Raghunathan et al. ( Survey Methodology, 2001).
Besides issues related to the hierarchical nature of the imputations just described, there are other methodological issues of interest such as the use of transformations of the income variables, the imposition of restrictions on the values of variables, the general validity of sequential regression imputation and, even more generally, the validity of multiple-imputation inferences for surveys with complex sample designs.
Release date: 2004-09-13
10. Examples of multiple imputation in large-scale surveys Archived
Articles and reports: 11-522-X20020016716
Description:
Missing data are a constant problem in large-scale surveys. Such incompleteness is usually dealt with either by restricting the analysis to the cases with complete records or by imputing, for each missing item, an efficiently estimated value. The deficiencies of these approaches will be discussed in this paper, especially in the context of estimating a large number of quantities. The main part of the paper will describe two examples of analyses using multiple imputation.
In the first, the International Labour Organization (ILO) employment status is imputed in the British Labour Force Survey by a Bayesian bootstrap method. It is an adaptation of the hot-deck method, which seeks to fully exploit the auxiliary information. Important auxiliary information is given by the previous ILO status, when available, and the standard demographic variables.
Missing data can be interpreted more generally, as in the framework of the expectation maximization (EM) algorithm. The second example is from the Scottish House Condition Survey, and its focus is on the inconsistency of the surveyors. The surveyors assess the sampled dwelling units on a large number of elements or features of the dwelling, such as internal walls, roof and plumbing, that are scored and converted to a summarizing 'comprehensive repair cost.' The level of inconsistency is estimated from the discrepancies between the pairs of assessments of doubly surveyed dwellings. The principal research questions concern the amount of information that is lost as a result of the inconsistency and whether the naive estimators that ignore the inconsistency are unbiased. The problem is solved by multiple imputation, generating plausible scores for all the dwellings in the survey.
Release date: 2004-09-13

Journals and periodicals (1)

Journals and periodicals (1) ((1 result))

1. Sampling and Weighting, 2001 Census Technical Report (Reference Products: 2001 Census) Archived
Journals and periodicals: 92-395-X
Description:
This report describes sampling and weighting procedures used in the 2001 Census. It reviews the history of these procedures in Canadian censuses, provides operational and theoretical justifications for them, and presents the results of the evaluation studies of these procedures.
Release date: 2004-12-15

Report a problem or mistake on this page

Date modified:: 2024-09-24