Keyword search

Skip to main content
Skip to footer

Language selection

Français

Search and menus

Search and menus

Search

Results

All (71)

All (71) (0 to 10 of 71 results)

1. Finding and Using Statistics Archived
Surveys and statistical programs – Documentation: 11-533-X
Description:
This guide has been created especially for users needing a step-by-step review on how to find, read and use data, with quick tips on locating information on the Statistics Canada website. Originally published in paper format in the 1980s, revised as part of the 1994 Statistics Canada Catalogue, and then transformed into an electronic version, this guide is continually being updated to maintain its currency and usefulness.
Release date: 2007-11-19
2. Variance estimation in the analysis of clustered longitudinal survey data Archived
Articles and reports: 12-001-X20070019847
Description:
We investigate the impact of cluster sampling on standard errors in the analysis of longitudinal survey data. We consider a widely used class of regression models for longitudinal data and a standard class of point estimators of a generalized least squares type. We argue theoretically that the impact of ignoring clustering in standard error estimation will tend to increase with the number of waves in the analysis, under some patterns of clustering which are realistic for many social surveys. The implication is that it is, in general, at least as important to allow for clustering in standard errors for longitudinal analyses as for cross-sectional analyses. We illustrate this theoretical argument with empirical evidence from a regression analysis of longitudinal data on gender role attitudes from the British Household Panel Survey. We also compare two approaches to variance estimation in the analysis of longitudinal survey data: a survey sampling approach based upon linearization and a multilevel modelling approach. We conclude that the impact of clustering can be seriously underestimated if it is simply handled by including an additive random effect to represent the clustering in a multilevel model.
Release date: 2007-06-28
3. Modelling durations of multiple spells from longitudinal survey data Archived
Articles and reports: 12-001-X20070019848
Description:
We investigate some modifications of the classical single-spell Cox model in order to handle multiple spells from the same individual when the data are collected in a longitudinal survey based on a complex sample design. One modification is the use of a design-based approach for the estimation of the model coefficients and their variances; in the variance estimation each individual is treated as a cluster of spells, bringing an extra stage of clustering into the survey design. Other modifications to the model allow a flexible specification of the baseline hazard to account for possibly differential dependence of hazard on the order and duration of successive spells, and also allow for differential effects of the covariates on the spells of different orders. These approaches are illustrated using data from the Canadian Survey of Labour and Income Dynamics (SLID).
Release date: 2007-06-28
4. Bayesian weight trimming for generalized linear regression models Archived
Articles and reports: 12-001-X20070019849
Description:
In sample surveys where units have unequal probabilities of inclusion in the sample, associations between the probability of inclusion and the statistic of interest can induce bias. Weights equal to the inverse of the probability of inclusion are often used to counteract this bias. Highly disproportional sample designs have large weights, which can introduce undesirable variability in statistics such as the population mean estimator or population regression estimator. Weight trimming reduces large weights to a fixed cutpoint value and adjusts weights below this value to maintain the untrimmed weight sum, reducing variability at the cost of introducing some bias. Most standard approaches are ad-hoc in that they do not use the data to optimize bias-variance tradeoffs. Approaches described in the literature that are data-driven are a little more efficient than fully-weighted estimators. This paper develops Bayesian methods for weight trimming of linear and generalized linear regression estimators in unequal probability-of-inclusion designs. An application to estimate injury risk of children rear-seated in compact extended-cab pickup trucks using the Partners for Child Passenger Safety surveillance survey is considered.
Release date: 2007-06-28
5. Semiparametric model-assisted estimation for natural resource surveys Archived
Articles and reports: 12-001-X20070019850
Description:
Auxiliary information is often used to improve the precision of survey estimators of finite population means and totals through ratio or linear regression estimation techniques. Resulting estimators have good theoretical and practical properties, including invariance, calibration and design consistency. However, it is not always clear that ratio or linear models are good approximations to the true relationship between the auxiliary variables and the variable of interest in the survey, resulting in efficiency loss when the model is not appropriate. In this article, we explain how regression estimation can be extended to incorporate semiparametric regression models, in both simple and more complicated designs. While maintaining the good theoretical and practical properties of the linear models, semiparametric models are better able to capture complicated relationships between variables. This often results in substantial gains in efficiency. The applicability of the approach for complex designs using multiple types of auxiliary variables will be illustrated by estimating several acidification-related characteristics for a survey of lakes in the Northeastern US.
Release date: 2007-06-28
6. Ex post weighting of price data to estimate depreciation rates Archived
Articles and reports: 12-001-X20070019851
Description:
To model economic depreciation, a database is used that contains information on assets discarded by companies. The acquisition and resale prices are known along with the length of use of these assets. However, the assets for which prices are known are only those that were involved in a transaction. While an asset depreciates on a continuous basis during its service life, the value of the asset is only known when there has been a transaction. This article proposes an ex post weighting to offset the effect of source of error in building econometric models.
Release date: 2007-06-28
7. Person-level and household-level regression estimation in household surveys Archived
Articles and reports: 12-001-X20070019852
Description:
A common class of survey designs involves selecting all people within selected households. Generalized regression estimators can be calculated at either the person or household level. Implementing the estimator at the household level has the convenience of equal estimation weights for people within households. In this article the two approaches are compared theoretically and empirically for the case of simple random sampling of households and selection of all persons in each selected household. We find that the household level approach is theoretically more efficient in large samples and any empirical inefficiency in small samples is limited.
Release date: 2007-06-28
8. Mean - Adjusted bootstrap for two - Phase sampling Archived
Articles and reports: 12-001-X20070019853
Description:
Two-phase sampling is a useful design when the auxiliary variables are unavailable in advance. Variance estimation under this design, however, is complicated particularly when sampling fractions are high. This article addresses a simple bootstrap method for two-phase simple random sampling without replacement at each phase with high sampling fractions. It works for the estimation of distribution functions and quantiles since no rescaling is performed. The method can be extended to stratified two-phase sampling by independently repeating the proposed procedure in different strata. Variance estimation of some conventional estimators, such as the ratio and regression estimators, is studied for illustration. A simulation study is conducted to compare the proposed method with existing variance estimators for estimating distribution functions and quantiles.
Release date: 2007-06-28
9. On standard errors of model-based small-area estimators Archived
Articles and reports: 12-001-X20070019854
Description:
We derive an estimator of the mean squared error (MSE) of the empirical Bayes and composite estimator of the local-area mean in the standard small-area setting. The MSE estimator is a composition of the established estimator based on the conditional expectation of the random deviation associated with the area and a naïve estimator of the design-based MSE. Its performance is assessed by simulations. Variants of this MSE estimator are explored and some extensions outlined.
Release date: 2007-06-28
10. Handling survey nonresponse in cluster sampling Archived
Articles and reports: 12-001-X20070019855
Description:
In surveys under cluster sampling, nonresponse on a variable is often dependent on a cluster level random effect and, hence, is nonignorable. Estimators of the population mean obtained by mean imputation or reweighting under the ignorable nonresponse assumption are then biased. We propose an unbiased estimator of the population mean by imputing or reweighting within each sampled cluster or a group of sampled clusters sharing some common feature. Some simulation results are presented to study the performance of the proposed estimator.
Release date: 2007-06-28

Data (0)

Data (0) (0 results)

No content available at this time.

Analysis (68)

Analysis (68) (0 to 10 of 68 results)

1. Variance estimation in the analysis of clustered longitudinal survey data Archived
Articles and reports: 12-001-X20070019847
Description:
We investigate the impact of cluster sampling on standard errors in the analysis of longitudinal survey data. We consider a widely used class of regression models for longitudinal data and a standard class of point estimators of a generalized least squares type. We argue theoretically that the impact of ignoring clustering in standard error estimation will tend to increase with the number of waves in the analysis, under some patterns of clustering which are realistic for many social surveys. The implication is that it is, in general, at least as important to allow for clustering in standard errors for longitudinal analyses as for cross-sectional analyses. We illustrate this theoretical argument with empirical evidence from a regression analysis of longitudinal data on gender role attitudes from the British Household Panel Survey. We also compare two approaches to variance estimation in the analysis of longitudinal survey data: a survey sampling approach based upon linearization and a multilevel modelling approach. We conclude that the impact of clustering can be seriously underestimated if it is simply handled by including an additive random effect to represent the clustering in a multilevel model.
Release date: 2007-06-28
2. Modelling durations of multiple spells from longitudinal survey data Archived
Articles and reports: 12-001-X20070019848
Description:
We investigate some modifications of the classical single-spell Cox model in order to handle multiple spells from the same individual when the data are collected in a longitudinal survey based on a complex sample design. One modification is the use of a design-based approach for the estimation of the model coefficients and their variances; in the variance estimation each individual is treated as a cluster of spells, bringing an extra stage of clustering into the survey design. Other modifications to the model allow a flexible specification of the baseline hazard to account for possibly differential dependence of hazard on the order and duration of successive spells, and also allow for differential effects of the covariates on the spells of different orders. These approaches are illustrated using data from the Canadian Survey of Labour and Income Dynamics (SLID).
Release date: 2007-06-28
3. Bayesian weight trimming for generalized linear regression models Archived
Articles and reports: 12-001-X20070019849
Description:
In sample surveys where units have unequal probabilities of inclusion in the sample, associations between the probability of inclusion and the statistic of interest can induce bias. Weights equal to the inverse of the probability of inclusion are often used to counteract this bias. Highly disproportional sample designs have large weights, which can introduce undesirable variability in statistics such as the population mean estimator or population regression estimator. Weight trimming reduces large weights to a fixed cutpoint value and adjusts weights below this value to maintain the untrimmed weight sum, reducing variability at the cost of introducing some bias. Most standard approaches are ad-hoc in that they do not use the data to optimize bias-variance tradeoffs. Approaches described in the literature that are data-driven are a little more efficient than fully-weighted estimators. This paper develops Bayesian methods for weight trimming of linear and generalized linear regression estimators in unequal probability-of-inclusion designs. An application to estimate injury risk of children rear-seated in compact extended-cab pickup trucks using the Partners for Child Passenger Safety surveillance survey is considered.
Release date: 2007-06-28
4. Semiparametric model-assisted estimation for natural resource surveys Archived
Articles and reports: 12-001-X20070019850
Description:
Auxiliary information is often used to improve the precision of survey estimators of finite population means and totals through ratio or linear regression estimation techniques. Resulting estimators have good theoretical and practical properties, including invariance, calibration and design consistency. However, it is not always clear that ratio or linear models are good approximations to the true relationship between the auxiliary variables and the variable of interest in the survey, resulting in efficiency loss when the model is not appropriate. In this article, we explain how regression estimation can be extended to incorporate semiparametric regression models, in both simple and more complicated designs. While maintaining the good theoretical and practical properties of the linear models, semiparametric models are better able to capture complicated relationships between variables. This often results in substantial gains in efficiency. The applicability of the approach for complex designs using multiple types of auxiliary variables will be illustrated by estimating several acidification-related characteristics for a survey of lakes in the Northeastern US.
Release date: 2007-06-28
5. Ex post weighting of price data to estimate depreciation rates Archived
Articles and reports: 12-001-X20070019851
Description:
To model economic depreciation, a database is used that contains information on assets discarded by companies. The acquisition and resale prices are known along with the length of use of these assets. However, the assets for which prices are known are only those that were involved in a transaction. While an asset depreciates on a continuous basis during its service life, the value of the asset is only known when there has been a transaction. This article proposes an ex post weighting to offset the effect of source of error in building econometric models.
Release date: 2007-06-28
6. Person-level and household-level regression estimation in household surveys Archived
Articles and reports: 12-001-X20070019852
Description:
A common class of survey designs involves selecting all people within selected households. Generalized regression estimators can be calculated at either the person or household level. Implementing the estimator at the household level has the convenience of equal estimation weights for people within households. In this article the two approaches are compared theoretically and empirically for the case of simple random sampling of households and selection of all persons in each selected household. We find that the household level approach is theoretically more efficient in large samples and any empirical inefficiency in small samples is limited.
Release date: 2007-06-28
7. Mean - Adjusted bootstrap for two - Phase sampling Archived
Articles and reports: 12-001-X20070019853
Description:
Two-phase sampling is a useful design when the auxiliary variables are unavailable in advance. Variance estimation under this design, however, is complicated particularly when sampling fractions are high. This article addresses a simple bootstrap method for two-phase simple random sampling without replacement at each phase with high sampling fractions. It works for the estimation of distribution functions and quantiles since no rescaling is performed. The method can be extended to stratified two-phase sampling by independently repeating the proposed procedure in different strata. Variance estimation of some conventional estimators, such as the ratio and regression estimators, is studied for illustration. A simulation study is conducted to compare the proposed method with existing variance estimators for estimating distribution functions and quantiles.
Release date: 2007-06-28
8. On standard errors of model-based small-area estimators Archived
Articles and reports: 12-001-X20070019854
Description:
We derive an estimator of the mean squared error (MSE) of the empirical Bayes and composite estimator of the local-area mean in the standard small-area setting. The MSE estimator is a composition of the established estimator based on the conditional expectation of the random deviation associated with the area and a naïve estimator of the design-based MSE. Its performance is assessed by simulations. Variants of this MSE estimator are explored and some extensions outlined.
Release date: 2007-06-28
9. Handling survey nonresponse in cluster sampling Archived
Articles and reports: 12-001-X20070019855
Description:
In surveys under cluster sampling, nonresponse on a variable is often dependent on a cluster level random effect and, hence, is nonignorable. Estimators of the population mean obtained by mean imputation or reweighting under the ignorable nonresponse assumption are then biased. We propose an unbiased estimator of the population mean by imputing or reweighting within each sampled cluster or a group of sampled clusters sharing some common feature. Some simulation results are presented to study the performance of the proposed estimator.
Release date: 2007-06-28
10. On an optimal controlled nearest proportional to size sampling scheme Archived
Articles and reports: 12-001-X20070019856
Description:
The concept of 'nearest proportional to size sampling designs' originated by Gabler (1987) is used to obtain an optimal controlled sampling design, ensuring zero selection probabilities to non-preferred samples. Variance estimation for the proposed optimal controlled sampling design using the Yates-Grundy form of the Horvitz-Thompson estimator is discussed. The true sampling variance of the proposed procedure is compared with that of the existing optimal controlled and uncontrolled high entropy selection procedures. The utility of the proposed procedure is demonstrated with the help of examples.
Release date: 2007-06-28

Reference (3)

Reference (3) ((3 results))

1. Finding and Using Statistics Archived
Surveys and statistical programs – Documentation: 11-533-X
Description:
This guide has been created especially for users needing a step-by-step review on how to find, read and use data, with quick tips on locating information on the Statistics Canada website. Originally published in paper format in the 1980s, revised as part of the 1994 Statistics Canada Catalogue, and then transformed into an electronic version, this guide is continually being updated to maintain its currency and usefulness.
Release date: 2007-11-19
2. Quality Assurance Review: Summary Report Archived
Surveys and statistical programs – Documentation: 12-594-X
Description:
This Summary Report provides an overview of the findings of a Quality Assurance Review that was conducted for nine key statistical programs during the period September 2006 to February 2007. The review was commissioned by Statistics Canada's Policy Committee in order to assess the soundness of quality assurance processes for these nine programs and to propose improvements where needed. The Summary Report describes the principal themes that recur frequently throughout these programs, as well as providing guidance for future reviews of this type.
Release date: 2007-06-20
3. Using the postal code as merge key for independent data files: matching data from the Canadian Census and an administrative file of school test scores in Quebec Archived
Surveys and statistical programs – Documentation: 11-522-X20050019476
Description:
The paper will show how, using data published by Statistics Canada and available from member libraries of the CREPUQ, a linkage approach using postal codes makes it possible to link the data from the outcomes file to a set of contextual variables. These variables could then contribute to producing, on an exploratory basis, a better index to explain the varied outcomes of students from schools. In terms of the impact, the proposed index could show more effectively the limitations of ranking students and schools when this information is not given sufficient weight.
Release date: 2007-03-02

Report a problem or mistake on this page

Date modified:: 2024-04-18