Analysis

Skip to main content
Skip to footer

Language selection

Français

Search and menus

Search and menus

Search

Skip to filters. View results.

Statistics Canada's Trust Centre

Results

All (11)

All (11) (0 to 10 of 11 results)

1. Use of within-primary-sample-unit variances to assess the stability of a standard design-based variance estimator Archived
Articles and reports: 12-001-X200900211045
Description:
In analysis of sample survey data, degrees-of-freedom quantities are often used to assess the stability of design-based variance estimators. For example, these degrees-of-freedom values are used in construction of confidence intervals based on t distribution approximations; and of related t tests. In addition, a small degrees-of-freedom term provides a qualitative indication of the possible limitations of a given variance estimator in a specific application. Degrees-of-freedom calculations sometimes are based on forms of the Satterthwaite approximation. These Satterthwaite-based calculations depend primarily on the relative magnitudes of stratum-level variances. However, for designs involving a small number of primary units selected per stratum, standard stratum-level variance estimators provide limited information on the true stratum variances. For such cases, customary Satterthwaite-based calculations can be problematic, especially in analyses for subpopulations that are concentrated in a relatively small number of strata. To address this problem, this paper uses estimated within-primary-sample-unit (within PSU) variances to provide auxiliary information regarding the relative magnitudes of the overall stratum-level variances. Analytic results indicate that the resulting degrees-of-freedom estimator will be better than modified Satterthwaite-type estimators provided: (a) the overall stratum-level variances are approximately proportional to the corresponding within-stratum variances; and (b) the variances of the within-PSU variance estimators are relatively small. In addition, this paper develops errors-in-variables methods that can be used to check conditions (a) and (b) empirically. For these model checks, we develop simulation-based reference distributions, which differ substantially from reference distributions based on customary large-sample normal approximations. The proposed methods are applied to four variables from the U.S. Third National Health and Nutrition Examination Survey (NHANES III).
Release date: 2009-12-23
2. CARI: A tool for improving data quality now and the next time Archived
Articles and reports: 11-522-X200800010956
Description:
The use of Computer Audio-Recorded Interviewing (CARI) as a tool to identify interview falsification is quickly growing in survey research (Biemer, 2000, 2003; Thissen, 2007). Similarly, survey researchers are starting to expand the usefulness of CARI by combining recordings with coding to address data quality (Herget, 2001; Hansen, 2005; McGee, 2007). This paper presents results from a study included as part of the establishment-based National Center for Health Statistics' National Home and Hospice Care Survey (NHHCS) which used CARI behavior coding and CARI-specific paradata to: 1) identify and correct problematic interviewer behavior or question issues early in the data collection period before either negatively impact data quality, and; 2) identify ways to diminish measurement error in future implementations of the NHHCS. During the first 9 weeks of the 30-week field period, CARI recorded a subset of questions from the NHHCS application for all interviewers. Recordings were linked with the interview application and output and then coded in one of two modes: Code by Interviewer or Code by Question. The Code by Interviewer method provided visibility into problems specific to an interviewer as well as more generalized problems potentially applicable to all interviewers. The Code by Question method yielded data that spoke to understandability of the questions and other response problems. In this mode, coders coded multiple implementations of the same question across multiple interviewers. Using the Code by Question approach, researchers identified issues with three key survey questions in the first few weeks of data collection and provided guidance to interviewers in how to handle those questions as data collection continued. Results from coding the audio recordings (which were linked with the survey application and output) will inform question wording and interviewer training in the next implementation of the NHHCS, and guide future enhancement of CARI and the coding system.
Release date: 2009-12-03
3. A web based high school transcript and course catalog keying and coding system Archived
Articles and reports: 11-522-X200800010970
Description:
RTI International is currently conducting a longitudinal education study. One component of the study involved collecting transcripts and course catalogs from high schools that the sample members attended. Information from the transcripts and course catalogs also needed to be keyed and coded. This presented a challenge because the transcripts and course catalogs were collected from different types of schools, including public, private, and religious schools, from across the nation and they varied widely in both content and format. The challenge called for a sophisticated system that could be used by multiple users simultaneously. RTI developed such a system possessing all the characteristics of a high-end, high-tech, multi-user, multitask, user-friendly and low maintenance cost high school transcript and course catalog keying and coding system. The system is web based and has three major functions: transcript and catalog keying and coding, transcript and catalog keying quality control (keyer-coder end), and transcript and catalog coding QC (management end). Given the complex nature of transcript and catalog keying and coding, the system was designed to be flexible and to have the ability to transport keyed and coded data throughout the system to reduce the keying time, the ability to logically guide users through all the pages that a type of activity required, the ability to display appropriate information to help keying performance, and the ability to track all the keying, coding, and QC activities. Hundreds of catalogs and thousands of transcripts were successfully keyed, coded, and verified using the system. This paper will report on the system needs and design, implementation tips, problems faced and their solutions, and lessons learned.
Release date: 2009-12-03
4. Going Web-only in a complex enterprise survey: Experiences and lessons learned Archived
Articles and reports: 11-522-X200800010987
Description:
Over the last few years, there have been large progress in the web data collection area. Today, many statistical offices offer a web alternative in many different types of surveys. It is widely believed that web data collection may raise data quality while lowering data collection costs. Experience has shown that, offered web as a second alternative to paper questionnaires; enterprises have been slow to embrace the web alternative. On the other hand, experiments have also shown that by promoting web over paper, it is possible to raise the web take up rates. However, there are still few studies on what happens when the contact strategy is changed radically and the web option is the only option given in a complex enterprise survey. In 2008, Statistics Sweden took the step of using more or less a web-only strategy in the survey of industrial production (PRODCOM). The web questionnaire was developed in the generalised tool for web surveys used by Statistics Sweden. The paper presents the web solution and some experiences from the 2008 PRODCOM survey, including process data on response rates and error ratios as well as the results of a cognitive follow-up of the survey. Some important lessons learned are also presented.
Release date: 2009-12-03
5. Accounting for uncertainty in the evaluation of data collection costs and data quality under partitioned designs for the U.S. Consumer Expenditure Surveys Archived
Articles and reports: 11-522-X200800010991
Description:
In the evaluation of prospective survey designs, statistical agencies generally must consider a large number of design factors that may have a substantial impact on both survey costs and data quality. Assessments of trade-offs between cost and quality are often complicated by limitations on the amount of information available regarding fixed and marginal costs related to: instrument redesign and field testing; the number of primary sample units and sample elements included in the sample; assignment of instrument sections and collection modes to specific sample elements; and (for longitudinal surveys) the number and periodicity of interviews. Similarly, designers often have limited information on the impact of these design factors on data quality.
This paper extends standard design-optimization approaches to account for uncertainty in the abovementioned components of cost and quality. Special attention is directed toward the level of precision required for cost and quality information to provide useful input into the design process; sensitivity of cost-quality trade-offs to changes in assumptions regarding functional forms; and implications for preliminary work focused on collection of cost and quality information. In addition, the paper considers distinctions between cost and quality components encountered in field testing and production work, respectively; incorporation of production-level cost and quality information into adaptive design work; as well as costs and operational risks arising from the collection of detailed cost and quality data during production work. The proposed methods are motivated by, and applied to, work with partitioned redesign of the interview and diary components of the U.S. Consumer Expenditure Survey.
Release date: 2009-12-03
6. Coherence analysis: Improved quality of National Accounts Archived
Articles and reports: 11-522-X200800011014
Description:
In many countries, improved quality of economic statistics is one of the most important goals of the 21st century. First and foremost, the quality of National Accounts is in focus, regarding both annual and quarterly accounts. To achieve this goal, data quality regarding the largest enterprises is of vital importance. To assure that the quality of data for the largest enterprises is good, coherence analysis is an important tool. Coherence means that data from different sources fit together and give a consistent view of the development within these enterprises. Working with coherence analysis in an efficient way is normally a work-intensive task consisting mainly of collecting data from different sources and comparing them in a structured manner. Over the last two years, Statistics Sweden has made great progress in improving the routines for coherence analysis. An IT tool that collects data for the largest enterprises from a large number of sources and presents it in a structured and logical matter has been built, and a systematic approach to analyse data for National Accounts on a quarterly basis has been developed. The paper describes the work in both these areas and gives an overview of the IT tool and the agreed routines.
Release date: 2009-12-03
7. Scientific and Technological Activities of Provincial Governments and Provincial Research Organizations, 2003/2004 to 2007/2008 Archived
Stats in brief: 88-001-X200900711026
Description:
The information in this document is intended primarily to be used by scientific and technological (S&T) policy makers, both federal and provincial, largely as a basis for inter-provincial and inter-sectoral comparisons. The statistics are aggregates of the provincial government and provincial research organization science surveys conducted by Statistics Canada under contract with the provinces, and cover the period 2002/2003 to 2006/2007.
Release date: 2009-11-20
8. Calibration with linked weights Archived
Articles and reports: 11-536-X200900110809
Description:
Cluster sampling and multi stage designs involve sampling of units from more than one population. Auxiliary information is usually available for the population and sample at each of these levels. Calibration weights for a sample are generally produced using only the auxiliary information at that level. This approach ignores the auxiliary information available at other levels. Moreover, it is often of practical interest to link the calibration weights between samples at different levels. Integrated weighting in cluster sampling ensures that the weights for the units in a cluster are all the same and equal to the cluster weight. This presentation discusses a generalization of integrated weighting to multi stage sample designs. This is called linked weighting.
Release date: 2009-08-11
9. A new face on two-phase sampling with calibration estimators Archived
Articles and reports: 12-001-X200900110880
Description:
This paper provides a framework for estimation by calibration in two phase sampling designs. This work grew out of the continuing development of generalized estimation software at Statistics Canada. An important objective in this development is to provide a wide range of options for effective use of auxiliary information in different sampling designs. This objective is reflected in the general methodology for two phase designs presented in this paper.
We consider the traditional two phase sampling design. A phase one sample is drawn from the finite population and then a phase two sample is drawn as a sub sample of the first. The study variable, whose unknown population total is to be estimated, is observed only for the units in the phase two sample. Arbitrary sampling designs are allowed in each phase of sampling. Different types of auxiliary information are identified for the computation of the calibration weights at each phase. The auxiliary variables and the study variables can be continuous or categorical.
The paper contributes to four important areas in the general context of calibration for two phase designs:(1) Three broad types of auxiliary information for two phase designs are identified and used in the estimation. The information is incorporated into the weights in two steps: a phase one calibration and a phase two calibration. We discuss the composition of the appropriate auxiliary vectors for each step, and use a linearization method to arrive at the residuals that determine the asymptotic variance of the calibration estimator.(2) We examine the effect of alternative choices of starting weights for the calibration. The two "natural" choices for the starting weights generally produce slightly different estimators. However, under certain conditions, these two estimators have the same asymptotic variance.(3) We re examine variance estimation for the two phase calibration estimator. A new procedure is proposed that can improve significantly on the usual technique of conditioning on the phase one sample. A simulation in section 10 serves to validate the advantage of this new method.(4) We compare the calibration approach with the traditional model assisted regression technique which uses a linear regression fit at two levels. We show that the model assisted estimator has properties similar to a two phase calibration estimator.
Release date: 2009-06-22
10. Wholesale Trade: The Year 2008 in Review Archived
Articles and reports: 11-621-M2009079
Geography: Canada, Province or territory
Description:
The study reviews the performance of the wholesale trade sector nationally and provincially in 2008, along with the key factors affecting this sector outcome. The study also examines infra-annual trends in this sector.
Release date: 2009-05-05

Stats in brief (1)

Stats in brief (1) ((1 result))

1. Scientific and Technological Activities of Provincial Governments and Provincial Research Organizations, 2003/2004 to 2007/2008 Archived
Stats in brief: 88-001-X200900711026
Description:
The information in this document is intended primarily to be used by scientific and technological (S&T) policy makers, both federal and provincial, largely as a basis for inter-provincial and inter-sectoral comparisons. The statistics are aggregates of the provincial government and provincial research organization science surveys conducted by Statistics Canada under contract with the provinces, and cover the period 2002/2003 to 2006/2007.
Release date: 2009-11-20

Articles and reports (10)

Articles and reports (10) ((10 results))

1. Use of within-primary-sample-unit variances to assess the stability of a standard design-based variance estimator Archived
Articles and reports: 12-001-X200900211045
Description:
In analysis of sample survey data, degrees-of-freedom quantities are often used to assess the stability of design-based variance estimators. For example, these degrees-of-freedom values are used in construction of confidence intervals based on t distribution approximations; and of related t tests. In addition, a small degrees-of-freedom term provides a qualitative indication of the possible limitations of a given variance estimator in a specific application. Degrees-of-freedom calculations sometimes are based on forms of the Satterthwaite approximation. These Satterthwaite-based calculations depend primarily on the relative magnitudes of stratum-level variances. However, for designs involving a small number of primary units selected per stratum, standard stratum-level variance estimators provide limited information on the true stratum variances. For such cases, customary Satterthwaite-based calculations can be problematic, especially in analyses for subpopulations that are concentrated in a relatively small number of strata. To address this problem, this paper uses estimated within-primary-sample-unit (within PSU) variances to provide auxiliary information regarding the relative magnitudes of the overall stratum-level variances. Analytic results indicate that the resulting degrees-of-freedom estimator will be better than modified Satterthwaite-type estimators provided: (a) the overall stratum-level variances are approximately proportional to the corresponding within-stratum variances; and (b) the variances of the within-PSU variance estimators are relatively small. In addition, this paper develops errors-in-variables methods that can be used to check conditions (a) and (b) empirically. For these model checks, we develop simulation-based reference distributions, which differ substantially from reference distributions based on customary large-sample normal approximations. The proposed methods are applied to four variables from the U.S. Third National Health and Nutrition Examination Survey (NHANES III).
Release date: 2009-12-23
2. CARI: A tool for improving data quality now and the next time Archived
Articles and reports: 11-522-X200800010956
Description:
The use of Computer Audio-Recorded Interviewing (CARI) as a tool to identify interview falsification is quickly growing in survey research (Biemer, 2000, 2003; Thissen, 2007). Similarly, survey researchers are starting to expand the usefulness of CARI by combining recordings with coding to address data quality (Herget, 2001; Hansen, 2005; McGee, 2007). This paper presents results from a study included as part of the establishment-based National Center for Health Statistics' National Home and Hospice Care Survey (NHHCS) which used CARI behavior coding and CARI-specific paradata to: 1) identify and correct problematic interviewer behavior or question issues early in the data collection period before either negatively impact data quality, and; 2) identify ways to diminish measurement error in future implementations of the NHHCS. During the first 9 weeks of the 30-week field period, CARI recorded a subset of questions from the NHHCS application for all interviewers. Recordings were linked with the interview application and output and then coded in one of two modes: Code by Interviewer or Code by Question. The Code by Interviewer method provided visibility into problems specific to an interviewer as well as more generalized problems potentially applicable to all interviewers. The Code by Question method yielded data that spoke to understandability of the questions and other response problems. In this mode, coders coded multiple implementations of the same question across multiple interviewers. Using the Code by Question approach, researchers identified issues with three key survey questions in the first few weeks of data collection and provided guidance to interviewers in how to handle those questions as data collection continued. Results from coding the audio recordings (which were linked with the survey application and output) will inform question wording and interviewer training in the next implementation of the NHHCS, and guide future enhancement of CARI and the coding system.
Release date: 2009-12-03
3. A web based high school transcript and course catalog keying and coding system Archived
Articles and reports: 11-522-X200800010970
Description:
RTI International is currently conducting a longitudinal education study. One component of the study involved collecting transcripts and course catalogs from high schools that the sample members attended. Information from the transcripts and course catalogs also needed to be keyed and coded. This presented a challenge because the transcripts and course catalogs were collected from different types of schools, including public, private, and religious schools, from across the nation and they varied widely in both content and format. The challenge called for a sophisticated system that could be used by multiple users simultaneously. RTI developed such a system possessing all the characteristics of a high-end, high-tech, multi-user, multitask, user-friendly and low maintenance cost high school transcript and course catalog keying and coding system. The system is web based and has three major functions: transcript and catalog keying and coding, transcript and catalog keying quality control (keyer-coder end), and transcript and catalog coding QC (management end). Given the complex nature of transcript and catalog keying and coding, the system was designed to be flexible and to have the ability to transport keyed and coded data throughout the system to reduce the keying time, the ability to logically guide users through all the pages that a type of activity required, the ability to display appropriate information to help keying performance, and the ability to track all the keying, coding, and QC activities. Hundreds of catalogs and thousands of transcripts were successfully keyed, coded, and verified using the system. This paper will report on the system needs and design, implementation tips, problems faced and their solutions, and lessons learned.
Release date: 2009-12-03
4. Going Web-only in a complex enterprise survey: Experiences and lessons learned Archived
Articles and reports: 11-522-X200800010987
Description:
Over the last few years, there have been large progress in the web data collection area. Today, many statistical offices offer a web alternative in many different types of surveys. It is widely believed that web data collection may raise data quality while lowering data collection costs. Experience has shown that, offered web as a second alternative to paper questionnaires; enterprises have been slow to embrace the web alternative. On the other hand, experiments have also shown that by promoting web over paper, it is possible to raise the web take up rates. However, there are still few studies on what happens when the contact strategy is changed radically and the web option is the only option given in a complex enterprise survey. In 2008, Statistics Sweden took the step of using more or less a web-only strategy in the survey of industrial production (PRODCOM). The web questionnaire was developed in the generalised tool for web surveys used by Statistics Sweden. The paper presents the web solution and some experiences from the 2008 PRODCOM survey, including process data on response rates and error ratios as well as the results of a cognitive follow-up of the survey. Some important lessons learned are also presented.
Release date: 2009-12-03
5. Accounting for uncertainty in the evaluation of data collection costs and data quality under partitioned designs for the U.S. Consumer Expenditure Surveys Archived
Articles and reports: 11-522-X200800010991
Description:
In the evaluation of prospective survey designs, statistical agencies generally must consider a large number of design factors that may have a substantial impact on both survey costs and data quality. Assessments of trade-offs between cost and quality are often complicated by limitations on the amount of information available regarding fixed and marginal costs related to: instrument redesign and field testing; the number of primary sample units and sample elements included in the sample; assignment of instrument sections and collection modes to specific sample elements; and (for longitudinal surveys) the number and periodicity of interviews. Similarly, designers often have limited information on the impact of these design factors on data quality.
This paper extends standard design-optimization approaches to account for uncertainty in the abovementioned components of cost and quality. Special attention is directed toward the level of precision required for cost and quality information to provide useful input into the design process; sensitivity of cost-quality trade-offs to changes in assumptions regarding functional forms; and implications for preliminary work focused on collection of cost and quality information. In addition, the paper considers distinctions between cost and quality components encountered in field testing and production work, respectively; incorporation of production-level cost and quality information into adaptive design work; as well as costs and operational risks arising from the collection of detailed cost and quality data during production work. The proposed methods are motivated by, and applied to, work with partitioned redesign of the interview and diary components of the U.S. Consumer Expenditure Survey.
Release date: 2009-12-03
6. Coherence analysis: Improved quality of National Accounts Archived
Articles and reports: 11-522-X200800011014
Description:
In many countries, improved quality of economic statistics is one of the most important goals of the 21st century. First and foremost, the quality of National Accounts is in focus, regarding both annual and quarterly accounts. To achieve this goal, data quality regarding the largest enterprises is of vital importance. To assure that the quality of data for the largest enterprises is good, coherence analysis is an important tool. Coherence means that data from different sources fit together and give a consistent view of the development within these enterprises. Working with coherence analysis in an efficient way is normally a work-intensive task consisting mainly of collecting data from different sources and comparing them in a structured manner. Over the last two years, Statistics Sweden has made great progress in improving the routines for coherence analysis. An IT tool that collects data for the largest enterprises from a large number of sources and presents it in a structured and logical matter has been built, and a systematic approach to analyse data for National Accounts on a quarterly basis has been developed. The paper describes the work in both these areas and gives an overview of the IT tool and the agreed routines.
Release date: 2009-12-03
7. Calibration with linked weights Archived
Articles and reports: 11-536-X200900110809
Description:
Cluster sampling and multi stage designs involve sampling of units from more than one population. Auxiliary information is usually available for the population and sample at each of these levels. Calibration weights for a sample are generally produced using only the auxiliary information at that level. This approach ignores the auxiliary information available at other levels. Moreover, it is often of practical interest to link the calibration weights between samples at different levels. Integrated weighting in cluster sampling ensures that the weights for the units in a cluster are all the same and equal to the cluster weight. This presentation discusses a generalization of integrated weighting to multi stage sample designs. This is called linked weighting.
Release date: 2009-08-11
8. A new face on two-phase sampling with calibration estimators Archived
Articles and reports: 12-001-X200900110880
Description:
This paper provides a framework for estimation by calibration in two phase sampling designs. This work grew out of the continuing development of generalized estimation software at Statistics Canada. An important objective in this development is to provide a wide range of options for effective use of auxiliary information in different sampling designs. This objective is reflected in the general methodology for two phase designs presented in this paper.
We consider the traditional two phase sampling design. A phase one sample is drawn from the finite population and then a phase two sample is drawn as a sub sample of the first. The study variable, whose unknown population total is to be estimated, is observed only for the units in the phase two sample. Arbitrary sampling designs are allowed in each phase of sampling. Different types of auxiliary information are identified for the computation of the calibration weights at each phase. The auxiliary variables and the study variables can be continuous or categorical.
The paper contributes to four important areas in the general context of calibration for two phase designs:(1) Three broad types of auxiliary information for two phase designs are identified and used in the estimation. The information is incorporated into the weights in two steps: a phase one calibration and a phase two calibration. We discuss the composition of the appropriate auxiliary vectors for each step, and use a linearization method to arrive at the residuals that determine the asymptotic variance of the calibration estimator.(2) We examine the effect of alternative choices of starting weights for the calibration. The two "natural" choices for the starting weights generally produce slightly different estimators. However, under certain conditions, these two estimators have the same asymptotic variance.(3) We re examine variance estimation for the two phase calibration estimator. A new procedure is proposed that can improve significantly on the usual technique of conditioning on the phase one sample. A simulation in section 10 serves to validate the advantage of this new method.(4) We compare the calibration approach with the traditional model assisted regression technique which uses a linear regression fit at two levels. We show that the model assisted estimator has properties similar to a two phase calibration estimator.
Release date: 2009-06-22
9. Wholesale Trade: The Year 2008 in Review Archived
Articles and reports: 11-621-M2009079
Geography: Canada, Province or territory
Description:
The study reviews the performance of the wholesale trade sector nationally and provincially in 2008, along with the key factors affecting this sector outcome. The study also examines infra-annual trends in this sector.
Release date: 2009-05-05
10. Cancer prevalence in the Canadian population Archived
Articles and reports: 82-003-X200900110800
Geography: Canada
Description:
This article provides more precise and detailed estimates of cancer prevalence than have been available previously.
Release date: 2009-03-18

Journals and periodicals (0)

Journals and periodicals (0) (0 results)

No content available at this time.

Report a problem or mistake on this page

Date modified:: 2024-09-22