Analysis

Skip to main content
Skip to footer

Language selection

Français

Search and menus

Search and menus

Search

Skip to filters. View results.

What’s new on our website

Statistics Canada's Trust Centre

Results

All (3)

All (3) ((3 results))

1. A cautionary note on Clark Winsorization Archived
Articles and reports: 12-001-X201600214676
Description:
Winsorization procedures replace extreme values with less extreme values, effectively moving the original extreme values toward the center of the distribution. Winsorization therefore both detects and treats influential values. Mulry, Oliver and Kaputa (2014) compare the performance of the one-sided Winsorization method developed by Clark (1995) and described by Chambers, Kokic, Smith and Cruddas (2000) to the performance of M-estimation (Beaumont and Alavi 2004) in highly skewed business population data. One aspect of particular interest for methods that detect and treat influential values is the range of values designated as influential, called the detection region. The Clark Winsorization algorithm is easy to implement and can be extremely effective. However, the resultant detection region is highly dependent on the number of influential values in the sample, especially when the survey totals are expected to vary greatly by collection period. In this note, we examine the effect of the number and magnitude of influential values on the detection regions from Clark Winsorization using data simulated to realistically reflect the properties of the population for the Monthly Retail Trade Survey (MRTS) conducted by the U.S. Census Bureau. Estimates from the MRTS and other economic surveys are used in economic indicators, such as the Gross Domestic Product (GDP).
Release date: 2016-12-20
2. Using Administrative Records to Evaluate Survey Data Archived
Articles and reports: 11-522-X201700014711
Description:
After the 2010 Census, the U.S. Census Bureau conducted two separate research projects matching survey data to databases. One study matched to the third-party database Accurint, and the other matched to U.S. Postal Service National Change of Address (NCOA) files. In both projects, we evaluated response error in reported move dates by comparing the self-reported move date to records in the database. We encountered similar challenges in the two projects. This paper discusses our experience using “big data” as a comparison source for survey data and our lessons learned for future projects similar to the ones we conducted.
Release date: 2016-03-24
3. Total error in the dual system estimator: The 1986 Census of Central Los Angeles County Archived
Articles and reports: 12-001-X198800214589
Description:
The U.S. Bureau of the Census uses dual system estimates (DSEs) for measuring census coverage error. The dual system estimate uses data from the original enumeration and a Post Enumeration Survey. In measuring the accuracy of the DSE, it is important to know that the DSE is subject to several components of nonsampling error, as well as sampling error. This paper gives models of the total error and the components of error in the dual system estimates. The models relate observed indicators of data quality, such as a matching error rate, to the first two moments of the components of error. The propagation of error in the DSE is studied and its bias and variance are assessed. The methodology is applied to the 1986 Census of Central Los Angeles County in the Census Bureau’s Test of Adjustment Related Operations. The methodology also will be useful to assess error in the DSE for the 1990 census as well as other applications.
Release date: 1988-12-15

Stats in brief (0)

Stats in brief (0) (0 results)

No content available at this time.

Articles and reports (3)

Articles and reports (3) ((3 results))

1. A cautionary note on Clark Winsorization Archived
Articles and reports: 12-001-X201600214676
Description:
Winsorization procedures replace extreme values with less extreme values, effectively moving the original extreme values toward the center of the distribution. Winsorization therefore both detects and treats influential values. Mulry, Oliver and Kaputa (2014) compare the performance of the one-sided Winsorization method developed by Clark (1995) and described by Chambers, Kokic, Smith and Cruddas (2000) to the performance of M-estimation (Beaumont and Alavi 2004) in highly skewed business population data. One aspect of particular interest for methods that detect and treat influential values is the range of values designated as influential, called the detection region. The Clark Winsorization algorithm is easy to implement and can be extremely effective. However, the resultant detection region is highly dependent on the number of influential values in the sample, especially when the survey totals are expected to vary greatly by collection period. In this note, we examine the effect of the number and magnitude of influential values on the detection regions from Clark Winsorization using data simulated to realistically reflect the properties of the population for the Monthly Retail Trade Survey (MRTS) conducted by the U.S. Census Bureau. Estimates from the MRTS and other economic surveys are used in economic indicators, such as the Gross Domestic Product (GDP).
Release date: 2016-12-20
2. Using Administrative Records to Evaluate Survey Data Archived
Articles and reports: 11-522-X201700014711
Description:
After the 2010 Census, the U.S. Census Bureau conducted two separate research projects matching survey data to databases. One study matched to the third-party database Accurint, and the other matched to U.S. Postal Service National Change of Address (NCOA) files. In both projects, we evaluated response error in reported move dates by comparing the self-reported move date to records in the database. We encountered similar challenges in the two projects. This paper discusses our experience using “big data” as a comparison source for survey data and our lessons learned for future projects similar to the ones we conducted.
Release date: 2016-03-24
3. Total error in the dual system estimator: The 1986 Census of Central Los Angeles County Archived
Articles and reports: 12-001-X198800214589
Description:
The U.S. Bureau of the Census uses dual system estimates (DSEs) for measuring census coverage error. The dual system estimate uses data from the original enumeration and a Post Enumeration Survey. In measuring the accuracy of the DSE, it is important to know that the DSE is subject to several components of nonsampling error, as well as sampling error. This paper gives models of the total error and the components of error in the dual system estimates. The models relate observed indicators of data quality, such as a matching error rate, to the first two moments of the components of error. The propagation of error in the DSE is studied and its bias and variance are assessed. The methodology is applied to the 1986 Census of Central Los Angeles County in the Census Bureau’s Test of Adjustment Related Operations. The methodology also will be useful to assess error in the DSE for the 1990 census as well as other applications.
Release date: 1988-12-15

Journals and periodicals (0)

Journals and periodicals (0) (0 results)

No content available at this time.

Report a problem or mistake on this page

Date modified:: 2024-04-16