Analysis
Filter results by
Search HelpKeyword(s)
Year of publication
Author(s)
Results
All (14)
All (14) (0 to 10 of 14 results)
- Articles and reports: 12-001-X202400200004Description: While we avoid specifying the parametric relationship between the study variable and covariates, we illustrate the advantage of including a spatial component to better account for the covariates in our models to make Bayesian predictive inference. We treat each unique covariate combination as an individual stratum, then we use small area estimation techniques to make inference about the finite population mean of the continuous response variable. The two spatial models used are the conditional autoregressive and simple conditional autoregressive models. We include the spatial effects by creating the adjacency matrix via the Mahalanobis distance between covariates. We also show how to incorporate survey weights into the spatial models when dealing with probability survey data. We compare the results of two non-spatial models including the Scott-Smith model and the Battese, Harter, and Fuller model to the spatial models. We illustrate the comparison between the aforementioned models with an application using BMI data from eight counties in California. Our goal is to have neighboring strata yield similar predictions, and to increase the difference between strata that are not neighbors. Ultimately, using the spatial models shows less global pooling compared to the non-spatial models, which was the desired outcome.Release date: 2024-12-20
- Articles and reports: 12-001-X202300200004Description: We present a novel methodology to benchmark county-level estimates of crop area totals to a preset state total subject to inequality constraints and random variances in the Fay-Herriot model. For planted area of the National Agricultural Statistics Service (NASS), an agency of the United States Department of Agriculture (USDA), it is necessary to incorporate the constraint that the estimated totals, derived from survey and other auxiliary data, are no smaller than administrative planted area totals prerecorded by other USDA agencies except NASS. These administrative totals are treated as fixed and known, and this additional coherence requirement adds to the complexity of benchmarking the county-level estimates. A fully Bayesian analysis of the Fay-Herriot model offers an appealing way to incorporate the inequality and benchmarking constraints, and to quantify the resulting uncertainties, but sampling from the posterior densities involves difficult integration, and reasonable approximations must be made. First, we describe a single-shrinkage model, shrinking the means while the variances are assumed known. Second, we extend this model to accommodate double shrinkage, borrowing strength across means and variances. This extended model has two sources of extra variation, but because we are shrinking both means and variances, it is expected that this second model should perform better in terms of goodness of fit (reliability) and possibly precision. The computations are challenging for both models, which are applied to simulated data sets with properties resembling the Illinois corn crop.Release date: 2024-01-03
- Articles and reports: 12-001-X202200100004Description:
When the sample size of an area is small, borrowing information from neighbors is a small area estimation technique to provide more reliable estimates. One of the famous models in small area estimation is a multinomial-Dirichlet hierarchical model for multinomial counts. Due to natural characteristics of the data, making unimodal order restriction assumption to parameter spaces is relevant. In our application, body mass index is more likely at an overweight level, which means the unimodal order restriction may be reasonable. The same unimodal order restriction for all areas may be too strong to be true for some cases. To increase flexibility, we add uncertainty to the unimodal order restriction. Each area will have similar unimodal patterns, but not the same. Since the order restriction with uncertainty increases the inference difficulty, we make comparison with the posterior summaries and approximated log-pseudo marginal likelihood.
Release date: 2022-06-21 - Articles and reports: 12-001-X202100100001Description:
In a previous paper, we developed a model to make inference about small area proportions under selection bias in which the binary responses and the selection probabilities are correlated. This is the homogeneous nonignorable selection model; nonignorable selection means that the selection probabilities and the binary responses are correlated. The homogeneous nonignorable selection model was shown to perform better than a baseline ignorable selection model. However, one limitation of the homogeneous nonignorable selection model is that the distributions of the selection probabilities are assumed to be identical across areas. Therefore, we introduce a more general model, the heterogeneous nonignorable selection model, in which the selection probabilities are not identically distributed over areas. We used Markov chain Monte Carlo methods to fit the three models. We illustrate our methodology and compare our models using an example on severe activity limitation of the U.S. National Health Interview Survey. We also perform a simulation study to demonstrate that our heterogeneous nonignorable selection model is needed when there is moderate to strong selection bias.
Release date: 2021-06-24 - Articles and reports: 12-001-X202100100005Description:
Bayesian pooling strategies are used to solve precision problems related to statistical analyses of data from small areas. In such cases, the subpopulation samples are usually small, even though the population might not be. As an alternative, similar data can be pooled in order to reduce the number of parameters in the model. Many surveys consist of categorical data on each area, collected into a contingency table. We consider hierarchical Bayesian pooling models with a Dirichlet process prior for analyzing categorical data based on small areas. However, the prior used to pool such data frequently results in an overshrinkage problem. To mitigate for this problem, the parameters are separated into global and local effects. This study focuses on data pooling using a Dirichlet process prior. We compare the pooling models using bone mineral density (BMD) data taken from the Third National Health and Nutrition Examination Survey for the period 1988 to 1994 in the United States. Our analyses of the BMD data are performed using a Gibbs sampler and slice sampling to carry out the posterior computations.
Release date: 2021-06-24 - Articles and reports: 12-001-X201900200004Description:
Benchmarking lower level estimates to upper level estimates is an important activity at the United States Department of Agriculture’s National Agricultural Statistical Service (NASS) (e.g., benchmarking county estimates to state estimates for corn acreage). Assuming that a county is a small area, we use the original Fay-Herriot model to obtain a general Bayesian method to benchmark county estimates to the state estimate (the target). Here the target is assumed known, and the county estimates are obtained subject to the constraint that these estimates must sum to the target. This is an external benchmarking; it is important for official statistics, not just NASS, and it occurs more generally in small area estimation. One can benchmark these estimates by “deleting” one of the counties (typically the last one) to incorporate the benchmarking constraint into the model. However, it is also true that the estimates may change depending on which county is deleted when the constraint is included in the model. Our current contribution is to give each small area a chance to be deleted, and we call this procedure the random deletion benchmarking method. We show empirically that there are differences in the estimates as to which county is deleted and that there are differences of these estimates from those obtained from random deletion as well. Although these differences may be considered small, it is most sensible to use random deletion because it does not give preferential treatment to any county and it can provide small improvement in precision over deleting the last one benchmarking as well.
Release date: 2019-06-27 - Articles and reports: 12-001-X201700114822Description:
We use a Bayesian method to infer about a finite population proportion when binary data are collected using a two-fold sample design from small areas. The two-fold sample design has a two-stage cluster sample design within each area. A former hierarchical Bayesian model assumes that for each area the first stage binary responses are independent Bernoulli distributions, and the probabilities have beta distributions which are parameterized by a mean and a correlation coefficient. The means vary with areas but the correlation is the same over areas. However, to gain some flexibility we have now extended this model to accommodate different correlations. The means and the correlations have independent beta distributions. We call the former model a homogeneous model and the new model a heterogeneous model. All hyperparameters have proper noninformative priors. An additional complexity is that some of the parameters are weakly identified making it difficult to use a standard Gibbs sampler for computation. So we have used unimodal constraints for the beta prior distributions and a blocked Gibbs sampler to perform the computation. We have compared the heterogeneous and homogeneous models using an illustrative example and simulation study. As expected, the two-fold model with heterogeneous correlations is preferred.
Release date: 2017-06-22 - Articles and reports: 12-001-X201200111688Description:
We study the problem of nonignorable nonresponse in a two dimensional contingency table which can be constructed for each of several small areas when there is both item and unit nonresponse. In general, the provision for both types of nonresponse with small areas introduces significant additional complexity in the estimation of model parameters. For this paper, we conceptualize the full data array for each area to consist of a table for complete data and three supplemental tables for missing row data, missing column data, and missing row and column data. For nonignorable nonresponse, the total cell probabilities are allowed to vary by area, cell and these three types of "missingness". The underlying cell probabilities (i.e., those which would apply if full classification were always possible) for each area are generated from a common distribution and their similarity across the areas is parametrically quantified. Our approach is an extension of the selection approach for nonignorable nonresponse investigated by Nandram and Choi (2002a, b) for binary data; this extension creates additional complexity because of the multivariate nature of the data coupled with the small area structure. As in that earlier work, the extension is an expansion model centered on an ignorable nonresponse model so that the total cell probability is dependent upon which of the categories is the response. Our investigation employs hierarchical Bayesian models and Markov chain Monte Carlo methods for posterior inference. The models and methods are illustrated with data from the third National Health and Nutrition Examination Survey.
Release date: 2012-06-27 - Articles and reports: 12-001-X201100211603Description:
In many sample surveys there are items requesting binary response (e.g., obese, not obese) from a number of small areas. Inference is required about the probability for a positive response (e.g., obese) in each area, the probability being the same for all individuals in each area and different across areas. Because of the sparseness of the data within areas, direct estimators are not reliable, and there is a need to use data from other areas to improve inference for a specific area. Essentially, a priori the areas are assumed to be similar, and a hierarchical Bayesian model, the standard beta-binomial model, is a natural choice. The innovation is that a practitioner may have much-needed additional prior information about a linear combination of the probabilities. For example, a weighted average of the probabilities is a parameter, and information can be elicited about this parameter, thereby making the Bayesian paradigm appropriate. We have modified the standard beta-binomial model for small areas to incorporate the prior information on the linear combination of the probabilities, which we call a constraint. Thus, there are three cases. The practitioner (a) does not specify a constraint, (b) specifies a constraint and the parameter completely, and (c) specifies a constraint and information which can be used to construct a prior distribution for the parameter. The griddy Gibbs sampler is used to fit the models. To illustrate our method, we use an example on obesity of children in the National Health and Nutrition Examination Survey in which the small areas are formed by crossing school (middle, high), ethnicity (white, black, Mexican) and gender (male, female). We use a simulation study to assess some of the statistical features of our method. We have shown that the gain in precision beyond (a) is in the order with (b) larger than (c).
Release date: 2011-12-21 - 10. A Bayesian allocation of undecided voters ArchivedArticles and reports: 12-001-X200800110606Description:
Data from election polls in the US are typically presented in two-way categorical tables, and there are many polls before the actual election in November. For example, in the Buckeye State Poll in 1998 for governor there are three polls, January, April and October; the first category represents the candidates (e.g., Fisher, Taft and other) and the second category represents the current status of the voters (likely to vote and not likely to vote for governor of Ohio). There is a substantial number of undecided voters for one or both categories in all three polls, and we use a Bayesian method to allocate the undecided voters to the three candidates. This method permits modeling different patterns of missingness under ignorable and nonignorable assumptions, and a multinomial-Dirichlet model is used to estimate the cell probabilities which can help to predict the winner. We propose a time-dependent nonignorable nonresponse model for the three tables. Here, a nonignorable nonresponse model is centered on an ignorable nonresponse model to induce some flexibility and uncertainty about ignorabilty or nonignorability. As competitors we also consider two other models, an ignorable and a nonignorable nonresponse model. These latter two models assume a common stochastic process to borrow strength over time. Markov chain Monte Carlo methods are used to fit the models. We also construct a parameter that can potentially be used to predict the winner among the candidates in the November election.
Release date: 2008-06-26
Articles and reports (14)
Articles and reports (14) (0 to 10 of 14 results)
- Articles and reports: 12-001-X202400200004Description: While we avoid specifying the parametric relationship between the study variable and covariates, we illustrate the advantage of including a spatial component to better account for the covariates in our models to make Bayesian predictive inference. We treat each unique covariate combination as an individual stratum, then we use small area estimation techniques to make inference about the finite population mean of the continuous response variable. The two spatial models used are the conditional autoregressive and simple conditional autoregressive models. We include the spatial effects by creating the adjacency matrix via the Mahalanobis distance between covariates. We also show how to incorporate survey weights into the spatial models when dealing with probability survey data. We compare the results of two non-spatial models including the Scott-Smith model and the Battese, Harter, and Fuller model to the spatial models. We illustrate the comparison between the aforementioned models with an application using BMI data from eight counties in California. Our goal is to have neighboring strata yield similar predictions, and to increase the difference between strata that are not neighbors. Ultimately, using the spatial models shows less global pooling compared to the non-spatial models, which was the desired outcome.Release date: 2024-12-20
- Articles and reports: 12-001-X202300200004Description: We present a novel methodology to benchmark county-level estimates of crop area totals to a preset state total subject to inequality constraints and random variances in the Fay-Herriot model. For planted area of the National Agricultural Statistics Service (NASS), an agency of the United States Department of Agriculture (USDA), it is necessary to incorporate the constraint that the estimated totals, derived from survey and other auxiliary data, are no smaller than administrative planted area totals prerecorded by other USDA agencies except NASS. These administrative totals are treated as fixed and known, and this additional coherence requirement adds to the complexity of benchmarking the county-level estimates. A fully Bayesian analysis of the Fay-Herriot model offers an appealing way to incorporate the inequality and benchmarking constraints, and to quantify the resulting uncertainties, but sampling from the posterior densities involves difficult integration, and reasonable approximations must be made. First, we describe a single-shrinkage model, shrinking the means while the variances are assumed known. Second, we extend this model to accommodate double shrinkage, borrowing strength across means and variances. This extended model has two sources of extra variation, but because we are shrinking both means and variances, it is expected that this second model should perform better in terms of goodness of fit (reliability) and possibly precision. The computations are challenging for both models, which are applied to simulated data sets with properties resembling the Illinois corn crop.Release date: 2024-01-03
- Articles and reports: 12-001-X202200100004Description:
When the sample size of an area is small, borrowing information from neighbors is a small area estimation technique to provide more reliable estimates. One of the famous models in small area estimation is a multinomial-Dirichlet hierarchical model for multinomial counts. Due to natural characteristics of the data, making unimodal order restriction assumption to parameter spaces is relevant. In our application, body mass index is more likely at an overweight level, which means the unimodal order restriction may be reasonable. The same unimodal order restriction for all areas may be too strong to be true for some cases. To increase flexibility, we add uncertainty to the unimodal order restriction. Each area will have similar unimodal patterns, but not the same. Since the order restriction with uncertainty increases the inference difficulty, we make comparison with the posterior summaries and approximated log-pseudo marginal likelihood.
Release date: 2022-06-21 - Articles and reports: 12-001-X202100100001Description:
In a previous paper, we developed a model to make inference about small area proportions under selection bias in which the binary responses and the selection probabilities are correlated. This is the homogeneous nonignorable selection model; nonignorable selection means that the selection probabilities and the binary responses are correlated. The homogeneous nonignorable selection model was shown to perform better than a baseline ignorable selection model. However, one limitation of the homogeneous nonignorable selection model is that the distributions of the selection probabilities are assumed to be identical across areas. Therefore, we introduce a more general model, the heterogeneous nonignorable selection model, in which the selection probabilities are not identically distributed over areas. We used Markov chain Monte Carlo methods to fit the three models. We illustrate our methodology and compare our models using an example on severe activity limitation of the U.S. National Health Interview Survey. We also perform a simulation study to demonstrate that our heterogeneous nonignorable selection model is needed when there is moderate to strong selection bias.
Release date: 2021-06-24 - Articles and reports: 12-001-X202100100005Description:
Bayesian pooling strategies are used to solve precision problems related to statistical analyses of data from small areas. In such cases, the subpopulation samples are usually small, even though the population might not be. As an alternative, similar data can be pooled in order to reduce the number of parameters in the model. Many surveys consist of categorical data on each area, collected into a contingency table. We consider hierarchical Bayesian pooling models with a Dirichlet process prior for analyzing categorical data based on small areas. However, the prior used to pool such data frequently results in an overshrinkage problem. To mitigate for this problem, the parameters are separated into global and local effects. This study focuses on data pooling using a Dirichlet process prior. We compare the pooling models using bone mineral density (BMD) data taken from the Third National Health and Nutrition Examination Survey for the period 1988 to 1994 in the United States. Our analyses of the BMD data are performed using a Gibbs sampler and slice sampling to carry out the posterior computations.
Release date: 2021-06-24 - Articles and reports: 12-001-X201900200004Description:
Benchmarking lower level estimates to upper level estimates is an important activity at the United States Department of Agriculture’s National Agricultural Statistical Service (NASS) (e.g., benchmarking county estimates to state estimates for corn acreage). Assuming that a county is a small area, we use the original Fay-Herriot model to obtain a general Bayesian method to benchmark county estimates to the state estimate (the target). Here the target is assumed known, and the county estimates are obtained subject to the constraint that these estimates must sum to the target. This is an external benchmarking; it is important for official statistics, not just NASS, and it occurs more generally in small area estimation. One can benchmark these estimates by “deleting” one of the counties (typically the last one) to incorporate the benchmarking constraint into the model. However, it is also true that the estimates may change depending on which county is deleted when the constraint is included in the model. Our current contribution is to give each small area a chance to be deleted, and we call this procedure the random deletion benchmarking method. We show empirically that there are differences in the estimates as to which county is deleted and that there are differences of these estimates from those obtained from random deletion as well. Although these differences may be considered small, it is most sensible to use random deletion because it does not give preferential treatment to any county and it can provide small improvement in precision over deleting the last one benchmarking as well.
Release date: 2019-06-27 - Articles and reports: 12-001-X201700114822Description:
We use a Bayesian method to infer about a finite population proportion when binary data are collected using a two-fold sample design from small areas. The two-fold sample design has a two-stage cluster sample design within each area. A former hierarchical Bayesian model assumes that for each area the first stage binary responses are independent Bernoulli distributions, and the probabilities have beta distributions which are parameterized by a mean and a correlation coefficient. The means vary with areas but the correlation is the same over areas. However, to gain some flexibility we have now extended this model to accommodate different correlations. The means and the correlations have independent beta distributions. We call the former model a homogeneous model and the new model a heterogeneous model. All hyperparameters have proper noninformative priors. An additional complexity is that some of the parameters are weakly identified making it difficult to use a standard Gibbs sampler for computation. So we have used unimodal constraints for the beta prior distributions and a blocked Gibbs sampler to perform the computation. We have compared the heterogeneous and homogeneous models using an illustrative example and simulation study. As expected, the two-fold model with heterogeneous correlations is preferred.
Release date: 2017-06-22 - Articles and reports: 12-001-X201200111688Description:
We study the problem of nonignorable nonresponse in a two dimensional contingency table which can be constructed for each of several small areas when there is both item and unit nonresponse. In general, the provision for both types of nonresponse with small areas introduces significant additional complexity in the estimation of model parameters. For this paper, we conceptualize the full data array for each area to consist of a table for complete data and three supplemental tables for missing row data, missing column data, and missing row and column data. For nonignorable nonresponse, the total cell probabilities are allowed to vary by area, cell and these three types of "missingness". The underlying cell probabilities (i.e., those which would apply if full classification were always possible) for each area are generated from a common distribution and their similarity across the areas is parametrically quantified. Our approach is an extension of the selection approach for nonignorable nonresponse investigated by Nandram and Choi (2002a, b) for binary data; this extension creates additional complexity because of the multivariate nature of the data coupled with the small area structure. As in that earlier work, the extension is an expansion model centered on an ignorable nonresponse model so that the total cell probability is dependent upon which of the categories is the response. Our investigation employs hierarchical Bayesian models and Markov chain Monte Carlo methods for posterior inference. The models and methods are illustrated with data from the third National Health and Nutrition Examination Survey.
Release date: 2012-06-27 - Articles and reports: 12-001-X201100211603Description:
In many sample surveys there are items requesting binary response (e.g., obese, not obese) from a number of small areas. Inference is required about the probability for a positive response (e.g., obese) in each area, the probability being the same for all individuals in each area and different across areas. Because of the sparseness of the data within areas, direct estimators are not reliable, and there is a need to use data from other areas to improve inference for a specific area. Essentially, a priori the areas are assumed to be similar, and a hierarchical Bayesian model, the standard beta-binomial model, is a natural choice. The innovation is that a practitioner may have much-needed additional prior information about a linear combination of the probabilities. For example, a weighted average of the probabilities is a parameter, and information can be elicited about this parameter, thereby making the Bayesian paradigm appropriate. We have modified the standard beta-binomial model for small areas to incorporate the prior information on the linear combination of the probabilities, which we call a constraint. Thus, there are three cases. The practitioner (a) does not specify a constraint, (b) specifies a constraint and the parameter completely, and (c) specifies a constraint and information which can be used to construct a prior distribution for the parameter. The griddy Gibbs sampler is used to fit the models. To illustrate our method, we use an example on obesity of children in the National Health and Nutrition Examination Survey in which the small areas are formed by crossing school (middle, high), ethnicity (white, black, Mexican) and gender (male, female). We use a simulation study to assess some of the statistical features of our method. We have shown that the gain in precision beyond (a) is in the order with (b) larger than (c).
Release date: 2011-12-21 - 10. A Bayesian allocation of undecided voters ArchivedArticles and reports: 12-001-X200800110606Description:
Data from election polls in the US are typically presented in two-way categorical tables, and there are many polls before the actual election in November. For example, in the Buckeye State Poll in 1998 for governor there are three polls, January, April and October; the first category represents the candidates (e.g., Fisher, Taft and other) and the second category represents the current status of the voters (likely to vote and not likely to vote for governor of Ohio). There is a substantial number of undecided voters for one or both categories in all three polls, and we use a Bayesian method to allocate the undecided voters to the three candidates. This method permits modeling different patterns of missingness under ignorable and nonignorable assumptions, and a multinomial-Dirichlet model is used to estimate the cell probabilities which can help to predict the winner. We propose a time-dependent nonignorable nonresponse model for the three tables. Here, a nonignorable nonresponse model is centered on an ignorable nonresponse model to induce some flexibility and uncertainty about ignorabilty or nonignorability. As competitors we also consider two other models, an ignorable and a nonignorable nonresponse model. These latter two models assume a common stochastic process to borrow strength over time. Markov chain Monte Carlo methods are used to fit the models. We also construct a parameter that can potentially be used to predict the winner among the candidates in the November election.
Release date: 2008-06-26