Social media as a data source for official statistics; the Dutch Consumer Confidence Index
Section 5. Discussion

For decades, national statistical institutes relied on probability sampling in the production of official statistics. This approach is based on a sound theory to draw valid statistical inference for large finite target populations based on relatively small random samples. Over the last decades, more and more alternative data sources, such as administrative and big data, have become available and the question is raised how to use these data sources in the production of official statistics. An important question is how results obtained with these sources can be generalized to an intended finite target population. Since the data generating process is generally unknown, it is not obvious how to draw valid inference with such data sources.

In this paper, the question is addressed how administrative and big data sources can be used in the production of official statistics. In the most extreme approach, survey data are replaced by related alternative data sources, running the risk of introducing e.g., selection bias. Since most surveys are conducted repeatedly, a time series modelling approach is proposed to investigate to which extent related alternative data sources reflect a similar evolution compared to the series obtained with a repeated survey. With a multivariate state space model, the correlation between the underlying unobserved components of both series can be modelled. In the case that components of the time series model are cointegrated, there are strong indications that both data sources are driven by the same underlying factor. This could be used as an argument that an alternative source can replace existing surveys since they reflect the same evolution of a process, generally at a different level.

The theory underlying probability sampling for finite population inference is stronger than reliance on the concept of cointegration. Series obtained from social media or Google Trends are selected by maximizing the correlation with the series from the sample survey and does not necessarily measure the same concept as the survey. There is no guarantee that this correlation is based on true causality and that the correlation will remain to exist in the future. Sampling theory, in contrast, provides a rigid mathematical theory showing that under a correct sampling strategy, i.e., the right combination of a probability sample with an approximately design-unbiased estimator, results in valid statistical inference for intended target populations.

Even in the case of cointegrated series, an extensive model evaluation, e.g., by some form of cross validation, will be required to assure that the alternative data source is a valid replacement. See in this context also Eichler (2013) for a discussion about the use of Granger causality for causal inference in multiple time series data. Instead of replacing a periodic survey for related data sources, they can be used in a multivariate time series modelling approach as an auxiliary series to improve the precision of the direct estimates or period-to-period change of the direct estimates obtained with a periodic survey. Another important benefit with big data sources is to use the higher frequency of these data sources to make more precise early predictions or nowcasts if in real time the survey estimate is not yet available but the covariate is already available. The time series model applied in this paper, initially proposed by Harvey and Chung (2000), is a generic approach for a model-based estimation procedure for periodic surveys. There are of course also issues with survey sampling. For example, continuously declining response rates and data collection modes that does not reach the intended target population result in selection bias either. In this case, cointegration with a related series derived from social media might be indication that there are similarities between the selection bias in the non-probabilistic big data sources and the non-response selection and coverage bias in a survey sample as pointed out by Baker et al. (2013).

In the application to the CCI, the time series modelling approach does not decrease the variance of the direct estimator if it is used for making level estimates. The reason is that the standard error of the time series model reflects the sampling error and the white noise of the population parameter. The standard error of the direct estimator only reflects the sampling error. In the case of the CCI, the variance component of the white noise of the population parameter is as large as the variance of the sampling error. The state space approach is still useful for producing official figures of the CCI, since it filters a more stable trend of the respondents opinion about the economic climate from the observed series of direct estimates. The situation, however, becomes different if the time series model is used to estimate month-to-month change. The stable trend estimates are the result of a strong positive correlation between the trend estimates between subsequent periods. As a result the standard errors of month-to-month change obtained with the time series model are clearly smaller than those of the direct estimates. Standard errors of smoothed month-to-month changes are about 47% smaller than those of the direct estimates. Standard errors of the filtered estimates are about 17% smaller than the standard errors of the direct estimates.

Using the SMI as an auxiliary series in a bivariate state space model slightly reduces the standard error of the model estimates of the CCI. However, since the available series of the SMI is relative short, the reduction obtained with this auxiliary series does not outweigh the loss of information in the CCI series that is observed in the period before the SMI became available. However, since both series reflect a similar evolution and social media is rapidly available, the SMI proved to be useful as an auxiliary series in the bivariate model to produce more reliable nowcasts for the CCI in real time at the moment that the SMI becomes available but the CCI is not available yet. In this application the SMI reduces the standard errors of the CCI in a nowcasting procedure with about 17%.

The question can be raised whether the SMI in its current operationalization measures the same concept as the CCI attempts and how the full potentials of social media or other big data sources can be used to measure consumer confidence better than the current CCI and SMI. Instead of constructing a social media index by taking the difference between positive and negative classified messages, an SMI could be constructed by looking at the concepts of the questions used for the CCI. If for example consumer confidence is measured by the amount of purchases of expensive goods during the last 12 months, or with the tendency of households to buy expensive goods, social media indices should be constructed that measure internet search for such goods (cars, houses, white goods, etc.) as well as actual purchases of such goods during the previous months. The strong advantage of this approach is that now actual behaviour of households is measured directly, while a survey measures it indirectly inducing more measurement error. This might eventually result in cointegrated series that measure similar concepts and further improves or even replaces the CCI.

Acknowledgements

The authors are grateful to the Associate Editor and the reviewers for careful reading of a former draft of this paper and providing constructive comments, which significantly improved the content of this paper. The views expressed in this paper are those of the authors and do not necessarily reflect the policies of Statistics Netherlands.

Appendix A

Model diagnostics

Table A.1
Univariate model (3.8) for CCI 172-24 obs
Table summary
This table displays the results of Univariate model (3.8) for CCI 172-24 obs. The information is grouped by Diagnostic (appearing as row headers), Value, (equation)-value and 95% conf. int. (appearing as column headers).
Diagnostic Value p MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpgpC0xg9Gqpe0xc9 qqaqFeFr0xbbG8FaYPYRWFb9fi0dYdcba9Ff0db9WqpeeaY=crpwe9 Fve9Fve8meqabeqadiWaceGabeqabeWabeqaeeaakeaacaWGWbGaaG jbVlabgkHiTaaa@3886@ value 95% conf. int.
L U
Log-likelihood -464 This is an empty cell This is an empty cell This is an empty cell
Mean std. innovations 0.0152 This is an empty cell This is an empty cell This is an empty cell
Variance std. innovations 1.0851 This is an empty cell This is an empty cell This is an empty cell
Skewness std. innovations 0.0276 This is an empty cell This is an empty cell This is an empty cell
Kurtosis std. innovations 2.8901 This is an empty cell This is an empty cell This is an empty cell
Bowman-Shenton testTable A.1 Note 1 on normality in the std. innovations 0.0926 0.955 This is an empty cell This is an empty cell
Ljung-Box testTable A.1 Note 2 on serial correlation in std. innovations 24.108 0.287 This is an empty cell This is an empty cell
Durban-Watson testTable A.1 Note 3 on serial correlation of std. innovations ( T = 148 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpgpC0xg9Gqpe0xc9 qqaqFeFr0xbbG8FaYPYRWFb9fi0dYdcba9Ff0db9WqpeeaY=crpwe9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbmaabmaabaGaamivaiabg2da9iaaigdacaaI0aGaaGioaaGaayjk aiaawMcaaaaa@3AD0@ 2.082 This is an empty cell 1.68 2.32
F MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpgpC0xg9Gqpe0xc9 qqaqFeFr0xbbG8FaYPYRWFb9fi0dYdcba9Ff0db9WqpeeaY=crpwe9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbiaadAeacaaMi8UaeyOeI0caaa@3876@ testTable A.1 Note 4 on heteroscedasticity of std. innovations ( d f num = d f denom = 60 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpgpC0xg9Gqpe0xc9 qqaqFeFr0xbbG8FaYPYRWFb9fi0dYdcba9Ff0db9WqpeeaY=crpwe9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbmaabmaabaGaamizaiaadAgapaWaaSbaaSqaa8qacaqGUbGaaeyD aiaab2gaa8aabeaak8qacqGH9aqpcaWGKbGaamOza8aadaWgaaWcba WdbiaabsgacaqGLbGaaeOBaiaab+gacaqGTbaapaqabaGcpeGaeyyp a0JaaGOnaiaaicdaaiaawIcacaGLPaaaaaa@4647@ 0.913 This is an empty cell 0.60 1.67
Table A.2
Bivariate model (3.9) for CCI 57-24 obs
Table summary
This table displays the results of Bivariate model (3.9) for CCI 57-24 obs. The information is grouped by Diagnostic (appearing as row headers), Value, equation value and 95% conf. int. (appearing as column headers).
Diagnostic Value p MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpgpC0xg9Gqpe0xc9 qqaqFeFr0xbbG8FaYPYRWFb9fi0dYdcba9Ff0db9WqpeeaY=crpwe9 Fve9Fve8meqabeqadiWaceGabeqabeWabeqaeeaakeaacaWGWbGaaG jbVlabgkHiTaaa@3886@ value 95% conf. int.
L U
Log-likelihood -230 This is an empty cell This is an empty cell This is an empty cell
Mean std. innovations -0.0872 This is an empty cell This is an empty cell This is an empty cell
Variance std. innovations 0.9777 This is an empty cell This is an empty cell This is an empty cell
Skewness std. innovations 0.0982 This is an empty cell This is an empty cell This is an empty cell
Kurtosis std. innovations 2.5450 This is an empty cell This is an empty cell This is an empty cell
Bowman-Shenton testTable A.2 Note 1 on normality in the std. innovations 0.3382 0.844 This is an empty cell This is an empty cell
Ljung-Box testTable A.2 Note 2 on serial correlation in std. innovations 18.060 0.645 This is an empty cell This is an empty cell
Durban-Watson testTable A.2 Note 3 on serial correlation of std. innovations ( T = 33 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpgpC0xg9Gqpe0xc9 qqaqFeFr0xbbG8FaYPYRWFb9fi0dYdcba9Ff0db9WqpeeaY=crpwe9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbmaabmaabaGaamivaiabg2da9iaaigdacaaI0aGaaGioaaGaayjk aiaawMcaaaaa@3AD0@ 2.133 This is an empty cell 1.32 2.68
F MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpgpC0xg9Gqpe0xc9 qqaqFeFr0xbbG8FaYPYRWFb9fi0dYdcba9Ff0db9WqpeeaY=crpwe9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbiaadAeacaaMi8UaeyOeI0caaa@3876@ testTable A.2 Note 4 on heteroscedasticity of std. innovations ( d f num = d f denom = 15 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpgpC0xg9Gqpe0xc9 qqaqFeFr0xbbG8FaYPYRWFb9fi0dYdcba9Ff0db9WqpeeaY=crpwe9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbmaabmaabaGaamizaiaadAgapaWaaSbaaSqaa8qacaqGUbGaaeyD aiaab2gaa8aabeaak8qacqGH9aqpcaWGKbGaamOza8aadaWgaaWcba WdbiaabsgacaqGLbGaaeOBaiaab+gacaqGTbaapaqabaGcpeGaeyyp a0JaaGOnaiaaicdaaiaawIcacaGLPaaaaaa@4647@ 0.783 This is an empty cell 0.35 2.86
Table A.3
Bivariate model (3.9) for SMI 57 -12 obs
Table summary
This table displays the results of Bivariate model (3.9) for SMI 57 -12 obs. The information is grouped by Diagnostic (appearing as row headers), Value, equation value and 95% conf. int. (appearing as column headers).
Diagnostic Value p MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpgpC0xg9Gqpe0xc9 qqaqFeFr0xbbG8FaYPYRWFb9fi0dYdcba9Ff0db9WqpeeaY=crpwe9 Fve9Fve8meqabeqadiWaceGabeqabeWabeqaeeaakeaacaWGWbGaaG jbVlabgkHiTaaa@3886@ value 95% conf. int.
L U
Log-likelihood -230 This is an empty cell This is an empty cell This is an empty cell
Mean std. innovations 0.0954 This is an empty cell This is an empty cell This is an empty cell
Variance std. innovations 1.0437 This is an empty cell This is an empty cell This is an empty cell
Skewness std. innovations -0.1311 This is an empty cell This is an empty cell This is an empty cell
Kurtosis std. innovations 2.5331 This is an empty cell This is an empty cell This is an empty cell
Bowman-Shenton testTable A.3 Note 1 on normality in the std. innovations 0.5377 0.764 This is an empty cell This is an empty cell
Ljung-Box testTable A.3 Note 2 on serial correlation in std. innovations 24.208 0.283 This is an empty cell This is an empty cell
Durban-Watson testTable A.3 Note 3 on serial correlation of std. innovations ( T = 45 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpgpC0xg9Gqpe0xc9 qqaqFeFr0xbbG8FaYPYRWFb9fi0dYdcba9Ff0db9WqpeeaY=crpwe9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbmaabmaabaGaamivaiabg2da9iaaisdacaaI1aaacaGLOaGaayzk aaaaaa@3A12@ 2.028 This is an empty cell 1.42 2.58
F MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpgpC0xg9Gqpe0xc9 qqaqFeFr0xbbG8FaYPYRWFb9fi0dYdcba9Ff0db9WqpeeaY=crpwe9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbiaadAeacaaMi8UaeyOeI0caaa@3876@ testTable A.3 Note 4 on heteroscedasticity of std. innovations ( d f num = d f denom = 20 ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqk0Jf9crFfpeea0xh9v8qiW7rqqrpgpC0xg9Gqpe0xc9 qqaqFeFr0xbbG8FaYPYRWFb9fi0dYdcba9Ff0db9WqpeeaY=crpwe9 Fve9Fve8meaabaqaciGacaGaaeqabaWaaeaaeaaakeaaqaaaaaaaaa WdbmaabmaabaGaamizaiaadAgapaWaaSbaaSqaa8qacaqGUbGaaeyD aiaab2gaa8aabeaak8qacqGH9aqpcaWGKbGaamOza8aadaWgaaWcba WdbiaabsgacaqGLbGaaeOBaiaab+gacaqGTbaapaqabaGcpeGaeyyp a0JaaGOmaiaaicdaaiaawIcacaGLPaaaaaa@4643@ 0.329 This is an empty cell 0.41 2.46

References

Baker, R., Brick, J.M., Bates, N.A., Battaglia, M., Couper, M.P., Dever, J.A., Gile, K.J. and Tourangeau, R. (2013). Summary report of the AAPOR task force on non-probability sampling. Journal of Survey Statistics and Methodology, 1, 90-143, first published online September 26, 2013, doi:10.1093/jssam/smt008.

Bell, W.R. (2005). Some considerations of seasonal adjustment variances. Census Bureau. Paper available at https://www.census.gov/ts/papers/jsm2005wrb.pdf.

Bell, W.R., and Hillmer, S.C. (1990). The time series approach to estimation for repeated surveys. Survey Methodology, 16, 2, 195-215. Paper available at http://www.statcan.gc.ca/pub/12-001-x/1990002/article/14535-eng.pdf.

Binder, D.A., and Dick, J.P. (1989). Modelling and estimation for repeated surveys. Survey Methodology, 15, 1, 29-45. Paper available at http://www.statcan.gc.ca/pub/12-001-x/1989001/article/14579-eng.pdf.

Binder, D.A., and Dick, J.P. (1990). A method for the analysis of seasonal ARIMA models. Survey Methodology, 16, 2, 239-253. Paper available at http://www.statcan.gc.ca/pub/12-001-x/1990002/article/14533-eng.pdf.

Blight, B.J.N., and Scott, A.J. (1973). A stochastic model for repeated surveys. Journal of the Royal Statistical Society, Series B, 35, 61-66.

Blumenstock, J., Cadamuro, G. and On, R. (2015). Predicting poverty and wealth from mobile phone metadata. Science, 350, 1073-1076.

Bollineni-Balabay, O., van den Brakel, J.A. and Palm, F. (2015). Multivariate state-space approach to variance reduction in series with level and variance breaks due to sampling redesigns. Accepted for publication in Journal of the Royal Statistical Society, Series A.

Bollineni-Balabay, O., van den Brakel, J.A. and Palm, F. (2017). State space time series modelling of the Dutch Labour Force Survey: Model selection and mean squared errors estimation. Survey Methodology, 43, 1, 41-67. Paper available at http://www.statcan.gc.ca/pub/12-001-x/2017001/article/14819-eng.pdf.

Bowley, A.L. (1926). Measurement of the precision attained in sampling. Bulletin de l’Institut International de Statistique, 22, Supplement to Book 1, 6-62.

Buelens, B., Burger, J. and van den Brakel, J.A. (2015). Predictive inference for non-probability samples: A simulation study. Discussion paper 2015-13, Statistics Netherlands, Heerlen.

Cochran, W. (1977). Sampling Theory. New York: John Wiley & Sons, Inc.

Daas, P., and Puts, M. (2014a). Big data as a source of statistical information. The Survey Statistician, 69, 22-31.

Daas, P., and Puts, M. (2014b). Social media sentiment and consumer confidence. European Central Bank Statistics paper series No. 5, Frankfurt Germany.

Doornik, J.A. (2009). An Object-oriented Matrix Programming Language Ox 6. London: Timberlake Consultants Press.

Durbin, J., and Koopman, S.J. (2012). Time Series Analysis by State Space Methods, Second Edition. Oxford: Oxford University Press.

Eichler, M. (2013). Causal inference with multiple time series: Principles and problems. Philosophical transactions of the Royal Statistical Society A, 371, issue 1997.

Feder, M. (2001). Time series analysis of repeated surveys: The state-space approach. Statistica Neerlandica, 55, 182-199.

Hansen, M.H., and Hurwitz, W.N. (1943). On the theory of sampling from finite populations. Annals of Mathematical Statistics 14, 333-362.

Harvey, A.C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge: Cambridge University Press.

Harvey, A.C., and Chung, C.H. (2000). Estimating the underlying change in unemployment in the UK. Journal of the Royal Statistical Society, Series A, 163, 303-339.

Koopman, S.J. (1997). Exact initial Kalman filtering and smoothing for non-stationary time series models. Journal of the American Statistical Association, 92, 1630-1638.

Koopman, S.J., Shephard, N. and Doornik, J.A. (2008). SsfPack 3.0: Statistical Algorithms for Models in State Space Form, London: Timberlake Consultants Press.

Koopman, S.J., Harvey, A., Shephard, N. and Doornik, J.A. (2009). STAMP 8.2, London: Timberlake Consultants Press.

Lind, J.T. (2005). Repeated surveys and the Kalman filter. Econometrics Journal, 8, 418-427.

Marchetti, S., Giusti, C., Pratesi, M., Salvati, N., Giannotti, F., Perdreschi, D., Rinzivillo, S., Pappalardo, L. and Gabrielli, L. (2015). Small area model-based estimators using big data sources. Journal of Official Statistics, 31, 263-281.

Neyman, J. (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection. Journal of the Royal Statistical Society, 97, 558-625.

Pang, B., and Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2, 1-135.

Pfeffermann, D. (1991). Estimation and seasonal adjustment of population means using data from repeated surveys. Journal of Business & Economic Statistics, 9, 163-175.

Pfeffermann, D., and Burck, L. (1990). Robust small area estimation combining time series and cross-sectional data. Survey Methodology, 16, 2, 217-237. Paper available at http://www.statcan.gc.ca/pub/12-001-x/1990002/article/14534-eng.pdf.

Pfeffermann, D., and Rubin-Bleuer, S. (1993). Robust joint modelling of labour force series of small areas. Survey Methodology, 19, 2, 149-163. Paper available at http://www.statcan.gc.ca/pub/12-001-x/1993002/article/14458-eng.pdf.

Pfeffermann, D., and Sverchkov, M. (2014). Estimation of mean squared error of X-11-ARIMA and other estimators of time series components. Journal of Official Statistics, 30, 811-838.

Pfeffermann, D., and Tiller, R. (2006). Small area estimation with state space models subject to benchmark constraints. Journal of the American Statistical Association, 101, 1387-1397.

Pfeffermann, D., Feder, M. and Signorelli, D. (1998). Estimation of autocorrelations of survey errors with application to trend estimation in small areas. Journal of Business & Economic Statistics, 16, 339-348.

Rao, J.N.K., and Yu, M. (1994). Small area estimation by combining time series and cross-sectional data. Canadian Journal of Statistics, 22, 511-528.

Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. Springer-Verlag.

Scott, A.J., and Smith, T.M.F. (1974). Analysis of repeated surveys using time series methods. Journal of the American Statistical Association, 69, 674-678.

Scott, A.J., Smith, T.M.F. and Jones, R.G. (1977). The application of time series methods to the analysis of repeated surveys. International Statistical Review/Revue Internationale de Statistique, 45, 13-28.

Tam, S.-M. (1987). Analysis of repeated surveys using a dynamic linear model. International Statistical Review/Revue Internationale de Statistique, 55, 1, 63-73.

Tiller, R.B. (1992). Time series modelling of sample survey data from the U.S. current population survey. Journal of Official Statistics, 8, 149-166.

van den Brakel, J.A., and Krieg, S. (2009). Estimation of the monthly unemployment rate through structural time series modelling in a rotating panel design. Survey Methodology, 35, 2, 177-190. Paper available at http://www.statcan.gc.ca/pub/12-001-x/2009002/article/11040-eng.pdf.

van den Brakel, J.A., and Krieg, S. (2015). Dealing with small sample sizes, rotation group bias and discontinuities in a rotating panel design. Survey Methodology, 41, 2, 267-296. Paper available at http://www.statcan.gc.ca/pub/12-001-x/2015002/article/14231-eng.pdf.


Date modified: