Browse by

2. Two estimators for the growth rate of the total turnover

Paul Knottnerus

Consider a population of $N$ enterprises $U = {1, ..., N},$ and suppose there are no births and deaths in the population. Let $Y_{i}$ denote the value of the turnover for the $i -th$ enterprise in a given month (say $t$ ) and $X_{i}$ the value of the turnover of that enterprise in month $t - 12.$ Hence, the variables $y$ and $x$ concern the same variable on two different occasions. Denote their population totals by $Y$ and $X,$ and their population means by $\bar{Y}$ and $\bar{X},$ respectively. That is, $Y = \sum_{i \in U} Y_{i},$ $X = \sum_{i \in U} X_{i},$ $\bar{Y} = Y / N$ and $\bar{X} = X / N .$ Let $s_{1}, s_{2}$ and $s_{3}$ denote three mutually disjoint simple random samples from $U$ without replacement (SRS). Define $s_{12}$ and $s_{23}$ by $s_{12} = s_{1} \cup s_{2}$ and $s_{23} = s_{2} \cup s_{3},$ respectively. Denote the size of $s_{k}$ by $n_{k} (k = 1, 2, 3, 12, 23)$ and the corresponding sample means by ${\bar{y}}_{k}$ and ${\bar{x}}_{k} .$ Let the variable $x$ be observed in $s_{12}$ on the first occasion and the variable $y$ in $s_{23}$ on the second occasion. Denote the overlap ratios by $λ$ $(= n_{2} / n_{12})$ and $μ$ $(= n_{2} / n_{23}) .$ The SRS estimators for the population totals $Y$ and $X$ are defined by ${\hat{Y}}_{S R S} = N {\bar{y}}_{23}$ and ${\hat{X}}_{S R S} = N {\bar{x}}_{12},$ respectively.

Define the growth rate $g$ of the total turnover between the two occasions by $g = G - 1$ with $G = Y / X .$ For estimating $G$ there are two options. One of the standard (STN) options is based on the estimated totals on both occasions, that is

${\hat{G}}_{S T N} = \frac{{\hat{Y}}_{S R S}}{{\hat{X}}_{S R S}} = \frac{{\bar{y}}_{23}}{{\bar{x}}_{12}}; (2.1)$

see Nordberg (2000), Qualité and Tillé (2008) and Knottnerus and Van Delden (2012). Note that the estimator ${\hat{g}}_{S T N} = {\hat{G}}_{S T N} - 1$ for $g$ has the same variance as ${\hat{G}}_{S T N} .$ For sufficiently large $n$ this variance can be approximated by using a first-order Taylor series expansion of ${\hat{G}}_{S T N} .$ That is,

$\begin{matrix} var ({\hat{G}}_{S T N}) & \approx \frac{1}{{\bar{X}}^{2}} var ({\bar{y}}_{23} - G {\bar{x}}_{12}) \\ = \frac{1}{{\bar{X}}^{2}} {var ({\bar{y}}_{23}) + G^{2} var ({\bar{x}}_{12}) - 2 G cov ({\bar{y}}_{23}, {\bar{x}}_{12})} \\ = \frac{1}{{\bar{X}}^{2}} {(\frac{1}{n_{23}} - \frac{1}{N}) S_{y}^{2} + G^{2} (\frac{1}{n_{12}} - \frac{1}{N}) S_{x}^{2} - 2 G (\frac{λ μ}{n_{2}} - \frac{1}{N}) S_{x y}}, (2 .2) \end{matrix}$

where $S_{y}^{2} = \sum_{U} {(Y_{i} - \bar{Y})}^{2} / (N - 1)$ is the adjusted population variance of the $Y_{i}$ and $S_{x}^{2}$ that of the $X_{i}$ while $S_{x y} = \sum_{U} (X_{i} - \bar{X}) (Y_{i} - \bar{Y}) / (N - 1)$ is their adjusted population covariance. Cochran (1977, page 153) suggests as working rule to use the large-sample result if the sample size exceeds 30 and the coefficients of variation of the numerator and denominator are less than 10%. For (different) derivations of the expression for $cov ({\bar{y}}_{23}, {\bar{x}}_{12})$ used in (2.2), see Tam (1984) and Knottnerus and Van Delden (2012). The adjusted population (co)variances can be estimated unbiasedly by the sample (co)variances; recall sample (co)variances $s_{y k}^{2}$ and $s_{y x k}^{}$ from sample $s_{k}$ $(k = 1, 2, 3, 12, 23)$ are defined by

$\begin{matrix} s_{y k}^{2} & = \frac{1}{n_{k} - 1} \sum_{i \in s_{k}}^{} {(Y_{i} - {\bar{y}}_{k})}^{2} \\ s_{y x k}^{} & = \frac{1}{n_{k} - 1} \sum_{i \in s_{k}}^{} (Y_{i} - {\bar{y}}_{k}) (X_{i} - {\bar{x}}_{k}) . \end{matrix}$

An alternative option for estimating $G$ and $g$ is based on enterprises observed on both occasions in overlap $s_{2}$ (OLP). That is,

${\hat{G}}_{O L P} = \frac{{\bar{y}}_{2}}{{\bar{x}}_{2}} (2.3)$

For sufficiently large $n_{2},$ the well-known approximation for the variance of this estimator is

$\begin{matrix} var ({\hat{G}}_{O L P}) & \approx \frac{1}{{\bar{X}}^{2}} var ({\bar{y}}_{2} - G {\bar{x}}_{2}) \\ = \frac{1}{{\bar{X}}^{2}} (\frac{1}{n_{2}} - \frac{1}{N}) S_{y - G x}^{2}, (2 .4) \end{matrix}$

where $S_{y - G x}^{2}$ stands for $S_{y}^{2} + G^{2} S_{x}^{2} - 2 G S_{x y};$ see Cochran (1977, page 31). In order to get some more insight into the merits of both ${\hat{g}}_{S T N}$ and ${\hat{g}}_{O L P},$ consider the following examples.

Example 2.1. The data used in this example are panel observations on the turnover of Dutch supermarkets in February 2011 and 2012 from stratum 3 (size class 3). The stratum size is $N = 386.$ Furthermore, $n_{1} = 15, n_{2} = 57$ and $n_{3} = 17.$ For the different samples we have (in thousand euros)

${\bar{y}}_{23} = 97.2, {\bar{x}}_{12} = 89.8, s_{y 23}^{2} = 3, 781, and s_{x 12}^{2} = 2, 232.$

The population correlation coefficient $ρ_{x y} (= S_{x y} / S_{x} S_{y})$ between the $Y_{i}$ and the $X_{i}$ is estimated from overlap $s_{2}$ by ${\hat{ρ}}_{x y 2} = s_{x y 2} / s_{y 2} s_{x 2} = 0.876.$ To avoid negative variance estimates, Knottnerus and Van Delden (2012) propose estimating $S_{x y}$ in (2.2) by ${\hat{S}}_{x y} = {\hat{ρ}}_{x y 2} s_{x 12}^{} s_{y 23}^{} = 2, 545.$ Substituting the above outcomes into (2.1) and (2.2), we obtain ${\hat{g}}_{S T N} = 0.082 (= 8.2 %)$ and $v \hat{a} r ({\hat{g}}_{S T N}) = 0.00324.$ Assuming normality and using $u_{0.975} = 1.96,$ the 95%-confidence interval is approximately $I_{S T N}^{95} \approx (- 3.0 %, 19.4 %) .$ In contrast, from overlap $s_{2}$ we get the estimates

${\bar{y}}_{2} = 102.2, {\bar{x}}_{2} = 97.3 and {\hat{g}}_{O L P} = 0.050 (= 5 .0%) .$

Substituting the same estimates as before for $\bar{X}$ and the (co)variances of the $X_{i}$ and $Y_{i}$ into (2.4) yields $v \hat{a} r ({\hat{g}}_{O L P}) = 0.00166.$ Under the normality assumption this yields a smaller 95%-confidence interval $I_{O L P}^{95} \approx (- 3.0 %,13 .0%) .$

Example 2.2. Among the data of Example 2.1 there were three enterprises with extreme $g -$ values of -50%, 133% and -91%. It is beyond the scope of this paper to further analyse or correct these outliers. But to illustrate the difference between the estimators ${\hat{g}}_{S T N}$ and ${\hat{g}}_{O L P}$ once more, we simply omit these enterprises so that $n_{2} = 54$ instead of $n_{2} = 57.$ A first result is that estimate ${\hat{ρ}}_{x y 2}$ increases from 0.876 to 0.970. The latter is fairly high in spite of the fact that the coefficient of variation of the growth rates $g_{i} = (Y_{i} / X_{i} - 1)$ is $c v_{g 2} = s_{g 2} / {\bar{g}}_{2} = 4.1$ which still indicates a rather high volatility among the growth rates in this example. Furthermore, in analogy with the previous example, we get ${\hat{g}}_{S T N} = 0.074 (= 7.4 %)$ with $var ({\hat{g}}_{S T N}) = 0.00251$ and ${\hat{g}}_{O L P} = 0.039 (= 3.9 %)$ with $var ({\hat{g}}_{O L P}) = 0.00039.$ The corresponding 95%-confidence intervals in this slightly modified example are approximately $I_{S T N}^{95} \approx (-2 .4%,17 .2%)$ and $I_{O L P}^{95} \approx (0.1 %, 7.7 %) .$ Compared to Example 2.1 the interval $I_{O L P}^{95}$ decreased relatively stronger than $I_{S T N}^{95} .$

In addition, Example 2.2 may serve as a warning to be cautious when using sample means as ${\bar{y}}_{23}$ and ${\bar{x}}_{12}$ for estimating growth rates because these estimates may lead to unnecessarily large confidence interval around a suboptimal estimate. In the next section we look more closely at the question of what kind of circumstances may lead to a large interval $I_{S T N}^{95} .$

Previous | Next

Date modified:: 2017-09-20

Language selection

Search and menus

Search

Publications

Survey Methodology

Browse by

2. Two estimators for the growth rate of the total turnover