2. Two estimators for the growth rate of the total turnover
Paul Knottnerus
Previous | Next
Consider a
population of
enterprises
and suppose there are no births
and deaths in the population. Let
denote the value of the turnover
for the
enterprise in a given month (say
) and
the value of the turnover of that
enterprise in month
Hence, the variables
and
concern the same variable on two
different occasions. Denote their population totals by
and
and their population means by
and
respectively. That is,
and
Let
and
denote three mutually disjoint
simple random samples from
without replacement (SRS). Define
and
by
and
respectively. Denote the size of
by
and the corresponding sample
means by
and
Let the variable
be observed in
on the first occasion and the
variable
in
on the second occasion. Denote
the overlap ratios by
and
The SRS estimators for the
population totals
and
are defined by
and
respectively.
Define the growth
rate
of the total turnover between the
two occasions by
with
For estimating
there are two options. One of the
standard (STN) options is based on the estimated totals on both occasions, that
is
see Nordberg (2000), Qualité and Tillé (2008) and Knottnerus and Van
Delden (2012). Note that the estimator
for
has the same variance as
For sufficiently large
this variance can be approximated
by using a first-order Taylor series expansion of
That is,
where
is the adjusted population variance of the
and
that of the
while
is their adjusted population covariance.
Cochran (1977, page 153) suggests as working
rule to use the large-sample result if the sample size exceeds 30 and the
coefficients of variation of the numerator and denominator are less than 10%.
For (different) derivations of the expression for
used in (2.2), see Tam (1984) and Knottnerus
and Van Delden (2012). The adjusted population (co)variances can be estimated
unbiasedly by the sample (co)variances; recall sample (co)variances
and
from sample
are defined by
An alternative
option for estimating
and
is based on enterprises observed
on both occasions in overlap
(OLP). That is,
For sufficiently large
the well-known approximation for
the variance of this estimator is
where
stands for
see Cochran (1977, page 31). In
order to get some more insight into the merits of both
and
consider the following examples.
Example 2.1. The data used in this example are panel
observations on the turnover of Dutch supermarkets in February 2011 and 2012
from stratum 3 (size class 3). The stratum size is
Furthermore,
and
For the different samples we have
(in thousand euros)
The population correlation coefficient
between the
and the
is estimated from overlap
by
To avoid negative variance
estimates, Knottnerus and Van Delden (2012) propose estimating
in (2.2) by
Substituting the above outcomes
into (2.1) and (2.2), we obtain
and
Assuming normality and using
the 95%-confidence interval is
approximately
In contrast, from overlap
we get the estimates
Substituting the same estimates as before for
and the (co)variances of the
and
into (2.4) yields
Under the normality assumption
this yields a smaller 95%-confidence interval
Example 2.2. Among the data of Example 2.1 there were three
enterprises with extreme
values of -50%, 133% and -91%. It is beyond the
scope of this paper to further analyse or correct these outliers. But to
illustrate the difference between the estimators
and
once more, we simply omit these
enterprises so that
instead of
A first result is that estimate
increases from 0.876 to 0.970.
The latter is fairly high in spite of the fact that the coefficient of
variation of the growth rates
is
which still indicates a rather
high volatility among the growth rates in this example. Furthermore, in analogy
with the previous example, we get
with
and
with
The corresponding 95%-confidence
intervals in this slightly modified example are approximately
and
Compared to Example 2.1 the
interval
decreased relatively stronger
than
In addition,
Example 2.2 may serve as a warning to be cautious when using sample means as
and
for estimating growth rates
because these estimates may lead to unnecessarily large confidence interval
around a suboptimal estimate. In the next section we look more closely at the
question of what kind of circumstances may lead to a large interval
Previous | Next