Linearization versus bootstrap for variance estimation of the change between Gini indexes
Section 3. Two-sample case

3.1 Notation and composite estimation

Suppose now that two variables $Y_{1}$ and $Y_{2}$ are measured on the population $U,$ and let $y_{d 1}, \dots, y_{d N}$ denote the values taken by $Y_{d}, d =1, 2,$ on the units in the population. The variables $Y_{1}$ and $Y_{2}$ may typically refer to some characteristic of interest collected at two different times $τ_{1}$ and $τ_{2} .$ We consider the estimation of parameters $Δ θ$ that can be written as a functional $Δ θ = T (M_{1}, M_{2}),$ where $M_{d} = \sum_{k \in U} δ_{{y_{d k}}} .$ For instance, the linear case $Δ t = t_{y 2} - t_{y 1}$ corresponds to the difference between the totals $t_{y 2} = \sum_{k \in U} y_{2 k}$ and $t_{y 1} = \sum_{k \in U} y_{1 k} .$

Let $s_{1}$ and $s_{2}$ be two samples of sizes $n_{1}$ and $n_{2},$ respectively, selected from the same population $U$ according to some two-dimensional sampling design $p (\cdot, \cdot)$ (see Goga, 2003). The variable $Y_{1}$ is measured on $s_{1},$ while the variable $Y_{2}$ is measured on $s_{2} .$ Plugging sample-based estimators ${\hat{M}}_{d}$ in $Δ θ$ yields the substitution estimator $\hat{Δ θ} = T ({\hat{M}}_{1}, {\hat{M}}_{2}) .$ Unlike the one-sample case, several estimators ${\hat{M}}_{d}$ are possible. In what follows, we focus on the general class of composite estimators introduced by Goga, Deville and Ruiz-Gazen (2009). We note $s_{1 •} = s_{1} \ s_{2},$ $s_{3} = s_{1} \cap s_{2}$ and $s_{2 •} = s_{2} \ s_{1} .$ For $⋄ \in {1 •, 3, 2 •} ,$ we note $π_{⋄ , k}$ the expected number of draws for unit $k$ in $s_{⋄}$ and ${\hat{M}}_{d , ⋄} = \sum_{k \in s_{⋄}} w_{⋄ , k} δ_{y_{d k}} ,$ where $w_{⋄ , k} = π_{⋄ , k}^{- 1} .$ The composite estimators of $M_{1}$ and $M_{2}$ are

${\hat{M}}_{1}^{co} (a) = a {\hat{M}}_{1,1 •} + (1 - a) {\hat{M}}_{1,3} and {\hat{M}}_{2}^{co} (b) = b {\hat{M}}_{2,2 •} + (1 - b) {\hat{M}}_{2,3}, (3.1)$

where $a$ and $b$ are some known constants. The choice $a = b =0$ leads to the intersection estimator with ${\hat{M}}_{1}^{int} = {\hat{M}}_{1,3}$ and ${\hat{M}}_{2}^{int} = {\hat{M}}_{2,3},$ where the overlapping sample $s_{3}$ only is used.

When estimating the parameter $Δ t = t_{y 2} - t_{y 1},$ the composite estimator is

${\hat{Δ t}}^{co} (a, b) = {\hat{t}}_{y_{2}}^{co} - {\hat{t}}_{y_{1}}^{co}, (3.2)$

where ${\hat{t}}_{y_{1}}^{co} = \int y d {\hat{M}}_{1}^{co} (y)$ and ${\hat{t}}_{y_{2}}^{co} = \int y d {\hat{M}}_{2}^{co} (y) .$ It may be rewritten as

${\hat{Δ t}}^{co} (a, b) = b ({\hat{t}}_{y_{2} , s_{2 •}} - {\hat{t}}_{y_{2} , s_{3}}) - a ({\hat{t}}_{y_{1} , s_{1 •}} - {\hat{t}}_{y_{1} , s_{3}}) + ({\hat{t}}_{y_{2} , s_{3}} - {\hat{t}}_{y_{1} , s_{3}}), (3.3)$

where ${\hat{t}}_{y_{d} , s_{⋄}} = \sum_{k \in s_{⋄}} w_{⋄, k} y_{d k} .$ The variance of the composite estimator is

$\begin{matrix} V {{\hat{Δ t}}^{co} (a, b)} & = & (b, - a,1) V {({\hat{t}}_{y_{2} , s_{2 •}} - {\hat{t}}_{y_{2} , s_{3}} , {\hat{t}}_{y_{1} , s_{1 •}} - {\hat{t}}_{y_{1} , s_{3}} , {\hat{t}}_{y_{2} , s_{3}} - {\hat{t}}_{y_{1} , s_{3}})^{⊤}} {(b, - a,1)}^{⊤} . (3.4) \end{matrix}$

Finding the vector $(a_{opt} , b_{opt})$ $_{}^{⊤}$ which minimizes the variance in (3.4) leads to the optimal composite estimator (Goga, Deville and Ruiz-Gazen, 2009, Section 3.6). Note that this is not an estimator per se, since it depends on unknown quantities which need to be estimated in practice. However, this is a useful benchmark which we will use for the appraisal of simpler composite estimators.

A variance estimator is obtained by substituting in (3.4) an estimator of the variance-covariance matrix. The derivation of variance estimators is detailed in Sections 3.1.1 and 3.1.2 for two examples of two-dimensional sampling designs.

3.1.1 Two-dimensional SI design

The two-dimensional SI design (SI2) of fixed size $(n_{1 •} , n_{3}, n_{2 •})$ assigns equal probabilities to all $s = (s_{1} , s_{2})$ for which the associated subsamples $s_{1 •} ,$ $s_{3}$ and $s_{2 •}$ have the required sizes $n_{1 •} ,$ $n_{3}$ and $n_{2 •} ,$ see Goga (2003) and Qualité and Tillé (2008). The SI2 design has the attractive property that the marginal samples $s_{1 •} ,$ $s_{3}$ and $s_{2 •}$ are SI samples from the population $U .$ Similarly, $s_{1}$ is a SI sample of size $n_{1} = n_{1 •} + n_{3},$ and $s_{2}$ is a SI sample of size $n_{2} = n_{2 •} + n_{3} .$ For the SI2 sampling design, the composite estimator in (3.3) yields

${\hat{Δ t}}^{co} (a, b) = N b ({\bar{y}}_{2, s_{2 •}} - {\bar{y}}_{2, s_{3}}) - N a ({\bar{y}}_{1, s_{1 •}} - {\bar{y}}_{1, s_{3}}) + N ({\bar{y}}_{2, s_{3}} - {\bar{y}}_{1, s_{3}}), (3.5)$

and the variance of the composite estimator is

$V {{\hat{Δ t}}^{co} (a, b)} = N^{2} {c_{1} (a) S_{y_{1} , U}^{2} - 2 c_{12} (a, b) S_{y_{1} y_{2} , U} + c_{2} (b) S_{y_{2} , U}^{2}}, (3.6)$

with

$\begin{array}{l} c_{1} (a) & = \frac{{(1 - a)}^{2}}{n_{3}} + \frac{a^{2}}{n_{1} - n_{3}} - \frac{1}{N}, \\ c_{2} (b) & = \frac{{(1 - b)}^{2}}{n_{3}} + \frac{b^{2}}{n_{2} - n_{3}} - \frac{1}{N}, \\ c_{12} (a, b) & = \frac{(1 - a) (1 - b)}{n_{3}} - \frac{1}{N}, \end{array}$

see Appendix for a proof.

We consider two examples. The choice $a = b =0$ leads to the intersection estimator

${\hat{Δ t}}^{int} = {\hat{Δ t}}^{co} (0,0) = \frac{N}{n_{3}} \sum_{k \in s_{3}} (y_{2 k} - y_{1 k}), (3.7)$

and the variance simplifies as

$V {{\hat{Δ t}}^{int}} = N^{2} (\frac{1}{n_{3}} - \frac{1}{N}) S_{y_{2} - y_{1} , U}^{2} . (3.8)$

The choice $a = n_{1}^{- 1} n_{1 •}$ and $b = n_{2}^{- 1} n_{2 •}$ leads to the union estimator

${\hat{Δ t}}^{uni} = {\hat{Δ t}}^{co} (n_{1}^{- 1} n_{1 •} , n_{2}^{- 1} n_{2 •}) = \frac{N}{n_{2}} \sum_{k \in s_{2}} y_{2 k} - \frac{N}{n_{1}} \sum_{k \in s_{1}} y_{1 k} (3.9)$

where the complete samples are used, and the variance may be written as

$V {{\hat{Δ t}}^{uni}} = N^{2} {(\frac{1}{n_{1}} - \frac{1}{N}) S_{y_{1} , U}^{2} - 2 (\frac{n_{3}}{n_{1} n_{2}} - \frac{1}{N}) S_{y_{1} y_{2} , U} + (\frac{1}{n_{2}} - \frac{1}{N}) S_{y_{2} , U}^{2}} . (3.10)$

The variances of the union estimator and of the intersection estimator were derived by Qualité and Tillé (2008), see also Tam (1984).

The choice of $a$ and $b$ is of practical importance to obtain an efficient composite estimator. After some algebra, the vector $(a_{opt} , b_{opt})$ $_{}^{⊤}$ which minimizes the variance of ${\hat{Δ t}}^{co} (a, b)$ is given by

$(a_{opt} , b_{opt})$ $_{}^{⊤}$ $\begin{matrix} = & A^{- 1} X (3.11) \end{matrix}$

with

$A = (\begin{array}{l} \frac{n_{1}}{n_{1} - n_{3}} & - \frac{S_{y_{1} y_{2} , U}}{S_{y_{1} , U}^{2}} \\ - \frac{S_{y_{1} y_{2} , U}}{S_{y_{2} , U}^{2}} & \frac{n_{2}}{n_{2} - n_{3}} \end{array}) and X = {(1 - \frac{S_{y_{1} y_{2} , U}}{S_{y_{1} , U}^{2}}, 1 - \frac{S_{y_{1} y_{2} , U}}{S_{y_{2} , U}^{2}})}^{⊤} . (3.12)$

For two variables $Y_{1}$ and $Y_{2}$ related to a same characteristic collected at two different times, $S_{y_{1} y_{2} , U}$ is expected to be close to $S_{y_{1} , U}^{2}$ and $S_{y_{2} , U}^{2} .$ The vector $X$ in (3.12) is in turn close to the null vector, and if the size of the overlapping sample $s_{3}$ is comparable to that of $s_{1 •}$ and $s_{2 •}$ we obtain $a_{opt} ≃ 0$ and $b_{opt} ≃ 0.$ Therefore, using the intersection estimator where $a = b =0$ seems reasonable in practice. On the contrary, the union estimator can be very inefficient; see Section 4.2 for an illustration. These conclusions are consistent with that of Qualité and Tillé (2008), Section 2.2.2.

Several variance estimators may be used for the composite estimator. Estimating the dispersions on the overlapping sample only yields the unbiased variance estimator

$v_{int}^{HT} {{\hat{Δ t}}^{co} (a, b)} = N^{2} {c_{1} (a) S_{y_{1} , s_{3}}^{2} - 2 c_{12} (a, b) S_{y_{1} y_{2} , s_{3}} + c_{2} (b) S_{y_{2} , s_{3}}^{2}}, (3.13)$

while an estimation on the whole samples yields

$v_{uni}^{HT} {{\hat{Δ t}}^{co} (a, b)} = N^{2} {c_{1} (a) S_{y_{1} , s_{1}}^{2} - 2 c_{12} (a, b) S_{y_{1} y_{2} , s_{3}} + c_{2} (b) S_{y_{2} , s_{2}}^{2}} . (3.14)$

Berger (2004) considered variance estimation for the union estimator under a maximum entropy rotating sampling scheme, by estimating separately the three components in (3.6).

3.1.2 Two-dimensional multistage design

We now consider a two-dimensional two-stage sampling design (MULT2). We assume that a with-replacement first-stage sample $s_{I}$ of size $m$ is first selected among the PSUs $U_{1} , \dots, U_{N_{I}} .$ Inside each PSU $i \in s_{I} ,$ a SI2 sample of size $(n_{1 •}^{i} , n_{3}^{i} , n_{2 •}^{i})$ is then selected. This type of sampling design emerges in particular in case of a self-weighted two-stage design in two waves, with a partial replacement at the second wave of the SSUs selected at the first wave. The composite estimator in (3.3) yields

${\hat{Δ t}}^{co} (a, b) = \sum_{i \in s_{I}} π_{I i}^{- 1} {\hat{Δ t}}^{i , co} (a, b) (3.15)$

where

${\hat{Δ t}}^{i , co} (a, b) = N_{i} b ({\bar{y}}_{2, s_{2 •}^{i}} - {\bar{y}}_{2, s_{3}^{i}}) - N_{i} a ({\bar{y}}_{1, s_{1 •}^{i}} - {\bar{y}}_{1, s_{3}^{i}}) + N_{i} ({\bar{y}}_{2, s_{3}^{i}} - {\bar{y}}_{1, s_{3}^{i}}), (3.16)$

where ${\bar{y}}_{d , s_{⋄}^{i}} = {(n_{⋄}^{i})}^{- 1} \sum_{k \in s_{⋄}^{i}} y_{⋄ k} ,$ where $s_{⋄}^{i} = s_{⋄} \cap U_{i} ,$ and where $N_{i}$ denotes the number of SSUs inside the PSU $u_{i} .$

For example, using the overlapping samples only inside the PSUs yields the intersection estimator

${\hat{Δ t}}^{int} = \sum_{i \in s_{I}} π_{I i}^{- 1} {\hat{Δ t}}^{i , int} with {\hat{Δ t}}^{i , int} = N_{i} ({\bar{y}}_{2, s_{3}^{i}} - {\bar{y}}_{1, s_{3}^{i}}) . (3.17)$

Using the complete samples inside the PSUs yields the union estimator

${\hat{Δ t}}^{uni} = \sum_{i \in s_{I}} π_{I i}^{- 1} {\hat{Δ t}}^{i , uni} with {\hat{Δ t}}^{i , uni} = N_{i} ({\bar{y}}_{2, s_{2}^{i}} - {\bar{y}}_{1, s_{1}^{i}}) . (3.18)$

We note that for any vector of values ${(a, b)}^{⊤} ,$ the variance due to the first-stage of sampling for ${\hat{Δ t}}^{co} (a, b)$ is the same. The possible composite estimators thus differ with respect to the second-stage variance only. In view of the discussion in Section 3.1.1, we therefore expect the intersection estimator to be close to the optimal composite estimator; see Section 4.2 for an illustration. An unbiased variance estimator for ${\hat{Δ t}}^{co} (a, b)$ is given by

$v^{HH} {{\hat{Δ t}}^{co} (a, b)} = \frac{m}{m - 1} \sum_{i \in s_{I}} {(\frac{{\hat{Δ t}}^{i , co} (a, b)}{π_{I i}} - \frac{{\hat{Δ t}}^{co} (a, b)}{m})}^{2} . (3.19)$

3.2 Estimation of the change between Gini indexes

The change between Gini indexes $Δ G = G_{2} - G_{1}$ may be written as

$Δ G = \frac{\int {2 F_{2 N} (y) - 1} y d M_{2} (y)}{\int y d M_{2} (y)} - \frac{\int {2 F_{1 N} (y) - 1} y d M_{1} (y)}{\int y d M_{1} (y)} (3.20)$

where $F_{d N} (y) = N^{- 1} \sum_{k \in U} 1_{{y_{d k} \leq y}} , d =1, 2.$ Using composite estimation leads to

${\hat{Δ G}}^{co} (a, b) = \frac{\int {2 {\hat{F}}_{2 N}^{co} (y) - 1} y d {\hat{M}}_{2}^{co} (y)}{\int y d {\hat{M}}_{2}^{co} (y)} - \frac{\int {2 {\hat{F}}_{1 N}^{co} (y) - 1} y d {\hat{M}}_{1}^{co} (y)}{\int y d {\hat{M}}_{1}^{co} (y)} (3.21)$

where ${\hat{F}}_{d N}^{co} (y) = {\int d {\hat{M}}_{d}^{co} (y)}^{- 1} \int 1_{{ξ \leq y}} d {\hat{M}}_{d}^{co} (ξ) .$

Usually, in a temporal sampling framework, the samples $s_{1}$ and $s_{2}$ are not independent. Consequently, our set-up differs from the usual estimation of functionals depending on distribution functions estimated with independent samples; see for example Pires and Branco (2002) and Reid (1981), who give the first-order expansion of a two-sample functional using the partial influence functions. Davison and Hinkley (1997, page 71) give bootstrap methods under a similar framework. Using a general two-dimensional sampling design $p (\cdot, \cdot),$ Goga, Deville and Ruiz-Gazen (2009) give a two-sample linearization technique of bivariate functionals that will be used in what follows.

3.3 Linearization variance estimation

To obtain the asymptotic variance of ${\hat{Δ θ}}^{co} (a, b),$ we adopt the asymptotic framework introduced by Goga, Deville and Ruiz-Gazen (2009), which is an extension to the two-sample case of the asymptotic framework of Isaki and Fuller (1982). Define, when they exist, the partial influence functions of a functional $T (M_{1}, M_{2})$ at point $y$ as

$\begin{array}{l} I_{1} T (M_{1} , M_{2} ; y) & = \lim_{h \to 0} \frac{T (M_{1} + h δ_{y} , M_{2}) - T (M_{1} , M_{2})}{h}, \\ I_{2} T (M_{1} , M_{2} ; y) & = \lim_{h \to 0} \frac{T (M_{1} , M_{2} + h δ_{y}) - T (M_{1} , M_{2})}{h} . \end{array}$

We define the linearized variables $u_{d k} = I_{d} T (M_{1} , M_{2} ; y_{d k})$ for $d =1, 2$ as the partial influence functions of $T$ at $(M_{1} , M_{2})$ and $y = y_{d k} .$ For the change between Gini indexes $Δ G,$ the linearized variables $u_{d k}$ may be computed using (2.10), namely

$u_{d k} = 2 F_{d N} (y_{d k}) \frac{y_{d k} - {\bar{y}}_{d k , U <}}{t_{y_{d}}} - y_{d k} \frac{G_{d} + 1}{t_{y_{d}}} + \frac{1 - G_{d}}{N}, (3.22)$

where ${\bar{y}}_{d k , U <} = {(\sum_{l \in U} 1_{{y_{d l} < y_{d k}}})}^{- 1} \sum_{j \in U} y_{d j} 1_{{y_{d j} < y_{d k}}} .$ The estimated linearized variable is

${\hat{u}}_{d k} = 2 {\hat{F}}_{d N}^{co} (y_{d k}) \frac{y_{d k} - {\bar{y}}_{d k , s <}^{co}}{{\hat{t}}_{y 1}^{co}} - y_{d k} \frac{{\hat{G}}_{d}^{co} + 1}{{\hat{t}}_{y 1}^{co}} + \frac{1 - {\hat{G}}_{d}^{co}}{\hat{N}} . (3.23)$

3.3.1 Two-dimensional SI design

In case of the SI2 design presented in Section 3.1.1, plugging the variables $u_{d k}$ derived in (3.22) into the variance formula in (3.6) yields the variance approximation

$V {{\hat{Δ G}}^{co} (a, b)} ≃ N^{2} {c_{1} (a) S_{u_{1} , U}^{2} - 2 c_{12} (a, b) S_{u_{1} u_{2} , U} + c_{2} (b) S_{u_{2} , U}^{2}},$

see Theorem 1 in Goga, Deville and Ruiz-Gazen (2009). To obtain a variance estimator, the linearized variables may be estimated in several ways. If the overlapping sample $s_{3}$ only is used, the estimated linearized variables ${\hat{u}}_{d}$ are obtained from (3.23) by taking ${\hat{M}}_{1}^{co} = {\hat{M}}_{1, 3}$ and ${\hat{M}}_{2}^{co} = {\hat{M}}_{2, 3} .$ A variance estimator is then obtained by plugging these linearized variables into (3.13). This leads to

$v_{int}^{HT} {{\hat{Δ G}}^{co} (a, b)} = N^{2} {c_{1} (a) S_{{\hat{u}}_{1} , s_{3}}^{2} - 2 c_{12} (a, b) S_{{\hat{u}}_{1} {\hat{u}}_{2} , s_{3}} + c_{2} (b) S_{{\hat{u}}_{2} , s_{3}}^{2}} . (3.24)$

If the whole samples $s_{1}$ and $s_{2}$ are used, the estimated linearized variable ${\hat{u}}_{d}$ are obtained from (3.23) by taking ${\hat{M}}_{1}^{co} = {\hat{M}}_{1,1}$ and ${\hat{M}}_{2}^{co} = {\hat{M}}_{2,2} .$ A variance estimator is then obtained by plugging these linearized variables into (3.14). This leads to

$v_{uni}^{HT} {{\hat{Δ G}}^{co} (a, b)} = N^{2} {c_{1} (a) S_{{\hat{u}}_{1} , s_{1}}^{2} - 2 c_{12} (a, b) S_{{\hat{u}}_{1} {\hat{u}}_{2} , s_{3}} + c_{2} (b) S_{{\hat{u}}_{2} , s_{2}}^{2}} . (3.25)$

3.3.2 Two-dimensional multistage design

In case of the MULT2 design presented in Section 3.1.2, the linearized variables may also be estimated in several ways. For the sake of simplicity, we consider using the overlapping sample $s_{3}$ only so that the estimated linearized variables ${\hat{u}}_{d}$ are obtained from (3.23) by taking ${\hat{M}}_{1}^{co} = {\hat{M}}_{1,3}$ and ${\hat{M}}_{2}^{co} = {\hat{M}}_{2,3} .$ A variance estimator is then obtained by plugging these linearized variables into (3.19). This leads to

$v^{HH} {{\hat{Δ G}}^{co} (a, b)} = \frac{m}{m - 1} \sum_{i \in s_{I}} {(\frac{{\hat{Δ u}}^{i , co} (a, b)}{π_{I i}} - \frac{{\hat{Δ u}}^{co} (a, b)}{m})}^{2}, (3.26)$

where ${\hat{Δ u}}^{co} (a, b)$ and ${\hat{Δ u}}^{i , co} (a, b)$ are obtained from (3.15) and (3.16), respectively, by replacing $y_{d k}$ with ${\hat{u}}_{d k} .$

3.4 Bootstrap variance estimation

Bootstrap methods have not yet been studied for the change between Gini indexes. The principles of the weighted bootstrap technique can be extended to the two-sample context, i.e. each measure ${\hat{M}}_{d , ⋄}$ with $d =1, 2$ and $⋄ \in {1 •, 3, 2 •}$ is estimated, conditionally on the samples originally selected, by some weighted bootstrap measure ${\hat{M}}_{d , ⋄}^{*}$ which enables to match, at least approximately, the two first moments of an unbiased estimator in the linear case. In Section 3.4.1, we consider a generalization of the BWO to the SI2 design. In Section 3.4.2, we propose a generalisation of the BWR to the MULT2 design.

3.4.1 A generalization of the BWO to the SI2 design

We first consider the SI2 design. Building a pseudo-population $U^{*}$ is more intricate in the two-sample case, since the variables of interest measured at waves $τ_{1}$ and $τ_{2}$ need to be available for each unit in $U^{*} .$ We therefore describe a bootstrap algorithm where the overlapping sample $s_{3}$ only is used to build the pseudo-population $U^{*} ,$ in the spirit of the intersection variance estimator in (3.24).

Suppose that $N / n_{3}$ is an integer. The vectors $D_{⋄}$ are obtained by, first creating a pseudo-population $U^{*}$ of size $N$ by duplicating $N / n_{3}$ times each unit $k$ in the original sample $s_{3} .$ A SI2 resample $s^{*} = (s_{1 •}^{*} , s_{3}^{*} , s_{• 2}^{*})$ of size $(n_{1 •} , n_{3} , n_{2 •})$ is then selected in $U^{*} .$ The bootstrap measures are then

${\hat{M}}_{d , ⋄}^{*} = \sum_{k \in s_{3}} w_{⋄ , k} D_{⋄ , k} δ_{y_{d k}} , (3.27)$

with $D_{⋄ , k}$ the number of times that unit $k$ is selected in the resample $s_{⋄}^{*} .$ In the linear case, the bootstrap estimator of the parameter $Δ t$ is then

${\hat{Δ t}}^{co*} (a, b) = b ({\hat{t}}_{y_{2} , s_{2 •}^{*}} - {\hat{t}}_{y_{2} , s_{3}^{*}}) - a ({\hat{t}}_{y_{1} , s_{1 •}^{*}} - {\hat{t}}_{y_{1} , s_{3}^{*}}) + ({\hat{t}}_{y_{2} , s_{3}^{*}} - {\hat{t}}_{y_{1} , s_{3}^{*}}), (3.28)$

where ${\hat{t}}_{y_{d} , s_{⋄}^{*}} = \sum_{k \in s_{3}} w_{⋄ , k} D_{⋄ , k} y_{d k} .$ After some algebra, we obtain

$E_{*} {{\hat{Δ t}}^{co*} (a, b)} = {\hat{Δ t}}^{int} and V_{*} {{\hat{Δ t}}^{co *} (a, b)} = \frac{1 - n_{3}^{- 1}}{1 - N^{- 1}} v_{int}^{HT} {{\hat{Δ t}}^{co} (a, b)}, (3.29)$

where ${\hat{Δ t}}^{int}$ is given in (3.7), and $v_{int}^{HT} ({\hat{t}}_{y 1}^{HT})$ is given in (3.13). The proposed generalization of the BWO therefore enables to exactly match the intersection estimator of the first moment, and to approximately match the intersection estimator of the second moment for a large $n_{3} .$

The building of $U^{*}$ may be avoided by noting that under the BWO procedure, each vector $D_{⋄}$ follows a multivariate hypergeometric distribution. Therefore, the resampling weights may be directly generated. The algorithm may be adapted to the general case when $N / n_{3}$ is not an integer by means of any of the techniques mentioned in Section 2.4.

3.4.2 A generalization of the BWR for the two-dimensional multistage design

We now consider the two-dimensional two-stage sampling design with a common first-stage sample $s_{I}$ presented in Section 3.1.2. The proposed bootstrap procedure is similar to that described in Rao and Wu (1988). A with-replacement resample $s_{I}^{*}$ of size $m - 1$ is selected by means of simple random sampling with replacement (SIR) in the original first-stage sample $s_{I} .$ The bootstrap measures are then

${\hat{M}}_{d , ⋄}^{*} = \frac{m}{m - 1} \sum_{i \in s_{I}^{*}} \sum_{k \in s_{⋄}^{i}} π_{I i}^{- 1} π_{⋄ k | i}^{- 1} δ_{y_{d k}} where π_{⋄ k | i} = \frac{n_{⋄}^{i}}{N_{i}} . (3.30)$

It may be rewritten as

${\hat{M}}_{d , ⋄}^{*} = \sum_{k \in s_{⋄}} w_{⋄ , k} D_{⋄ , k} δ_{y_{d k}} , (3.31)$

with $s_{⋄}$ the union of the samples $s_{⋄}^{i}$ for $i \in s_{I} ,$ and where the resampling weight $D_{⋄ , k}$ equals $m {(m - 1)}^{- 1}$ multiplied by the number of times the PSU containing $k$ is selected in $s_{I}^{*} .$

In the linear case, the bootstrap estimator of the parameter $Δ t$ is then

${\hat{Δ t}}^{co*} (a, b) = \frac{m}{m - 1} \sum_{i \in s_{I}^{*}} π_{I i}^{- 1} {\hat{Δ t}}^{i , co} (a, b) (3.32)$

where ${\hat{Δ t}}^{i , co} (a, b)$ is defined in (3.16). After some algebra, we obtain

$E_{*} {{\hat{Δ t}}^{co *} (a, b)} = {\hat{Δ t}}^{co} (a, b) and V_{*} {{\hat{Δ t}}^{co*} (a, b)} = v^{HH} {{\hat{Δ t}}^{co} (a, b)}, (3.33)$

where ${\hat{Δ t}}^{co} (a, b)$ is given in (3.15), and $v^{HH} {{\hat{Δ t}}^{co} (a, b)}$ is given in (3.19). The proposed generalization of the BWR therefore enables to exactly match the composite estimator of the first moment, and the associated estimator of the second moment.

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2018-06-21

Language selection

Search and menus

Search

Linearization versus bootstrap for variance estimation of the change between Gini indexes
Section 3. Two-sample case

3.1 Notation and composite estimation

3.1.1 Two-dimensional SI design

3.1.2 Two-dimensional multistage design

3.2 Estimation of the change between Gini indexes

3.3 Linearization variance estimation

3.3.1 Two-dimensional SI design

3.3.2 Two-dimensional multistage design

3.4 Bootstrap variance estimation

3.4.1 A generalization of the BWO to the SI2 design

3.4.2 A generalization of the BWR for the two-dimensional multistage design

Linearization versus bootstrap for variance estimation of the change between Gini indexes Section 3. Two-sample case

3.1 Notation and composite estimation

3.1.1 Two-dimensional SI design

3.1.2 Two-dimensional multistage design

3.2 Estimation of the change between Gini indexes

3.3 Linearization variance estimation

3.3.1 Two-dimensional SI design

3.3.2 Two-dimensional multistage design

3.4 Bootstrap variance estimation

3.4.1 A generalization of the BWO to the SI2 design

3.4.2 A generalization of the BWR for the two-dimensional multistage design

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Linearization versus bootstrap for variance estimation of the change between Gini indexes
Section 3. Two-sample case