Comparison of the conditional bias and Kokic and Bell methods for Poisson and stratified sampling
Section 4. Comparison of winsorization and conditional bias

In the previous section, we presented two types of methods for processing influential units applied to survey data:

To compare the efficiency of these two methods, we performed two exercises:

  1. simulations applied to the Poisson sampling;
  2. a comparison on real data, applied to the data from the French labour cost and structure of earnings survey (ECMOSS).

4.1  Simulations in the case of a Poisson sampling

We performed a simulations study to examine the properties of the two robust estimators proposed in the context of a Poisson drawing. We carried out four scenarios to compare the efficiency of the two estimators, but also to study, in the case of the Kokic and Bell estimator, the model’s robustness to a bad specification, i.e., to a modification between the learning model and the model that generated the sample data.

The simulation proceeds as follows:

The inclusion probabilities, as well as the values of the variable, X MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGybaaaa@3298@ were generated according to the following model:

                                  U i L og N ( 1 ; 1 .1 ) , MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGvbWaaSbaaSqaaiaadMgaaeqaae bbfv3ySLgzGueE0jxyaGabaOGae8hpIOZefv3ySLgznfgDOfdaryqr 1ngBPrginfgDObYtUvgaiuaacqGFsectcaqGVbGaae4zaiaaysW7cq GFaC=zcqGHsislcqGFneVtdaqadaqaaiaaigdacaGG7aGaaGjbVlaa bgdacaqGUaGaaeymaaGaayjkaiaawMcaaiaaiYcaaaa@5364@

                                                     π i = n × U i i = 1 N U i , MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaHapaCdaWgaaWcbaGaamyAaaqaba GccaaI9aGaamOBaiabgEna0oaalaaabaGaamyvamaaBaaaleaacaWG PbaabeaaaOqaamaaqadabaGaamyvamaaBaaaleaacaWGPbaabeaaae aacaWGPbGaaGypaiaaigdaaeaacaWGobaaniabggHiLdaaaOGaaGil aaaa@4268@

                                                      X i = 2,000 × π i + π i ϵ i + δ i V i , MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGybWaaSbaaSqaaiaadMgaaeqaaO GaaGypaiaabkdacaqGSaGaaeimaiaabcdacaqGWaGaey41aqRaeqiW da3aaSbaaSqaaiaadMgaaeqaaOGaey4kaSIaeqiWda3aaSbaaSqaai aadMgaaeqaamrr1ngBPrwtHrhAXaqeguuDJXwAKbstHrhAG8KBLbac eaGccqWF1pG8daWgaaWcbaGaamyAaaqabaGccqGHRaWkcqaH0oazda WgaaWcbaGaamyAaaqabaGccaWGwbWaaSbaaSqaaiaadMgaaeqaaOGa aGilaaaa@5436@

    ϵ i N ( 0 ; 100 ) , V i L og N ( log ( 500 ) ; 1 .2 ) , δ i B ( ω ) , MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaatuuDJXwAK1uy0HwmaeHbfv3ySLgzG0 uy0Hgip5wzaGabaiab=v=aYpaaBaaaleaacaWGPbaabeaarqqr1ngB PrgifHhDYfgaiuaakiab+XJi6iab=1q8onaabmaabaGaaGimaiaacU dacaaMe8UaaGymaiaaicdacaaIWaaacaGLOaGaayzkaaGaaGilaiaa ysW7caWGwbWaaSbaaSqaaiaadMgaaeqaaOGae4hpIOJae8NeHWKaae 4BaiaabEgacaaMe8Uae8ha3FMaeyOeI0Iae8xdX70aaeWaaeaacaqG SbGaae4BaiaabEgadaqadaqaaiaaiwdacaaIWaGaaGimaaGaayjkai aawMcaaiaacUdacaaMe8Uaaeymaiaab6cacaqGYaaacaGLOaGaayzk aaGaaGilaiaaysW7cqaH0oazdaWgaaWcbaGaamyAaaqabaGccqGF8i IocqWFSeIqdaqadaqaaiabeM8a3bGaayjkaiaawMcaaiaaiYcaaaa@7347@

where ω MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaHjpWDaaa@3388@ is the Bernoulli parameter, reflecting the proportion of influential values whose values are given in Table 4.1. The notation L og N MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaatuuDJXwAK1uy0HwmaeHbfv3ySLgzG0 uy0Hgip5wzaGabaiab=jrimjaab+gacaqGNbGaaGjbVlab=bW9Njab gkHiTiaaysW7cqWFneVtaaa@45DF@ denotes a log-normal distribution.

Table 4.1
Values of parameter ω MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpi0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeqabeqadiWa ceGabeqabeqabeqadeaakeaacqaHjpWDaaa@3512@ used to generate populations
Table summary
This table displays the results of Values of parameter ω MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpi0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeqabeqadiWa ceGabeqabeqabeqadeaakeaacqaHjpWDaaa@3512@ used to generate populations. The information is grouped by Scenario (appearing as row headers), Values of parameter ω MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpi0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeqabeqadiWa ceGabeqabeqabeqadeaakeaacqaHjpWDaaa@3512@ (appearing as column headers).
Scenario Values of parameter ω MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpi0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeqabeqadiWa ceGabeqabeqabeqadeaakeaacqaHjpWDaaa@3512@
Learning model Test model
1 0 0
2 0.01 0.01
3 0.01 0.1
4 0.1 0.01

Scenario 1 corresponds to the population model for which the extension of the Kokic and Bell method was developed in the Poisson case with H = 1 , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGibGaaGypaiaaigdacaGGSaaaaa@34BA@ but in which no or very few units are influential (the value of the parameter ω MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqaHjpWDaaa@3388@ being fixed at 0). Scenario 2 corresponds to a situation in which this model applies, but in which a small proportion (1%) of units are influential. The model is, in scenarios 1 and 2, identical in the population used to calculate the threshold and the sample to which the threshold is applied.

In scenarios 3 and 4, the basic model is the same between the learning population and the sample, but the number of influential units varies between the two. In scenario 3, the learning population contains 10 times fewer influential units than the sample. Scenario 4 corresponds to the opposite scenario.

As a measure of the bias of an estimator θ ^ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacuaH4oqCgaqcaaaa@3381@ of a total T , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGubGaaiilaaaa@3344@ we calculated the relative Monte Carlo bias (as in percentage)

                                                 BR MC ( θ ^ ) = 1 M m = 1 M ( θ ^ ( m ) T ) T × 100, MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaqGcbGaaeOuamaaBaaaleaacaqGnb Gaae4qaaqabaGcdaqadaqaaiqbeI7aXzaajaaacaGLOaGaayzkaaGa aGypamaalaaabaWaaSqaaSqaaiaaigdaaeaacaWGnbaaaOWaaabmae aadaqadaqaaiqbeI7aXzaajaWaaSbaaSqaamaabmaabaGaamyBaaGa ayjkaiaawMcaaaqabaGccqGHsislcaWGubaacaGLOaGaayzkaaaale aacaWGTbGaaGypaiaaigdaaeaacaWGnbaaniabggHiLdaakeaacaWG ubaaaiabgEna0kaaigdacaaIWaGaaGimaiaaiYcaaaa@4DE6@

where θ ^ ( m ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacuaH4oqCgaqcamaaBaaaleaadaqada qaaiaad2gaaiaawIcacaGLPaaaaeqaaaaa@3628@ is the estimator θ ^ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacuaH4oqCgaqcaaaa@3381@ in the sample m , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGTbGaaGilaaaa@3363@ m = 1, , M . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGTbGaaGypaiaaigdacaaISaGaeS OjGSKaaGilaiaad2eacaaIUaaaaa@3847@

We also calculated the relative efficiency of the robust estimators relative (RE) to the dilation estimator, t ^ MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWG0bGbaKaacaaMc8UaaiOoaaaa@350D@ :

                                                RE MC ( θ ^ ) = 1 M m = 1 M ( θ ^ ( m ) T ) 2 1 M m = 1 M ( t ^ ( m ) T ) 2 × 100. MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaqGsbGaaeyramaaBaaaleaacaqGnb Gaae4qaaqabaGcdaqadaqaaiqbeI7aXzaajaaacaGLOaGaayzkaaGa aGypamaalaaabaWaaSqaaSqaaiaaigdaaeaacaWGnbaaaOWaaabmae aadaqadaqaaiqbeI7aXzaajaWaaSbaaSqaamaabmaabaGaamyBaaGa ayjkaiaawMcaaaqabaGccqGHsislcaWGubaacaGLOaGaayzkaaWaaW baaSqabeaacaaIYaaaaaqaaiaad2gacaaI9aGaaGymaaqaaiaad2ea a0GaeyyeIuoaaOqaamaaleaaleaacaaIXaaabaGaamytaaaakmaaqa dabaWaaeWaaeaaceWG0bGbaKaadaWgaaWcbaWaaeWaaeaacaWGTbaa caGLOaGaayzkaaaabeaakiabgkHiTiaadsfaaiaawIcacaGLPaaada ahaaWcbeqaaiaaikdaaaaabaGaamyBaiaai2dacaaIXaaabaGaamyt aaqdcqGHris5aaaakiabgEna0kaaigdacaaIWaGaaGimaiaai6caaa a@5CDC@

Tables 4.2 and 4.3 represent the descriptive statistics associated with the L = 1,000 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGmbGaaGypaiaabgdacaqGSaGaae imaiaabcdacaqGWaaaaa@36CF@ Monte Carlo values calculated according to the learning population considered.

Table 4.2
Descriptive statistics for scenarios 1 and 2 of the 1,000 simulations for (équation)
Table summary
This table displays the results of Descriptive statistics for scenarios 1 and 2 of the 1. The information is grouped by Statistic (appearing as row headers), Scenario (appearing as column headers).
Statistic Scenario
1 2
Description K&B BHR K&B BHR
BR RE BR RE BR RE BR RE
Min. -0.2 100 -0.43 100 -9.0 1 -4.3 26
Q 1 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpi0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGrbGaaGymaaaa@34CB@ -0.1 100 -0.32 100 -2.9 35 -1.9 51
Median 0.0 100 -0.27 100 -1.8 50 -1.5 62
Mean 0.0 100 -0.27 100 -2.0 50 -1.6 62
Q 3 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpi0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGrbGaaGymaaaa@34CB@ 0.0 100 -0.23 100 -1.0 64 -1.3 73
Max. 0.0 100 -0.14 100 -0.1 109 -0.6 91

Scenario 1 corresponds to a situation in which no or very few influential units are present in the population: the performance of the robust estimators is therefore identical to that of the usual Horvitz-Thompson estimator, with a relative bias very close to 0. Scenario 2 corresponds to the situation for which the extension of the Kokic and Bell method to the Poisson case was developed, with the introduction of influential units. The two robust estimators are more effective than the usual estimator, but the performance of the Kokic and Bell estimator in terms of the gain in mean square error is greater, with a median relative efficiency over the 1,000 simulations of 50%, compared to 62% for the conditional bias method. This result is expected given that the threshold of the Kokic and Bell method is explicitly determined to obtain the estimator with the smallest mean square error.

Table 4.3
Descriptive statistics for scenarios 3 and 4 on the 1,000 simulations for (équation)
Table summary
This table displays the results of Descriptive statistics for scenarios 3 and 4 on the 1. The information is grouped by Statistic (appearing as row headers), Scenario (appearing as column headers).
Statistic Scenario
3 4
Description K&B BHR K&B BHR
BR RE BR RE BR RE BR RE
Min. -32.2 2 -7.8 27 -4.5 1 -4.3 26
Q 1 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpi0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGrbGaaGymaaaa@34CB@ -18.9 50 -5.1 59 -1.8 48 -1.9 51
Median -13.9 82 -4.6 66 -1.5 70 -1.5 62
Mean -14.2 89 -4.7 65 -1.5 68 -1.6 62
Q 3 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpi0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGrbGaaGymaaaa@34CB@ -9.3 138 -4.2 72 -1.2 91 -1.3 73
Max. -0.01 537 -2.7 89 -0.6 100 -0.6 91

The performances of the two methods in scenario 3 are more contrasted. While over the set of simulations, the conditional bias method succeeds in reducing the mean square error of the estimators, with a minimum mean square error gain of 27%, the Kokic and Bell method deteriorates precision in more than a quarter of cases. The population on which the threshold was calculated contains, in this scenario, too few influential units compared to the sample for the calculated threshold to be effective.

In scenario 4, where the learning population contains more influential units than the sample, the performances of the two methods are of the same order of magnitude.

Therefore, these simulations show:

4.2  Application to the Survey on labour costs and wage structure

4.2.1  Presentation of the survey

The Survey on labour cost and structure of earnings (ECMOSS) is conducted by INSEE every year and harmonized at the European level. It is used to respond to European regulations on the production of statistics on both the cost of labour and structure of earnings which contribute to comparisons between European countries in terms of work time and costs.

ECMOSS is a survey of local business units (or establishments). It covers all sectors MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaacbaqcLbwaqa aaaaaaaaWdbiaa=nbiaaa@37A3@ both market and non-market MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9Lq=Jc9 vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKkFr0xfr=x fr=xb9adbaqaaeGaciGaaiaabeqaamaabaabaaGcbaacbaqcLbwaqa aaaaaaaaWdbiaa=nbiaaa@37A3@ with the exception of agriculture, state administrations and certain activities (extraterritorial activities, embassies, consulates, activities of individuals acting as employers) and businesses with 10 or more employees. It covers establishments located in the metropolitan territory and in the overseas departments. Each sampled business answers two questionnaires: In the first, it must provide a certain amount of aggregated information on its workforce, payroll and a breakdown into its main elements (basic wages, bonuses, social contributions paid by the employer and by employees, etc.) and on the number of work hours of its employees; in the second, it details these elements for a randomly selected sample of its employees.

Given this survey method, the ECMOSS sample design has two stages:

Each year, a certain number of establishments do not respond to the survey, and responding establishments do not systematically provide information for all their employees. Therefore, there is total non-response at each stage, which is handled by reweighting according to the homogeneous response group method. Next, the final sample of respondent employees, on which most operations are performed, is calibrated on the population of employees from the files of social security organizations.

Last, the sample of employees is obtained through a complex sample design, comprising two drawing stages (establishments and employees), with two drawing phases at each stage.

Given the very great variability of the establishments and their wage policy (both in terms of differences in the average levels of wages between establishments and differences in the dispersion of wages in the establishments), the sampling weights of the sampled employees are widely dispersed, and the accuracy of the estimators is sensitive to the influential values of the sample: for example, a very high level executive in a large business, or the athletes employed by a high-level sports club.

4.2.2  Parameter of interest

The main parameter of interest in the survey is the average hourly wage, calculated in different dissemination domains: sectors, sectors crossed with the employment size ranges of the businesses, and sectors crossed with the region in which the establishment is located. The estimators used later in our simulations are obtained by calculating the ratio of estimators by expansion of total remuneration over the total number of hours:

                                            R ^ ( D ) = i S D w i e i i S D w i h i ( 4.1 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWGsbGbaKaadaqadaqaaiaadseaai aawIcacaGLPaaacaaI9aWaaSaaaeaadaaeqaqaaiaadEhadaWgaaWc baGaamyAaaqabaGccaWGLbWaaSbaaSqaaiaadMgaaeqaaaqaaiaadM gacqGHiiIZcaWGtbGaeyykICSaamiraaqab0GaeyyeIuoaaOqaamaa qababaGaam4DamaaBaaaleaacaWGPbaabeaakiaadIgadaWgaaWcba GaamyAaaqabaaabaGaamyAaiabgIGiolaadofacqGHPiYXcaWGebaa beqdcqGHris5aaaakiaaywW7caaMf8UaaGzbVlaaywW7caaMf8Uaai ikaiaaisdacaGGUaGaaGymaiaacMcaaaa@5885@

with S MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGtbaaaa@3293@ the sample of employees, D MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGebaaaa@3284@ the domain of interest, e i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGLbWaaSbaaSqaaiaadMgaaeqaaa aa@33BF@ the annual remuneration of the employee i , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGPbGaaiilaaaa@3359@ h i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGObWaaSbaaSqaaiaadMgaaeqaaa aa@33C2@ their annual hourly work volume and w i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWG3bWaaSbaaSqaaiaadMgaaeqaaa aa@33D1@ the employee’s estimation weight obtained by multiplying the selection probabilities and the response probabilities associated with each stage and phase of the sample design. Estimator (4.1) does not correspond to the estimator used in practice because it involves the initial weights corrected for non-response, while the estimator used in practice uses the calibrated weights. In the context of this article, the calibration phase was not taken into account, but it could have been using the classical residual technique and an additional degree of complexity which we deemed unnecessary to compare the two robust estimation methods.

4.2.3  How to adapt the processing methods for influential units to the ECMOSS sampling design

Estimator (4.1) is not the expansion estimator of a total, for which the previously described methods were designed. The problem can, however, be adapted to the framework of these two methods.

Indeed, an unbiased estimator of the variance of i S w i L ^ i [ R ^ ( D ) ] , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaaeqaqabSqaaiaadMgacqGHiiIZca WGtbaabeqdcqGHris5aOGaaGPaVlaadEhadaWgaaWcbaGaamyAaaqa baGcceWGmbGbaKaadaWgaaWcbaGaamyAaaqabaGcdaWadaqaaiqadk fagaqcamaabmaabaGaamiraaGaayjkaiaawMcaaaGaay5waiaaw2fa aiaacYcaaaa@427E@ with L ^ i [ R ^ ( D ) ] = e i R ^ ( D ) h i i S D w i h i I ( i D ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWGmbGbaKaadaWgaaWcbaGaamyAaa qabaGcdaWadaqaaiqadkfagaqcamaabmaabaGaamiraaGaayjkaiaa wMcaaaGaay5waiaaw2faaiaai2dadaWcbaWcbaGaamyzamaaBaaame aacaWGPbaabeaaliabgkHiTiqadkfagaqcamaabmaabaGaamiraaGa ayjkaiaawMcaaiaadIgadaWgaaadbaGaamyAaaqabaaaleaadaaeqa qabWqaaiaadMgacqGHiiIZcaWGtbGaeyykICSaamiraaqab4Gaeyye IuoaliaaykW7caWG3bWaaSbaaWqaaiaadMgaaeqaaSGaaGPaVlaadI gadaWgaaadbaGaamyAaaqabaaaamrr1ngBPrwtHrhAYaqeguuDJXwA KbstHrhAGq1DVbaceaGccqWFicFsdaqadaqaaiaadMgacqGHiiIZca WGebaacaGLOaGaayzkaaaaaa@61AF@ the estimated linearized variable of R ^ ( D ) , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWGsbGbaKaadaqadaqaaiaadseaai aawIcacaGLPaaacaGGSaaaaa@35A4@ is also an asymptotically unbiased estimator of V ( R ^ ( D ) ) . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGwbWaaeWaaeaaceWGsbGbaKaada qadaqaaiaadseaaiaawIcacaGLPaaaaiaawIcacaGLPaaacaGGUaaa aa@380A@ Thus, a robust estimator of the total of the linearized variable L ^ i [ R ^ ( D ) ] MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWGmbGbaKaadaWgaaWcbaGaamyAaa qabaGcdaWadaqaaiqadkfagaqcamaabmaabaGaamiraaGaayjkaiaa wMcaaaGaay5waiaaw2faaaaa@38EB@ will also be a robust estimator for the influential units of R ^ ( D ) . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWGsbGbaKaadaqadaqaaiaadseaai aawIcacaGLPaaacaGGUaaaaa@35A6@ Each method, applied to the estimated linearized variable, generates a winsorized value of this variable, denoted L ^ i w [ R ^ ( D ) ] . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWGmbGbaKaadaqhaaWcbaGaamyAaa qaaiaadEhaaaGcdaWadaqaaiqadkfagaqcamaabmaabaGaamiraaGa ayjkaiaawMcaaaGaay5waiaaw2faaiaac6caaaa@3A9A@ The effects of the processing of the influential units are then transferred to all other variables of interest of the survey through the estimation weight, by calculating a winsorized estimation weight:

                                                             w i w = w i L ^ i w [ R ^ ( D ) ] L ^ i [ R ^ ( D ) ] . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWG3bWaa0baaSqaaiaadMgaaeaaca WG3baaaOGaaGypaiaadEhadaWgaaWcbaGaamyAaaqabaGcdaWcaaqa aiqadYeagaqcamaaDaaaleaacaWGPbaabaGaam4Daaaakmaadmaaba GabmOuayaajaWaaeWaaeaacaWGebaacaGLOaGaayzkaaaacaGLBbGa ayzxaaaabaGabmitayaajaWaaSbaaSqaaiaadMgaaeqaaOWaamWaae aaceWGsbGbaKaadaqadaqaaiaadseaaiaawIcacaGLPaaaaiaawUfa caGLDbaaaaGaaiOlaaaa@47DD@

We thus test the two methods of Kokic and Bell and Beaumont, Haziza and Ruiz-Gazen to estimate the total of L ^ i [ R ^ ( D ) ] . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWGmbGbaKaadaWgaaWcbaGaamyAaa qabaGcdaWadaqaaiqadkfagaqcamaabmaabaGaamiraaGaayjkaiaa wMcaaaGaay5waiaaw2faaiaac6caaaa@399D@ However, each of the two methods requires adaptations to be applied to the sampling design and variables of interest of ECMOSS.

4.2.4  Adaptation of winsorization according to the Kokic and Bell method and its extension

The survey and the parameter of interest of the survey, even after linearization, do not fit with the framework of the Kokic and Bell method, whether it is the original method, or the extension presented previously. First, the ECMOSS sample is not selected using a stratified simple random survey or a Poisson sampling. Moreover, the variable to winsorize, the estimated linearized variable L ^ i [ R ^ ( D ) ] , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWGmbGbaKaadaWgaaWcbaGaamyAaa qabaGcdaWadaqaaiqadkfagaqcamaabmaabaGaamiraaGaayjkaiaa wMcaaaGaay5waiaaw2faaiaacYcaaaa@399B@ is not always positive. To apply the Kokic and Bell method to the ECMOSS case, we have made the following adaptations.

  1. We apply the processing of the influential units as though the employees were directly selected by stratified simple random sampling (Poisson sampling for the extension) in strata defined by the sector, the number of employees of the business and the location of the employing establishment, by grouping certain modalities of this last variable to avoid generating pseudo-strata containing too few observations (we distinguish Île de France, the overseas departments and the rest of the country) and by the social category of the employee (distinguishing managers and non-managers). As the classical method acts as though the sample in each pseudo-stratum was selected by simple random sampling and thus all employees of the same pseudo-stratum have the same sampling weight, we do not consider the dispersion of the estimation weights in the pseudo-strata from the actual sampling design of the survey, and thus risk missing influential units. In the case of the extension of the method, this dispersion of the weights is properly taken into account.
  2. In each of these pseudo-strata, winsorization is not applied directly to the estimated linearized variable, but to a translated version of it.

More precisely, we define for each sampled employee:

                                             T ^ i [ R ^ ( D ) ] = L ^ i [ R ^ ( D ) ] + min j S L ^ j [ R ^ ( D ) ] MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWGubGbaKaadaWgaaWcbaGaamyAaa qabaGcdaWadaqaaiqadkfagaqcamaabmaabaGaamiraaGaayjkaiaa wMcaaaGaay5waiaaw2faaiaai2daceWGmbGbaKaadaWgaaWcbaGaam yAaaqabaGcdaWadaqaaiqadkfagaqcamaabmaabaGaamiraaGaayjk aiaawMcaaaGaay5waiaaw2faaiabgUcaRiaab2gacaqGPbGaaeOBam aaBaaaleaacaWGQbGaeyicI4Saam4uaaqabaGcceWGmbGbaKaadaWg aaWcbaGaamOAaaqabaGcdaWadaqaaiqadkfagaqcamaabmaabaGaam iraaGaayjkaiaawMcaaaGaay5waiaaw2faaaaa@4F4A@

for which we calculate winsorization thresholds in the pseudo-strata according to the method initially proposed by Kokic and Bell and for its extension. We then deduce two sets of estimation weights used to estimate the average hourly wage in each domain of interest of the form:

                                                             w i w = w i T ^ i w [ R ^ ( D ) ] T ^ i [ R ^ ( D ) ] . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWG3bWaa0baaSqaaiaadMgaaeaaca WG3baaaOGaaGypaiaadEhadaWgaaWcbaGaamyAaaqabaGcdaWcaaqa aiqadsfagaqcamaaDaaaleaacaWGPbaabaGaam4Daaaakmaadmaaba GabmOuayaajaWaaeWaaeaacaWGebaacaGLOaGaayzkaaaacaGLBbGa ayzxaaaabaGabmivayaajaWaaSbaaSqaaiaadMgaaeqaaOWaamWaae aaceWGsbGbaKaadaqadaqaaiaadseaaiaawIcacaGLPaaaaiaawUfa caGLDbaaaaGaaiOlaaaa@47ED@

We can thus only identify and process influential units with high values of the estimated linearized variable, i.e., employees whose hourly wage is higher than the average hourly wage in the domain of interest D . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGebGaaiOlaaaa@3336@ Units with low hourly wages cannot be identified by this method, but pose less problems for the accuracy of estimates, since the distribution of hourly wages is particularly skewed, with a very long tail on the right.

A final adaptation is necessary to adapt the method to the case of ECMOSS. This can only be used if observations of the variable of interest in each pseudo-stratum are available. Previous editions of the survey can be used. However, the tests performed to evaluate the efficiency of the Kokic and Bell method applied to the Annual Sectoral Surveys (see Deroyon, 2015) have shown that the use of responses to previous editions of the survey to calculate winsorization thresholds does not lead to the largest gains in accuracy. This is because the small number of observations available per stratum to calculate these thresholds are determined with too little precision, so that too many units can be winsorized, or conversely, influential units escape winsorization. We have chosen to use the auxiliary information available in the social security files on total remuneration paid annually to employees and their number of hours worked. These data are not those measured in the survey (in particular, the wages declared in the social security files form the tax base on which are calculated social contributions and tax contributions on wages, and not labour income paid to employees), but are strongly correlated with them.

4.2.5  Adaptation of Beaumont, Haziza and Ruiz-Gazen estimator

Because of its generality, the conditional bias method requires fewer adaptations to be applied to the ECMOSS. It can thus be applied directly to the variables of interest of the survey without the need to mobilize external data. However, calculating conditional biases while considering the whole sampling design is complex; therefore, for our simulations, we chose to apply the conditional bias methods as though the employees had been selected directly by a Poisson sampling, with the selection probabilities 1 / w i , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaWcgaqaaiaaigdaaeaacaWG3bWaaS baaSqaaiaadMgaaeqaaaaakiaayIW7caGGSaaaaa@36ED@ where w i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWG3bWaaSbaaSqaaiaadMgaaeqaaa aa@33D1@ designates the estimation weight after correction for non-response of the employee i . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGPbGaaiOlaaaa@335B@ The conditional bias used to identify influential units is therefore equal to:

                                                 B 1 i { L ^ i [ R ^ ( D ) ] } = ( w i 1 ) L ^ i [ R ^ ( D ) ] . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGcbWaaSbaaSqaaiaaigdacaWGPb aabeaakmaacmaabaGabmitayaajaWaaSbaaSqaaiaadMgaaeqaaOWa amWaaeaaceWGsbGbaKaadaqadaqaaiaadseaaiaawIcacaGLPaaaai aawUfacaGLDbaaaiaawUhacaGL9baacaaI9aWaaeWaaeaacaWG3bWa aSbaaSqaaiaadMgaaeqaaOGaeyOeI0IaaGymaaGaayjkaiaawMcaai qadYeagaqcamaaBaaaleaacaWGPbaabeaakmaadmaabaGabmOuayaa jaWaaeWaaeaacaWGebaacaGLOaGaayzkaaaacaGLBbGaayzxaaGaaG Olaaaa@4BC1@

With formula (3.8), the Beaumont, Haziza and Ruiz-Gazen estimator processes only a limited number of units, i.e., the observations with the lowest and highest conditional biases, for which all corresponding indicators define the sets A min MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGbbWaaSbaaSqaaiaab2gacaqGPb GaaeOBaaqabaaaaa@357A@ and A max : MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGbbWaaSbaaSqaaiaab2gacaqGHb GaaeiEaaqabaGccaaMi8UaaiOoaaaa@37D5@

                                                    A min = argmin j S B 1 j { L ^ j [ R ^ ( D ) ] } MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGbbWaaSbaaSqaaiaab2gacaqGPb GaaeOBaaqabaGccaaI9aGaaeyyaiaabkhacaqGNbGaaeyBaiaabMga caqGUbWaaSbaaSqaaiaadQgacqGHiiIZcaWGtbaabeaakiaadkeada WgaaWcbaGaaGymaiaadQgaaeqaaOWaaiWaaeaaceWGmbGbaKaadaWg aaWcbaGaamOAaaqabaGcdaWadaqaaiqadkfagaqcamaabmaabaGaam iraaGaayjkaiaawMcaaaGaay5waiaaw2faaaGaay5Eaiaaw2haaaaa @4B64@

                                                   A max = argmax j S B 1 j { L ^ j [ R ^ ( D ) ] } . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGbbWaaSbaaSqaaiaab2gacaqGHb GaaeiEaaqabaGccaaI9aGaaeyyaiaabkhacaqGNbGaaeyBaiaabgga caqG4bWaaSbaaSqaaiaadQgacqGHiiIZcaWGtbaabeaakiaadkeada WgaaWcbaGaaGymaiaadQgaaeqaaOWaaiWaaeaaceWGmbGbaKaadaWg aaWcbaGaamOAaaqabaGcdaWadaqaaiqadkfagaqcamaabmaabaGaam iraaGaayjkaiaawMcaaaGaay5waiaaw2faaaGaay5Eaiaaw2haaiaa c6caaaa@4C1A@

Thus, the processed estimation weight of the influential units is equal to:

                                  w i BHR = { ( 2 | A min | 1 ) w i + 1 2 | A min | if B 1 i { L ^ i [ R ^ ( D ) ] } A min ( 2 | A max | 1 ) w i + 1 2 | A max | if B 1 i { L ^ i [ R ^ ( D ) ] } A max w i otherwise . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9L8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWG3bWaa0baaSqaaiaadMgaaeaaca qGcbGaaeisaiaabkfaaaGccaaI9aWaaeqabeaafaqaaeWacaaabaWa aSaaaeaadaqadaqaaiaaikdadaabdaqaaiaaykW7caWGbbWaaSbaaS qaaiaab2gacaqGPbGaaeOBaaqabaGccaaMc8oacaGLhWUaayjcSdGa eyOeI0IaaGymaaGaayjkaiaawMcaaiaadEhadaWgaaWcbaGaamyAaa qabaGccqGHRaWkcaaIXaaabaGaaGOmamaaemaabaGaaGPaVlaadgea daWgaaWcbaGaaeyBaiaabMgacaqGUbaabeaakiaaykW7aiaawEa7ca GLiWoaaaaabaGaaeyAaiaabAgacaaMe8UaaGPaVlaadkeadaWgaaWc baGaaGymaiaadMgaaeqaaOWaaiWaaeaaceWGmbGbaKaadaWgaaWcba GaamyAaaqabaGcdaWadaqaaiqadkfagaqcamaabmaabaGaamiraaGa ayjkaiaawMcaaaGaay5waiaaw2faaaGaay5Eaiaaw2haaiabgIGiol aadgeadaWgaaWcbaGaaeyBaiaabMgacaqGUbaabeaaaOqaamaalaaa baWaaeWaaeaacaaIYaWaaqWaaeaacaaMc8UaamyqamaaBaaaleaaca qGTbGaaeyyaiaabIhaaeqaaOGaaGPaVdGaay5bSlaawIa7aiabgkHi TiaaigdaaiaawIcacaGLPaaacaWG3bWaaSbaaSqaaiaadMgaaeqaaO Gaey4kaSIaaGymaaqaaiaaikdadaabdaqaaiaaykW7caWGbbWaaSba aSqaaiaab2gacaqGHbGaaeiEaaqabaGccaaMc8oacaGLhWUaayjcSd aaaaqaaiaabMgacaqGMbGaaGjbVlaaykW7caWGcbWaaSbaaSqaaiaa igdacaWGPbaabeaakmaacmaabaGabmitayaajaWaaSbaaSqaaiaadM gaaeqaaOWaamWaaeaaceWGsbGbaKaadaqadaqaaiaadseaaiaawIca caGLPaaaaiaawUfacaGLDbaaaiaawUhacaGL9baacqGHiiIZcaWGbb WaaSbaaSqaaiaab2gacaqGHbGaaeiEaaqabaaakeaacaWG3bWaaSba aSqaaiaadMgaaeqaaaGcbaGaae4BaiaabshacaqGObGaaeyzaiaabk hacaqG3bGaaeyAaiaabohacaqGLbGaaeOlaaaaaiaawUhaaaaa@A8FE@

where | A min | MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaabdaqaaiaaykW7caWGbbWaaSbaaS qaaiaab2gacaqGPbGaaeOBaaqabaGccaaMc8oacaGLhWUaayjcSdaa aa@3BBC@ and | A max | MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaabdaqaaiaaykW7caWGbbWaaSbaaS qaaiaab2gacaqGHbGaaeiEaaqabaGccaaMc8oacaGLhWUaayjcSdaa aa@3BBE@ respectively designate the cardinal of A min MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGbbWaaSbaaSqaaiaab2gacaqGPb GaaeOBaaqabaaaaa@357A@ and A max . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGbbWaaSbaaSqaaiaab2gacaqGHb GaaeiEaaqabaGccaGGUaaaaa@3638@

Compared to the Kokic and Bell method, the robust estimator based on conditional biases does not focus on influential units located in the right-hand part of the distribution of the estimated linearized variable, but identifies the influential units with very low and very high values for this variable. It also focuses on an a priori limited number of units, since only observations with the minimum and maximum conditional bias are modified.

4.2.6  Robust estimation on several domains of interest

As previously described, the domains of interest for the dissemination of the ECMOSS results are numerous. For the sake of simplicity of dissemination and to comply with the requirements of European regulations, each employee in the individual sample must have only one estimation weight, so adaptations are necessary:

To evaluate the performance in terms of precision gains or losses of the methods defined above, we carried out a set of simulations based on the ECMOSS sampling design and data on wages and hours worked from the social security files, available for all employees and for which we are therefore able to compare the average hourly wages observed in the population with their various estimators. In these simulations, we compared the efficiency of the methods applied directly to each dissemination domain, which lead to the optimal results, and to the pseudo-dissemination domains defined above.

4.2.7  Simulations

The simulations are conducted in the social security files, from which the sample of employees is selected and which are available for all French employees. They are implemented as follows:

For each robust estimator and each domain, we calculate the mean relative bias (RB) and the relative mean square error (RMSE) for all simulations by:

AB[ R ^ KB ( D ) ] = m=1 5,000 [ R ^ m KB ( D )R( D ) ] 5,000 AMSE[ R ^ KB ( D ) ] = m=1 5,000 [ R ^ m KB ( D )R( D ) ] 2 5,000 RB[ R ^ KB ( D ) ] = 100 AB[ R ^ KB ( D ) ] R( D ) RMSE[ R ^ KB ( D ) ] = 100 AMSE[ R ^ KB ( D ) ] AMSE[ R ^ ( D ) ] MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Jc9qqqrpepC0xbbL8F4rqqrFfFv0dg9Wqpe0dar pepeuf0xe9q8qiYRWFGCk9vi=dbvc9s8vr0db9Ff0dbbG8Fq0Jfr=x fr=xfbpdbaqaaeGaciGaaiaabeqaamaabaabaaGcbaqbaeWabqWaaa aabaGaaeyqaiaabkeadaWadaqaaiqadkfagaqcamaaCaaaleqabaGa ae4saiaabkeaaaGcdaqadaqaaiaadseaaiaawIcacaGLPaaaaiaawU facaGLDbaaaeaacqGH9aqpaeaadaWcaaqaamaaqadabaWaamWaaeaa ceWGsbGbaKaadaqhaaWcbaGaamyBaaqaaiaabUeacaqGcbaaaOWaae WaaeaacaWGebaacaGLOaGaayzkaaGaeyOeI0IaamOuamaabmaabaGa amiraaGaayjkaiaawMcaaaGaay5waiaaw2faaaWcbaGaamyBaiaai2 dacaaIXaaabaGaaeynaiaabYcacaqGWaGaaeimaiaabcdaa0Gaeyye IuoaaOqaaiaabwdacaqGSaGaaeimaiaabcdacaqGWaaaaaqaaiaabg eacaqGnbGaae4uaiaabweadaWadaqaaiqadkfagaqcamaaCaaaleqa baGaae4saiaabkeaaaGcdaqadaqaaiaadseaaiaawIcacaGLPaaaai aawUfacaGLDbaaaeaacqGH9aqpaeaadaWcaaqaamaaqadabaWaamWa aeaaceWGsbGbaKaadaqhaaWcbaGaamyBaaqaaiaabUeacaqGcbaaaO WaaeWaaeaacaWGebaacaGLOaGaayzkaaGaeyOeI0IaamOuamaabmaa baGaamiraaGaayjkaiaawMcaaaGaay5waiaaw2faamaaCaaaleqaba GaaGOmaaaaaeaacaWGTbGaaGypaiaaigdaaeaacaaI1aGaaiilaiaa icdacaaIWaGaaGimaaqdcqGHris5aaGcbaGaaeynaiaabYcacaqGWa GaaeimaiaabcdaaaaabaGaaeOuaiaabkeadaWadaqaaiqadkfagaqc amaaCaaaleqabaGaae4saiaabkeaaaGcdaqadaqaaiaadseaaiaawI cacaGLPaaaaiaawUfacaGLDbaaaeaacqGH9aqpaeaacaaIXaGaaGim aiaaicdadaWcaaqaaiaabgeacaqGcbWaamWaaeaaceWGsbGbaKaada ahaaWcbeqaaiaabUeacaqGcbaaaOWaaeWaaeaacaWGebaacaGLOaGa ayzkaaaacaGLBbGaayzxaaaabaGaamOuamaabmaabaGaamiraaGaay jkaiaawMcaaaaaaeaacaqGsbGaaeytaiaabofacaqGfbWaamWaaeaa ceWGsbGbaKaadaahaaWcbeqaaiaabUeacaqGcbaaaOWaaeWaaeaaca WGebaacaGLOaGaayzkaaaacaGLBbGaayzxaaaabaGaeyypa0dabaGa aGymaiaaicdacaaIWaWaaSaaaeaacaqGbbGaaeytaiaabofacaqGfb WaamWaaeaaceWGsbGbaKaadaahaaWcbeqaaiaabUeacaqGcbaaaOWa aeWaaeaacaWGebaacaGLOaGaayzkaaaacaGLBbGaayzxaaaabaGaae yqaiaab2eacaqGtbGaaeyramaadmaabaGabmOuayaajaWaaeWaaeaa caWGebaacaGLOaGaayzkaaaacaGLBbGaayzxaaaaaaaaaaa@B144@

where, for example, for the classical Kokic and Bell method, R ( D ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGsbWaaeWaaeaacaWGebaacaGLOa Gaayzkaaaaaa@34E4@ designates the average hourly wage observed in the social security files in the domain D MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGebaaaa@3284@ and R ^ ( D ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWGsbGbaKaadaqadaqaaiaadseaai aawIcacaGLPaaaaaa@34F4@ designates the usual expansion estimator of this parameter. Relative bias compares the bias of the robust estimator to the real value of the parameter. The relative mean square error measures the gain or loss of precision provided by the robust estimators relative to the usual estimator.

4.2.8  Simulation results

Among the different estimators tested in our simulations, the estimator obtained by applying the adaptation of the Kokic and Bell method to Poisson sampling is distinguished by extremely poor performances, summarized in Table 4.4. Application of the Kokic and Bell method extended to Poisson sampling for the ECMOSS results in a significant or even dramatic deterioration in the precision of the estimates.

Table 4.4
Statistics on the MSE ratio of the robust Kokic and Bell estimators applied to the Poisson sampling in the different domains of interest
Table summary
This table displays the results of Statistics on the MSE ratio of the robust Kokic and Bell estimators applied to the Poisson sampling in the different domains of interest. The information is grouped by Statistic (appearing as row headers), (équation) and Domain (appearing as column headers).
Statistic RMSE ( R ^ m KB poiss ( D ) ) MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacPqpw0le9 v8qqaqFD0xXdHaVhbbf9y8qrpi0xb9qqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeqabeqadiWa ceGabeqabeqabeqadeaakeaacaqGsbGaaeytaiaabofacaqGfbWaae WaaeaaceWGsbGbaKaadaqhaaWcbaGaamyBaaqaaiaabUeacaqGcbWa aSbaaWqaamaaBaaabaGaaeiCaiaab+gacaqGPbGaae4Caiaabohaae qaaaqabaaaaOWaaeWaaeaacaWGebaacaGLOaGaayzkaaaacaGLOaGa ayzkaaaaaa@4311@
Domain
NACE* Workforce NACE NACE* NUTS
Min. 18 128 33
Mean 490 1,858 324
Max. 4,437 8,606 2,466

Figures 4.1, 4.2 and 4.3 focus on presenting the results of the conditional bias and classical Kokic and Bell methods, applied under the hypothesis of a stratified simple random sampling.

Figure 4.1 of article 54961 issue 2018002

Description for Figure 4.1

Bar charts presenting the relative mean square errors for the estimators of average hourly wage by sector. Relative mean square errors are on the y-axis, going from 0 to 150%. NACE sections are on the x-axis, going from B to S. There are three bars for each sections representing the following estimators: conditional bias in the pseudo-domains, conditional bias in the NACE sections and Kokic and Bell in the pseudo-domains. No estimator shows a systematically higher or lower relative mean square error, it depends on the sections. Sections C, N and O have the highest relative MSE.

Figure 4.2 of article 54961 issue 2018002

Description for Figure 4.2

Box-plots presenting the distribution of relative mean square errors in each domain. Relative mean square errors are on the y-axis, going from 40 to 160%. Three domains are shown: NACE, NACE*No. et NACE*NUTS. There are three box-plots for each domain: one for the conditional bias, one for the optimal conditional bias and one for Kokic-Bell. Box-plots for the optimal conditional bias show a smaller dispersion compared to the other estimators. The median of conditional bias box-plots is lower compared to the other estimators.

Figure 4.1 shows the relative mean square errors of the robust average hourly wage estimators in each section of the Statistical classification of economic activities in the European Community (NACE, a grouping of business sectors into 21 categories, of which 18 are in the ECMOSS field) and Figure 4.2 shows the distribution of relative mean square errors in each domain (among all sections, section crossings, and number of business employees, or crossings of sector and location of the establishment).

For almost all domains of interest, the robust estimators considered provide gains in precision over the usual expansion estimator. The domains in which the robust estimators have a higher error than the usual estimator are also those where the estimation variance is the lowest originally. The processes for influential units considered in these figures (conditional bias and classical Kokic and Bell method) are thus able to reduce estimation errors when necessary without causing too much loss of precision when the estimators are not affected by influential units.

The biases of the average hourly wage estimators in the sectors are low (see Figure 4.3), except in some domains where the sample size is small (A: Agriculture, forestry and fishing; K: Financial and insurance activities; R: Arts, entertainment and recreation). The same results are also observed for the other domains.

Figure 4.3 of article 54961 issue 2018002

Description for Figure 4.3

Bar charts presenting the relative biases for estimators of average hourly wage by sector. Relative bias is on the y-axis, going from -3 to 1%. NACE sections are on the x-axis, going from A to S. There are three bars for each sections representing the following estimators: conditional bias in the pseudo-domains, conditional bias in the NACE sections and Kokic and Bell in the pseudo-domains. The biases of the average hourly wage estimators in the sectors are low, except in some domains where the sample size is small (A: Agriculture, forestry and fishing; K: Financial and insurance activities; R: Arts, entertainment and recreation).

The application of conditional bias methods adapted to each domain gives the best results for the estimation in the NACE sections, but not necessarily in the other dissemination domains. The NACE sections are much more aggregated than the pseudo-domains used for the identification of influential units, so the bias introduced by the processing of influential units is more significant in the cases where the application is made on pseudo-domains, compared to the optimal version applied directly to the NACE sections. In the other domains, the identification of influential units at a finer level than the real dissemination domain makes it possible to identify more influential units and thus substantially reduce the estimation variance, without introducing too much additional bias, when the domain used to identify the influential units and the real dissemination domains are close. Differences in how the sampling design is described to apply each of the two methods and the actual sampling design may explain why the use of the Beaumont, Haziza and Ruiz-Gazen robust estimator in each dissemination domain does not necessarily translate into greater precision gains.

The differences between the results obtained with the conditional bias and Kokic and Bell methods under the hypothesis of the stratified simple random sampling design are, however, small. Note however that, for the implementation of these simulations, we use the population data as observations of the additional interest variables not from the sample to calculate the winsorization thresholds in the Kokic and Bell method. Since we also evaluate the performance of the different estimators based on these data, the Kokic and Bell method is favoured a priori.

The extension of the Kokic and Bell method to Poisson sampling results in a significant deterioration in the precision of the estimators.

The discrepancies between the performances of the two implementations of the Kokic and Bell method are thus very high. However, these implementations are both based on two hypotheses:

In both applications of the Kokic and Bell method, the first hypothesis is not respected. The violation of this hypothesis is, however, a priori more significant when we apply the Kokic and Bell method as though the sample had been selected by a stratified simple random sampling in pseudo-strata constructed ad-hoc, because in so doing we assume that the selection probabilities are identical in these pseudo-strata, which is not at all verified. The Kokic and Bell method applied as though the employees had been selected by Poisson sampling, for its part, considers real simple inclusion probabilities, but neglects the links between the indicators of belonging to the sample of different employees.

However, the population model postulated for the Kokic and Bell method extended to the Poisson case is not valid, since the simple inclusion probabilities are not proportional to the variable of interest considered. It is more complex to assess the validity of the population model used for the classical Kokic and Bell method; up to a point, it is still possible to consider that the results of the variable of interest in a pseudo-stratum are derived from the same law whose expectation and variance can be estimated by the mean and the empirical variance of the results of the variable of interest in the stratum.

Also, the performance differences of the two implementations of the Kokic and Bell method are complex to analyze. A first possible explanation is that the performances of the method are more sensitive to violations of the hypothesis on the law of the observations than to those on the form of the sampling design. This finding was shared by Fizzala (2017) in the case of an application of winsorization in the context of corporate profiling. In our ECMOSS simulations, we observe that the classical Kokic and Bell method, based on the hypothesis of stratified simple random samplings, gives very valid results despite the fact that this hypothesis is only partially respected. Future extensions of this work could consist of validating this explanation on the basis of simulations. Another explanation for these differences in performance may lie in the relationship between the two hypotheses in the case of the extension of Kokic and Bell to Poisson sampling. Indeed, while in the case of the classical Kokic and Bell method, the hypotheses on the sampling design and the law of the variable of interest in each stratum are unrelated, in the case of the Poisson sampling, the population model involves selection probabilities and therefore implies additional constraints on the sampling design. Therefore, the fact that the selection probabilities are not proportional to the variable of interest implies that, for the extension of the Kokic and Bell to Poisson sampling, the hypotheses on the sampling design and the population are simultaneously violated, which could explain this explosion of errors of the estimator.

However, the conditional bias and classical Kokic and Bell methods, whatever the configuration, seem to be able to identify influential units for the estimation of the parameters affected, and thus guarantee significant gains in precision even when they are applied in a setting that is remote from their original hypotheses and the actual sampling design of the survey.

Appendix

A Demonstrations of the formulas for the extension of the Kokic and Bell method in the case of a Poisson sampling

A.1 Calculation of the mean square error of the winsorized estimator

First, we will calculate

E P { [ T ^ ( X ˜ ) T ( X ) ] 2 } = E P { [ T ^ ( X ˜ ) T ( X ˜ ) ] 2 + [ T ( X ˜ ) T ( X ) ] 2 + 2 [ T ^ ( X ˜ ) T ( X ˜ ) ] [ T ( X ˜ ) T ( X ) ] } = E P { [ T ^ ( X ˜ ) T ( X ˜ ) ] 2 } + [ T ( X ˜ ) T ( X ) ] 2 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaafaqaaeWacaaabaGaamyramaaBaaale aacaWGqbaabeaakmaacmaabaWaamWaaeaaceWGubGbaKaadaqadaqa aiqadIfagaacaaGaayjkaiaawMcaaiabgkHiTiaadsfadaqadaqaai aadIfaaiaawIcacaGLPaaaaiaawUfacaGLDbaadaahaaWcbeqaaiaa ikdaaaaakiaawUhacaGL9baaaeaacaaI9aGaamyramaaBaaaleaaca WGqbaabeaakmaaceaabaWaamWaaeaaceWGubGbaKaadaqadaqaaiqa dIfagaacaaGaayjkaiaawMcaaiabgkHiTiaadsfadaqadaqaaiqadI fagaacaaGaayjkaiaawMcaaaGaay5waiaaw2faamaaCaaaleqabaGa aGOmaaaakiabgUcaRmaadmaabaGaamivamaabmaabaGabmiwayaaia aacaGLOaGaayzkaaGaeyOeI0IaamivamaabmaabaGaamiwaaGaayjk aiaawMcaaaGaay5waiaaw2faamaaCaaaleqabaGaaGOmaaaaaOGaay 5EaaaabaaabaGaaGzbVlaaywW7daGacaqaamaaCaaaleqabaWaaWba aWqabeaadaahaaqabeaadaahaaqabeaadaahaaqabeaadaahaaqabe aaaaaaaaaaaaaaaaaakiabgUcaRiaaysW7caaIYaWaamWaaeaaceWG ubGbaKaadaqadaqaaiqadIfagaacaaGaayjkaiaawMcaaiabgkHiTi aadsfadaqadaqaaiqadIfagaacaaGaayjkaiaawMcaaaGaay5waiaa w2faamaadmaabaGaamivamaabmaabaGabmiwayaaiaaacaGLOaGaay zkaaGaeyOeI0IaamivamaabmaabaGaamiwaaGaayjkaiaawMcaaaGa ay5waiaaw2faaaGaayzFaaaabaaabaGaaGypaiaadweadaWgaaWcba GaamiuaaqabaGcdaGadaqaamaadmaabaGabmivayaajaWaaeWaaeaa ceWGybGbaGaaaiaawIcacaGLPaaacqGHsislcaWGubWaaeWaaeaace WGybGbaGaaaiaawIcacaGLPaaaaiaawUfacaGLDbaadaahaaWcbeqa aiaaikdaaaaakiaawUhacaGL9baacqGHRaWkdaWadaqaaiaadsfada qadaqaaiqadIfagaacaaGaayjkaiaawMcaaiabgkHiTiaadsfadaqa daqaaiaadIfaaiaawIcacaGLPaaaaiaawUfacaGLDbaadaahaaWcbe qaaiaaikdaaaaaaaaa@8F80@

with T ( X ˜ ) = E P [ T ^ ( X ˜ ) ] = h = 1 H i U h X ˜ h i . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGubWaaeWaaeaaceWGybGbaGaaai aawIcacaGLPaaacaaI9aGaamyramaaBaaaleaacaWGqbaabeaakmaa dmaabaGabmivayaajaWaaeWaaeaaceWGybGbaGaaaiaawIcacaGLPa aaaiaawUfacaGLDbaacaaI9aWaaabmaeqaleaacaWGObGaaGypaiaa igdaaeaacaWGibaaniabggHiLdGcdaaeqaqabSqaaiaadMgacqGHii IZcaWGvbWaaSbaaWqaaiaadIgaaeqaaaWcbeqdcqGHris5aOGaaGPa VlqadIfagaacamaaBaaaleaacaWGObGaamyAaaqabaGccaGGUaaaaa@4E9E@

Furthermore,

                                         E P { [ T ^ ( X ˜ ) T ( X ˜ ) ] 2 } = h = 1 H i U h d h i ( 1 1 d h i ) X ˜ h i 2 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGfbWaaSbaaSqaaiaadcfaaeqaaO WaaiWaaeaadaWadaqaaiqadsfagaqcamaabmaabaGabmiwayaaiaaa caGLOaGaayzkaaGaeyOeI0IaamivamaabmaabaGabmiwayaaiaaaca GLOaGaayzkaaaacaGLBbGaayzxaaWaaWbaaSqabeaacaaIYaaaaaGc caGL7bGaayzFaaGaaGypamaaqahabeWcbaGaamiAaiaai2dacaaIXa aabaGaamisaaqdcqGHris5aOGaaGPaVpaaqafabeWcbaGaamyAaiab gIGiolaadwfadaWgaaadbaGaamiAaaqabaaaleqaniabggHiLdGcca aMc8UaamizamaaBaaaleaacaWGObGaamyAaaqabaGcdaqadaqaaiaa igdacqGHsisldaWcaaqaaiaaigdaaeaacaWGKbWaaSbaaSqaaiaadI gacaWGPbaabeaaaaaakiaawIcacaGLPaaaceWGybGbaGaadaqhaaWc baGaamiAaiaadMgaaeaacaaIYaaaaaaa@5DE3@

finally:

         E P { [ T ^ ( X ˜ ) T ( X ) ] 2 } = h = 1 H i U h d h i ( 1 1 d h i ) X ˜ h i 2 + [ h = 1 H i U h ( X ˜ h i X h i ) ] 2 . ( A .1 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGfbWaaSbaaSqaaiaadcfaaeqaaO WaaiWaaeaadaWadaqaaiqadsfagaqcamaabmaabaGabmiwayaaiaaa caGLOaGaayzkaaGaeyOeI0IaamivamaabmaabaGaamiwaaGaayjkai aawMcaaaGaay5waiaaw2faamaaCaaaleqabaGaaGOmaaaaaOGaay5E aiaaw2haaiaai2dadaaeWbqabSqaaiaadIgacaaI9aGaaGymaaqaai aadIeaa0GaeyyeIuoakiaaykW7daaeqbqabSqaaiaadMgacqGHiiIZ caWGvbWaaSbaaWqaaiaadIgaaeqaaaWcbeqdcqGHris5aOGaaGPaVl aadsgadaWgaaWcbaGaamiAaiaadMgaaeqaaOWaaeWaaeaacaaIXaGa eyOeI0YaaSaaaeaacaaIXaaabaGaamizamaaBaaaleaacaWGObGaam yAaaqabaaaaaGccaGLOaGaayzkaaGabmiwayaaiaWaa0baaSqaaiaa dIgacaWGPbaabaGaaGOmaaaakiabgUcaRmaadmaabaWaaabCaeqale aacaWGObGaaGypaiaaigdaaeaacaWGibaaniabggHiLdGccaaMc8+a aabuaeqaleaacaWGPbGaeyicI4SaamyvamaaBaaameaacaWGObaabe aaaSqab0GaeyyeIuoakmaabmaabaGabmiwayaaiaWaaSbaaSqaaiaa dIgacaWGPbaabeaakiabgkHiTiaadIfadaWgaaWcbaGaamiAaiaadM gaaeqaaaGccaGLOaGaayzkaaaacaGLBbGaayzxaaWaaWbaaSqabeaa caaIYaaaaOGaaGzaVlaai6cacaaMf8UaaGzbVlaaywW7caGGOaGaae yqaiaac6cacaaIXaGaaiykaaaa@822F@

Assuming in each stratum that:

and noting that:

                                        X ˜ h i = X h i ( 1 J h i ) + J h i [ X h i d h i + ( 1 1 d h i ) K h d h i ] = 1 d h i [ d h i X h i + J h i ( 1 1 d h i ) ( K h d h i X h i ) ] MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaafaqaaeGacaaabaGabmiwayaaiaWaaS baaSqaaiaadIgacaWGPbaabeaaaOqaaiaai2dacaWGybWaaSbaaSqa aiaadIgacaWGPbaabeaakmaabmaabaGaaGymaiabgkHiTiaadQeada WgaaWcbaGaamiAaiaadMgaaeqaaaGccaGLOaGaayzkaaGaey4kaSIa amOsamaaBaaaleaacaWGObGaamyAaaqabaGcdaWadaqaamaalaaaba GaamiwamaaBaaaleaacaWGObGaamyAaaqabaaakeaacaWGKbWaaSba aSqaaiaadIgacaWGPbaabeaaaaGccqGHRaWkdaqadaqaaiaaigdacq GHsisldaWcaaqaaiaaigdaaeaacaWGKbWaaSbaaSqaaiaadIgacaWG PbaabeaaaaaakiaawIcacaGLPaaadaWcaaqaaiaadUeadaWgaaWcba GaamiAaaqabaaakeaacaWGKbWaaSbaaSqaaiaadIgacaWGPbaabeaa aaaakiaawUfacaGLDbaaaeaaaeaacaaI9aWaaSaaaeaacaaIXaaaba GaamizamaaBaaaleaacaWGObGaamyAaaqabaaaaOWaamWaaeaacaWG KbWaaSbaaSqaaiaadIgacaWGPbaabeaakiaadIfadaWgaaWcbaGaam iAaiaadMgaaeqaaOGaey4kaSIaamOsamaaBaaaleaacaWGObGaamyA aaqabaGcdaqadaqaaiaaigdacqGHsisldaWcaaqaaiaaigdaaeaaca WGKbWaaSbaaSqaaiaadIgacaWGPbaabeaaaaaakiaawIcacaGLPaaa daqadaqaaiaadUeadaWgaaWcbaGaamiAaaqabaGccqGHsislcaWGKb WaaSbaaSqaaiaadIgacaWGPbaabeaakiaadIfadaWgaaWcbaGaamiA aiaadMgaaeqaaaGccaGLOaGaayzkaaaacaGLBbGaayzxaaaaaaaa@787C@

and so that:

                             X ˜ h i X h i = 1 d h i ( 1 1 d h i ) J h i ( K h d h i X h i ) , ( A .2 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaaceWGybGbaGaadaWgaaWcbaGaamiAai aadMgaaeqaaOGaeyOeI0IaamiwamaaBaaaleaacaWGObGaamyAaaqa baGccaaI9aWaaSaaaeaacaaIXaaabaGaamizamaaBaaaleaacaWGOb GaamyAaaqabaaaaOWaaeWaaeaacaaIXaGaeyOeI0YaaSaaaeaacaaI XaaabaGaamizamaaBaaaleaacaWGObGaamyAaaqabaaaaaGccaGLOa GaayzkaaGaamOsamaaBaaaleaacaWGObGaamyAaaqabaGcdaqadaqa aiaadUeadaWgaaWcbaGaamiAaaqabaGccqGHsislcaWGKbWaaSbaaS qaaiaadIgacaWGPbaabeaakiaadIfadaWgaaWcbaGaamiAaiaadMga aeqaaaGccaGLOaGaayzkaaGaaGilaiaaywW7caaMf8UaaGzbVlaayw W7caGGOaGaaeyqaiaac6cacaaIYaGaaiykaaaa@5BBE@

we obtain:

                   X ˜ h i 2 = 1 d h i 2 [ d h i 2 X h i 2 + J h i ( 1 1 d h i ) 2 ( K h 2 + d h i 2 X h i 2 2 d h i X h i K h ) + 2 ( 1 1 d h i ) ( d h i X h i J h i K h J h i d h i 2 X h i 2 ) ] , ( A .3 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaafaqaaeGacaaabaGabmiwayaaiaWaa0 baaSqaaiaadIgacaWGPbaabaGaaGOmaaaaaOqaaiaai2dadaWcaaqa aiaaigdaaeaacaWGKbWaa0baaSqaaiaadIgacaWGPbaabaGaaGOmaa aaaaGcdaWabaqaaiaadsgadaqhaaWcbaGaamiAaiaadMgaaeaacaaI YaaaaOGaamiwamaaDaaaleaacaWGObGaamyAaaqaaiaaikdaaaGccq GHRaWkcaWGkbWaaSbaaSqaaiaadIgacaWGPbaabeaakmaabmaabaGa aGymaiabgkHiTmaalaaabaGaaGymaaqaaiaadsgadaWgaaWcbaGaam iAaiaadMgaaeqaaaaaaOGaayjkaiaawMcaamaaCaaaleqabaGaaGOm aaaakmaabmaabaGaam4samaaDaaaleaacaWGObaabaGaaGOmaaaaki abgUcaRiaadsgadaqhaaWcbaGaamiAaiaadMgaaeaacaaIYaaaaOGa amiwamaaDaaaleaacaWGObGaamyAaaqaaiaaikdaaaGccqGHsislca aIYaGaamizamaaBaaaleaacaWGObGaamyAaaqabaGccaWGybWaaSba aSqaaiaadIgacaWGPbaabeaakiaadUeadaWgaaWcbaGaamiAaaqaba aakiaawIcacaGLPaaaaiaawUfaaaqaaaqaamaadiaabaGaaGzbVlaa ywW7caaMf8Uaey4kaSIaaGOmamaabmaabaGaaGymaiabgkHiTmaala aabaGaaGymaaqaaiaadsgadaWgaaWcbaGaamiAaiaadMgaaeqaaaaa aOGaayjkaiaawMcaamaabmaabaGaamizamaaBaaaleaacaWGObGaam yAaaqabaGccaWGybWaaSbaaSqaaiaadIgacaWGPbaabeaakiaadQea daWgaaWcbaGaamiAaiaadMgaaeqaaOGaam4samaaBaaaleaacaWGOb aabeaakiabgkHiTiaadQeadaWgaaWcbaGaamiAaiaadMgaaeqaaOGa amizamaaDaaaleaacaWGObGaamyAaaqaaiaaikdaaaGccaWGybWaa0 baaSqaaiaadIgacaWGPbaabaGaaGOmaaaaaOGaayjkaiaawMcaaaGa ayzxaaGaaGilaiaaywW7caaMf8UaaGzbVlaaywW7caGGOaGaaeyqai aac6cacaaIZaGaaiykaaaaaaa@94FE@

and that:

          E m { [ h = 1 H i U h ( X ˜ h i X h i ) ] 2 } = h = 1 H i U h V m ( X ˜ h i X h i ) + [ h = 1 H i U h E m ( X ˜ h i X h i ) ] 2 = h = 1 H i U h { E m [ ( X ˜ h i X h i ) 2 ] [ E m ( X ˜ h i X h i ) ] 2 } + [ h = 1 H i U h E m ( X ˜ h i X h i ) ] 2 . ( A .4 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaafaqaaeabcaaaaeaacaWGfbWaaSbaaS qaaiaad2gaaeqaaOWaaiWaaeaadaWadaqaamaaqahabeWcbaGaamiA aiaai2dacaaIXaaabaGaamisaaqdcqGHris5aOGaaGPaVpaaqafabe WcbaGaamyAaiabgIGiolaadwfadaWgaaadbaGaamiAaaqabaaaleqa niabggHiLdGccaaMc8+aaeWaaeaaceWGybGbaGaadaWgaaWcbaGaam iAaiaadMgaaeqaaOGaeyOeI0IaamiwamaaBaaaleaacaWGObGaamyA aaqabaaakiaawIcacaGLPaaaaiaawUfacaGLDbaadaahaaWcbeqaai aaikdaaaaakiaawUhacaGL9baaaeaacqGH9aqpdaaeWbqabSqaaiaa dIgacaaI9aGaaGymaaqaaiaadIeaa0GaeyyeIuoakiaaykW7daaeqb qabSqaaiaadMgacqGHiiIZcaWGvbWaaSbaaWqaaiaadIgaaeqaaaWc beqdcqGHris5aOGaaGPaVlaadAfadaWgaaWcbaGaamyBaaqabaGcda qadaqaaiqadIfagaacamaaBaaaleaacaWGObGaamyAaaqabaGccqGH sislcaWGybWaaSbaaSqaaiaadIgacaWGPbaabeaaaOGaayjkaiaawM caaaqaaaqaaiaaysW7caaMe8Uaey4kaSYaamWaaeaadaaeWbqabSqa aiaadIgacaaI9aGaaGymaaqaaiaadIeaa0GaeyyeIuoakiaaykW7da aeqbqabSqaaiaadMgacqGHiiIZcaWGvbWaaSbaaWqaaiaadIgaaeqa aaWcbeqdcqGHris5aOGaaGPaVlaadweadaWgaaWcbaGaamyBaaqaba GcdaqadaqaaiqadIfagaacamaaBaaaleaacaWGObGaamyAaaqabaGc cqGHsislcaWGybWaaSbaaSqaaiaadIgacaWGPbaabeaaaOGaayjkai aawMcaaaGaay5waiaaw2faamaaCaaaleqabaGaaGOmaaaaaOqaaaqa aiaai2dadaaeWbqabSqaaiaadIgacaaI9aGaaGymaaqaaiaadIeaa0 GaeyyeIuoakiaaykW7daaeqbqabSqaaiaadMgacqGHiiIZcaWGvbWa aSbaaWqaaiaadIgaaeqaaaWcbeqdcqGHris5aOWaaiWaaeaacaWGfb WaaSbaaSqaaiaad2gaaeqaaOWaamWaaeaadaqadaqaaiqadIfagaac amaaBaaaleaacaWGObGaamyAaaqabaGccqGHsislcaWGybWaaSbaaS qaaiaadIgacaWGPbaabeaaaOGaayjkaiaawMcaamaaCaaaleqabaGa aGOmaaaaaOGaay5waiaaw2faaiabgkHiTmaadmaabaGaamyramaaBa aaleaacaWGTbaabeaakmaabmaabaGabmiwayaaiaWaaSbaaSqaaiaa dIgacaWGPbaabeaakiabgkHiTiaadIfadaWgaaWcbaGaamiAaiaadM gaaeqaaaGccaGLOaGaayzkaaaacaGLBbGaayzxaaWaaWbaaSqabeaa caaIYaaaaaGccaGL7bGaayzFaaaabaaabaGaaGjbVlaaysW7cqGHRa WkdaWadaqaamaaqahabeWcbaGaamiAaiaai2dacaaIXaaabaGaamis aaqdcqGHris5aOGaaGPaVpaaqafabeWcbaGaamyAaiabgIGiolaadw fadaWgaaadbaGaamiAaaqabaaaleqaniabggHiLdGccaaMc8Uaamyr amaaBaaaleaacaWGTbaabeaakmaabmaabaGabmiwayaaiaWaaSbaaS qaaiaadIgacaWGPbaabeaakiabgkHiTiaadIfadaWgaaWcbaGaamiA aiaadMgaaeqaaaGccaGLOaGaayzkaaaacaGLBbGaayzxaaWaaWbaaS qabeaacaaIYaaaaOGaaGzaVlaai6cacaaMf8UaaGzbVlaaywW7caaM f8UaaGzbVlaaywW7caaMf8UaaiikaiaabgeacaGGUaGaaGinaiaacM caaaaaaa@E8BB@

In the end, taking the expectation under the model of expression (A.1) and applying simplifications (A.2), (A.3), (A.4), we obtain, after some additional simplifications:

E m E P { [ T ˜ ^ ( X ˜ )T( X ) ] 2 } = h=1 H i U h ( 1 d hi )( 1 1 d hi ){ μ h 2 + σ h 2 + ( 1 1 d hi ) 2 [ K h 2 E m ( J hi )+ E m ( J hi d hi 2 X hi 2 )2 K h E m ( J hi d hi X hi ) ]+2( 1 1 d hi )[ K h E m ( J hi d hi X hi ) E m ( J hi d hi 2 X hi 2 ) ] } + h=1 H i U h ( 1 d hi ) 2 ( 1 1 d hi ) 2 { K h 2 E m ( J hi )+ E m ( J hi d hi 2 X hi 2 )2 K h E m ( J hi d hi X hi )+ [ K h E m ( J hi ) E m ( J hi d hi X hi ) ] 2 } + { h=1 H i U h ( 1 d hi )( 1 1 d hi )[ K h E m ( J hi ) E m ( J hi d hi X hi ) ] } 2 . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaafaqaaeWacaaabaGaamyramaaBaaale aacaWGTbaabeaakiaadweadaWgaaWcbaGaamiuaaqabaGcdaGadaqa amaadmaabaGabmivayaaiyaajaWaaeWaaeaaceWGybGbaGaaaiaawI cacaGLPaaacqGHsislcaWGubWaaeWaaeaacaWGybaacaGLOaGaayzk aaaacaGLBbGaayzxaaWaaWbaaSqabeaacaaIYaaaaaGccaGL7bGaay zFaaaabaGaaGypamaaqahabeWcbaGaamiAaiaai2dacaaIXaaabaGa amisaaqdcqGHris5aOGaaGPaVpaaqafabeWcbaGaamyAaiabgIGiol aadwfadaWgaaadbaGaamiAaaqabaaaleqaniabggHiLdGcdaqadaqa amaalaaabaGaaGymaaqaaiaadsgadaWgaaWcbaGaamiAaiaadMgaae qaaaaaaOGaayjkaiaawMcaamaabmaabaGaaGymaiabgkHiTmaalaaa baGaaGymaaqaaiaadsgadaWgaaWcbaGaamiAaiaadMgaaeqaaaaaaO GaayjkaiaawMcaamaacmaabaGaeqiVd02aa0baaSqaaiaadIgaaeaa caaIYaaaaOGaey4kaSIaeq4Wdm3aa0baaSqaaiaadIgaaeaacaaIYa aaaOGaey4kaSYaaeWaaeaacaaIXaGaeyOeI0YaaSaaaeaacaaIXaaa baGaamizamaaBaaaleaacaWGObGaamyAaaqabaaaaaGccaGLOaGaay zkaaWaaWbaaSqabeaacaaIYaaaaOWaamWaaeaacaWGlbWaa0baaSqa aiaadIgaaeaacaaIYaaaaOGaamyramaaBaaaleaacaWGTbaabeaakm aabmaabaGaamOsamaaBaaaleaacaWGObGaamyAaaqabaaakiaawIca caGLPaaacqGHRaWkcaWGfbWaaSbaaSqaaiaad2gaaeqaaOWaaeWaae aacaWGkbWaaSbaaSqaaiaadIgacaWGPbaabeaakiaadsgadaqhaaWc baGaamiAaiaadMgaaeaacaaIYaaaaOGaamiwamaaDaaaleaacaWGOb GaamyAaaqaaiaaikdaaaaakiaawIcacaGLPaaacqGHsislcaaIYaGa am4samaaBaaaleaacaWGObaabeaakiaadweadaWgaaWcbaGaamyBaa qabaGcdaqadaqaaiaadQeadaWgaaWcbaGaamiAaiaadMgaaeqaaOGa amizamaaBaaaleaacaWGObGaamyAaaqabaGccaWGybWaaSbaaSqaai aadIgacaWGPbaabeaaaOGaayjkaiaawMcaaaGaay5waiaaw2faaiab gUcaRiaaikdadaqadaqaaiaaigdacqGHsisldaWcaaqaaiaaigdaae aacaWGKbWaaSbaaSqaaiaadIgacaWGPbaabeaaaaaakiaawIcacaGL PaaadaWadaqaaiaadUeadaWgaaWcbaGaamiAaaqabaGccaWGfbWaaS baaSqaaiaad2gaaeqaaOWaaeWaaeaacaWGkbWaaSbaaSqaaiaadIga caWGPbaabeaakiaadsgadaWgaaWcbaGaamiAaiaadMgaaeqaaOGaam iwamaaBaaaleaacaWGObGaamyAaaqabaaakiaawIcacaGLPaaacqGH sislcaWGfbWaaSbaaSqaaiaad2gaaeqaaOWaaeWaaeaacaWGkbWaaS baaSqaaiaadIgacaWGPbaabeaakiaadsgadaqhaaWcbaGaamiAaiaa dMgaaeaacaaIYaaaaOGaamiwamaaDaaaleaacaWGObGaamyAaaqaai aaikdaaaaakiaawIcacaGLPaaaaiaawUfacaGLDbaaaiaawUhacaGL 9baaaeaaaeaacaaMe8UaaGjbVlabgUcaRmaaqahabeWcbaGaamiAai aai2dacaaIXaaabaGaamisaaqdcqGHris5aOGaaGPaVpaaqafabeWc baGaamyAaiabgIGiolaadwfadaWgaaadbaGaamiAaaqabaaaleqani abggHiLdGcdaqadaqaamaalaaabaGaaGymaaqaaiaadsgadaWgaaWc baGaamiAaiaadMgaaeqaaaaaaOGaayjkaiaawMcaamaaCaaaleqaba GaaGOmaaaakmaabmaabaGaaGymaiabgkHiTmaalaaabaGaaGymaaqa aiaadsgadaWgaaWcbaGaamiAaiaadMgaaeqaaaaaaOGaayjkaiaawM caamaaCaaaleqabaGaaGOmaaaakmaacmaabaGaam4samaaDaaaleaa caWGObaabaGaaGOmaaaakiaadweadaWgaaWcbaGaamyBaaqabaGcda qadaqaaiaadQeadaWgaaWcbaGaamiAaiaadMgaaeqaaaGccaGLOaGa ayzkaaGaey4kaSIaamyramaaBaaaleaacaWGTbaabeaakmaabmaaba GaamOsamaaBaaaleaacaWGObGaamyAaaqabaGccaWGKbWaa0baaSqa aiaadIgacaWGPbaabaGaaGOmaaaakiaadIfadaqhaaWcbaGaamiAai aadMgaaeaacaaIYaaaaaGccaGLOaGaayzkaaGaeyOeI0IaaGOmaiaa dUeadaWgaaWcbaGaamiAaaqabaGccaWGfbWaaSbaaSqaaiaad2gaae qaaOWaaeWaaeaacaWGkbWaaSbaaSqaaiaadIgacaWGPbaabeaakiaa dsgadaWgaaWcbaGaamiAaiaadMgaaeqaaOGaamiwamaaBaaaleaaca WGObGaamyAaaqabaaakiaawIcacaGLPaaacqGHRaWkdaWadaqaaiaa dUeadaWgaaWcbaGaamiAaaqabaGccaWGfbWaaSbaaSqaaiaad2gaae qaaOWaaeWaaeaacaWGkbWaaSbaaSqaaiaadIgacaWGPbaabeaaaOGa ayjkaiaawMcaaiabgkHiTiaadweadaWgaaWcbaGaamyBaaqabaGcda qadaqaaiaadQeadaWgaaWcbaGaamiAaiaadMgaaeqaaOGaamizamaa BaaaleaacaWGObGaamyAaaqabaGccaWGybWaaSbaaSqaaiaadIgaca WGPbaabeaaaOGaayjkaiaawMcaaaGaay5waiaaw2faamaaCaaaleqa baGaaGOmaaaakmaaCaaaleqabaGaaGzaVdaaaOGaay5Eaiaaw2haaa qaaaqaaiaaysW7caaMe8Uaey4kaSYaaiWaaeaadaaeWbqabSqaaiaa dIgacaaI9aGaaGymaaqaaiaadIeaa0GaeyyeIuoakiaaykW7daaeqb qabSqaaiaadMgacqGHiiIZcaWGvbWaaSbaaWqaaiaadIgaaeqaaaWc beqdcqGHris5aOWaaeWaaeaadaWcaaqaaiaaigdaaeaacaWGKbWaaS baaSqaaiaadIgacaWGPbaabeaaaaaakiaawIcacaGLPaaadaqadaqa aiaaigdacqGHsisldaWcaaqaaiaaigdaaeaacaWGKbWaaSbaaSqaai aadIgacaWGPbaabeaaaaaakiaawIcacaGLPaaadaWadaqaaiaadUea daWgaaWcbaGaamiAaaqabaGccaWGfbWaaSbaaSqaaiaad2gaaeqaaO WaaeWaaeaacaWGkbWaaSbaaSqaaiaadIgacaWGPbaabeaaaOGaayjk aiaawMcaaiabgkHiTiaadweadaWgaaWcbaGaamyBaaqabaGcdaqada qaaiaadQeadaWgaaWcbaGaamiAaiaadMgaaeqaaOGaamizamaaBaaa leaacaWGObGaamyAaaqabaGccaWGybWaaSbaaSqaaiaadIgacaWGPb aabeaaaOGaayjkaiaawMcaaaGaay5waiaaw2faaaGaay5Eaiaaw2ha amaaCaaaleqabaGaaGOmaaaakiaaygW7caGGUaaaaaaa@6053@

Given that the d h i X h i MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGKbWaaSbaaSqaaiaadIgacaWGPb aabeaakiaadIfadaWgaaWcbaGaamiAaiaadMgaaeqaaaaa@3799@ are assumed to be independent and follow the same law within the strata, it is sufficient to consider a random variable Z h MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGAbWaaSbaaSqaaiaadIgaaeqaaa aa@33B3@ that has the same law as one of the d h i X h i , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGKbWaaSbaaSqaaiaadIgacaWGPb aabeaakiaadIfadaWgaaWcbaGaamiAaiaadMgaaeqaaOGaaiilaaaa @3853@ , i.e., verifying:

Thus, we can also consider that a random variable J h = I Z h > K h MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGkbWaaSbaaSqaaiaadIgaaeqaaO GaaGypamrr1ngBPrwtHrhAYaqeguuDJXwAKbstHrhAGq1DVbaceaGa e8hIWN0aaSbaaSqaaiaadQfadaWgaaadbaGaamiAaaqabaWccaaI+a Gaam4samaaBaaameaacaWGObaabeaaaSqabaaaaa@4549@   to calculate the expectation with respect to the model of the winsorized indicator. The previous expression is rewritten:

E m E P [ ( T ˜ ^ ( X ˜ )T( X ) ) 2 ] = h=1 H i U h ( 1 d hi )( 1 1 d hi ){ μ h 2 + σ h 2 + ( 1 1 d hi ) 2 [ K h 2 E m ( J h )+ E m ( J h Z h 2 )2 K h E m ( J h Z h ) ]+2( 1 1 d hi )[ K h E m ( J h Z h ) E m ( J h Z h 2 ) ] } + h=1 H i U h ( 1 d hi ) 2 ( 1 1 d hi ) 2 { K h 2 E m ( J h )+ E m ( J h Z h 2 )2 K h E m ( J h Z h ) [ K h E m ( J h ) E m ( J h Z h ) ] 2 } + { h=1 H i U h ( 1 d hi )( 1 1 d hi )[ K h E m ( J h ) E m ( J h Z h ) ] } 2 . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaafaqaaeWacaaabaGaamyramaaBaaale aacaWGTbaabeaakiaadweadaWgaaWcbaGaamiuaaqabaGcdaWadaqa amaabmaabaGabmivayaaiyaajaWaaeWaaeaaceWGybGbaGaaaiaawI cacaGLPaaacqGHsislcaWGubWaaeWaaeaacaWGybaacaGLOaGaayzk aaaacaGLOaGaayzkaaWaaWbaaSqabeaacaaIYaaaaaGccaGLBbGaay zxaaaabaGaaGypamaaqahabeWcbaGaamiAaiaai2dacaaIXaaabaGa amisaaqdcqGHris5aOGaaGPaVpaaqafabeWcbaGaamyAaiabgIGiol aadwfadaWgaaadbaGaamiAaaqabaaaleqaniabggHiLdGcdaqadaqa amaalaaabaGaaGymaaqaaiaadsgadaWgaaWcbaGaamiAaiaadMgaae qaaaaaaOGaayjkaiaawMcaamaabmaabaGaaGymaiabgkHiTmaalaaa baGaaGymaaqaaiaadsgadaWgaaWcbaGaamiAaiaadMgaaeqaaaaaaO GaayjkaiaawMcaamaacmaabaGaeqiVd02aa0baaSqaaiaadIgaaeaa caaIYaaaaOGaey4kaSIaeq4Wdm3aa0baaSqaaiaadIgaaeaacaaIYa aaaOGaey4kaSYaaeWaaeaacaaIXaGaeyOeI0YaaSaaaeaacaaIXaaa baGaamizamaaBaaaleaacaWGObGaamyAaaqabaaaaaGccaGLOaGaay zkaaWaaWbaaSqabeaacaaIYaaaaOWaamWaaeaacaWGlbWaa0baaSqa aiaadIgaaeaacaaIYaaaaOGaamyramaaBaaaleaacaWGTbaabeaakm aabmaabaGaamOsamaaBaaaleaacaWGObaabeaaaOGaayjkaiaawMca aiabgUcaRiaadweadaWgaaWcbaGaamyBaaqabaGcdaqadaqaaiaadQ eadaWgaaWcbaGaamiAaaqabaGccaWGAbWaa0baaSqaaiaadIgaaeaa caaIYaaaaaGccaGLOaGaayzkaaGaeyOeI0IaaGOmaiaadUeadaWgaa WcbaGaamiAaaqabaGccaWGfbWaaSbaaSqaaiaad2gaaeqaaOWaaeWa aeaacaWGkbWaaSbaaSqaaiaadIgaaeqaaOGaamOwamaaBaaaleaaca WGObaabeaaaOGaayjkaiaawMcaaaGaay5waiaaw2faaiabgUcaRiaa ikdadaqadaqaaiaaigdacqGHsisldaWcaaqaaiaaigdaaeaacaWGKb WaaSbaaSqaaiaadIgacaWGPbaabeaaaaaakiaawIcacaGLPaaadaWa daqaaiaadUeadaWgaaWcbaGaamiAaaqabaGccaWGfbWaaSbaaSqaai aad2gaaeqaaOWaaeWaaeaacaWGkbWaaSbaaSqaaiaadIgaaeqaaOGa amOwamaaBaaaleaacaWGObaabeaaaOGaayjkaiaawMcaaiabgkHiTi aadweadaWgaaWcbaGaamyBaaqabaGcdaqadaqaaiaadQeadaWgaaWc baGaamiAaaqabaGccaWGAbWaa0baaSqaaiaadIgaaeaacaaIYaaaaa GccaGLOaGaayzkaaaacaGLBbGaayzxaaaacaGL7bGaayzFaaaabaaa baGaaGjbVlaaysW7cqGHRaWkdaaeWbqabSqaaiaadIgacaaI9aGaaG ymaaqaaiaadIeaa0GaeyyeIuoakiaaykW7daaeqbqabSqaaiaadMga cqGHiiIZcaWGvbWaaSbaaWqaaiaadIgaaeqaaaWcbeqdcqGHris5aO WaaeWaaeaadaWcaaqaaiaaigdaaeaacaWGKbWaaSbaaSqaaiaadIga caWGPbaabeaaaaaakiaawIcacaGLPaaadaahaaWcbeqaaiaaikdaaa GcdaqadaqaaiaaigdacqGHsisldaWcaaqaaiaaigdaaeaacaWGKbWa aSbaaSqaaiaadIgacaWGPbaabeaaaaaakiaawIcacaGLPaaadaahaa WcbeqaaiaaikdaaaGcdaGadaqaaiaadUeadaqhaaWcbaGaamiAaaqa aiaaikdaaaGccaWGfbWaaSbaaSqaaiaad2gaaeqaaOWaaeWaaeaaca WGkbWaaSbaaSqaaiaadIgaaeqaaaGccaGLOaGaayzkaaGaey4kaSIa amyramaaBaaaleaacaWGTbaabeaakmaabmaabaGaamOsamaaBaaale aacaWGObaabeaakiaadQfadaqhaaWcbaGaamiAaaqaaiaaikdaaaaa kiaawIcacaGLPaaacqGHsislcaaIYaGaam4samaaBaaaleaacaWGOb aabeaakiaadweadaWgaaWcbaGaamyBaaqabaGcdaqadaqaaiaadQea daWgaaWcbaGaamiAaaqabaGccaWGAbWaaSbaaSqaaiaadIgaaeqaaa GccaGLOaGaayzkaaGaeyOeI0YaamWaaeaacaWGlbWaaSbaaSqaaiaa dIgaaeqaaOGaamyramaaBaaaleaacaWGTbaabeaakmaabmaabaGaam OsamaaBaaaleaacaWGObaabeaaaOGaayjkaiaawMcaaiabgkHiTiaa dweadaWgaaWcbaGaamyBaaqabaGcdaqadaqaaiaadQeadaWgaaWcba GaamiAaaqabaGccaWGAbWaaSbaaSqaaiaadIgaaeqaaaGccaGLOaGa ayzkaaaacaGLBbGaayzxaaWaaWbaaSqabeaacaaIYaaaaOWaaWbaaS qabeaacaaMb8oaaaGccaGL7bGaayzFaaaabaaabaGaaGjbVlaaysW7 cqGHRaWkdaGadaqaamaaqahabeWcbaGaamiAaiaai2dacaaIXaaaba GaamisaaqdcqGHris5aOGaaGPaVpaaqafabeWcbaGaamyAaiabgIGi olaadwfadaWgaaadbaGaamiAaaqabaaaleqaniabggHiLdGcdaqada qaamaalaaabaGaaGymaaqaaiaadsgadaWgaaWcbaGaamiAaiaadMga aeqaaaaaaOGaayjkaiaawMcaamaabmaabaGaaGymaiabgkHiTmaala aabaGaaGymaaqaaiaadsgadaWgaaWcbaGaamiAaiaadMgaaeqaaaaa aOGaayjkaiaawMcaamaadmaabaGaam4samaaBaaaleaacaWGObaabe aakiaadweadaWgaaWcbaGaamyBaaqabaGcdaqadaqaaiaadQeadaWg aaWcbaGaamiAaaqabaaakiaawIcacaGLPaaacqGHsislcaWGfbWaaS baaSqaaiaad2gaaeqaaOWaaeWaaeaacaWGkbWaaSbaaSqaaiaadIga aeqaaOGaamOwamaaBaaaleaacaWGObaabeaaaOGaayjkaiaawMcaaa Gaay5waiaaw2faaaGaay5Eaiaaw2haamaaCaaaleqabaGaaGOmaaaa kiaaygW7caGGUaaaaaaa@3327@

A.2  Search for thresholds to minimize the MSE

To determine the value of the thresholds K h MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGlbWaaSbaaSqaaiaadIgaaeqaaa aa@33A4@ leading to the optimum of E m E P { [ T ^ ( X ˜ ) T ( X ) ] 2 } , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGfbWaaSbaaSqaaiaad2gaaeqaaO GaamyramaaBaaaleaacaWGqbaabeaakmaacmaabaWaamWaaeaaceWG ubGbaKaadaqadaqaaiqadIfagaacaaGaayjkaiaawMcaaiabgkHiTi aadsfadaqadaqaaiaadIfaaiaawIcacaGLPaaaaiaawUfacaGLDbaa daahaaWcbeqaaiaaikdaaaaakiaawUhacaGL9baacaGGSaaaaa@42D2@ we use the same property as Kokic and Bell in their demonstration, i.e., that:

                                                       E m ( Z h p J h ) = K h + t h p g h ( t ) d t , MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGfbWaaSbaaSqaaiaad2gaaeqaaO WaaeWaaeaacaWGAbWaa0baaSqaaiaadIgaaeaacaWGWbaaaOGaamOs amaaBaaaleaacaWGObaabeaaaOGaayjkaiaawMcaaiaai2dadaWdXa qabSqaaiaadUeadaWgaaadbaGaamiAaaqabaaaleaacqGHRaWkcqGH EisPa0Gaey4kIipakiaaykW7caWG0bWaa0baaSqaaiaadIgaaeaaca WGWbaaaOGaaGPaVlaadEgadaWgaaWcbaGaamiAaaqabaGcdaqadaqa aiaadshaaiaawIcacaGLPaaacaWGKbGaamiDaiaaiYcaaaa@4EB3@

and so that

                                                     K h E m ( Z h p J h ) = K h p g h ( K h ) . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaWcaaqaaiabgkGi2cqaaiabgkGi2k aadUeadaWgaaWcbaGaamiAaaqabaaaaOGaamyramaaBaaaleaacaWG TbaabeaakmaabmaabaGaamOwamaaDaaaleaacaWGObaabaGaamiCaa aakiaadQeadaWgaaWcbaGaamiAaaqabaaakiaawIcacaGLPaaacaaI 9aGaeyOeI0Iaam4samaaDaaaleaacaWGObaabaGaamiCaaaakiaadE gadaWgaaWcbaGaamiAaaqabaGcdaqadaqaaiaadUeadaWgaaWcbaGa amiAaaqabaaakiaawIcacaGLPaaacaaIUaaaaa@49CE@

By deriving relative to K h , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGlbWaaSbaaSqaaiaadIgaaeqaaO Gaaiilaaaa@345E@ and after simplification, we obtain that:

   K h E m E P { [ T ˜ ^ ( X ˜ ) T ( X ) ] 2 } = 2 B × A h E m ( J h ) + 2 C h { [ K h E m ( J h ) E m ( J h Z h ) ] [ 1 E m ( J h ) ] } + 2 D h [ K h E m ( J h ) E m ( J h Z h ) ] + 2 F h E m ( J h Z h ) ( A .5 ) MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaafaqaaeWacaaabaWaaSaaaeaacqGHci ITaeaacqGHciITcaWGlbWaaSbaaSqaaiaadIgaaeqaaaaakiaadwea daWgaaWcbaGaamyBaaqabaGccaWGfbWaaSbaaSqaaiaadcfaaeqaaO WaaiWaaeaadaWadaqaaiqadsfagaacgaqcamaabmaabaGabmiwayaa iaaacaGLOaGaayzkaaGaeyOeI0IaamivamaabmaabaGaamiwaaGaay jkaiaawMcaaaGaay5waiaaw2faamaaCaaaleqabaGaaGOmaaaaaOGa ay5Eaiaaw2haaaqaaiaai2dacaaIYaGaamOqaiabgEna0kaadgeada WgaaWcbaGaamiAaaqabaGccaWGfbWaaSbaaSqaaiaad2gaaeqaaOWa aeWaaeaacaWGkbWaaSbaaSqaaiaadIgaaeqaaaGccaGLOaGaayzkaa aabaaabaGaaGjbVlaaysW7cqGHRaWkcaaIYaGaam4qamaaBaaaleaa caWGObaabeaakmaacmaabaWaamWaaeaacaWGlbWaaSbaaSqaaiaadI gaaeqaaOGaamyramaaBaaaleaacaWGTbaabeaakmaabmaabaGaamOs amaaBaaaleaacaWGObaabeaaaOGaayjkaiaawMcaaiabgkHiTiaadw eadaWgaaWcbaGaamyBaaqabaGcdaqadaqaaiaadQeadaWgaaWcbaGa amiAaaqabaGccaWGAbWaaSbaaSqaaiaadIgaaeqaaaGccaGLOaGaay zkaaaacaGLBbGaayzxaaWaamWaaeaacaaIXaGaeyOeI0Iaamyramaa BaaaleaacaWGTbaabeaakmaabmaabaGaamOsamaaBaaaleaacaWGOb aabeaaaOGaayjkaiaawMcaaaGaay5waiaaw2faaaGaay5Eaiaaw2ha aaqaaaqaaiaaysW7caaMe8Uaey4kaSIaaGOmaiaadseadaWgaaWcba GaamiAaaqabaGcdaWadaqaaiaadUeadaWgaaWcbaGaamiAaaqabaGc caWGfbWaaSbaaSqaaiaad2gaaeqaaOWaaeWaaeaacaWGkbWaaSbaaS qaaiaadIgaaeqaaaGccaGLOaGaayzkaaGaeyOeI0IaamyramaaBaaa leaacaWGTbaabeaakmaabmaabaGaamOsamaaBaaaleaacaWGObaabe aakiaadQfadaWgaaWcbaGaamiAaaqabaaakiaawIcacaGLPaaaaiaa wUfacaGLDbaacqGHRaWkcaaIYaGaamOramaaBaaaleaacaWGObaabe aakiaadweadaWgaaWcbaGaamyBaaqabaGcdaqadaqaaiaadQeadaWg aaWcbaGaamiAaaqabaGccaWGAbWaaSbaaSqaaiaadIgaaeqaaaGcca GLOaGaayzkaaGaaGzbVlaaywW7caaMf8UaaiikaiaabgeacaGGUaGa aGynaiaacMcaaaaaaa@A1D5@

where

Equation (A.5) is reduced to:

    K h E m E P [ ( T ˜ ^ ( X ˜ )T( X ) ) 2 ]=0 A h ×B× E m ( J h )+( C h + D h ) K h E m ( J h ) C h E m ( J h )[ K h E m ( J h ) E m ( J h Z h ) ]+( F h C h D h ) E m ( J h Z h )=0. MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaafaqaaeabbaaaaeaacaaMf8UaaGzbVl aaywW7caaMf8UaaGzbVlaaywW7caaMf8UaaGzbVpaalaaabaGaeyOa IylabaGaeyOaIyRaam4samaaBaaaleaacaWGObaabeaaaaGccaWGfb WaaSbaaSqaaiaad2gaaeqaaOGaamyramaaBaaaleaacaWGqbaabeaa kmaadmaabaWaaeWaaeaaceWGubGbaGGbaKaadaqadaqaaiqadIfaga acaaGaayjkaiaawMcaaiabgkHiTiaadsfadaqadaqaaiaadIfaaiaa wIcacaGLPaaaaiaawIcacaGLPaaadaahaaWcbeqaaiaaikdaaaaaki aawUfacaGLDbaacaaI9aGaaGimaaqaaiaaywW7caaMf8UaaGzbVlaa ywW7caaMf8UaaGzbVlaaywW7caaMf8UaaGzbVlaaywW7caaMf8UaaG zbVlabgsDiBdqaaiaadgeadaWgaaWcbaGaamiAaaqabaGccqGHxdaT caWGcbGaey41aqRaamyramaaBaaaleaacaWGTbaabeaakmaabmaaba GaamOsamaaBaaaleaacaWGObaabeaaaOGaayjkaiaawMcaaiabgUca RmaabmaabaGaam4qamaaBaaaleaacaWGObaabeaakiabgUcaRiaads eadaWgaaWcbaGaamiAaaqabaaakiaawIcacaGLPaaacaWGlbWaaSba aSqaaiaadIgaaeqaaOGaamyramaaBaaaleaacaWGTbaabeaakmaabm aabaGaamOsamaaBaaaleaacaWGObaabeaaaOGaayjkaiaawMcaaaqa aiaaywW7caaMf8UaaGzbVlaaywW7caaMf8UaaGzbVlaaywW7cqGHsi slcaWGdbWaaSbaaSqaaiaadIgaaeqaaOGaamyramaaBaaaleaacaWG TbaabeaakmaabmaabaGaamOsamaaBaaaleaacaWGObaabeaaaOGaay jkaiaawMcaamaadmaabaGaam4samaaBaaaleaacaWGObaabeaakiaa dweadaWgaaWcbaGaamyBaaqabaGcdaqadaqaaiaadQeadaWgaaWcba GaamiAaaqabaaakiaawIcacaGLPaaacqGHsislcaWGfbWaaSbaaSqa aiaad2gaaeqaaOWaaeWaaeaacaWGkbWaaSbaaSqaaiaadIgaaeqaaO GaamOwamaaBaaaleaacaWGObaabeaaaOGaayjkaiaawMcaaaGaay5w aiaaw2faaiabgUcaRmaabmaabaGaamOramaaBaaaleaacaWGObaabe aakiabgkHiTiaadoeadaWgaaWcbaGaamiAaaqabaGccqGHsislcaWG ebWaaSbaaSqaaiaadIgaaeqaaaGccaGLOaGaayzkaaGaamyramaaBa aaleaacaWGTbaabeaakmaabmaabaGaamOsamaaBaaaleaacaWGObaa beaakiaadQfadaWgaaWcbaGaamiAaaqabaaakiaawIcacaGLPaaaca aI9aGaaGimaiaac6caaaaaaa@BCAB@

Finally, by noting that ( F h C h D h ) = 0 MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaqadaqaaiaadAeadaWgaaWcbaGaam iAaaqabaGccqGHsislcaWGdbWaaSbaaSqaaiaadIgaaeqaaOGaeyOe I0IaamiramaaBaaaleaacaWGObaabeaaaOGaayjkaiaawMcaaiaai2 dacaaIWaaaaa@3C64@ and assuming that E m ( J h ) > 0 , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGfbWaaSbaaSqaaiaad2gaaeqaaO WaaeWaaeaacaWGkbWaaSbaaSqaaiaadIgaaeqaaaGccaGLOaGaayzk aaGaaGOpaiaaicdacaGGSaaaaa@395A@ we obtain that the threshold K h MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGlbWaaSbaaSqaaiaadIgaaeqaaa aa@33A4@ minimizing the MSE verifies the equation:

                                 A h × B + ( C h + D h ) K h C h [ K h E m ( J h ) E m ( J h Z h ) ] = 0 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGbbWaaSbaaSqaaiaadIgaaeqaaO Gaey41aqRaamOqaiabgUcaRmaabmaabaGaam4qamaaBaaaleaacaWG ObaabeaakiabgUcaRiaadseadaWgaaWcbaGaamiAaaqabaaakiaawI cacaGLPaaacaWGlbWaaSbaaSqaaiaadIgaaeqaaOGaeyOeI0Iaam4q amaaBaaaleaacaWGObaabeaakmaadmaabaGaam4samaaBaaaleaaca WGObaabeaakiaadweadaWgaaWcbaGaamyBaaqabaGcdaqadaqaaiaa dQeadaWgaaWcbaGaamiAaaqabaaakiaawIcacaGLPaaacqGHsislca WGfbWaaSbaaSqaaiaad2gaaeqaaOWaaeWaaeaacaWGkbWaaSbaaSqa aiaadIgaaeqaaOGaamOwamaaBaaaleaacaWGObaabeaaaOGaayjkai aawMcaaaGaay5waiaaw2faaiaai2dacaaIWaaaaa@559F@

which is reduced further to

                                       B + ( C h + D h ) A h K h = C h A h [ K h E m ( J h ) E m ( J h Z h ) ] . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGcbGaey4kaSYaaSaaaeaadaqada qaaiaadoeadaWgaaWcbaGaamiAaaqabaGccqGHRaWkcaWGebWaaSba aSqaaiaadIgaaeqaaaGccaGLOaGaayzkaaaabaGaamyqamaaBaaale aacaWGObaabeaaaaGccaWGlbWaaSbaaSqaaiaadIgaaeqaaOGaaGyp amaalaaabaGaam4qamaaBaaaleaacaWGObaabeaaaOqaaiaadgeada WgaaWcbaGaamiAaaqabaaaaOWaamWaaeaacaWGlbWaaSbaaSqaaiaa dIgaaeqaaOGaamyramaaBaaaleaacaWGTbaabeaakmaabmaabaGaam OsamaaBaaaleaacaWGObaabeaaaOGaayjkaiaawMcaaiabgkHiTiaa dweadaWgaaWcbaGaamyBaaqabaGcdaqadaqaaiaadQeadaWgaaWcba GaamiAaaqabaGccaWGAbWaaSbaaSqaaiaadIgaaeqaaaGccaGLOaGa ayzkaaaacaGLBbGaayzxaaGaaGOlaaaa@54A2@

It remains to be shown that C h [ K h E m ( J h ) E m ( J h Z h ) ] A h B MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaWcbaWcbaGaam4qamaaBaaameaaca WGObaabeaalmaadmaabaGaam4samaaBaaameaacaWGObaabeaaliaa dweadaWgaaadbaGaamyBaaqabaWcdaqadaqaaiaadQeadaWgaaadba GaamiAaaqabaaaliaawIcacaGLPaaacqGHsislcaWGfbWaaSbaaWqa aiaad2gaaeqaaSWaaeWaaeaacaWGkbWaaSbaaWqaaiaadIgaaeqaaS GaamOwamaaBaaameaacaWGObaabeaaaSGaayjkaiaawMcaaaGaay5w aiaaw2faaaqaaiaadgeadaWgaaadbaGaamiAaaqabaWccaWGcbaaaa aa@4830@ tends toward zero when n . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGUbGaeyOKH4QaeyOhIuQaaiOlaa aa@36BE@ However,

                      C h | K h E m ( J h ) E m ( J h Z h ) | | A h B | = C h | K h E m ( J h ) E m ( J h Z h ) | | A h | l = 1 H | A l | | K l E m ( J l ) E m ( J l Z l ) | MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaWcaaqaaiaadoeadaWgaaWcbaGaam iAaaqabaGcdaabdaqaaiaaykW7caWGlbWaaSbaaSqaaiaadIgaaeqa aOGaamyramaaBaaaleaacaWGTbaabeaakmaabmaabaGaamOsamaaBa aaleaacaWGObaabeaaaOGaayjkaiaawMcaaiabgkHiTiaadweadaWg aaWcbaGaamyBaaqabaGcdaqadaqaaiaadQeadaWgaaWcbaGaamiAaa qabaGccaWGAbWaaSbaaSqaaiaadIgaaeqaaaGccaGLOaGaayzkaaGa aGPaVdGaay5bSlaawIa7aaqaamaaemaabaGaaGPaVlaadgeadaWgaa WcbaGaamiAaaqabaGccaWGcbGaaGPaVdGaay5bSlaawIa7aaaacaaI 9aWaaSaaaeaacaWGdbWaaSbaaSqaaiaadIgaaeqaaOWaaqWaaeaaca aMc8Uaam4samaaBaaaleaacaWGObaabeaakiaadweadaWgaaWcbaGa amyBaaqabaGcdaqadaqaaiaadQeadaWgaaWcbaGaamiAaaqabaaaki aawIcacaGLPaaacqGHsislcaWGfbWaaSbaaSqaaiaad2gaaeqaaOWa aeWaaeaacaWGkbWaaSbaaSqaaiaadIgaaeqaaOGaamOwamaaBaaale aacaWGObaabeaaaOGaayjkaiaawMcaaiaaykW7aiaawEa7caGLiWoa aeaadaabdaqaaiaaykW7caWGbbWaaSbaaSqaaiaadIgaaeqaaOGaaG PaVdGaay5bSlaawIa7aiaaysW7daaeWaqaamaaemaabaGaaGPaVlaa dgeadaWgaaWcbaGaamiBaaqabaGccaaMc8oacaGLhWUaayjcSdGaaG jbVpaaemaabaGaaGPaVlaadUeadaWgaaWcbaGaamiBaaqabaGccaWG fbWaaSbaaSqaaiaad2gaaeqaaOWaaeWaaeaacaWGkbWaaSbaaSqaai aadYgaaeqaaaGccaGLOaGaayzkaaGaeyOeI0IaamyramaaBaaaleaa caWGTbaabeaakmaabmaabaGaamOsamaaBaaaleaacaWGSbaabeaaki aadQfadaWgaaWcbaGaamiBaaqabaaakiaawIcacaGLPaaacaaMc8oa caGLhWUaayjcSdaaleaacaWGSbGaaGypaiaaigdaaeaacaWGibaani abggHiLdaaaaaa@99ED@

and according to hypothesis (2.8) relating to inclusion probabilities, we have that, h = 1, , H , i U h MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqGHaiIicaWGObGaaGypaiaaigdaca aISaGaaGjbVlablAciljaacYcacaaMe8UaamisaiaaiYcacaaMe8Ua eyiaIiIaamyAaiabgIGiolaadwfadaWgaaWcbaGaamiAaaqabaaaaa@42E1@ d h i > 1. MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGKbWaaSbaaSqaaiaadIgacaWGPb aabeaakiaai6dacaaIXaGaaiOlaaaa@36EA@ Which implies A h > 0 , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGbbWaaSbaaSqaaiaadIgaaeqaaO GaaGOpaiaaicdacaGGSaaaaa@35D6@ and thus:

                                  | C h || K h E m ( J h ) E m ( J h Z h ) | | A h B | C h A h 2 i U h ( 1 d hi ) 2 ( 1 1 d hi ) 2 [ i U h ( 1 d hi )( 1 1 d hi ) ] 2 1 [ i U h ( 1 d hi )( 1 1 d hi ) ] . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaafaqaaeWacaaabaWaaSaaaeaadaabda qaaiaaykW7caWGdbWaaSbaaSqaaiaadIgaaeqaaOGaaGPaVdGaay5b SlaawIa7aiaaysW7daabdaqaaiaaykW7caWGlbWaaSbaaSqaaiaadI gaaeqaaOGaamyramaaBaaaleaacaWGTbaabeaakmaabmaabaGaamOs amaaBaaaleaacaWGObaabeaaaOGaayjkaiaawMcaaiabgkHiTiaadw eadaWgaaWcbaGaamyBaaqabaGcdaqadaqaaiaadQeadaWgaaWcbaGa amiAaaqabaGccaWGAbWaaSbaaSqaaiaadIgaaeqaaaGccaGLOaGaay zkaaGaaGPaVdGaay5bSlaawIa7aaqaamaaemaabaGaaGPaVlaadgea daWgaaWcbaGaamiAaaqabaGccaWGcbGaaGPaVdGaay5bSlaawIa7aa aaaeaacqGHKjYOdaWcaaqaaiaadoeadaWgaaWcbaGaamiAaaqabaaa keaacaWGbbWaa0baaSqaaiaadIgaaeaacaaIYaaaaaaaaOqaaaqaai abgsMiJoaalaaabaWaaabeaeaadaqadaqaamaaleaaleaacaaIXaaa baGaamizamaaBaaameaacaWGObGaamyAaaqabaaaaaGccaGLOaGaay zkaaWaaWbaaSqabeaacaaIYaaaaOWaaeWaaeaacaaIXaGaeyOeI0Ya aSqaaSqaaiaaigdaaeaacaWGKbWaaSbaaWqaaiaadIgacaWGPbaabe aaaaaakiaawIcacaGLPaaadaahaaWcbeqaaiaaikdaaaaabaGaamyA aiabgIGiolaadwfadaWgaaadbaGaamiAaaqabaaaleqaniabggHiLd aakeaadaWadaqaamaaqababaWaaeWaaeaadaWcbaWcbaGaaGymaaqa aiaadsgadaWgaaadbaGaamiAaiaadMgaaeqaaaaaaOGaayjkaiaawM caamaabmaabaGaaGymaiabgkHiTmaaleaaleaacaaIXaaabaGaamiz amaaBaaameaacaWGObGaamyAaaqabaaaaaGccaGLOaGaayzkaaaale aacaWGPbGaeyicI4SaamyvamaaBaaameaacaWGObaabeaaaSqab0Ga eyyeIuoaaOGaay5waiaaw2faamaaCaaaleqabaGaaGOmaaaaaaaake aaaeaacqGHKjYOdaWcaaqaaiaaigdaaeaadaWadaqaamaaqababaWa aeWaaeaadaWcbaWcbaGaaGymaaqaaiaadsgadaWgaaadbaGaamiAai aadMgaaeqaaaaaaOGaayjkaiaawMcaamaabmaabaGaaGymaiabgkHi TmaaleaaleaacaaIXaaabaGaamizamaaBaaameaacaWGObGaamyAaa qabaaaaaGccaGLOaGaayzkaaaaleaacaWGPbGaeyicI4Saamyvamaa BaaameaacaWGObaabeaaaSqab0GaeyyeIuoaaOGaay5waiaaw2faaa aacaGGUaaaaaaa@A491@

However, it is possible to demonstrate from hypothesis (2.8) that [ i U h ( 1 d h i ) ( 1 1 d h i ) ] 1 = O ( 1 N h ) . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaWadaqaamaaqababeWcbaGaamyAai abgIGiolaadwfadaWgaaadbaGaamiAaaqabaaaleqaniabggHiLdGc daqadaqaamaaleaaleaacaaIXaaabaGaamizamaaBaaameaacaWGOb GaamyAaaqabaaaaaGccaGLOaGaayzkaaWaaeWaaeaacaaIXaGaeyOe I0YaaSqaaSqaaiaaigdaaeaacaWGKbWaaSbaaWqaaiaadIgacaWGPb aabeaaaaaakiaawIcacaGLPaaaaiaawUfacaGLDbaadaahaaWcbeqa aiabgkHiTiaaigdaaaGccaaI9aGaam4tamaabmaabaWaaSqaaSqaai aaigdaaeaacaWGobWaaSbaaWqaaiaadIgaaeqaaaaaaOGaayjkaiaa wMcaaiaac6caaaa@4EED@ Thus: C h [ K h E m ( J h ) E m ( J h Z h ) ] A h B MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaadaWcbaWcbaGaam4qamaaBaaameaaca WGObaabeaalmaadmaabaGaam4samaaBaaameaacaWGObaabeaaliaa dweadaWgaaadbaGaamyBaaqabaWcdaqadaqaaiaadQeadaWgaaadba GaamiAaaqabaaaliaawIcacaGLPaaacqGHsislcaWGfbWaaSbaaWqa aiaad2gaaeqaaSWaaeWaaeaacaWGkbWaaSbaaWqaaiaadIgaaeqaaS GaamOwamaaBaaameaacaWGObaabeaaaSGaayjkaiaawMcaaaGaay5w aiaaw2faaaqaaiaadgeadaWgaaadbaGaamiAaaqabaWccaWGcbaaaa aa@4830@ tends toward zero when n . MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGUbGaeyOKH4QaeyOhIuQaaiOlaa aa@36BE@

K h MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacaWGlbWaaSbaaSqaaiaadIgaaeqaaa aa@33A3@ is thus equivalent in each stratum to A h ( C h + D h ) B , MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaebbnrfifHhDYfgasaacH8rrps0l bbf9q8WrFfeuY=Hhbbf9y8WrFj0xc9vqFj0db9qqvqFr0dXdHiVc=b YP0xH8peeu0xXdcrpe0db9Wqpepec9ar=xfr=xfr=tmeaabaqaciGa caGaaeqabaqaaeaadaaakeaacqGHsisldaWcbaWcbaGaamyqamaaBa aameaacaWGObaabeaaaSqaamaabmaabaGaam4qamaaBaaameaacaWG ObaabeaaliabgUcaRiaadseadaWgaaadbaGaamiAaaqabaaaliaawI cacaGLPaaaaaGccaWGcbGaaiilaaaa@3C76@ when the size of the population and the sample tend toward infinity.

References

Chambers, R. (1986). Outlier robust finite population estimation. Journal of the American Statistical Association, 81, 1063-1069.

Beaumont, J.-F., Haziza, D. and Ruiz-Gazen, A. (2013). A unified approach to robust estimation in finite population sampling. Biometrika, 100, 555-569.

Clark, R.G. (1995). Winsorization methods in sample surveys. Master’s thesis, Department of Statistics, Australian National University.

Dalén, J. (1987). Practical estimators of a population total which reduce the impact of large observations. R & D Report, Statistics Sweden.

Demoly, E., Fizzala, A. and Gros, E. (2014). Méthodes et pratiques des enquêtes entreprises à l’Insee. Journal de la Société Française de Statistique, 155-4.

Deroyon, T. (2015). Traitement des observations atypiques d’une enquête par winsorisation : application aux Enquêtes Sectorielles Annuelles. Actes des Journées de Méthodologie Statistique.

Fizzala, A. (2017). Adaptations of Winsorization Caused by Profiling - An Example Based on the French SBS Survey. European Establishment Survey Workshop, Southampton.

Favre-Martinoz, C., Haziza, D. and Beaumont, J.-F. (2015). A method of determining the winsorization threshold, with an application to domain estimation. Survey Methodology, 41, 1, 57-77. Paper available at https://www150.statcan.gc.ca/n1/fr/pub/12-001-x/2015001/article/14199-eng.pdf.

Favre-Martinoz, C., Haziza, D. and Beaumont, J.-F. (2016). Robust inference in two-phase sampling designs with application to unit nonresponse. Scandinavian Journal of Statistics, 43, 1019-1034.

Kokic, P.N., and Bell, P.A. (1994). Optimal winsorizing cut-offs for a stratified finite population estimation. Journal of Official Statistics, 10-4, 419-435.

Moreno-Rebollo, J.-L., Muñoz-Reyez, A.M. and Muñoz-Pichardo, J.M. (1999). Influence diagnostics in survey sampling: Conditional bias. Biometrika, 86, 923-968.

Moreno-Rebollo, J.-L., Muñoz-Reyez, A.M., Jimenez-Gamero, J.-L. and Muñoz-Pichardo, J.M. (2002). Influence diagnostics in survey sampling: Estimating the conditional bias. Metrika, 55, 209-214.

Rivest, L.-P., and Hurtubise, D. (1995). On searls’ winsorized mean for skewed populations. Survey Methodology, 21, 2, 107-116. Paper available at https://www150.statcan.gc.ca/n1/fr/pub/12-001-x/1995002/article/14399-eng.pdf.

Tambay, J.-L. (1988). An integrated approach for the treatment of outliers in sub-annual surveys. Proceedings of the Survey Research Methods Section, American Statistical Association, 229-234.


Date modified: