A short note on quantile and expectile estimation in unequal probability samples 5. Simulations
We run a small simulation study to show the performance of the expectile based estimates. In the following, we make use of the Mizuno sampling method (see Midzuno 1952) and define the inclusion probabilities ${\pi}_{j}$ proportional to a measure of size $x,$ see R package “sampling” by Tillé and Matei (2015). We examine two data sets also used in Kuk (1988). The first data set (Dwellings) contains two variables, the number of dwelling units $\left(X\right),$ and the number of rented units $\left(Y\right),$ which are highly correlated (with a correlation of 0.97); see also Kish (1965). The second data set (Villages) includes information on the population $\left(X\right)$ and on the number of workers in household industry $\left(Y\right)$ for 128 villages in India; see Murthy (1967). In the second data set the correlation between $Y$ and $X$ is 0.54. In order to compare our simulation results with the results of Kuk (1988) we choose the same sample size of $n\mathrm{=30}$ (from a total population of $N\mathrm{=270}$ for the Dwellings data and $N\mathrm{=128}$ for the Villages data).
We compare quantiles defined by inversion of ${\widehat{F}}_{R}$ with quantiles defined by inversion of ${\widehat{F}}_{R}^{M}.$ In Table 5.1 we give the root mean squared error (RMSE) and the relative efficiency for specified quantiles. We note that the median for the village data and for the Dwelling data also upper quantiles derived from expectiles yield increased efficiency. Also the efficiency gain does not hold uniformly as we observe a loss of efficiency for lower quantiles.
$\alpha $  quantiles $\sqrt{\text{MSE}\left({\widehat{Q}}_{R}\left(\alpha \right)\right)}$ 
quantiles from expectiles $\sqrt{\text{MSE}\left({\widehat{Q}}_{R}^{M}\left(\alpha \right)\right)}$ 
relative efficiency $\frac{\sqrt{\text{MSE}\left({\widehat{Q}}_{R}^{M}\left(\alpha \right)\right)}}{\sqrt{\text{MSE}\left({\widehat{Q}}_{R}\left(\alpha \right)\right)}}$ 


Dwellings  0.1  2.57  2.76  1.07 
0.25  1.77  1.97  1.11  
0.5  2.45  2.35  0.96  
0.75  3.15  2.91  0.92  
0.9  4.20  3.43  0.82  
Villages  0.1  5.52  6.65  1.21 
0.25  11.41  10.31  0.90  
0.5  12.29  11.69  0.95  
0.75  16.24  15.41  0.95  
0.9  13.31  18.34  1.38 
To obtain more insight we run a simulation scenario which involves a larger sample size of $n\mathrm{=100}$ selected from populations of sizes $N\mathrm{=}\text{1,000}$ and $N\mathrm{=}\text{10,000}\text{.}$ We draw $Y$ and $X$ from a bivariate log standard normal distribution with $\mu \mathrm{=0}$ and $\sigma \mathrm{=1.}$ The variables $Y$ and $X$ are drawn such that the correlation between the variables is equal to 0.9. We again calculate the root mean squared error for a range of $\alpha $ values and show the relative efficiency of the expectile based approach in Figure 5.1. For better visual presentation we show a smoothed version of the relative efficiency. We notice a reduction in the root mean squared error for both cases $N\mathrm{=}\text{1,000}$ and $N\mathrm{=}\text{10,000}.$ We may conclude that the expectiles can easily be fitted in unequal probability sampling and the relation between expectiles and the distribution function can be used numerically to calculate quantiles with increased efficiency. This efficiency gain holds for upper quantiles only, that is for $\alpha $ bounded away from zero. Note however that the sampling scheme is such that large values of $Y$ are sampled with higher probability, reflecting that the sampling scheme aims to get more reliable estimates for the right hand side of the distribution function, i.e., for large quantiles. If we are interested in small quantiles we should use a different samling scheme by giving individuals with small values of $Y$ an increased inclusion probability. In this case the behavior shown in Figure 5.1 would be mirrored with respect to $\alpha .$
Description of Figure 5.1
Figure made of two graphs presenting the relative root mean squared error of quantiles and quantiles from expectiles for the Probability Proportional to Size (PPS) design calculated from 500 repetitions, for $N\mathrm{=}\text{1,000}$ and $N\mathrm{=}\text{10,000}.$ For both graphs, the y axis is the ratio of RMSE quantiles from ${F}_{R}^{M}$ and from ${F}_{R},$ going from 0.90 to 1.15. $\alpha $ is on the x axis, going from 0.01 to 0.99. For $N\mathrm{=}\text{1,000,}$ the ratio is close to 1.15 for small $\alpha $ values before decreasing between 0.90 and 0.95 for an $\alpha $ value of about 0.25. After, the ratio is globally increasing slowly toward 1.00 when $\alpha $ increases. For $N\mathrm{=}\text{10,000,}$ the ratio is close to 1.10 for small $\alpha $ values before decreasing to about 0.95 for $\alpha $ between 0.20 and 0.25. After, the ratio is globally increasing more quickly toward 1.00 when $\alpha $ increases.
 Date modified: