# Sample-based estimation of mean electricity consumption curves for small domains Section 4. Model-based estimation methods

In this section, we use the model from Valliant, Dorfman and Royall (2000), in which curves ${Y}_{i}$ are considered to be random and we propose four innovative approaches for responding to our problem estimating curves with a mean demand for small domains. Assuming that ${Y}_{i}\left(t\right)$ and the auxiliary information vector ${X}_{i}$ are available for each individual $i$ in the domain $d$ and that the mean ${\overline{X}}_{d}={\sum }_{i\in {U}_{d}}\text{\hspace{0.17em}}{X}_{i}/{N}_{d}$ is also known.

We assume that the auxiliary variables are related to the demand curves according to a functional model of superpopulation on all of the population that is generally expressed as:

$\xi \text{\hspace{0.17em}}:\text{ }{Y}_{i}\left(t\right)={f}_{d}\left({X}_{i},\text{\hspace{0.17em}}t\right)+{ϵ}_{i}\left(t\right),\text{ }i\in {U}_{d},\text{ }t\in \left[0,\text{\hspace{0.17em}}T\right],\text{ }\text{ }\text{ }\text{ }\left(4.1\right)$

with ${f}_{d},$ an unknown regression function to be estimated, which can vary from one domain to another, and ${ϵ}_{i}$ a process of zero expectation noise, zero covariance for different individuals and non-null regarding time.

If the size of domain ${N}_{d}$ is large, then the mean ${\mu }_{d}$ will be estimated by

${\stackrel{^}{\mu }}_{d}\left(t\right)=\frac{1}{{N}_{d}}\sum _{i\in {U}_{d}}\text{\hspace{0.17em}}{\stackrel{^}{Y}}_{i}\left(t\right),\text{ }t\in \left[0,\text{\hspace{0.17em}}T\right],$

where ${\stackrel{^}{Y}}_{i}\left(t\right)={\stackrel{^}{f}}_{d}\left({X}_{i},\text{\hspace{0.17em}}t\right)$ is the prediction of ${Y}_{i}\left(t\right).$ Otherwise, the mean ${\mu }_{d}$ is estimated by (see Valliant et al., 2000):

${\stackrel{^}{\mu }}_{d}\left(t\right)=\frac{1}{{N}_{d}}\left(\sum _{i\in {s}_{d}}\text{\hspace{0.17em}}{Y}_{i}\left(t\right)+\sum _{i\text{\hspace{0.17em}}\in \text{\hspace{0.17em}}{U}_{d}\text{\hspace{0.17em}}-\text{\hspace{0.17em}}{s}_{d}}\text{\hspace{0.17em}}{\stackrel{^}{Y}}_{i}\left(t\right)\right),\text{ }t\in \left[0,\text{\hspace{0.17em}}T\right].\text{ }\text{ }\text{ }\text{ }\left(4.2\right)$

The quality of our estimates thus depends on the quality of our model: if the model is false, that may lead to biases in the estimates.

## 4.1  Functional linear model

The simplest model of form (4.1) is the functional linear regression model from Faraway (1997):

${Y}_{i}\left(t\right)={X}_{i}^{\prime }\beta \left(t\right)+{\epsilon }_{i}\left(t\right),\text{ }t\in \left[0,\text{\hspace{0.17em}}T\right],\text{ }i\in {U}_{d}.\text{ }\text{ }\text{ }\text{ }\left(4.3\right)$

where the residuals ${\epsilon }_{i}\left(t\right)$ are independent for $i\ne j,$ distributed based on a law of means of 0 and of variance of ${\sigma }_{i}^{2}\left(t\right).$ If the size of domain ${N}_{d}$ is large, then the mean of $Y$ in domain $d$ is estimated by

${\stackrel{^}{\mu }}_{d}^{\text{blu}}\left(t\right)={\overline{X}}_{d}^{\prime }{\stackrel{^}{\beta }}_{\text{BLU}}\left(t\right),\text{ }t\in \left[0,\text{\hspace{0.17em}}T\right],$

where ${\stackrel{^}{\beta }}_{\text{BLU}}\left(t\right)={\left({\sum }_{i\in s}{X}_{i}{X}_{i}^{\prime }/{\sigma }_{i}^{2}\left(t\right)\right)}^{-1}{\sum }_{i\in s}\text{\hspace{0.17em}}{X}_{i}{Y}_{i}\left(t\right)/{\sigma }_{i}^{2}\left(t\right)$ is the best linear unbiased (BLU) estimator of $\beta$ that does not depend on domain $d.$ Estimator ${\stackrel{^}{\mu }}_{d}^{\text{blu}}\left(t\right)$ can be expressed as a weighted sum of ${Y}_{i}\left(t\right):$

${\stackrel{^}{\mu }}_{d}^{\text{blu}}\left(t\right)=\frac{1}{{N}_{d}}\sum _{i\in s}\text{\hspace{0.17em}}{w}_{id}^{\text{blu}}\left(t\right){Y}_{i}\left(t\right),\text{ }t\in \left[0,\text{\hspace{0.17em}}T\right],$

where the weight ${w}_{id}^{\text{blu}}\left(t\right)=\left({\sum }_{j\in {U}_{d}}\text{\hspace{0.17em}}{X}_{j}^{\prime }\right){\left({\sum }_{j\in s}\text{\hspace{0.17em}}{X}_{j}{X}_{j}^{\prime }/{\sigma }_{j}^{2}\left(t\right)\right)}^{-1}{X}_{i}/{\sigma }_{i}^{2}\left(t\right)$ now dependant on time $t.$ If ${n}_{d}/{N}_{d}$ is not insignificant, then the mean ${\mu }_{d}$ is estimated using (4.2) by:

${\stackrel{^}{\mu }}_{d}^{\text{blu}}\left(t\right)=\frac{1}{{N}_{d}}\sum _{i\in {s}_{d}}\left({Y}_{i}\left(t\right)-{X}_{i}^{T}{\stackrel{^}{\beta }}_{\text{BLU}}\left(t\right)\right)+{\overline{X}}_{d}^{\prime }{\stackrel{^}{\beta }}_{\text{BLU}}\left(t\right),\text{ }t\in \left[0,\text{\hspace{0.17em}}T\right].$

This estimator can again be expressed as a weighted sum of ${Y}_{i}\left(t\right)$ with weights that will still depend on time $t.$ The variance function (based on the model) of ${\stackrel{^}{\mu }}_{d}^{\text{blu}}\left(t\right)$ can be derived using Rao and Molina (2015), Chapter 7. The variance function ${\sigma }_{i}^{2}\left(t\right)$ is unknown and can be estimated by following Rao and Molina (2015). By replacing ${\sigma }_{i}^{2}\left(t\right)$ with ${\stackrel{^}{\sigma }}_{i}^{2}\left(t\right),$ we will obtain the empirical best linear unbiased predictor (EBLUP) of ${\mu }_{d}$ and its variance can be obtained using the method set out by Rao and Molina (2015). This EBLUP estimator does not use the sample weight ${d}_{i}$ and therefore is not consistent in terms of the sampling plan (unless the sample weights are constant for units in the same domain $d\right).$ A modified estimator, also referred to as a pseudo-EBLUP, can be constructed using the new approach described by Rao and Molina (2015, Chapter 7), equal in this case to the estimator set out in (3.4).

If ${Y}_{i}\left(t\right)$ is unknown for the units in domain $d,$ the following indirect estimator can be used:

${\stackrel{^}{\mu }}_{d}^{\text{ind}}\left(t\right)={\overline{X}}_{d}\stackrel{^}{\beta }\left(t\right)=\frac{1}{{N}_{d}}\sum _{i\in s}\text{\hspace{0.17em}}{\stackrel{˜}{w}}_{id}^{\text{ind}}{Y}_{i}\left(t\right),\text{ }t\in \left[0,\text{\hspace{0.17em}}T\right],\text{ }\text{ }\text{ }\text{ }\left(4.4\right)$

with $\stackrel{^}{\beta }\left(t\right)$ given in (3.5) and the weights ${\stackrel{˜}{w}}_{id}^{\text{ind}}=\left({\sum }_{j\in {U}_{d}}\text{\hspace{0.17em}}{X}_{j}^{\prime }\right){\left({\sum }_{i\in s}\text{\hspace{0.17em}}{d}_{i}{X}_{i}{X}_{i}^{\prime }\right)}^{-1}{\sum }_{i\in s}\text{\hspace{0.17em}}{d}_{i}{X}_{i}$ are not dependant on time $t,$ unlike ${w}_{id}^{\text{blu}}\text{​}.$ Thus, the estimators proposed in this section have the benefit of being able to be used for unsampled domains.

## 4.2  Unit-level linear mixed models for functional data

The unit-level linear mixed models proposed by Battese, Harter and Fuller (1988) are very useful in estimating total actual variables for domains. As we will see later in more detail, they can translate both the effect of auxiliary information on the interest variable (by fixed effects), and the specifics of the domains (by random effects).

In this section, we thus attempt to adapt those models to the context of functional data. To that end, we will project curves in a space of defined dimensions and in that way, transform our functional problem into several problems in estimating total or mean real uncorrelated variables for small domains, which we will then resolve using the usual methods. The use of projection bases thus makes it possible to preserve the temporal correlation structure of our data while arriving at several unrelated subproblems in estimating real variables, which we treat independently using the usual methods.

### 4.2.1  Estimation of curves using unit-level linear mixed models applied to PCA scores

Like PCA in finite dimensions, functional PCA is a dimension-reduction method that makes it possible to summarize information contained in a data set. It was proposed by Deville (1974), its theoretical properties were studied in Dauxois, Pousse and Romain (1982) or Hall, Mülller and Wang (2006) and, finally, it was adapted to surveys by Cardot et al. (2010).

More formally, the curves ${Y}_{i}={\left({Y}_{i}\left(t\right)\right)}_{t\text{\hspace{0.17em}}\in \text{\hspace{0.17em}}\left[0,\text{\hspace{0.17em}}T\right]}$ are functions of time $t$ and we assume that they belong to ${L}^{2}\left[0,\text{\hspace{0.17em}}T\right],$ the space of square-integrable functions in the interval $\left[0,\text{\hspace{0.17em}}T\right].$ That space is equipped with the usual scalar product $={\int }_{0}^{T}f\left(t\right)g\left(t\right)dt$ and the standard $‖\text{\hspace{0.17em}}f\text{\hspace{0.17em}}‖={\left({\int }_{0}^{T}{f}^{2}\left(t\right)dt\right)}^{1/2}\text{​}.$ The variance covariance function $v\left(s,\text{\hspace{0.17em}}t\right)$ defined by:

$v\left(s,\text{\hspace{0.17em}}t\right)=\frac{1}{N}\sum _{i=1}^{N}\text{\hspace{0.17em}}\left({Y}_{i}\left(s\right)-\mu \left(s\right)\right)\left({Y}_{i}\left(t\right)-\mu \left(t\right)\right),\text{ }s,\text{\hspace{0.17em}}t\in \left[0,\text{\hspace{0.17em}}T\right],\text{ }\text{ }\text{ }\text{ }\left(4.5\right)$

with $\mu ={\sum }_{i=1}^{N}\text{\hspace{0.17em}}{Y}_{i}/N$ the mean curve of $Y$ on the population $U.$

Let ${\left({\lambda }_{k}\right)}_{k\text{\hspace{0.17em}}=\text{\hspace{0.17em}}1}^{N}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}$the eigen values of $v$ with ${\lambda }_{1}\ge {\lambda }_{2}\dots \ge {\lambda }_{N}\ge 0$ and ${\left({\xi }_{k}\right)}_{k\text{\hspace{0.17em}}=\text{\hspace{0.17em}}1}^{N}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}$the related orthonormal eigen vectors, $v\left(s,\text{\hspace{0.17em}}t\right)={\sum }_{k=1}^{N}\text{\hspace{0.17em}}{\lambda }_{k}{\xi }_{k}\left(s\right){\xi }_{k}\left(t\right).$

The best approximation of $Y$ in a dimensional space $K$ smaller than $N$ is given by the projection of $Y$ in the space created by the first eigen vectors ${\xi }_{k},\text{\hspace{0.17em}}k=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}q$ (Ramsay and Silverman, 2005):

${Y}_{i}\left(t\right)=\mu \left(t\right)+\sum _{k=1}^{K}\text{\hspace{0.17em}}{f}_{ik}{\xi }_{k}\left(t\right)+{R}_{i}\left(t\right),\text{ }i\in U,\text{ }t\in \left[0,\text{\hspace{0.17em}}T\right],\text{ }\text{ }\text{ }\text{ }\left(4.6\right)$

where ${f}_{ik}$ is the projection (or score) of ${Y}_{i}$ on the component ${\xi }_{k}$ and ${R}_{i}\left(t\right),$ the rest representing the difference between curve $i$ and its projection. The score ${f}_{ik}$ is independent of the domain and can be calculated as the scalar product between ${\xi }_{k}$ and ${Y}_{i}-\mu ,$ ${f}_{ik}=<{Y}_{i}-\mu ,\text{\hspace{0.17em}}{\xi }_{k}>={\int }_{0}^{T}\left({Y}_{i}-\mu \right)\left(t\right){\xi }_{k}\left(t\right)dt.$ The decomposition given in (4.6) is also known as Karhunen-Loève.

Using (4.6), the mean ${\mu }_{d}$ on the domain $d$ can be approximated by

${\mu }_{d}\left(t\right)\simeq \mu \left(t\right)+\sum _{k=1}^{K}\left(\frac{1}{{N}_{d}}\sum _{i\in {U}_{d}}\text{\hspace{0.17em}}{f}_{ik}\right){\xi }_{k}\left(t\right),\text{ }d=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}D,\text{ }t\in \left[0,\text{\hspace{0.17em}}T\right].\text{ }\text{ }\text{ }\text{ }\left(4.7\right)$

The unknown mean $\mu$ is estimated using the Horvitz-Thompson estimator

$\stackrel{^}{\mu }\left(t\right)=\frac{1}{N}\sum _{i\in s}\text{\hspace{0.17em}}{d}_{i}{Y}_{i}\left(t\right),\text{ }t\in \left[0,\text{\hspace{0.17em}}T\right]\text{ }\text{ }\text{ }\text{ }\text{ }\left(4.8\right)$

and the ${\xi }_{k},\text{\hspace{0.17em}}k=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}K$ are estimated by ${\stackrel{^}{\xi }}_{k},$ the eigen vectors in the estimated variance-covariance function $\stackrel{^}{v}\left(s,\text{\hspace{0.17em}}t\right)={\sum }_{i\in s}\text{\hspace{0.17em}}{d}_{i}\left({Y}_{i}\left(s\right)-\stackrel{^}{\mu }\left(s\right)\right)\left({Y}_{i}\left(t\right)-\stackrel{^}{\mu }\left(t\right)\right)/N$ (Cardot et al., 2010).

Thus, to estimate ${\mu }_{d},$ we must estimate the mean scores on the principal components for the domain $d,$ i.e., ${\overline{f}}_{dk}={\sum }_{i\in {U}_{d}}\text{\hspace{0.17em}}{f}_{ik}/{N}_{d}\text{​}.$ To that end, for each component ${f}_{ik}\text{​},$ $k=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}K,$ we consider a unit-level linear mixed model, also known as a nested error regression model (Battese et al., 1988):

${f}_{ik}={\beta }_{k}^{\prime }{X}_{i}+{\nu }_{dk}+{ϵ}_{ik},\text{ }i\in {U}_{d},\text{ }k=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}K,\text{ }\text{ }\text{ }\text{ }\text{ }\left(4.9\right)$

with ${\beta }_{k}^{\prime }{X}_{i}$ the fixed effect of auxiliary information, ${\nu }_{dk}$ the random effect of the domain $d$ and ${ϵ}_{ik}$ the residual of unit $i.$ We assume that the random effects of the domains are independent and follow a common law of means of 0 and of variance of ${\sigma }_{\nu k}^{2}.$ The residuals are also independent, distributed based on a law of means of 0 and of variance of ${\sigma }_{ϵk}^{2}.$ In addition, the random effects and residuals are also assumed to be independent. The parameter $\beta$ in the model can be estimated by ${\stackrel{˜}{\beta }}_{k}\text{​},$ the best linear unbiased estimator (BLUP) (Rao and Molina, 2015, Chapter 4.7) and the BLUP estimator ${\overline{f}}_{dk}$ is thus expressed as a composite estimator (see Rao and Molina, 2015):

${\stackrel{˜}{\overline{f}}}_{dk}={\gamma }_{k}\left({\overline{f}}_{dk\text{​},\text{\hspace{0.17em}}s}-{\left({\overline{X}}_{d\text{​},\text{\hspace{0.17em}}s}-{\overline{X}}_{d}\right)}^{\prime }{\stackrel{˜}{\beta }}_{k}\right)+\left(1-{\gamma }_{k}\right){\overline{X}}_{d}^{\prime }{\stackrel{˜}{\beta }}_{k},\text{ }k=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}K\text{ }\text{ }\text{ }\text{ }\left(4.10\right)$

with ${\gamma }_{k}={\sigma }_{\nu k}^{2}/\left({\sigma }_{\nu k}^{2}+{\sigma }_{ϵk}^{2}/{n}_{d}\right)$ and ${\overline{X}}_{d\text{​},\text{\hspace{0.17em}}s}={\sum }_{i\in {s}_{d}}\text{\hspace{0.17em}}{X}_{i}/{n}_{d}\text{​},$ ${\overline{f}}_{dk\text{​},\text{\hspace{0.17em}}s}={\sum }_{i\in {s}_{d}}\text{\hspace{0.17em}}{f}_{ik}/{n}_{d}$ the respective means of the vectors ${X}_{i}$ and the scores ${\stackrel{^}{f}}_{ik}$ on ${s}_{d}.$ Finally, the mean ${\mu }_{d}$ is estimated by

${\stackrel{^}{\mu }}_{d}^{\text{BHF}}\left(t\right)=\stackrel{^}{\mu }\left(t\right)+\sum _{k=1}^{K}\text{\hspace{0.17em}}{\stackrel{^}{\overline{f}}}_{dk}{\stackrel{^}{\xi }}_{k}\left(t\right),\text{ }t\in \left[0,\text{\hspace{0.17em}}T\right],\text{ }\text{ }\text{ }\text{ }\text{ }\left(4.11\right)$

with $\stackrel{^}{\mu }$ and ${\stackrel{^}{\xi }}_{k}$ the estimates of $\mu$ and ${k}^{\text{th}}$ principal component ${\xi }_{k}$ given previously.

The variances ${\sigma }_{\nu k}^{2}$ and ${\sigma }_{ϵk}^{2}$ for $k=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}K$ are unknown and are estimated by ${\stackrel{^}{\sigma }}_{\nu k}^{2}$ and ${\stackrel{^}{\sigma }}_{ϵk}^{2}$ obtained, for example, by restricted maximum likelihood (Rao and Molina, 2015). The estimator for ${\overline{f}}_{dk}$ is obtained by replacing ${\gamma }_{k}$ in (4.10) with ${\stackrel{^}{\gamma }}_{k}={\stackrel{^}{\sigma }}_{\nu k}^{2}/\left({\stackrel{^}{\sigma }}_{\nu k}^{2}+{\stackrel{^}{\sigma }}_{ϵk}^{2}/{n}_{d}\right)$ and is known as empirical best linear unbiased prediction (EBLUP). Nonetheless, the calculation of the variance function (based on the model) of ${\stackrel{^}{\mu }}_{d}^{\text{BHF}}\left(t\right)$ is more complicated in this case because of the estimators ${\stackrel{^}{\xi }}_{k}$ of the principal components ${\xi }_{k}$ and will be examined elsewhere.

We note that a simpler model, without the random effects, could have been considered for the scores  ${f}_{ik}\text{\hspace{0.17em}}:$

${f}_{ik}={\beta }_{k}^{\prime }{X}_{i}+{ϵ}_{ik},\text{ }i\in U,\text{ }k=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}K,\text{ }\text{ }\text{ }\text{ }\text{ }\left(4.12\right)$

with ${ϵ}_{ik}$ a null mean residual ${\sigma }_{k}^{2}.$ In this case, the parameter ${\beta }_{k}$ is estimated by ${\stackrel{^}{\beta }}_{k},$ the BLUP estimator and the mean score on the domain $d$ by ${\stackrel{^}{\overline{f}}}_{dk}={\stackrel{^}{\beta }}_{k}^{\prime }{\overline{X}}_{d},\text{\hspace{0.17em}}k=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}K.$

If the rate of ${n}_{d}/{N}_{d}$ is not insignificant, then ${\stackrel{^}{\mu }}_{d}$ is obtained using the procedure described in Section 4.1.

Note 1: Here, the PCA is not used as a dimension-reduction method, but to decompose our problem into several unrelated subproblems in the estimation of total real variables, which we know how to resolve. We thus keep a number $K$ of principle components as high as possible, i.e., equal to the minimum number of instants of discretization and the number of individuals in the sample.

Note 2: When certain explanatory variables in vector ${X}_{i}$ are categorical, our method, defined in the case of real variables, must be adapted: to that end, we propose transforming each categorical variable into a series of one hot encoding indicators $0-1.$ As well, when the number of explanatory variables $p$ is large, it may also be relevant to introduce penalties, RIDGE-style for example, in the regression problem.

Note 3: Other projection bases can be considered, such as wavelets (see Mallat, 1999), as they are particularly suited to irregular curves. Finally, another solution would be to apply the functional linear mixed models for curve values to the instants of discretization; however, that method would not allow for consideration of temporal correlations in the problem, unlike the previous ones.

### 4.2.2  Estimating variance by parametric bootstrap

To estimate the accuracy (variance based on the model) of mean curve estimators, we propose declining the parametric bootstrap method proposed by González-Manteiga, Lombarda, Molina, Morales and Santamara (2008) and then reiterated by Molina and Rao (2010). This is a replicate method that consists of generating a large number $B$ of replicates ${s}^{*\left(b\right)}\text{​},\text{\hspace{0.17em}}b=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}B$ of size $n$ by simple random sampling with replacement in $s$ and randomly generating the random and fixed effects in the estimated superpopulation model. Note ${\stackrel{^}{R}}_{i}\left(t\right)={Y}_{i}\left(t\right)-\stackrel{^}{\mu }\left(t\right)+{\sum }_{k\text{\hspace{0.17em}}=\text{\hspace{0.17em}}1}^{K}\text{\hspace{0.17em}}{f}_{k\text{​},\text{\hspace{0.17em}}i}{\stackrel{^}{\xi }}_{k}\left(t\right)$ the estimated projection residual for the unit $i\in s$ (see also the formula in (4.6)). For $b=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}B,$ we proceed as follows for each $t\in \left[0,\text{\hspace{0.17em}}T\right]:$

1. Generate the random bootstrap effects of each domain, for each principal component:

${\nu }_{k\text{​},\text{\hspace{0.17em}}d}^{*\left(b\right)}\sim \mathcal{N}\left(0,\text{\hspace{0.17em}}{\stackrel{^}{\sigma }}_{k\text{​},\text{\hspace{0.17em}}\nu }^{2}\right),\text{ }d=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}D,\text{ }k=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}K$

1. and, independent of those random effects, generate the individual bootstrap errors for each unit $i=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}n$ and for each principal component:

${ϵ}_{k\text{​},\text{\hspace{0.17em}}i}^{*\left(b\right)}\sim \mathcal{N}\left(0,\text{\hspace{0.17em}}{\stackrel{^}{\sigma }}_{k\text{​},\text{\hspace{0.17em}}ϵ}^{2}\right),\text{ }i=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}n,\text{ }k=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}K.$

1. Calculate the $n$ bootstrap replicates of the projection residuals ${\stackrel{^}{R}}_{i}^{*\left(b\right)}\left(t\right),$ for $i\in {s}^{*\left(b\right)}$ (this means selection with replacement of $n$ projection residuals among the $n$ residuals ${\stackrel{^}{R}}_{i}\left(t\right)\right).$
2. Calculate the bootstrap replicates ${Y}_{i}^{*\left(b\right)}\left(t\right)$ conditional to the explanatory variables ${X}_{i}$ using the estimated model:

${Y}_{i}^{*\left(b\right)}\left(t\right)=\stackrel{^}{\mu }\left(t\right)+\sum _{k=1}^{K}\underset{{f}_{k\text{​},\text{\hspace{0.17em}}i}^{*\left(b\right)}}{\underbrace{\left({X}_{i}^{\prime }{\stackrel{^}{\beta }}_{k}+{\nu }_{k\text{​},\text{\hspace{0.17em}}d}^{*\left(b\right)}+{ϵ}_{k\text{​},\text{\hspace{0.17em}}i}^{*\left(b\right)}\right)}}{\stackrel{^}{\xi }}_{k}\left(t\right)+{\stackrel{^}{R}}_{i}^{*\left(b\right)}\left(t\right),\text{ }\forall i\in {s}_{d}^{*\left(b\right)}={s}^{*\left(b\right)}\cap {s}_{d}.$

1. We see that ${f}_{k\text{​},\text{\hspace{0.17em}}i}^{*\left(b\right)}\text{​},$ the simulated score for the unit $i,$ is obtained using the same
1. For each domain $d,$ calculate the bootstrap replicate ${\stackrel{^}{\mu }}_{d}^{*\left(b\right)}$ on the replicate ${s}^{*\left(b\right)}$ by declining the entire process: PCA and estimation of linear mixed models on principal components by means of EBLUP.
2. Estimate the estimator’s variance ${\stackrel{^}{\mu }}_{d}\left(t\right)$ by the empirical variance of the $B$ replicates ${\stackrel{^}{\mu }}_{d}^{*\left(b\right)}\text{ }:$

$\stackrel{^}{V}\left({\stackrel{^}{\mu }}_{d}\left(t\right)\right)=\frac{1}{B-1}\sum _{b=1}^{B}\text{\hspace{0.17em}}{\left({\stackrel{^}{\mu }}_{d}^{*\left(b\right)}\left(t\right)-\frac{1}{B}\sum _{b=1}^{B}\text{\hspace{0.17em}}{\stackrel{^}{\mu }}_{d}^{*\left(b\right)}\left(t\right)\right)}^{2}\text{​}.$

This approach will also be the one used to approximate the variance of the functional linear regression (omitting step 1 of generating random effects ${\nu }_{k\text{​},\text{\hspace{0.17em}}d}^{*}\text{​}\right).$

## 4.3  Non-parametric estimation using regression trees and random forests for small curve domains

In this section, to obtain individual predictions ${\stackrel{^}{Y}}_{i}\left(t\right),$ we use non-parametric models, regression trees adapted to functional data, and random forests, which no longer require a linear form in the relation between auxiliary information and interest variable and allow more flexibility in modelling. In fact, regression trees for functional data are frequently used at EDF and are known to give satisfactory results on electricity consumption curves. As well, in literature, regression trees have been adapted to surveys by Toth and Eltinge (2011), but not for estimating totals in small domains.

In this section and the next section, we are therefore seeking to estimate a specific case of the general model (4.1) in which the function $f$ does not depend on the domain of the unit $i,$

${Y}_{i}\left(t\right)=f\left({X}_{i},\text{\hspace{0.17em}}t\right)+{ϵ}_{i}\left(t\right)\text{ }\forall i\in U,\text{ }t\in \left[0,\text{\hspace{0.17em}}T\right].\text{ }\text{ }\text{ }\text{ }\text{ }\left(4.13\right)$

### 4.3.1  Regression trees for functional data

The classification and regression tree (CART) proposed by Breiman, Friedman, Stone and Olshen (1984) is a very popular non-parametric statistical technique. Its goal is to predict the value of a real or categorical target variable $Y$ based on a vector of real or categorical explanatory variables $X=\left({X}_{1},\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}{X}_{j},\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}{X}_{p}\right).$ To that end, we determine a partitioning of the space of $X$ by repeatedly splitting the data set in two, using the decision rule (split criteria) involving a single explanatory variable. Thus, our sample $s$ is the first node $\lambda$ in a tree (its “root”) that we seek to subdivide into two separate nodes ${\lambda }_{l}$ and ${\lambda }_{r}$ such that the values of the real target variable ${Y}_{i}$ are as homogenous as possible in each node. The inertia criterion $\kappa \left(\lambda \right)$ used to quantify the homogeneity of a node is frequently the sum of the squares of residuals between the values of ${Y}_{i}$ for units $i$ in node $\lambda$ and the mean of those values in the node: $\kappa \left(\lambda \right)={\sum }_{i\in \lambda }{\left({Y}_{i}-{\overline{Y}}_{\lambda }\right)}^{2}$ where ${\overline{Y}}_{\lambda }$ is the mean of ${Y}_{i}$ in node $\lambda .$

For the variables ${X}_{j}$ which are quantitative, the decision rules are expressed as

$\left\{\begin{array}{lll}i\in {\lambda }_{l}\hfill & \text{if}\hfill & {X}_{ji}

with $c$ a cut-off point to be optimized among all possible values of ${X}_{j}.$ For qualitative variables, they consist of dividing into two separate subsets of modalities. The search for the optimal split criterion is a matter of resolving the optimization problem

$\underset{{\lambda }_{l}\text{​},\text{\hspace{0.17em}}{\lambda }_{r}}{\text{arg}\text{\hspace{0.17em}}\text{max}}\text{\hspace{0.17em}}\left(\kappa \left(\lambda \right)-\kappa \left({\lambda }_{l}\right)-\kappa \left({\lambda }_{r}\right)\right).\text{ }\text{ }\text{ }\text{ }\text{ }\left(4.15\right)$

Each of these nodes will then be subdivided in turn into two child nodes and the partitioning process continues until a minimal node size is obtained, until the value of the target variable is the same for all units of the node, or until a given maximum depth is attained. The final partition of the space is then made up of the final nodes in the tree, also called leaves. A summary of each of those leaves (often the mean for a quantitative target variable) then becomes the predicted variable for all units assigned to the leaf. The various parameters (minimum node size and depth) can be selected by cross-validation.

When the variable $Y$ to be predicted is not a real variable, but a dimension vector $m>1,$ the regression tree principle extends very naturally: the tree construction algorithm and the choice of cross-validation parameters remain unchanged, but the inertia criterion is modified. Thus, the minimization problem is still in the form of (4.15), but this time the criterion is in the form of $\kappa \left(\lambda \right)={\sum }_{i\in \lambda }{‖\text{\hspace{0.17em}}{Y}_{i}-{\overline{Y}}_{\lambda }\text{\hspace{0.17em}}‖}^{2}$ where $‖\text{\hspace{0.17em}}\cdot \text{\hspace{0.17em}}‖$ is, for example, the Euclidean norm or the Mahalanobis distance norm. Multivariate regression trees were used, for example, by De’Ath (2002) in an ecological application.

Finally, when the variable to be predicted $Y$ is a curve, the algorithm for construction of the tree and for choosing the parameters is the same, but this time a functional inertia criterion $\kappa$ must be used. There are many possible choices. We chose to follow the “Courbotree” approach described in Stéphan and Cogordan (2009) and frequently used at EDF for building segmentations of data sets of electricity consumption curves based on explanatory variables. In that approach, we apply the method presented in the previous paragraph for multivariate $Y$ on vectors ${Y}_{i}=\left({Y}_{i}\left({t}_{1}\right),\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}{Y}_{i}\left({t}_{L}\right)\right)$ of curve values at the instants of discretization, with the Euclidian distance. The Euclidian distance on instants of discretization can thus be seen as an approximation of the norm ${L}^{2}\left[0,\text{\hspace{0.17em}}T\right].$ More formally, the functional criterion is thus expressed as

$\kappa \left(\lambda \right)=\sum _{i\in \lambda }\sum _{l=1}^{L}{\left({Y}_{i}\left({t}_{l}\right)-{\overline{Y}}_{\lambda }\left({t}_{l}\right)\right)}^{2},\text{ }\text{ }\text{ }\text{ }\text{ }\left(4.16\right)$

with ${\overline{Y}}_{\lambda }\left({t}_{l}\right)={\sum }_{i\in \lambda }\text{\hspace{0.17em}}{Y}_{i}\left({t}_{l}\right)/{n}_{\lambda }$ where ${n}_{\lambda }$ is the number of units in the sample that belong to the node $\lambda .$

In practice, when working on electricity consumption data, the curves considered are at extremely similar levels, and the Courbotree algorithm based on the Euclidian distance may not work well when applied to raw data. Often, the Courbotree algorithm is therefore only used on the curve forms, i.e., the normalized curves ${\stackrel{˜}{Y}}_{i}\left(t\right)={Y}_{i}\left(t\right)/{\overline{Y}}_{i}$ where ${\overline{Y}}_{i}={\sum }_{\mathcal{l}=1}^{L}\text{\hspace{0.17em}}{Y}_{i}\left({t}_{\mathcal{l}}\right)/L$ is the mean of ${Y}_{i}\left(t\right)$ (or the level) on all measurement instants ${t}_{1},\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}{t}_{L}$ (method also known as normalized Courbotree). We then calculate the prediction ${\overline{Y}}_{i}$ using a linear regression and finally obtain the prediction of ${Y}_{i}\left(t\right)$ by obtaining the product between the prediction of ${\stackrel{˜}{Y}}_{i}\left(t\right)$ and that of ${\overline{Y}}_{i}.$

### 4.3.2  Variance estimation

To estimate the variance under the model of our estimators for mean curves by domain, we will follow a bootstrap approach very similar to the one proposed for parametric models. Here, our superpopulation model is expressed as

${Y}_{i}\left(t\right)=f\left({X}_{i},\text{\hspace{0.17em}}t\right)+{ϵ}_{i}\left(t\right),\text{ }\forall i\in U.\text{ }\text{ }\text{ }\text{ }\text{ }\left(4.17\right)$

Let $\stackrel{^}{f}\left({X}_{i},\text{\hspace{0.17em}}t\right),$ for all $i\in s$ the predicted value for the unit $i$ by regression tree, and ${\stackrel{^}{ϵ}}_{i}\left(t\right)={Y}_{i}\left(t\right)-\stackrel{^}{f}\left({X}_{i},\text{\hspace{0.17em}}t\right),$ for all $i\in s$ the estimated residual for that unit. The idea of our accuracy approximation method is, as with linear mixed models, to generate a large number $B$ of replicates ${s}^{*\left(b\right)}\text{​},\text{\hspace{0.17em}}b=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}B$ of size $n$ by simple random sampling with replacement in $s,$ and calculate the estimator of the mean curve by domain on each replicate and, finally, deduct the variance from the estimator by the variability between replicates. The bootstrap method used here is also known as residual bootstrap in linear model cases.

More specifically, for $b=1,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}B,$ we proceed as follows for each $t\in \left[0,\text{\hspace{0.17em}}T\right]:$

1. Calculate the bootstrap replications of the estimated residuals ${\stackrel{^}{ϵ}}_{i}^{*\left(b\right)}\left(t\right)$ for $i\in {s}^{*\left(b\right)}\text{​}.$
2. Calculate the bootstrap replications for ${Y}_{i}\left(t\right):$

${Y}_{i}^{*\left(b\right)}\left(t\right)=\stackrel{^}{f}\left({X}_{i},\text{\hspace{0.17em}}t\right)+{\stackrel{^}{ϵ}}_{i}^{*\left(b\right)}\left(t\right),\text{ }\forall i\in {s}^{*\left(b\right)}$

1. and recalculate, for each domain $d,$ the mean estimator ${\stackrel{^}{\mu }}_{d}^{*\left(b\right)}\left(t\right)$ on this replicate.
1. Estimate the variance by the empirical variance of the $B$ bootstrap replicates ${\stackrel{^}{\mu }}_{d}^{*\left(b\right)}\left(t\right),$ $\stackrel{^}{V}\left({\stackrel{^}{\mu }}_{d}\left(t\right)\right)=\frac{1}{B-1}{\sum }_{b=1}^{B}{\left({\stackrel{^}{\mu }}_{d}^{*\left(b\right)}\left(t\right)-\frac{1}{B}{\sum }_{b=1}^{B}\text{\hspace{0.17em}}{\stackrel{^}{\mu }}_{d}^{*\left(b\right)}\left(t\right)\right)}^{2}\text{​}.$

The process is identical if we estimate the function $f$ by random forests rather than regression trees.

## 4.4  Aggregation of predictions by random forests for curves

The literature often highlights the mediocre predictive performances of regression trees compared to other techniques such as SVMs (see, for example, Cristianini and Shawe-Taylor, 2000). Regression trees can be unstable and highly dependant on the sample on which they were built. To resolve that default, Breiman (2001) proposed the random forest algorithm. This is a set technique that, as its name suggests, consists of aggregating predictions resulting from different regression trees. The fact that the aggregation of unstable predictors leads to a reduction in variance was particularly shown by Breiman (1998). For a quantitative target variable, the aggregation of predictions is performed by taking the mean predictions for each of the trees.

To reduce the variance of the aggregate prediction, the objective is to build trees that are very different from each other. The Breiman algorithm introduces variability in the construction of the trees on the one hand by means of replication (simple random sampling with replacement) of units and, on the other hand, by means of random selection, for each “split” in the tree, of a subset of candidate explanatory variables. For a regression tree, there are therefore two additional parameters to be adjusted for a random forest: the number of trees and the number of candidate explanatory variables in each split.

When the interest variable is multivariate (or functional), the algorithm proposed by Breiman adapts easily, by aggregating the multivariate (or functional) regression trees presented in the previous paragraph. Multivariate random forests have, for example, been studied by Segal and Xiao (2011).

The algorithm that we are proposing here, called “Courboforest,” simply consists of aggregating the functional regression trees constructed using the “Courbotree” approach, i.e., the multivariate regression trees applied to the vectors $\left({Y}_{i}={Y}_{i}\left({t}_{1}\right),\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}{Y}_{i}\left({t}_{L}\right)\right)$ of the values of the curves at the instants of discretization, with the split criterion being the inertia based on the Euclidean distance defined by equation (4.16).

﻿

Is something not working? Is there information outdated? Can't find what you're looking for?