Sample-based estimation of mean electricity consumption curves for small domains
Section 4. Model-based estimation methods
In this section, we use
the model from Valliant, Dorfman and Royall (2000), in which curves
are considered to be random and we propose four innovative
approaches for responding to our problem estimating curves with a mean demand
for small domains. Assuming that
and the auxiliary information vector
are available for each individual
in the domain
and that the mean
is also known.
We assume that the
auxiliary variables are related to the demand curves according to a functional
model of superpopulation on all of the population that is generally expressed
as:
with
an unknown regression
function to be estimated, which can vary from one domain to another, and
a process of zero expectation noise, zero
covariance for different individuals and non-null regarding time.
If the size of domain
is large, then the mean
will be estimated by
where
is the prediction of
Otherwise, the mean
is estimated by (see Valliant et al., 2000):
The
quality of our estimates thus depends on the quality of our model: if the model
is false, that may lead to biases in the estimates.
4.1 Functional linear model
The
simplest model of form (4.1) is the functional linear regression model from
Faraway (1997):
where the residuals
are independent for
distributed based on a law of means of 0 and of variance of
If the size of domain
is large, then the mean of
in domain
is estimated by
where
is the best linear unbiased (BLU) estimator of
that does not depend on domain
Estimator
can be expressed as a weighted sum of
where the weight
now dependant on time
If
is not insignificant, then the mean
is estimated using (4.2) by:
This
estimator can again be expressed as a weighted sum of
with weights that will still depend on time
The variance function (based on the model) of
can be derived using Rao and Molina (2015), Chapter 7.
The variance function
is unknown and can be estimated by following Rao and Molina
(2015). By replacing
with
we will obtain the empirical best linear unbiased predictor
(EBLUP) of
and its variance can be obtained using the method set out by
Rao and Molina (2015). This EBLUP estimator does not
use the sample weight
and therefore is not consistent in terms of the sampling plan
(unless the sample weights are constant for units in the same domain
A modified estimator, also referred to as a pseudo-EBLUP, can
be constructed using the new approach described by Rao and Molina (2015, Chapter 7),
equal in this case to the estimator set out in (3.4).
If
is unknown for the units in domain
the following indirect estimator can be used:
with
given in (3.5) and the weights
are not dependant on
time
unlike
Thus, the estimators proposed in this section have the benefit
of being able to be used for unsampled domains.
4.2 Unit-level linear mixed models for functional data
The unit-level linear
mixed models proposed by Battese, Harter and Fuller (1988) are very useful in
estimating total actual variables for domains. As we
will see later in more detail, they can translate both the effect of auxiliary
information on the interest variable (by fixed effects), and the specifics of
the domains (by random effects).
In this section, we thus
attempt to adapt those models to the context of functional data. To that end, we will
project curves in a space of defined dimensions and in that way, transform our
functional problem into several problems in estimating total or mean real
uncorrelated variables for small domains, which we will then resolve using the
usual methods. The use of projection bases thus makes
it possible to preserve the temporal correlation structure of our data while
arriving at several unrelated subproblems in estimating real variables, which
we treat independently using the usual methods.
4.2.1 Estimation
of curves using unit-level linear mixed models applied to PCA scores
Like PCA in finite dimensions,
functional PCA is a dimension-reduction method that makes it possible to
summarize information contained in a data set. It was proposed by Deville
(1974), its theoretical properties were studied in Dauxois, Pousse and Romain
(1982) or Hall, Mülller and Wang (2006) and, finally, it was adapted to surveys
by Cardot et al. (2010).
More
formally, the curves
are functions of time
and we assume that they belong to
the space of
square-integrable functions in the interval
That space is equipped
with the usual scalar product
and the standard
The variance covariance
function
defined by:
with
the mean curve of
on the population
Let
the eigen values of
with
and
the related orthonormal eigen vectors,
The best approximation
of
in a dimensional space
smaller than
is given by the projection of
in the space created by the first eigen vectors
(Ramsay and Silverman, 2005):
where
is the projection (or score) of
on the component
and
the rest representing the difference between curve
and its projection. The score
is independent of the domain and can be calculated as the
scalar product between
and
The decomposition given in (4.6) is also known as
Karhunen-Loève.
Using (4.6), the mean
on the domain
can be approximated by
The unknown mean
is estimated using the Horvitz-Thompson estimator
and the
are estimated by
the eigen vectors in the estimated variance-covariance
function
(Cardot et al., 2010).
Thus,
to estimate
we must estimate the mean scores on the principal components
for the domain
i.e.,
To that end, for each component
we consider a unit-level linear mixed model, also known as a nested
error regression model (Battese et al., 1988):
with
the fixed effect of
auxiliary information,
the random effect of the
domain
and
the residual of unit
We assume that the random
effects of the domains are independent and follow a common law of means of 0
and of variance of
The residuals are also
independent, distributed based on a law of means of 0 and of variance of
In addition, the random
effects and residuals are also assumed to be independent. The parameter
in the model can be
estimated by
the best linear unbiased
estimator (BLUP) (Rao and Molina, 2015, Chapter 4.7) and the BLUP
estimator
is thus expressed as a
composite estimator (see Rao and Molina, 2015):
with
and
the respective means of the vectors
and the scores
on
Finally, the mean
is estimated by
with
and
the estimates of
and
principal component
given previously.
The variances
and
for
are unknown and are estimated by
and
obtained, for example, by restricted maximum likelihood (Rao and
Molina, 2015). The estimator for
is obtained by replacing
in (4.10) with
and is known as empirical best linear unbiased
prediction (EBLUP). Nonetheless, the calculation of the variance function (based
on the model) of
is more complicated in this case because of the estimators
of the principal components
and will be examined elsewhere.
We
note that a simpler model, without the random effects, could have been
considered for the scores
with
a null mean residual
In this case, the parameter
is estimated by
the BLUP estimator and the mean score on the domain
by
If the rate of
is not insignificant, then
is obtained using the procedure described in Section 4.1.
Note 1: Here, the PCA is not used as a
dimension-reduction method, but to decompose our problem into several unrelated
subproblems in the estimation of total real variables, which we know how to
resolve. We
thus keep a number
of principle components as high as possible, i.e.,
equal to the minimum number of instants of discretization and the number of
individuals in the sample.
Note 2: When certain explanatory variables in vector
are categorical, our method, defined in the case of real
variables, must be adapted: to that end, we propose transforming each
categorical variable into a series of one hot encoding indicators
As well, when the number
of explanatory variables
is large, it may also be relevant to introduce penalties,
RIDGE-style for example, in the regression problem.
Note 3: Other projection bases can be considered, such
as wavelets (see Mallat, 1999), as they are particularly suited to irregular
curves. Finally,
another solution would be to apply the functional linear mixed models for curve
values to the instants of discretization; however, that method would not allow
for consideration of temporal correlations in the problem, unlike the previous
ones.
4.2.2 Estimating
variance by parametric bootstrap
To
estimate the accuracy (variance based on the model) of mean curve estimators,
we propose declining the parametric bootstrap method proposed by
González-Manteiga, Lombarda, Molina, Morales and Santamara (2008) and then
reiterated by Molina and Rao (2010). This is a replicate method that consists of
generating a large number
of replicates
of size
by simple random sampling with replacement in
and randomly generating the random and fixed effects in the
estimated superpopulation model. Note
the estimated projection residual for the unit
(see also the formula in (4.6)). For
we proceed as follows for each
- Generate the random bootstrap effects of each
domain, for each principal component:
- and,
independent of those random effects, generate the individual bootstrap errors
for each unit
and for each principal
component:
- Calculate the
bootstrap replicates of
the projection residuals
for
(this means selection with
replacement of
projection residuals among
the
residuals
- Calculate the bootstrap replicates
conditional to the
explanatory variables
using the estimated model:
- We see
that
the simulated score for
the unit
is obtained using the same
- For each domain
calculate the bootstrap replicate
on the replicate
by declining the entire
process: PCA and estimation of linear mixed models on principal components by
means of EBLUP.
- Estimate the estimator’s variance
by the empirical variance
of the
replicates
This
approach will also be the one used to approximate the variance of the
functional linear regression (omitting step 1 of generating random effects
4.3 Non-parametric
estimation using regression trees and random forests for small curve domains
In
this section, to obtain individual predictions
we use non-parametric models, regression trees adapted to
functional data, and random forests, which no longer require a linear form in
the relation between auxiliary information and interest variable and allow more
flexibility in modelling. In fact, regression trees for
functional data are frequently used at EDF and are known to give satisfactory
results on electricity consumption curves. As well, in
literature, regression trees have been adapted to surveys by Toth and Eltinge
(2011), but not for estimating totals in small domains.
In
this section and the next section, we are therefore seeking to estimate a
specific case of the general model (4.1) in which the function
does not depend on the domain of the unit
4.3.1 Regression trees for functional data
The
classification and regression tree (CART) proposed by Breiman, Friedman, Stone
and Olshen (1984) is a very popular non-parametric statistical technique. Its goal is to predict the
value of a real or categorical target variable
based on a vector of real or categorical explanatory variables
To that end, we determine a partitioning of the space of
by repeatedly splitting the data set in two, using the
decision rule (split criteria) involving a single explanatory variable. Thus, our sample
is the first node
in a tree (its “root”) that we seek to subdivide into two
separate nodes
and
such that the values of the real target variable
are as homogenous as possible in each node. The inertia criterion
used to quantify the homogeneity of a node is frequently the
sum of the squares of residuals between the values of
for units
in node
and the mean of those values in the node:
where
is the mean of
in node
For
the variables
which are quantitative, the decision rules are expressed as
with
a cut-off point to be optimized among all possible values of
For qualitative variables, they consist of dividing into two
separate subsets of modalities. The search for the
optimal split criterion is a matter of resolving the optimization problem
Each of these nodes will then be subdivided in turn into two child nodes
and the partitioning process continues until a minimal node size is obtained,
until the value of the target variable is the same for all units of the node,
or until a given maximum depth is attained. The final partition of the space is then made up of
the final nodes in the tree, also called leaves. A
summary of each of those leaves (often the mean for a quantitative target
variable) then becomes the predicted variable for all units assigned to the
leaf. The various parameters (minimum node size and
depth) can be selected by cross-validation.
When
the variable
to be predicted is not a real variable, but a
dimension vector
the regression tree principle extends very naturally: the tree
construction algorithm and the choice of cross-validation parameters remain
unchanged, but the inertia criterion is modified. Thus,
the minimization problem is still in the form of (4.15), but this time the
criterion is in the form of
where
is, for example, the Euclidean norm or the Mahalanobis distance
norm. Multivariate regression trees were used, for
example, by De’Ath (2002) in an ecological application.
Finally,
when the variable to be predicted
is a curve, the algorithm for construction of
the tree and for choosing the parameters is the same, but this time a
functional inertia criterion
must be used. There are many possible
choices. We chose to follow the “Courbotree” approach
described in Stéphan and Cogordan (2009) and frequently used at EDF for
building segmentations of data sets of electricity consumption curves based on
explanatory variables. In that approach, we apply the
method presented in the previous paragraph for multivariate
on vectors
of curve values at the instants of discretization, with the
Euclidian distance. The Euclidian distance on instants
of discretization can thus be seen as an approximation of the norm
More formally, the functional criterion is thus expressed as
with
where
is the number of units in the sample that belong to the node
In
practice, when working on electricity consumption data, the curves considered
are at extremely similar levels, and the Courbotree algorithm based on the
Euclidian distance may not work well when applied to raw data. Often, the Courbotree
algorithm is therefore only used on the curve forms, i.e., the normalized
curves
where
is the mean of
(or the level) on all measurement instants
(method also known as normalized
Courbotree). We then calculate the prediction
using a linear regression and finally obtain the prediction of
by obtaining the product between the prediction of
and that of
4.3.2 Variance estimation
To
estimate the variance under the model of our estimators for mean curves by
domain, we will follow a bootstrap approach very similar to the one proposed
for parametric models. Here, our superpopulation model is expressed
as
Let
for all
the predicted value for the unit
by regression tree, and
for all
the estimated residual for that unit. The
idea of our accuracy approximation method is, as with linear mixed models, to
generate a large number
of replicates
of size
by simple random sampling with replacement in
and calculate the estimator of the mean curve by domain on
each replicate and, finally, deduct the variance from the estimator by the
variability between replicates. The bootstrap method
used here is also known as residual bootstrap in linear model cases.
More
specifically, for
we proceed as follows for each
- Calculate the bootstrap replications of the
estimated residuals
for
- Calculate the bootstrap replications for
- and
recalculate, for each domain
the mean estimator
on this replicate.
- Estimate the variance by the empirical
variance of the
bootstrap replicates
The
process is identical if we estimate the function
by random forests rather than regression trees.
4.4 Aggregation
of predictions by random forests for curves
The
literature often highlights the mediocre predictive performances of regression
trees compared to other techniques such as SVMs (see, for example, Cristianini and
Shawe-Taylor, 2000). Regression trees can be
unstable and highly dependant on the sample on which they were built. To resolve that default,
Breiman (2001) proposed the random forest algorithm. This
is a set technique that, as its name suggests, consists of aggregating
predictions resulting from different regression trees. The
fact that the aggregation of unstable predictors leads to a reduction in
variance was particularly shown by Breiman (1998). For
a quantitative target variable, the aggregation of predictions is performed by
taking the mean predictions for each of the trees.
To
reduce the variance of the aggregate prediction, the objective is to build
trees that are very different from each other. The Breiman algorithm
introduces variability in the construction of the trees on the one hand by
means of replication (simple random sampling with replacement) of units and, on
the other hand, by means of random selection, for each “split” in the tree, of
a subset of candidate explanatory variables. For a regression
tree, there are therefore two additional parameters to be adjusted for a random
forest: the number of trees and the number of candidate explanatory variables
in each split.
When
the interest variable is multivariate (or functional), the algorithm proposed
by Breiman adapts easily, by aggregating the multivariate (or functional)
regression trees presented in the previous paragraph. Multivariate random
forests have, for example, been studied by Segal and Xiao (2011).
The
algorithm that we are proposing here, called “Courboforest,” simply consists of
aggregating the functional regression trees constructed using the “Courbotree”
approach, i.e., the multivariate regression trees applied to the vectors
of the values of the curves at the instants of discretization,
with the split criterion being the inertia based on the Euclidean distance
defined by equation (4.16).