Conditional calibration and the sage statistician
Section 4. Calibration – A simulation perspective
Consider how to evaluate a proposed procedure,
generically called
which is to be applied to a data
set, generically called
yet to be collected from a
population;
is a specified function of the
data
and will be used to estimate the estimand, here a scalar quantity,
which describes some aspect of the population
from which
is drawn. For descriptive simplicity,
suppose
is a purported 95% interval for
for
to be exactly calibrated means
that
includes
in exactly 95% of repeated
samples. Further, suppose
is drawn from its population
using design
which is known and fixed
throughout this discussion; for concreteness,
is simple random
sampling. Although
is known, at the design
stage, the data set
is not yet known. Also suppose,
again for simplicity, that all “experts” interested in this problem agree on a
set of
possible “Truths” for describing the unknown population to which
will be applied to obtain data set
call these possible Truths
and the values of their
associated local (local to each Truth) estimands
where
for
for the function
common to all possible truths. The estimand
is the value of the function
evaluated at the actual Truth.
The
are here called local estimands, that is,
local to the truths. As far as I can tell,
Neyman never fomally considered such local estimands, but I see them as important bridges to the Bayesian
perspective as well as to being a sage statistician. Only one of the
possible truths is the actual truth. The inferential
objective is the value of
for the Truth that generated the yet-to-be
observed data
The collection of
possible Truths can often be
compactly described mathematically, so that
can be essentially infinite. One
example of such Truths, and their associated local estimands, could be
Gaussian univariate populations,
with unknown local means,
and with the scalar estimand
equal to the mean of the one true population. Or the Truths could be all possible
dimensional vectors of real numbers; this is the standard finite
population set-up for survey sampling with
units and one scalar variable, as in Cochran (1963) and
Kish
(1965), where the
estimand
is typically the mean of the
values for the true
population.
We continue by defining simple calibration
using a simulation to fix ideas; this simulation will be used to define
concepts throughout this manuscript, including the
key concept of conditional calibration. Suppose that for
each possible truth,
with local estimand
we have drawn
data sets, labeled
each drawn using design
To each of these data sets, we
apply procedure
to the data to create an
interval estimate for
where for each
is the same for all
because all such
arose from the same truth
We then assess whether when
is applied to
the resulting interval includes
the local estimand
The proportion of data sets,
for which the interval
includes
is called here the local
calibration (or local coverage) of the procedure
for the
Truth, notationally written
for
For evaluating point estimators,
rather than interval estimators, the calibration of
for
could be replaced by the bias or
mean squared error of the point estimate of
This simulation is depicted in Table 4.1,
where each column represents a possible truth, and the
rows represent the
data sets generated under each
truth.
Table 4.1
Display of simulation (Each column represents a possible truth)
Table summary
This table displays the results of Display of simulation (Each column represents a possible truth). The information is grouped by Local estimands: (appearing as row headers), (équation) and ... (appearing as column headers).
Local estimands: |
|
Note ...: not applicable |
|
Note ...: not applicable |
|
This is an empty cell |
|
Note ...: not applicable |
|
Note ...: not applicable |
|
|
This is an empty cell |
|
This is an empty cell |
|
|
Note ...: not applicable |
|
Note ...: not applicable |
|
|
This is an empty cell |
|
This is an empty cell |
|
|
Note ...: not applicable |
|
Note ...: not applicable |
|
Calibration of
for
|
|
Note ...: not applicable |
|
Note ...: not applicable |
|
Now we define local calibration using 95%
to represent any level of coverage. A 95% interval estimate of
is called “locally (for truth
conservatively calibrated” if
95%; we could say
that
is “approximately locally
calibrated” (for Truth
if
is close to 95%, but this idea
was never formally defined by Neyman, although in Fisher’s (1934) discussion of
Neyman (1934), we can see Fisher had something like this in mind with his criticism of
Neyman’s formulation.
Next, following Neyman, the interval estimate
is called “confidence calibrated” across the ensemble of possible truths,
if all
95%, or returning to Neyman’s original definition,
is then simply called a 95% confidence
interval for
The critical point here for
calibration is that all that matters to a die-hard Neymanian frequentist, when
evaluating a procedure,
for its validity, is whether the collection of
values for
procedure
are all greater than the nominal level for
The word “confidence” arises because when
confronted with the results of Table 4.1 for procedure
and with a critic who selected one Truth from
the collection of possible truths, you should be “confident” that the result of applying
to
will be
an interval that includes
These assessments of 95% confidence
calibration are well-defined no matter what the etiology of the procedure
BUT, are they statistically
apposite for evaluating
as a 95% interval
estimate of the unknown
after seeing a specific data set,
call it
That is, after seeing a specific
instance of
now known to be
does the
95%-attached to
necessarily reflect the judgment
of a sage statistician? Maybe we should seek only
procedures that are approximately calibrated for truths that plausibly
could have generated the observed
We now consider the formal Bayesian perspective because it
sheds light on this concept of being sage after seeing
ISSN : 1492-0921
Editorial policy
Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.
Submission of Manuscripts
Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).
Note of appreciation
Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.
Standards of service to the public
Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.
Copyright
Published by authority of the Minister responsible for Statistics Canada.
© Her Majesty the Queen in Right of Canada as represented by the Minister of Industry, 2019
Use of this publication is governed by the Statistics Canada Open Licence Agreement.
Catalogue No. 12-001-X
Frequency: Semi-annual
Ottawa