Conditional calibration and the sage statistician
Section 4. Calibration – A simulation perspective

Table of contents

Consider how to evaluate a proposed procedure, generically called $P,$ which is to be applied to a data set, generically called $Y,$ yet to be collected from a population; $P$ is a specified function of the data $Y$ and will be used to estimate the estimand, here a scalar quantity, $Q,$ which describes some aspect of the population from which $Y$ is drawn. For descriptive simplicity, suppose $P$ is a purported 95% interval for $Q;$ for $P$ to be exactly calibrated means that $P$ includes $Q$ in exactly 95% of repeated samples. Further, suppose $Y$ is drawn from its population using design $D,$ which is known and fixed throughout this discussion; for concreteness, $D$ is simple random sampling. Although $D$ is known, at the design stage, the data set $Y$ is not yet known. Also suppose, again for simplicity, that all “experts” interested in this problem agree on a set of $K$ possible “Truths” for describing the unknown population to which $D$ will be applied to obtain data set $Y;$ call these possible Truths $T_{1}, T_{2}, \dots, T_{K},$ and the values of their associated local (local to each Truth) estimands $Q_{1}, Q_{2}, \dots, Q_{K},$ where $Q_{k} = \tilde{Q} (T_{k}),$ for $k = 1, \dots, K,$ for the function $\tilde{Q},$ common to all possible truths. The estimand $Q$ is the value of the function $\tilde{Q}$ evaluated at the actual Truth.

The $Q_{k}$ are here called local estimands, that is, local to the truths. As far as I can tell, Neyman never fomally considered such local estimands, but I see them as important bridges to the Bayesian perspective as well as to being a sage statistician. Only one of the possible truths is the actual truth. The inferential objective is the value of $\tilde{Q}$ for the Truth that generated the yet-to-be observed data $Y .$

The collection of $K$ possible Truths can often be compactly described mathematically, so that $K$ can be essentially infinite. One example of such Truths, and their associated local estimands, could be $K$ Gaussian univariate populations, with unknown local means, $μ_{k},$ and with the scalar estimand $Q$ equal to the mean of the one true population. Or the Truths could be all possible $N -$ dimensional vectors of real numbers; this is the standard finite population set-up for survey sampling with $N$ units and one scalar variable, as in Cochran (1963) and Kish (1965), where the estimand $Q$ is typically the mean of the $N$ values for the true population.

We continue by defining simple calibration using a simulation to fix ideas; this simulation will be used to define concepts throughout this manuscript, including the key concept of conditional calibration. Suppose that for each possible truth, $T_{k}, k = 1, \dots, K,$ with local estimand $Q_{k},$ we have drawn $J$ data sets, labeled $Y_{j k}, j = 1, \dots, J,$ each drawn using design $D .$ To each of these data sets, we apply procedure $P$ to the data to create an interval estimate for $Q_{k},$ where for each $k, Q_{k}$ is the same for all $Y_{j k} (j = 1, \dots, J)$ because all such $Y_{j k}$ arose from the same truth $T_{k} .$ We then assess whether when $P$ is applied to $Y_{j k},$ the resulting interval includes the local estimand $Q_{k} .$ The proportion of data sets, ${Y_{j k}, j = 1, \dots, J},$ for which the interval $P$ includes $Q_{k}$ is called here the local calibration (or local coverage) of the procedure $P$ for the $k^{th}$ Truth, notationally written $C_{k}$ for $k = 1, \dots, K .$ For evaluating point estimators, rather than interval estimators, the calibration of $P$ for $Q_{k}$ could be replaced by the bias or mean squared error of the point estimate of $Q_{k} .$

This simulation is depicted in Table 4.1, where each column represents a possible truth, and the $J$ rows represent the $J$ data sets generated under each truth.

Table 4.1
Display of simulation (Each column represents a possible truth)
Table summary
This table displays the results of Display of simulation (Each column represents a possible truth). The information is grouped by Local estimands: (appearing as row headers), (équation) and ... (appearing as column headers).
Local estimands:	$Q_{1} = \tilde{Q} (T_{1})$	Note ...: not applicable	$Q_{k} = \tilde{Q} (T_{k})$	Note ...: not applicable	$Q_{K} = \tilde{Q} (T_{K})$
This is an empty cell	$Y_{11}$	Note ...: not applicable	$Y_{1k}$	Note ...: not applicable	$Y_{1k}$
	$⋮$	This is an empty cell	$⋮$	This is an empty cell	$⋮$
	$Y_{j1}$	Note ...: not applicable	$Y_{jk}$	Note ...: not applicable	$Y_{jk}$
	$⋮$	This is an empty cell	$⋮$	This is an empty cell	$⋮$
	$Y_{J 1}$	Note ...: not applicable	$Y_{J k}$	Note ...: not applicable	$Y_{J K}$
Calibration of $P$ for $Q_{k} :$	$C_{1}$	Note ...: not applicable	$C_{k}$	Note ...: not applicable	$C_{K}$

Now we define local calibration using 95% to represent any level of coverage. A 95% interval estimate of $Q, P,$ is called “locally (for truth $T_{k})$ conservatively calibrated” if $C_{k} > =$ 95%; we could say that $P$ is “approximately locally calibrated” (for Truth $T_{k})$ if $C_{k}$ is close to 95%, but this idea was never formally defined by Neyman, although in Fisher’s (1934) discussion of Neyman (1934), we can see Fisher had something like this in mind with his criticism of Neyman’s formulation.

Next, following Neyman, the interval estimate $P$ is called “confidence calibrated” across the ensemble of possible truths, ${T_{k}, k = 1, \dots, K},$ if all $C_{k} >=$ 95%, or returning to Neyman’s original definition, $P$ is then simply called a 95% confidence interval for $Q .$ The critical point here for calibration is that all that matters to a die-hard Neymanian frequentist, when evaluating a procedure, $P,$ for its validity, is whether the collection of $C_{k}$ values for procedure $P$ are all greater than the nominal level for $P .$ The word “confidence” arises because when confronted with the results of Table 4.1 for procedure $P$ and with a critic who selected one Truth from the collection of possible truths, you should be “confident” that the result of applying $P$ to $Y^{*}$ will be an interval that includes $Q .$

These assessments of 95% confidence calibration are well-defined no matter what the etiology of the procedure $P .$ BUT, are they statistically apposite for evaluating $P$ as a 95% interval estimate of the unknown $Q$ after seeing a specific data set, call it $Y^{*} ?$ That is, after seeing a specific instance of $Y,$ now known to be $Y^{*},$ does the 95%-attached to $P$ necessarily reflect the judgment of a sage statistician? Maybe we should seek only procedures that are approximately calibrated for truths that plausibly could have generated the observed $Y^{*} ?$

We now consider the formal Bayesian perspective because it sheds light on this concept of being sage after seeing $Y = Y^{*} .$

ISSN : 1492-0921

Editorial policy

Survey Methodology publishes articles dealing with various aspects of statistical development relevant to a statistical agency, such as design issues in the context of practical constraints, use of different data sources and collection techniques, total survey error, survey evaluation, research in survey methodology, time series analysis, seasonal adjustment, demographic studies, data integration, estimation and data analysis methods, and general survey systems development. The emphasis is placed on the development and evaluation of specific methodologies as applied to data collection or the data themselves. All papers will be refereed. However, the authors retain full responsibility for the contents of their papers and opinions expressed are not necessarily those of the Editorial Board or of Statistics Canada.

Submission of Manuscripts

Survey Methodology is published twice a year in electronic format. Authors are invited to submit their articles in English or French in electronic form, preferably in Word to the Editor, (statcan.smj-rte.statcan@canada.ca, Statistics Canada, 150 Tunney’s Pasture Driveway, Ottawa, Ontario, Canada, K1A 0T6). For formatting instructions, please see the guidelines provided in the journal and on the web site (www.statcan.gc.ca/SurveyMethodology).

Note of appreciation

Canada owes the success of its statistical system to a long-standing partnership between Statistics Canada, the citizens of Canada, its businesses, governments and other institutions. Accurate and timely statistical information could not be produced without their continued co-operation and goodwill.

Standards of service to the public

Statistics Canada is committed to serving its clients in a prompt, reliable and courteous manner. To this end, the Agency has developed standards of service which its employees observe in serving its clients.

Copyright

Published by authority of the Minister responsible for Statistics Canada.

Use of this publication is governed by the Statistics Canada Open Licence Agreement.

Catalogue No. 12-001-X

Frequency: Semi-annual

Ottawa

Date modified:: 2019-07-04

Language selection

Search and menus

Search

Conditional calibration and the sage statistician
Section 4. Calibration – A simulation perspective

Conditional calibration and the sage statistician Section 4. Calibration – A simulation perspective

Editorial policy

Submission of Manuscripts

Note of appreciation

Standards of service to the public

Copyright

Conditional calibration and the sage statistician
Section 4. Calibration – A simulation perspective