3 The Second Controversy: Neyman advocates the exclusive use of randomization

Ken Brewer

Previous | Next

By the 1920s the situation was clear, though hardly ideal. Sampling was no longer regarded as off the agenda, but there was little or no guidance as to whether the sample should be chosen randomly or purposefully. The next two decades saw a slow but steady tendency for the randomization approach to become mandatory. And there was a good reason behind that tendency, for there were no other attractive models available to cause sampling statisticians to want to use them.

A particularly influential paper advocating the exclusive use of randomization was Jerzy Neyman's (1934) 68-page attack on a survey conducted by Gini and Galvani (1929). Those two authors had selected a "purposive� sample of 29 out of 214 districts (circondari) from the 1921 Italian Population Census. Their sample was chosen in such a way as to reflect almost exactly the whole-of-Italy average values for seven variables chosen for their importance; but Neyman showed that it exhibited substantial differences for other important variables. He then went on to attack this study with a three-pronged argument.

  1. Because randomization had not been used, the investigators had not been able to invoke the Central Limit Theorem. Consequently they had been unable to use the normality of the estimates to construct the "confidence intervals� that Neyman himself had recently invented. That idea appeared in English for the first time in this paper.
  2. On Gini's and Galvani's own admissions, the difficulty of their achieving their "purposive� requirement (that the sample match the population closely on seven variables) had caused them to limit their attention to the 214 districts rather than to the 8,354 communes into which Italy had also been divided. In consequence, their 15% sample consisted of only 29 districts (instead of perhaps 1,200 or 1,300 communes). Neyman further showed that a considerably more accurate set of estimates could have been expected had the sample consisted of a much larger number of those (order of magnitude smaller) communes.
  3. Crucially, the population model used by the investigators was unrealistic and inappropriate. (Neyman was convinced that models by their very nature were always liable to represent the actual situation inadequately.) Furthermore, randomization obviated the need for such population modelling. Using randomization-based inference, the statistical properties of an estimator could be established by using the distribution of its estimates from all the samples that could possibly be drawn. Moreover, when using randomisation, that same estimator under different designs could have different statistical properties. (A good example of this, though not one of Neyman's, is that an estimator that is biased under an equal probability design might well be unbiased under an unequal probability design.)

These three arguments were not all equally valid or convincing, but even Gini and Galvani were ready to admit that something was seriously wrong with their approach. Moreover, the second argument (that the sample size of 29 was too small) was an easy one for Neyman to argue. It was incontrovertible. The third argument, that the population modelling was inadequate, was also one that the survey designers were ready to acknowledge. The first argument (about confidence intervals) seems to have been accepted for no better reason than that Neyman was saying it, and that since he was certainly right on the other two points, he was probably right on that one as well.

3.1  Bowley's opposition to Neyman's first argument and the outcomes

One statistician who was not prepared to accept Neyman's way of thinking was Bowley, who moved the vote of thanks to him for his 1934 presentation. We are, in consequence, able to quote the actual words used by both the disputants. Bowley actually started the argument by wondering aloud whether confidence intervals were just "a confidence trick�!

He asked, "Does [a confidence interval] really lead us to what we need MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbcvPDwzYbGeaGqiVu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0x e9Lq=Jc9vqaqpepm0xbba9pwe9Q8fs0=yqaqpepae9pg0FirpepeKk Fr0xfr=xfr=xb9adbaqaaeGacaGaaiaabeqaamaabeabaaGcbaacba qcLbwaqaaaaaaaaaWdbiaa=rbiaaa@39BE@ the chance that within the universe which we are sampling the proportion is within these certain limits? I think it does not. I think we are in the position of knowing that either an improbable event had occurred or that the proportion in the population is within these limits… The statement of the theory is not convincing, and until I am convinced I am doubtful of its validity.�

In his reply, Neyman asserted that Bowley's question (about the confidence interval being a confidence trick) "contain[ed] the statement of the problem in the form of Bayes� and that in consequence its solution "must depend upon the probability law a priori.� He added, "In so far as we keep to the old form of the problem, any further progress is impossible.� He thus concluded that there was a need to stop asking Bowley's "Bayesian� question and instead adopt the stance that Neyman's own "either…or� statement [that either an improbable event had occurred or the proportion of the population was within the stated limits] "form[ed] a basis for the practical work of a statistician concerned with problems of estimation…�

 However, the fact remains that confidence intervals are not easy to understand. A confidence interval is in fact a sample-specific range of potentially true values of the parameter being estimated, which has been constructed so as to have a particular property. This property is that, over a large number of sample observations, the proportion of times that the true parameter falls inside that range (constructed for each sample separately) is equal to a predetermined value known as the confidence level. This confidence level is conventionally written as p=1α, MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiCai abg2da9iaaigdacqGHsislcqaHXoqycaGGSaaaaa@401D@  where α MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaeqySde gaaa@3BCA@  is small compared with unity. Conventional values for α MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaeqySde gaaa@3BCA@  are 0.05, 0.01, and sometimes 0.001. Thus, if many samples of size n MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamOBaa aa@3B1F@  are drawn independently from a normal distribution, the proportion of times that the true parameter value will lie within any given sample's own confidence interval will, before that sample is selected, be [1α]. MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaai4wai aaigdacqGHsislcqaHXoqycaGGDbGaaiOlaaaa@3FE4@

"It is not the case, however, that the probability of this true parameter value lying within the confidence interval as calculated for any individual sample of size n MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamOBaa aa@3B1F@  will be [1α]. MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaai4wai aaigdacqGHsislcqaHXoqycaGGDbGaaiOlaaaa@3FE4@  The confidence interval calculated for any individual sample of size n MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamOBaa aa@3B1F@  will, in general, be wider or narrower than average and might be centred well away from the true parameter value, especially if n MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamOBaa aa@3B1F@  is small. It is also sometimes possible to recognise when a sample is atypical and, hence, make the informed guess that in this particular case, the probability of the true value lying in a particular 95% confidence interval differs substantially from 0.95.�

Let us then consider, in particular, the most commonly used of all 95% confidence intervals, namely that between p=0.05 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiCai abg2da9iaaicdacaGGUaGaaGimaiaaiwdaaaa@3F0B@  and p=1.00 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiCai abg2da9iaaigdacaGGUaGaaGimaiaaicdaaaa@3F07@ . (Fisher (1925) had actually suggested using the interval between p=1/22 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiCai abg2da9iaaigdacaGGVaGaaGOmaiaaikdaaaa@3F0C@  and p=1 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiCai abg2da9iaaigdaaaa@3CE1@ .) Editors of publications in a great variety of fields (most of them not themselves statisticians) feel this definition of "significance� to be the one that very conveniently gives them leave to publish p-values that fall outside that range and reject those that do not. I believe the time is long overdue for looking at that suggestion of Fisher's very carefully.

What Fisher claimed (using p=1/22 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiCai abg2da9iaaigdacaGGVaGaaGOmaiaaikdaaaa@3F0C@  rather than p=0.05 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiCai abg2da9iaaicdacaGGUaGaaGimaiaaiwdaaaa@3F0B@  ) was that "Using this criterion we should be led to follow up a false indication only once in 22 trials�. But what did he (and what do we now) mean by "following up a false indication�? What we should mean is this: that if the null hypothesis (H0) is true, a "false indication�, that is to say, "a misleadingly significant observation,� will be observed , on average, once in 22 (or 20) times. But this is not what many non-statistical users of the p-statistic imagine that it means. Such users seem to think it means that only one in 20 of their "significant observations� (i.e., that only one in 20 of all their observations with p-values less than 0.05) will be misleadingly significant.

That is the notorious p-statistic fallacy! (See Berger and Sellke (1987) for details.) To say "If H0 is true, observations will be misleadingly described as 'significant' only once in 20 (or 22) times�, is correct but unhelpful, for if H0 is true, it follows that every observation described as "significant�, for whatever reason, must also have been described that way misleadingly. But simply to say "Whether H0 is true or not, p<0.05 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiCai abgYda8iaaicdacaGGUaGaaGimaiaaiwdaaaa@3F09@  �, is also misleading. A meaningful false discovery rate (FDR) in these circumstances is (in fact) something that approximates to p<0.0025 MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiCai abgYda8iaaicdacaGGUaGaaGimaiaaicdacaaIYaGaaGynaaaa@407F@  or p< 0.05 2 . MathType@MTEF@5@5@+= feaagKart1ev2aaatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiCai abgYda8iaaicdacaGGUaGaaGimaiaaiwdadaahaaWcbeqaaiaaikda aaGccaGGUaaaaa@40AE@

This is a subject on which I have expended some thought of late. In particular, I co-authored a four-part article on it. 

Part 1 (Brewer and Hayes 2011a) discusses how the notoriously parsimonious Bayesian Information Criterion (BIC) can be remedied by adding certain obviously needed penalty terms. The resulting Augmented Bayesian Information Criterion (ABIC) is nearly always intermediate between the original BIC and the (equally notoriously lacking in parsimony) Akaike Information Criterion (AIC). Another useful feature of the ABIC is that in its univariate case it is a simple function of T MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamivaa aa@3B05@  (the large sample limiting case of Student's t MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiDaa aa@3B25@  ).

In Part 2 (Brewer and Hayes 2011b), a reference Bayesian hypothesis test is derived that is fully compatible with the ABIC of Part 1. An important role is played here by an obvious generalisation of Benford's (purely empirical) Law of Numbers, in providing an objective (though not flat) Bayesian prior distribution over the entire range from zero (or minus infinity) to plus infinity for the relevant hypothesis test. (The problem that characteristically arises with zero prior probabilities is avoided here by the use of Lebesgue-type measures instead.) Importantly, when T=1, MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamivai abg2da9iaaigdacaGGSaaaaa@3D76@  the relevant Bayesian hypothesis test yields a posterior measure that is indifferent between the null and alternative hypotheses. Furthermore, when the ABIC is generalised to small samples, as a function of the t-statistic, Fisher's p MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaGaamiCaa aa@3B21@  sets an upper bound to the false discovery rate (FDR), regardless of the number of degrees of freedom involved.

In Part 3 (Brewer, Hayes, and Gillison 2012), a set of some 1,300 regression slopes from a biodiversity sample survey of tropical landscape mosaics is used to provide empirical support for the ABIC, and the earlier theoretical findings are thereby confirmed.

In Part 4 (Hayes and Brewer 2012), the approximate results derived in Parts 1 to 3 are supplemented by exact results that can be obtained using a somewhat similar approach, but one that requires no explicit null hypothesis. Finally we suggest some likely consequences of the recognition that, when the implied null hypothesis is precise, much smaller values of | p | MathType@MTEF@5@5@+= feaagKart1ev2aqatCvAUfeBSjuyZL2yd9gzLbvyNv2CaerbuLwBLn hiov2DGi1BTfMBaeXatLxBI9gBaerbd9wDYLwzYbItLDharqqtubsr 4rNCHbGeaGqiFu0Je9sqqrpepC0xbbL8F4rqqrFfpeea0xe9qqFf0x e9q8qqvqFr0dXdbrVc=b0P0xb9sq=fFfeu0RXxb9vr0dd9q8as0lf9 Fve9Fve9vapdbaqaaeGacaGaaiaabeqaamaabaabaaGcbaWaaqWaae aacaWGWbaacaGLhWUaayjcSdaaaa@3E43@  (typically of the order of 0.0025 rather than 0.05) are needed to provide any useful FDR.

3.2  The acceptance of Neyman's second and third arguments

The second and third ideas that Neyman had advocated in his paper (namely the inefficiency of Gini and Galvani's (1929) selection procedure and the need to use only randomized sampling) though both relevant for their time and well presented, caught on only gradually over the course of the next decade. W. Edwards Deming heard Neyman in London in 1936. He was impressed and arranged for Neyman to lecture, and for his approach to be taught to U.S. government statisticians. A crucial event in its acceptance was the use in the 1940 U.S. Population and Housing Census of a one-in-twenty sample, designed by Deming along with Morris Hansen and others, to obtain answers to additional questions. Once fully accepted, however, Neyman's second and third arguments swept all other considerations aside for at least two decades.

Those twenty-odd years were a time of great progress. In the terms introduced by Kuhn (1962), finite population sampling had found a universally accepted "paradigm� in randomization-based inference, and an unusually long period of "normal science� based on "probability sampling� had ensued. ("Probability sampling� requires that all the elements in the population have known and positive probabilities of inclusion in sample.)

3.3  The appearance of relevant textbooks

This agreed consensus made it possible for several influential sampling textbooks to be published. Kish's (1995) historical article mentions five that appeared in quick succession: Yates (1949), Deming (1950), Cochran (1953), Hansen, Hurwitz and Madow ("HH&M�) (1953) and Sukhatme (1954).

In my estimation the two most important of these were those by Cochran and by HH&M, but for quite opposite reasons. HH&M seem not to have wanted any truck at all with population modelling. (I doubt whether the word "model� is even mentioned in either of their two volumes. It does not appear in either index.) Cochran (1953), on the other hand found several uses for such models, even as early as 1953.

Re-reading Cochran (1953) recently, I had the distinct impression that the more he wrote, the more he was at ease in using population models. So I started to count them. This first edition had 316 pages of text. The words "model� and "models� were used on 23 occasions. In the first half of the book, the word "model� appeared only once (on page 123) and "models� not at all. But Cochran used those words again three times in the third quarter and 19 times in the last quarter. (Numbers sometimes speak louder than words!)

Another strange thing was that although HH&M's two-volume book on Sample Survey Methods and Theory appears not to have used the word "model� at all, each of its two volumes included a chapter on "regression estimation�. I don't see how one can have a regression estimator without a regression model, at least in the back of one's mind.

HH&M also defined four "estimates� in Chapter 11 of their Volume 1: the difference estimate, the regression estimate, the ratio estimate and the simple unbiased estimate. In Chapter 11 of Volume 2 only the difference estimate and the regression estimate are defined, but of course the other two would have been well known to anyone who was already familiar with Volume 1.

The question still remains as to whether HH&M would have regarded the regression estimate as implying a model. My guess is that they would have been reluctant to do so!

3.4  My fifteen months in the USA

In 1966-67, I was privileged to spend over a year in the USA, visiting (in order) the U.S. Bureau of the Census in Washington DC, and then Harvard and Princeton Universities. At the Bureau of the Census I had hoped to be able to spend some time with Morris Hansen, and was looking forward to suggesting to him that there were actually some useful things that could be done with population models, but when the first opportunity occurred, he cut me off short, saying "We don't need models,� and immediately changed the subject!

Conversely, when I went to Harvard, where I spent a considerable time with Cochran, we were able to look at the topic rationally together and agree that models had a useful if limited role to play. At Princeton, I attempted to interest several well-known statisticians at the university about the topic, but without any serious success.

Quite a different challenge to Hansen's model-free orthodoxy had been voiced by Godambe (1955), with his proof of the non-existence of any uniformly best randomization-based estimator of the population mean. A new notation and class of estimators were required for the argument, and this framework in its earliest form met with some resistance. In Section 5 of that paper, citing Yates' (1949) textbook and Cochran's (1939) paper as antecedents, Godambe suggested an alternative optimality criterion, the minimization of the expected sampling variance under what was later called a superpopulation model.

At that time few others working in this excitingly innovative field of survey sampling seemed to be concerned by this result. I must confess that I wasn't myself concerned at the time, but I now think that perhaps I should have been!

Previous | Next

Date modified: