% LaTeX source for Fisher 272 The Nature of Probability
    
% LaTeX source for Fisher 275 Lung Cancer and Cigarettes
    
\documentclass{article} 
    
\usepackage{amsmath}
    
\usepackage{times} 
    
\begin{document} 
    
\noindent
\textit{Centennial Review} \textbf{2} (1958), 261--274.
\begin{center}
\Huge{272}
\end{center}
\begin{center}  
\Large{THE NATURE OF PROBABILITY} 
\end{center} 
\begin{center}
{\Large\textit{Sir Ronald Fisher}}\footnote{
This paper represents the substance of an address given in November 1957
at Michigan State University.}
\end{center}
    
\textsc{It is no secret}---it is a fact that I have stressed
particularly in a recent book of mine on scientific inference
\footnote{\textit{Statistical Methods and Scientific Inference}
(Edinburgh: Oliver and Boyd, 1956).}---that grave differences of opinion
touching upon the nature of probability are at present current among
mathematicians. i should emphasize that mathematicians are expert and
exceedingly skilled people at the particular jobs that they have had
experience of-in particular: exact, precise deductive reasoning. In that
field of deductive logic, at least when carried out with mathematical
symbols, they are of course experts. But it would be a mistake to think
that mathematicians as such are particularly good at the inductive
logical processes which are needed in improving our knowledge of the
natural world, in reasoning from observational facts to the inferences
which those facts warrant. Now when we are presented, as we are at the
present time in the loth century and perhaps especially in this country,
with grave differences of opinion of this sort among entirely competent
mathematicians, we may reasonably suspect that the difficulty does not
lie in the mathematics-or at least only incidentally or accidentally in
the mathematics-but has a much deeper root in the semantics or an
understanding of the meanings of the terms which are used.

It's not the first time that grave differences of opinion among
mathematicians have occurred on this very question of probability.
Looking over the history of the subject, I think we can say that a
crucial set of circumstances occurred at an early period, in the 17th
and 18th centuries, at the time when the interest of mathematicians in
the area of probability hung upon the high social prestige of the
recreation of gambling, and mathematicians were constantly being
approached by persons of the highest social standing, worthy of every
respect and service, in order to solve the knotty problems that arose in
this recreation; and this activity was manifestly the mainspring of the
interest of the galaxy of distinguished mathematicians who, at that
period, gave their attention to the subject. 

May I just mention a few names illustrative of that period: Pascal,
Fermat, Leibnitz, Montmort (all of whom functioned principally in
France), De Moivre and Bayes (in England), and Bernoulli (who didn't
live quite in France because he was a member of a distinguished family
of the town of Basel). And I am inclined to say that all of those
founders of the mathematical theory of probability understood the
meaning of the word in one way, and they had the great advantage of
coming to an understanding of the word which they used in their work, in
that they were brought frequently into contact with its practical
applications in the real world. 

Now one of the difficulties in the teaching of mathematics in the
present century is the difficulty of representing in mathematical
departments those arts, crafts, skills, and tech-nologies to which
statistics is now being actively applied. It would seem an almost
impossible task to staff a mathematical department, to get even a
representation of the immense variety of practical affairs in which
mathematics or statistics is applicable and is now being used. That is a
problem for the organizers of education. 

My own problem is a much narrower one. I want to make clear what I mean
by probability; I want to make clear, so far as I can, why it is that
quite a number of mathematicians fall into what I consider to be
manifest fallacies in this field. My business, you see, is one in
semantics, the meaning of the word; and the meaning of the word only
comes into existence by usage, and so I define the usage that I am
concerned with as that of these 17th and 18th century mathematicians. If
we wish to speak about something else from that which they call
probability, then I think we should find a different word; but I doubt
if there is anything else of so great importance that we should
consider. We can trace, I think, some of the difficulties of such a word
to the mathematical mind. Clearly, the purpose of the notion of
probability is to express -and express accurately, with mathematical
precision-a state of uncertainty; and states of uncertainty are not
familiar in the processes of exact deductive reasoning.

Probability is, I suggest, the first example of a well specified state
of logical uncertainty. Let me put down a short list of three
requirements, as I think them to be, for a correct statement of
probability, which I shall then hope to illustrate with particular
examples. I shall use quite abstract terms in listing them.
\renewcommand{\theenumi}{\alph{enumi}}
\renewcommand{\labelenumi}{(\theenumi)}
\begin{enumerate}
\item There is a measurable reference set (a well-defined set, perhaps
of propositions, perhaps of events).
\item The subject (that is, the subject of a statement of prob-ability)
belongs to the set.
\item No relevant sub-set can be recognized.
\end{enumerate}
I expect that these words will acquire a meaning from the examples I
have to give.

Let us consider any uncertain event. A child is going to be born. I
don't know enough about the present state of medical science to know
whether experts exist who are really capable of saying in advance of
what sex the child will be. But let us imagine ourselves in the
technology of the 19th century, when certainly no such statement could
be made with any confidence. This is my first example of a matter in
which we are in the state of uncertainty; that is to say, we lack
precise knowledge, but we do not lack all knowledge. On inquiry at the
registrar, we may find that in his experience, or in the experience of
much larger numbers recorded by registrars in different parts of the
world, a fixed proportion of the births has been of boys and the
remainder of girls. Let us suppose he tells us that in 51 per cent the
births are those of boys (a little more than 51 per cent in most
populations). To the registrar, the birth which is about to take place,
though intensely important to ourselves, is just another birth. To him
it belongs to this set of his experience of sex at birth, and he very
properly informs us that the probability of a boy is 51 per cent, having
made reference to this measurable reference set as the basis of his
statement.

Secondly, we satisfy ourselves as to the existence of relevant sub-sets.
I need not use the word ``random'' because all I need say can be said
under ``(c)'' which is the most novel in its formulation if not in its
idea, the most novel of the requirements I have listed. This is a
formulation which I submit to your judgment as a competent formulation
of what is needed if we are to speak without equivocation of a
probability of something in the real world.

The registrar might raise such a question as this: Is it a white birth
or a colored birth? In his experience, the sex ratio might be different.
Very well, then, it's a white birth. We have recognized a sub-set of
white births, and he must turn to his tables and find out what the
proportion is in respect to white births, ignoring those which do not
belong to the particular sub-set to which our event belongs. Or again,
his experience might have shown that first births have a higher sex
ratio than births in general. He will then inquire whether our birth is
a first birth or not. If it is a first birth, it belongs to a relevant
sub-set. It is now recognized and takes the place of the reference set
with which we started.

Exactly the same considerations may be applied to any other case of
uncertainty. Let us take the case of deliberately arranged uncertainty,
which occurs in games of chance.

I mentioned the importance of the recreation of gambling as calling
attention of mathematics to this new concept of probability in the 17th
century. The concept was unknown to the Greek mathematicians; it was
also unknown to the Islamic mathematicians, perhaps because gambling was
forbidden by the Prophet. But it was not only the taste for gam-bling, I
think, which made the difference; it was the fact that by the 17th
century the technology of the manufacture of the apparatus of games of
chance had reached a point at which the calculations of mathematicians
have some relevance. They were not playing with knuckle-bones; they were
playing with very well made dice.

Consider the gambler who has laid a stake on the assertion that an ace
will be thrown. It's worth a lot of money to him. He doesn't want to
mistake your meaning if you say, as per-haps De Moivre might have said,
the probability of an ace is one-sixth. In saying that, he is saying
that this is just one throw out of all the possible throws that might be
made, and he will regard these possible throws as a reference set,
measurable, of which the fraction exactly one-sixth are aces. His
reasons for doing that don't immediately concern us. It is a common
sense reason, perhaps, that the die has been supplied by a reputable
maker, that it has six faces, that the aim of the maker has been to make
it approximately a perfect cube, and to make sure that the center of
gravity is equally distant from each of those faces.

Contrast that, however, with a much more sophisticated and typically
useless definition of probability, which is sometimes fed to
mathematical students. It goes something like this: \[
Pr\left\{\left|\frac{a}{n}-\frac{1}{6}\right|\right\} \to 0 \] If a aces
occur in n trials, then the difference in absolute value between the
fraction $\frac{a}{n}$ and $\frac{1}{6}$ will have a probability of
exceeding any positive number $\varepsilon$, however small, a
probability which will tend to zero as $n$ tends to infinity.

You see, that is someway away from the real world already. The gambler
deserves something better than that. He may ask you, ``What do you mean,
`tends to infinity'?'' ``Well, you go on rolling, and you don't stop-you
go on rolling; you go on rolling until the die is worn to a sphere; you
go on rolling until the sun goes out; but still you haven't reached
infinity and are still a long way off.'' And then, it's not only that;
as a practical man he doesn't like that, of course. ``But,'' he says,
``I asked you what you meant by probability, and here you are, you've
brought in the same notion of probability in your definition. How do I
know what that probability means?" We have a perpetual regression
defining probabilities in terms of probabilities in terms of
probabilities; that is a purely logical objection to the defi-nition.
But the real objection, if I may say so, for the practical gambler who
wants to know about his stake, is that it says nothing about the
particular throw in which he is interested. It says something about what
we should ultimately regard as the reference set, certainly; but it says
nothing whatever about his particular throw. And of course it might
occur to him that though this was true of throws in general, yet in
particular groups of throws within that general set, in particular
sub-sets, the fraction might be different, perfectly consistently with
this general statement.

Consider a few possible sub-sets. Here's a recognizable sub-set: throws
made on Friday. He can recognize that sub-set of possible future throws,
and he knows his throw is one of them. But so far as we know, shall we
say, according to the axioms on which the mathematicians were advising
the gambler, throws made on Friday do not give a different frequency of
aces from throws made on other days. So it is recognizable, but not
relevant. It doesn't alter the estimate. And then, perhaps you say, odd
numbers: 1, 3, or 5. A very relevant sub-set, if it could be recognized.
But the makers of dice and other apparatus of gambling have taken
care---they have taken a great deal of trouble to make sure, in
fact-that such a sub-set cannot be recognized before the dice are
thrown. And, thirdly, let us suppose that our gambler has heard of
Professor Rhine of Duke University, and that in the opinion of Professor
Rhine, some of his students have the remarkable gift of precognition.
The gambler perhaps makes an agreement with such a student to sit by his
side while he is rolling the dice and give him a nudge when an ace is
coming. Here you have, let us say, two possible cases. Perhaps the
prophet is some good-and what that means is that the sub-set of throws
in which he gives the signal to his patron has a proportion of aces
which is greater than one-sixth---it is possible it might be a third if
he is a pretty good prophet. And in that case I submit that the gambler
has a recognizable and a relevant sub-set, and that to him, on his
knowledge, on his information, on his data as we sometimes say, the
probability is not one-sixth, but a third. On the other hand, if, after
some experience he comes to the conclusion that his prophet is no good
at all, he will not lose his knowledge of the probability-it will merely
revert to its value of one-sixth. He will now be in the position of
saying that there is a measurable set with a frequency of one-sixth, and
there is no relevant and recognizable sub-set which I should prefer to
it.

Now that, I hope, sounds easy, and I want to get a little closer to the
psychological difficulties which cause difference in understanding as to
the meanings of these words.

The first difficulty is that we are making a statement of un-certainty,
and that statements of uncertainty are not familiar in the ordinary
course of deductive mathematical argument. They introduce special
logical requirements. You notice, my third condition was that no sub-set
should be recognizable. It is a postulate of ignorance. How are we to
take account of postulates of ignorance, as we have to do in inductive
reasoning? In the ordinary course of deductive reasoning, the reasoner
is supplied with what I shall call, for the moment, "axioms"---the term
doesn't matter very much---and if he can prove what he wants to prove by
using axiom A, axiom C, and axiom E to give the proposition, he is
perfectly entitled to do so because he is arguing with certainty, and
the truth of axioms A, C, and E are not at all precluded or interfered
with by his axioms B and D that have not entered into his argument.
\begin{center}
\textit{Axioms}
\end{center}

\begin{picture}(120,50)
\put(160,10){\line(2,1){42}}
\put(140,10){\line(-2,1){42}}
\put(140,30){\line(-5,-4){15.5}}
\put(145,0){$P$}
\put(90,35){$A$}
\put(117.5,35){$B$}
\put(145,35){$C$}
\put(172.5,35){$D$}
\put(200,35){$E$}
\end{picture}

\noindent
But suppose he were making a statement of uncertainty. Then $B$ and $D$
do matter. In inductive reasoning the whole of the data, or the
available axioms, or the available observations, has to be taken into
account, and it is only because of that particularity of inductive
reasoning that axioms of ignorance matter. There the postulate of
ignorance asserts that certain things are not known and that the
validity of the argument requires that they should not be known; and of
course this is fundamental to any correct statement of uncer-tainty, If
all sorts of other additional information could be sprung on you at any
stage in the argument, you might dis-cover there was no uncertainty at
all, or, more easily, that the degree and nature of uncertainty which
you have arrived at is totally different from what should have been
arrived at if everything had been taken into account.

Now, at the end of the last century, a group of rather distinguished
mathematicians, Hilbert, for example, and Peano, set out on a project
which was to show that the whole of mathematics could be deduced with
strict irrefragable logic from certain chosen axioms. Peano had a shot
at setting up such axioms that would suffice for the deduction of the
whole of mathematics. That project was influential---it still is
influential, I think, in spite of the setbacks that it has received. It
was influential, for example, in producing Whitehead and Russell's
\textit{Principia Mathematica}. It was quite fundamental to Keynes' book
on probability.

But difficulties have arisen. It was fairly easily demonstrated, and it
came as a surprise to a good many people, that if a system of axioms
allowed of the deduction of any contradiction (any fallacy, if you
like)-if it allowed the proposition $P$ and also the proposition
\textit{not}-$P$ to be deduced by the ordinary rigorous processes from
the same system of axioms---then that system of axioms contained latent
alt contradictions, in the simple sense that any proposition whatever
could be deduced from them.

There is a story that emanates from the high table at Trinity that is
instructive in this regard. G.\ H.\ Hardy, the pure mathematician---to
whom I owe all that I know of pure mathematics---remarked on this
remarkable fact, and some-one took him up from across the table and
said, ``Do you mean, Hardy, if I said that two and two make five that
you could prove any other proposition you like?'' Hardy said, ``Yes, I
think so.'' ``Well, then, prove that McTaggart is the Pope.'' ``Well,''
said Hardy, ``if two and two make five, then five is equal to four. If
you subtract three, you will find that two is equal to one. McTaggart
and the Pope are two; therefore, McTaggart and the Pope are one.'' I
gather it came rather quickly.

That wasn't, however, the worst that befell the theory of the axiomatic
basis for mathematics. It pinpointed the need for some means of
demonstrating that a system of axioms was free from all contradictions,
because if it wasn't it could lead to anything. And then the blow fell,
which was due, I believe, to G\"odel, who put forward a very long, very
elaborate, and extraordinarily ingenious proof to the effect that you
could not, basing your reasoning upon a given system of axioms, disprove
the possibility that that system could lead to a contradiction. Now that
was a surprise to people, but I don't think it ought to have been. After
all, suppose a Ph.D.\ student came, breathless with excitement, and
said, ``I have proved that this system of axioms is free from all
contradictions.'' You'd say, ``Did you prove it using only those
axioms?'' He might say, ``Yes, I have written out a chain of
propositions which demonstrate that these axioms are free from
contradiction.'' Well, I suppose you'd look at him with mild surprise,
and you might say, ``I suppose you know that if this system of axioms
did contain a contradiction, you could prove exactly those same
propositions.'' And so you have the situation that certain propositions
which purport to prove the truth, the truth of the theorem, could be
equally well demonstrated by the ordinary rigorous processes of
deductive reasoning if they were false. And I don't know how much we
would give, then, for the chain of theorems which purported to prove
that the system of axioms was free from contradictions. It would seem to
be a little absurd to imagine that such a thing was possible.

Now, if I were to illustrate the mathematics, it would not appeal to a
large proportion of the audience. But I want to give a few comparatively
slight illustrations of how the con-troversies that I have alluded to
affect our practical mathe-matical reasoning. Some of us think that if
one had a sample which was known to be drawn from a normal population-a
sample of $N$ observations, $X_1$,\dots,$X_N$---that by taking the mean
of that sample (that is, by adding up the individual observations and
dividing by their number), and by taking the mean square deviation,
using the sum of $(X-\bar{X})^2$, treating it ap  propriately, as Gauss
suggested, and getting what is called the sample variance of the mean,
$s^2=S/N(N-1)$---some of us believe that one can then make probability
statements of the kind that the true mean ($\mu$) of the population is
less than a calculable limit with an exactly known probability. In fact,
the statement can be made that the probability that the un-known mean of
the population is less than a particular limit, is exactly $P$. Namely 
$\Pr(\mu < \bar{x}+ts)=P$ for all values of $P$, where $t$ is known (and
has been tabulated as a function of $P$ and $N$).
  
This is exactly the sort of specification of our uncertain knowledge of
the constants of nature that scientists have for a hundred years thought
they possessed about them. The conditions required are more stringent
than has been generally realized, but these conditions can be met in a
number of useful cases, and in these cases the quantity under
discussion, although of course not known with exactitude, is accurately
specified as a random variable about which exact probability statements
can be made for all possible values of the probability.

This is a single example of a large number of such inductive inferences
that are made by the same process of reasoning. They have been disputed,
I think principally on this ground, that it is not clear to all
mathematicians that a probability statement is based on data, and that
it is no defect in such a probability statement that it would be
different if the data were different.

Let me examine this simple example. We have a limit which we can
calculate, and it is undoubtedly true that this limit exceeds $\mu$ with
given probability in the reference set defined by any value of $\mu$. If
a population with a mean p. were sampled repeatedly, we would certainly
get this quantity exceeding it with a given probability. That, I
believe, is not disputed. It is also true that if we take the statement
in general we have proved it for all $\mu$ and therefore for the
reference set for all samples from all populations. Each sample has
peculiar values $(\mu, \bar{x}, s)$, and for this enlarged reference set
it is true that $\Pr(\mu < \bar{x}+st) = P$, where $t$ is ``Students''
deviate corresponding with the (one-sided) probability $P$.

That, however, does not settle the matter. There are two conditions
which should be satisfied in addition. I would like to emphasize these
because you will find examples in the literature where this sort of
inference is drawn without any reference to the conditions, and usually
drawn with reference to what is really irrelevant, namely, certain
beliefs about tests of significance---``the theory of testing
hypotheses,'' or perhaps the theory of decision functions. The two
requirements that are necessary flow from the third condition which I
laid down for a correct statement of probability, namely, that no
relevant sub-set should be recognizable.

Now suppose there were knowledge a priori of the distribution of $\mu$.
Then the method of Bayes would give a probability statement, probably a
different one. This would supersede the fiducial value, for a very
simple reason. If there were knowledge a priori, the fiducial method of
reasoning would be clearly erroneous because it would have ignored some
of the data. I need give no stronger reason than that. Therefore, the
first condition is that there shall be no knowledge a priori. And the
second condition is that in calculating the limit, the second term of
the inequality concerned, we should have used exhaustive estimates. The
two estimates that we are concerned with are the mean and variance
(estimate of the mean, estimate of the variance), and those happen to be
ex-haustive in a mathematical sense when calculated from the normal
distribution, but not from other distributions. If they are exhaustive,
then it is known that given these two quantities, $\bar{X}$ and $s^2$,
the distribution of any other statistic whatsoever (that is to say, any
function whatever of the observations) would, subject to the restriction
of fixing the values of $\bar{X}$ and $s^2$, have a distribution indeed
and take many values, but its distribution would be independent of the
unknowns $\mu$ and $\sigma$. And, therefore, no such value could provide
information about $\mu$. But if the statistics used in this argument had
not been exhaustive, then it would be possible to find other functions
of the observations which even under the restrictions that X and s are
fixed, would have information to give about the unknown $\mu$. Such a
value, calculated from the sample, would define a sub-set of cases which
might well give a different probability from that which we have arrived
at. So the rigorous application of that third specification of what is
needed for a true statement of probability brings in the two
requirements for a valid argument of this kind.

Now of course I haven't listed all or anything like all of the fallacies
that have been introduced, largely springing from the same roots, but as
I suppose is familiar, whether you think of error or whether you think
of sin, one leads to another. Once a person has harbored an error in his
undergraduate days, carefully implanted there by some distinguished but
muddle-headed professor, he may go on for a long while without being
enabled to work it out by his own powers of thought. At least it's
scarcely conceivable that the mathematicians of the 19th century should
have harbored the notion of inverse probability from about 1812, when
Laplace published his \textit{Th\'eorie Analytique}, to what I suppose
would be the best terminus, 1886, when, speaking of my own country,
Crystal published his great \textit{Algebra}, in which he took the
unprecedented step of throwing out the whole business of probability
altogether as being too hopelessly unsound to be included in a good book
on algebra. That was good for the teaching of algebra, and I am inclined
to think, though it is a matter of judgment, that it was also good for
statistical studies in England. The same movement of thought was going
on, to some extent, in other countries, but not quite so abruptly and
dramatically as it did in England, and the result in England was that
the study of probability, when it re-emerged from its temporary eclipse,
re-emerged well embedded in a much larger discipline which is commonly
known as statistics at the present time.

Of course, there is quite a lot of continental influence in favor of
regarding probability theory as a self-supporting branch of mathematics,
and treating it in the traditionally abstract and, I think, fruitless
way. Perhaps that's why statistical science has been comparatively
backward in many European countries. Perhaps we were lucky in England in
having the whole mass of fallacious rubbish put out of sight until we
had time to think about probability in concrete terms and in relation,
above all, to the purposes for which we wanted the idea in the natural
sciences. I am quite sure it is only personal contact with the business
of the improvement of natural knowledge in the natural sciences that is
capable to keep straight the thought of mathematically-minded people who
have to grope their way through the complex entanglements of error, with
which at present they are very much surrounded. I think it's worse in
this country than in most, though I may be wrong. Certainly there is
grave confusion of thought. We are quite in danger of sending highly-
trained and highly intelligent young men out into the world with tables
of erroneous numbers under their arms, and with a dense fog in the place
where their brains ought to be. In this century, of course, they will be
working on guided missiles and advising the medical profession on the
control of disease, and there is no limit to the extent to which they
could impede every sort of national effort.

\end{document}

%