% LaTeX source for Galton on Regression

\documentclass{article}

\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{epsfig}
\usepackage{times}

\begin{document}

\setcounter{page}{1}

\begin{center}
  {\Large ANTHROPOLOGICAL MISCELLANEA.}
  
  \bigskip
  \bigskip
  
  \textsc{Regression} \textit{towards} \textsc{Mediocrity} \textit{in}
  \textsc{Hereditary Stature.}
  
  \smallskip
  
  By \textsc{Francis Galton, F.R.S.\ \&c.}
  
  \medskip
  
  \textsc{[With Plates IX and X.]}
\end{center}

\bigskip 

\noindent 
\textsc{This} memoir contains the data upon which the remarks on the Law
of Regression were founded, that I made in my Presidential Address to
Section II, at Aberdeen.  That address, which will appear in due course
in the Journal of the British Association, has already been published in
``Nature,'' September 24th.  I reproduce here the portion of it which
bears upon regression, together with some amplification where brevity
has rendered it obscure, and I have added copies of the diagrams
suspended at the meeting, without which the  letterpress is necessarily
difficult to follow.  My object is to place beyond doubt the existence
of a simple and far-reaching law that governs the hereditary
transmission of, I believe, every one of those simple qualities which
all possess, though in unequal degrees.  I once before ventured to draw
attention to this law on far more slender evidence than I now possess.

It is some years since I made an extensive series of experiments on the
produce of seeds of different size but of the same species.  They
yielded results that seemed very noteworthy, and I used them as the
basis of a lecture before the Royal Institution on February 9th, 1877.
It appeared from these experiments that the offspring did \textit{not}
tend to resemble their parent seeds in size, but always to be more
mediocre than they---to be smaller than the parents, if the parents were
large; to be larger than the parents, if the parents were very small.
The point of convergence was considerably below the average size of the
seeds contained in the large bagful I bought at a nursery garden, out of
which I selected those that were sown, and I had some reason to believe
that the size of the seed towards which the produce converged was
similar to that of an average seed taken out of beds of self-planted
specimens. 

The experiments showed further that the mean filial regression towards
mediocrity was directly proportional to the parental deviation from it.
This curious result was based on so many plantings, conducted for me
by friends living in various parts of the country, from Nairn in the
north to Cornwall in the south, during one, two, of even three
generations of the plants, that I could entertain no doubt of the truth
of my conclusions.  The exact ratio of regression remained a little
doubtful, owing to variable influences; therefore I did not attempt to
define it.  But as it seems a pity that no record could exist in print
of the general averages, I give them, together with a brief statement of
the details of the experiment, in Appendix I to the present memoir.

After the lecture had been published, it occurred to me that the 
grounds of my misgivings might be urged as objections to the general 
conclusions. I did not think them of moment, but as the inquiry had been
surrounded with many small difficulties and matters of detail, it would
be scarcely possible to give a brief and yet a full and adequate answer
to such objections.  Also, I was then blind to what I now perceive to be
the simple explanation of the phenomenon, so I thought it better to say
no more upon the subject until I should obtain independent evidence.
It was anthropological evidence that I desired, caring only
for the seeds as means of throwing light on heredity in man.  I tried
in vain for a long and weary time to obtain it in sufficient abundance,
and my failure was a cogent motive, together with others, in inducing me
to make an offer of prizes for Family Records, which was largely
responded to, and furnished me last year with what I wanted.  I
especially guarded myself against making any allusion to this particular
inquiry in my prospectus, lest a bias should be given to the returns.  I
now can scarcely contemplate the possibility of the records of height
having been frequently drawn up in a careless fashion, because no
amount of unbiassed inaccuracy can account for the results, contrasted
in their values but concurrent in their significance, that are derived
from comparisons between different groups of the returns.  

An analysis of the Records fully confirms and goes far beyond the
conclusions I obtained from the seeds.  It gives the numerical value of
the regression toe ward mediocrity in human stature, as from 1 to
$\frac{2}{3}$ with unexpected coherence and precision [\textit{see}
Plate IX, fig.\ (\textit{a})], and it supplies me with the class of facts
I wanted to investigate---the degrees of family likeness in different
degrees of kinship, and the steps through which special family
peculiarities become merged into the typical characteristics of the race
at large.

My data consisted of the heights of 930 adult children and of their
respective parentages, 205 in number.  In every case I transmuted the
female statures to their corresponding male equivalents and used them in
their transmuted form, so that no objection grounded on the sexual
differences of stature need be raised when I speak of averages.  The
factor I used was 1.08, which is equivalent to adding a little less than
one-twelfth to each female height.  It differs a very little from the
factors employed by other anthropologists, who, moreover, differ a
trifle between themselves; anyhow, it suits my data better than 1.07 or
1.09.  The final result is not of a kind to be affected by these minute
details, for it happened that, owing to a mistaken direction, the
computer to whom I first entrusted the figures used a somewhat different
factor, yet the results came out closely the same.

\begin{figure}
  \begin{center}
    \epsfig{file=galton_reg_table_I.eps,width=11cm,height=18cm,clip=}
  \end{center}
\end{figure}

\begin{figure}
  \begin{center}
    \epsfig{file=galton_reg_plate_IX.eps,width=11cm,height=18cm,clip=}
  \end{center}
\end{figure}

\begin{figure}
  \begin{center}
    \epsfig{file=galton_reg_plate_X.eps,width=11cm,height=18cm,clip=}
  \end{center}
\end{figure}

I shall now explain with fulness why I chose stature for the  subject of
inquiry, because the peculiarities and points to be attended to in the
investigation will manifest themselves best by doing so.  Many of its
advantages are obvious enough, such as the case and frequency with which
its measurement is made, its practical constancy during thirty-five
years of middle life, its small dependence on differences of bringing
up, and its inconsiderable influence on the rate of mortality.  Other
advantages which are not equally obvious are no less great.  One of
these lies in the fact that stature, is not a simple element, but a sum
of the accumulated lengths or thicknesses of more than a hundred bodily
parts, each so distinct from the rest as to have earned a name by which
it can be specified. The list of them includes about fifty separate
bones, situated in the skull, the spine, the pelvis, the two legs, and
the two ankles and feet.  The bones in both the lower limbs are counted,
because it is the average length of these two  limbs that contributes to
the general stature. The cartilages interposed between the bones, two at
each joint, are rather more numerous than the bones themselves.  The
fleshy parts of the scalp of the head and of the soles of the feet
conclude the list. Account should also be taken of the shape and set of
many of the bones which conduce to a more or less arched instep,
straight back, or high head.  I noticed in the skeleton of O'Brien, the
Irish giant, at the College of Surgeons, which is, I believe, the
tallest skeleton in any museum, that his extraordinary stature of about
7 feet 7 inches would have been a trifle increased if the faces of his
dorsal vertebrae had been more parallel and his back consequently
straighter.

The beautiful regularity in the statures of a population, whenever they
are statistically marshalled in the order of their heights, is due to
the due to the number of variable elements of which the stature is the
sum.  The best illustrations I have seen of this regularity were the
curves of male and female statures that I obtained from the careful
measurements made at my Anthropometric Laboratory in the International
Health Exhibition last year.  They were almost perfect.

The multiplicity of elements, some derived from one progenitor, some
from another, must be the cause of a fact that has proved very
convenient in the course of my inquiry.  It is that the stature of the
children depends very closely on the average stature of the  the two
parents, and may be considered in practice as having nothing to do with
their individual heights.  The fact was proved as follows:-- After
transmuting the female measurements in the way already explained, I
sorted the adult children of those parents who severally differed 1, 2,
3, 4, and 5 or more inches, into separate lines (see Table II).  Each
line was then divided into similar classes, showing the number of cases
in which the children differed 1, 2, 3. \&c., inches from the common
average of the children in their respective families.  I confined my
inquiry to large families of six children and upwards, that the common
average of each might be a trustworthy point of reference.  The  entries
in each of the different lines were then seen to run in the same way
except that in the last of them the children showed a faint tendency to
fall into two sets, one taking after the tall parent, the other after
the short one; this, however, is not visible in the summary Table II
that I annex.  Therefore, when dealing with the transmission of stature
from parents to children, the average height of the two parents, or, as
I prefer to call it, the ``mid-parental'' height is all we need care to
know about them.

\begin{center}
  {\large TABLE II.}
  
  \medskip
  
  \textsc{Effect upon Adult Children of Differences in Height of\\
    their Parents.}
  
  \begin{tabular}{l|c|c|c|c|c|c|c}
    \hline
    & \multicolumn{6}{|c|}{Proportion per 50 of cases in which the} & \\
    & \multicolumn{6}{|c|}{Heights of the Children deviated to various} & \\
    \multicolumn{1}{c|}{Difference} 
    & \multicolumn{6}{|c|}{amounts from the Mid-filial Stature of their} & \\
    \multicolumn{1}{c|}{between the} 
    & \multicolumn{6}{|c|}{respective families} &
    \multicolumn{1}{|c}{Number of} \\
    \multicolumn{1}{c|}{Heights$^1$ of the} 
    & Less & Less & Less & Less & Less & Less &
    \multicolumn{1}{|c}{Children whose} \\
    \cline{2-7}
    \multicolumn{1}{c|}{Parents in}
    & than & than & than & than & than & than &
    \multicolumn{1}{|c}{Heights were} \\
    \multicolumn{1}{c|}{inches.}
    & 1 & 1 & 1 & 1 & 1 & 1 &
    \multicolumn{1}{|c}{observed.} \\
    & inch. & inch. & inch. & inch. & inch. & inch. & \\
    \hline
    Under 1       & 21 & 35 & 43 & 46 & 48 & 50 & 105 \\
    1 and under 2 & 23 & 37 & 46 & 49 & 50 & .. & 122 \\
    2 and under 3 & 16 & 34 & 41 & 45 & 49 & 50 & 112 \\
    3 and under 4 & 24 & 35 & 41 & 47 & 49 & 50 & 108 \\
    5 and above   & 18 & 30 & 40 & 47 & 49 & 50 & \phantom{0}78 \\
    \hline
  \end{tabular}
\end{center}
  
{\footnotesize $^1$ Every female height has been transmuted to its male
  equivalent by multiplying it by 1.08, and only those families have been
  included in which the number of adult children amounted to six, at least.}

{\footnotesize \textsc{Note.}---When these figures are protracted into
  curves it will be seen---(1) that they run much alike; (2) that their
  peculiarities are not in sequence; and (3) that the curve
  corresponding to the first line occupies a medium position.  It is
  therefore certain that differences in the heights of the parents have
  on the whole an inconsiderable effect on the heights of their offspring.}

It must be noted that I use the word parent without specifying the sex.
The methods of statistics permit us to employ this abstract term,
because the cases of a tall father being married to a short mother are
balanced by those of a short father being married to a tall mother.  I
use the parent to save any complication due to a fact apparently brought
out by these inquiries, that the height of the children of both sexes,
but especially that of the daughters, takes after the height of the
father more than it does after that of the mother.  My personal data are
insufficient to enable me to speak with any confidence on this point,
much less to determine the ratio satisfactorily.

Another great merit of stature as a subject of inquiries into heredity
is that marriage election takes little of no account of shortness or
tallness.  There are undoubtedly sexual preferences for moderate
contrast in height, but the marriage choice is guided by so many and
more important considerations that questions of stature appear to exert
no perceptible influence upon it.  This is by no means my only inquiry
into this subject, but, as regards the present data, my test lay in
dividing the 205 male parents and the 205 female parents into three
groups---T, M, and S---that is, tall, medium, and short (medium male
measurement being taken as 67 inches and upwards to 70 inches), and in
counting the number of marriages in each possible combination between
them (see Table III).  The result was that men and women of contrasted
heights, short and tall or tall and short, married just about as
frequently as men and women of similar heights, both tall or both short;
there were 32 cases of the one to 27 of the other.

\begin{center}
  {\large Table III.}
  
  \medskip
  
  \begin{tabular}{c|c|c}
    \hline
    S.,t. & M.,t. & T.,t. \\
    12 cases. & 20 cases. & 18 cases. \\
    \hline
    S.,m. & M.,m. & T.,m. \\
    25 cases. & 51 cases. & 28 cases. \\
    \hline
    S.,s. & M.,s. & T.,s. \\
    9 cases. & 28 cases. & 14 cases. \\
    \hline
  \end{tabular}
\end{center}
\begin{align*}
  &\text{Short and tall, $12 + 14 = 32$ cases.} \\
  &\left.\begin{array}{l}
    \text{Short and short, 9} \\
    \text{Tall and tall, 18}
  \end{array}\right\} = \text{27 cases.}
\end{align*}

In applying the law of probabilities to investigations into heredity of
stature, we may therefore regard the married folk as couples picked out
of the general population at haphazard.

The advantage of stature as a subject in which the simple laws of
heredity may be studied will now be understood.  It is a nearly constant
value that is frequently measured and recorded, and its discussion is
little entangled with considerations of nurture, of the survival of the
fittest, or of marriage selection.  We have only to consider the
mid-parentage and not to trouble ourselves about the parents separately.
The statistical variations of stature are extremely regular, so much
that so that their general conformity with the results of calculations
based on the the abstract law of frequency of error is an accepted fact
by anthropologists.  I have made much use of the properties of that law
in cross-testing my various conclusions, and always with success.  For
example, the measure of variability (say the ``probable error'') of the
system of mid-parental heights ought, on the suppositions justified in
the preceding paragraphs, to be equal to that of the system of adult
male heights, multiplied by the square root of two; this inference is
shown to be correct by direct observation.

The only drawback to the use of stature is its small variability.
One-half of the population with whom I dealt, varied less than 1.7 inch
from the average of all of them, and one-half of the offspring of
similar mid-parentages varied less than 1.5 inch from the average of
their own heights.  On the other hand, the precision of my data is so
small, partly due to the uncertainty in many cases whether the height
was measured with the shoes on or off, that I find by means of an
independent inquiry that each observation, taking one with another, is
liable to an error that as often as not exceeds $\frac{2}{3}$ of
an  inch.  

The law that I wish to establish refers primarily to the inheritance of
different degrees of tallness and shortness, and only secondarily to
that of absolute stature.  That is to say, it refers to measurements
made from the crown of the head to the level of mediocrity, upwards or
downwards as the case maybe, and not from the crown of the head to the
ground.  In the population with which I deal the level of mediocrity is
$68\frac{1}{4}$ inches (without shoes).  The same law applying with
sufficient closeness both to tallness and shortness, we may include both
under the single head of deviations, and I shall call any particular
deviation a ``deviate.''  By the use of this word and that of 
``mid-parentage'' we can define the law of regression very briefly.  It
is that the height-deviate of the offspring is, on the average,
two-thirds of the height-deviate of its mid-parentage.  

Plate IX, fig.\ \textit{a}, gives a graphic expression of the data upon
which this law is founded.  It will there be seen that the relations
between the statures of the children and their mid-parents, which are
perfectly simple when referred to the scale of deviates at the right
hand of the plate, do not admit of being briefly phrased when they are
referred to the scale of statures at its left.  

If this remarkable law had been based only on experiments on the
diameters of the seeds, it might well be distrusted until confirmed by
other inquiries.  If it were corroborated merely by comparatively small
number of observations on human stature, some hesitation might be
expected before its truth could be recognised in opposition to the
current belief that the child tends to resemble its parents.  But more
can be urged than this.  It is easily to be shown that we we ought to
expect filial regression, and that it should amount to some constant
fractional part of the value of mid-parental deviation.  It is because
this explanation confirms the previous observations made both on seeds
and on men that I feel justified on the present occasion in drawing
attention to this elementary law.

The explanation of it is as follows.  The child inherits partly from his
parents, partly from his ancestry.  Speaking generally, the further his
genealogy goes back, the more numerous and varied will his ancestry
become, until they cease to differ from any equally numerous sample
taken at haphazard from the race at large.  Their mean stature will then
be the same as that of the race, in other words, it will be mediocre.
Or, to put the same fact into  another form, the most probable value of
the mid-ancestral deviates in any remote generation is zero.  

For the moment let us confine our attention to the remote ancestry and
to the mid-parentages, and ignore the intermediate generations.  The
combination of the zero of the ancestry with the deviate of the
mid-parentage is the combination of nothing with something, and the
result resembles that of pouring a uniform proportion of pure water into
a vessel of wine.  It dilutes the wine to a constant fraction of its
original alcoholic strength, whatever that strength might have been. 

The intermediate generations will each in their degree do the same.  The
mid-deviate in any one of them will have a value intermediate between
that of the mid-parental deviate and the zero value of the ancestry.
Its combination with the mid-parental deviate will be as if, not pure
water, but a mixture of wine and water in some definite proportion, had
been poured into the wine.  The process throughout is one of
proportionate dilutions, and therefore the joint effect of all of them
is to weaken the original wine in a constant ratio.  We have no word to
express the form of that ideal and composite progenitor, whom the
offspring of similar mid-parentages most nearly resemble, and from
whose stature their own respective heights diverge evenly, above and
below. If he, she, or it, is styled the ``generant'' of the group, then
the law of regression makes it clear that parents are not identical with
the generants of their own offspring.  

The average regression of the offspring to a constant fraction of their
respective mid-parental deviations, which was first observed in the
diameters of seeds, and then confirmed by observations on human stature,
is now shown to be a perfectly reasonable law which might have been
deductively foreseen.  It is of so simple a character that I have made
an arrangement with pulleys and weights by which the probable average
height of the children of known parents can be mechanically reckoned
(see Plate IX, fig. \textit{b}).  This law tells heavily against the
full hereditary transmission of any gift, as only a few of many children
would resemble their mid-parentage.  The more exceptional the amount of
the gift, the more exceptional will be the good fortune of a parent who
has a son  who equals, and still more if he has a son who overpasses him
in  that respect.  The law is even-handed; it levies the same heavy 
succession-tax on the transmission of badness as of goodness.  If it
discourages the extravagant expectations of gifted parents that their
children will inherit their powers, it no less discountenances
extravagant fears that they will inherit all their weaknesses and
diseases.  

The converse of this law is very far from being its numerical opposite.
Because the most probable deviate of the son is only two-thirds that of
his mid-parentage, it does not in the least follow that the most
probable deviate of the mid-parentage is $\frac{3}{2}$, or
$1\frac{1}{2}$ that of the son.  The number of individuals in a
population who differ from mediocrity is so preponderant it it is more
frequently the case that an exceptional man is the somewhat exceptional
son of rather mediocre than the average son of very exceptional parents.
It appears from the very same table of observations by which the value
of the filial regression was determined when it is read in a different
way, namely, in vertical columns instead of in horizontal lines, that
the most probable mid-parentage of a man is one that deviates only
one-third as much as the man does.  There is a great difference between
this value of $\frac{1}{3}$ and the  numerical converse mentioned above
of $\frac{1}{3}$; it is four and a half times smaller, since
$4\frac{1}{2}$, or $\frac{9}{2}$ being multiplied into $\frac{1}{3}$, is
equal to $\frac{3}{2}$.

It will be gathered from what has been said, that a mid-parental deviate
of one unit implies a mid-grandparental deviate of $\frac{1}{9}$, a 
mid-ancestral unit in the next generation of $\frac{1}{9}$, and so on. I
reckon from these and other data, by methods I cannot stop now to
explain, but will do so in the Appendix, that the heritance derived on
an average from the mid-parental deviate, independently of what it may
imply, or of what may be known concerning the previous ancestry is only
$\frac{1}{2}$.  Consequently, that similarly derived from a single
parent is only $\frac{1}{4}$, and that from a single grandparent is 
$\frac{1}{16}$.
 
Let it not be supposed for a moment that any of these statements
invalidate the general doctrine that the children of a gifted pair are
much more likely to be gifted than the children of a mediocre pair. What
they assert is that the ablest child of one gifted pair is not likely to
be as gifted as the ablest of all the children of very many mediocre
pairs.  However, as, notwithstanding this explanation, some suspicion
may remain of a paradox lurking in my strongly contrasted results, I
will call attention to the form in which the table of data (Table I) was
draws up, and give an anecdote connected with it.  

It is deduced from a large sheet on which I entered every child's
height, opposite to its mid-parental height, and in every case each was
entered to the nearest tenth of an in inch.  Then I counted the number
of entries in each square inch, and copied them out as they appear in
the table. The meaning of the table is best understood by examples.
Thus, out of a total of 928 children who were born to the 205
mid-parents on my list, there were 18 of the height of 69.2 inches
(counting to the nearest inch), who were born to mid-parents of the
height of 70.5 inches (also counting to the nearest inch).  So again
there were 25 children of 70.2 inches born to mid-parents of 69.5 
inches.  I found it hard at first to catch the full significance of the
entries in the table, which had curious relations that were very
interesting to investigate.  They came out distinctly when I
``smoothed''  the entries by writing at each intersection of a
horizontal column with a vertical one, the sum of the entries in  four
adjacent squares, and using these to work upon.  I then noticed (see
Plate X) that lines drawn through entries of the same value formed a
series of concentric and similar ellipses.  Their common centre lay at
the intersection of the vertical and horizontal lines, that corresponded
to $68\frac{1}{4}$ inches.  Their axes were similarly inclined.  The
points where each ellipse in succession was touched by a horizontal
tangent, lay in a straight line inclined to the vertical in the ratio of
$\frac{2}{3}$; those where they were touched by a vertical tangent lay
in a straight line inclined to the horizontal in the ration of
$\frac{1}{3}$.  The same is true in respect of the vertical lines. 
These and other relations were evidently a subject for mathematical
analysis and verification.  They were all clearly dependent on three
elementary data, supposing the law of frequency of error to be
applicable throughout; these data being (1) the measure of racial
variability, whence that of mid-parentages may be inferred as has
already been explained, (2) that of co-family variability (counting the
offspring of like mid-parentages as members of the same co-family), and
(3) the average ratio of regression.  I noted these values, and phrased
the problem in abstract terms such as a competent mathematician could
deal with, disentangled from all reference to heredity, and in that
shape submitted it to Mr.~J.~Hamilton Dickson, of St.~Peter's College,
Cambridge.  I asked him kindly to investigate for me the surface of
frequency of error that would result from these three data, and the
various particulars of its sections, one of which would form the
ellipses to which I have alluded.  

I may be permitted to say that I never felt such a glow of loyalty and
respect towards the sovereignty and magnificent sway of mathematical
analysis as when his answer reached me, confirming, by purely      
mathematical reasoning, my various and laborious statistical conclusions
with far more minuteness than I had dared to hope, for the original data
ran somewhat roughly, and I had to smooth them with tender caution.  His
calculation corrected my observed value of mid-parental regression from
$\frac{1}{3}$ to     $\frac{6}{17.6}$, the relation between the major
and minor axis of the ellipses was changed 3 per cent.\ (it should be as
$\sqrt{7}:\sqrt{2}$), their inclination was changed less than
$2^{\circ}$ (it should be to an angle whose tangent is $\frac{1}{2}$).
It is obvious, then, that the law of error holds throughout the
investigation with sufficient precision to be of real service, and that
the various results of my statistics are not casual and disconnected
determinations, but strictly interdependent.  In the lecture at the
Royal Institution to which I have referred, I pointed out the remarkable
way in which one generation was succeeded by another that proved to be
its statistical counterpart.  I there had to discuss the various
agencies of the survival of the fittest, of relative fertility, and so
forth; but the selection of human stature as the subject of
investigation now enables me get rid of all these complications and to
discuss this very curious question under its  simplest form.  How is it,
I ask, that in each successive generation there proves to be the same
number of men per thousand, who range between any limits of stature we
please to specify, although the tall men are rarely descended from
equally tall parents, or the short men from equally short? How is the
balance from other sources so nicely made up?  The answer is that the
process comprises two opposite sets of actions, one concentrative and
the other dispersive, and of such a character that they necessarily
neutralise one another, and fall into a state of stable equilibrium (see
Table IV).  By the first set, a system of scattered elements is replaced
by another system which is less scattered; by the second set, each of
these new elements becomes a centre whence a third system of elements
are dispersed.  

The details are as follows:---In the first of these two stages we start
from the population generally, in the first generation; then the units
of the population group themselves, as it were by chance, into married
couples, whence the more compact system of mid-parentages is derived,
and then by a regression of the values of the mid-parentages the still
more compact system of the generants is derived.  In the second stage
each generant is a centre whence the offspring diverge upwards and
downwards to form the second  generation.  The stability of the balance
between the opposed tendencies is due to the regression being
proportionate to the deviation.  It acts like a spring against a weight;
the spring stretches until its resilient force balances the weight, then
the two forces of spring and weight are in      stable equilibrium; for
if the weight be lifted by the hand it will obviously fall down again
when the hand is withdrawn, and, if it be depressed by the hand, the
resilience of the spring will be thereby increased, so that the weight
will rise when the hand is withdrawn.  

A simple equation connects the three data of race variability, of the
ratio of regression, and of co-family variability, whence, if any  two
are given, the third may be found.  My observations give separate
measures of all three and their values fit well into the equation,
which is of the simple form---
\[ v^2\frac{p^2}{2}+f^2=p^2, \]
where $v=\frac{2}{3}$, $p=1.7$, $f=1.5$.

It will therefore be understood that the complete table of mid-parental
and filial heights may be calculated from two simple numbers, and that
the most elementary data upon which it admits of being constructed
are---(1) the ratio between the mid-parental and the rest of the
ancestral influences, and (2) the measure of the co-family variability. 


\begin{figure}
  \begin{center}
    \epsfig{file=galton_reg_table_IV.eps,width=11cm,height=18cm,clip=}
  \end{center}
\end{figure}

The mean regression in stature of a population is easily ascertained; I
do not see much use in knowing it, but will give the work merely as a
simple example.  It has already been stated that half the population
vary less than 1.7 inch from mediocrity, this being what is technically
known as the ``probable'' deviation.  The mean deviation is, by a
well-known theory, 1.18 times that of the probable, therefore in this
case it is 1.9 inch.  The mean loss through regression is $\frac{1}{3}$
of that amount, or a little more than 0.6 inch. That is to say,    
taking one child with another, the mean amount by which they fall short
of their mid-parental peculiarity of stature is rather more than
six-tenths of an inch.  

The stability of a Type, which I should define as ``an ideal form
towards which the children of those who deviate from it tend to
regress,'' would I presume, be measured by the strength of its tendency
to regress; thus a mean regression from 1 in the mid-parents to
$\frac{2}{3}$ in the offspring would indicate only half as much  
stability as if it had been to $\frac{1}{3}$.

The limits of deviation beyond which there is no regression, but a new
condition of equilibrium is entered into, and a new type comes into
existence, have still to be explored.  

With respect to numerical estimates I wish emphatically to say that I
offer them only as being serviceably approximate, though they are
mutually consistent, and with the desire that they may be reinvestigated
by the help of more abundant and much more accurate measurements than
those I have had at command.  There are many simple and interesting
relations to which I am  still unable to assign numerical values for
lack of adequate material  such as that to which I referred some        
 time back, of the relative influence of the father and the mother on
the stature of their sons and daughters.  

I do not now pursue the numerous branches that spring from the data I
have given, as from a root.  I do not speak of the continued domination
of one type over others, nor of the persistency of of unimportant
characteristics, nor of the inheritance of disease,  which is
complicated in many cases by the requisite concurrence of two separate
heritages, the one of a susceptible constitution, the other of the germs
of the disease.  Still less do I enter upon the subject of fraternal
deviation and collateral descent, which 1 have also worked out. 

\begin{center}
  \textsc{Appendix}
  
  \bigskip
  
  I.---\textit{Experiments on Seeds bearing on the Law of Regression}
\end{center}

I sent a set of carefully selected sweet pea seeds to each of several
country friends, who kindly undertook to help me.  The advantage of
sweet peas over other seeds is that they do not cross fertilise, that
they are spherical, and that all the seeds in the same pod are of much
the same size.  They are also hardy and prolific.  I selected them as
the subject of experiments after consulting eminent botanists.  Each
packet contained ten seeds of exactly the same weight; those in K being
the heaviest, L the next heaviest, and so on down to Q, which was the
lightest.  The precise weights are given in Table V, together
with the corresponding diameter, which I ascertained by laying 100 peas
of the same sort in a row.  The weights run in an arithmetic series,
having a common average difference of 0.172 grain. I do not of course
profess to work to thousandths of a grain, though I did to less than
tenths of a grain; therefore the third decimal place represents no more
thin an arithmetical working value, which has to be regarded in
multiplications, lest an error of sensible importance should be
introduced by its neglect.  Curiously enough, the diameters were found
to run approximately in an arithmetic series also, owing, I suppose, to
the misshape and corrugations of the smaller seeds, which gave them a
larger diameter than if they had been plumped out into spheres.  The
results are given in Table V, which show that I was justified in sorting
the seeds by the convenient method of the balance and weights, and of
accepting the weights as directly proportional to the mean diameters,
which can hardly be measured satisfactorily except in spherical seeds.  

In each experiment seven beds were prepared in partner rows; each was
$1\frac{1}{2}$ feet wide and 5 feet long. Ten holes of 1 inch deep were
dibbled at equal distances apart along each bed, and one seed was put
into each hole.  They were then bushed over to keep off the birds.
Minute instructions were given and followed to ensure uniformity, which
I need not repeat here.  The end of all was that the seeds as they
became ripe were collected from time to time in bags that I sent,
lettered from K to Q, the same letters being stuck at the ends of the
beds, and when the crop was coming to an end the whole foliage of each
bed was torn up, tied together, labelled, and sent to me.  I measured
the foliage and the pods, both of which gave results confirmatory of
those of the pelts, which will be found in Table VI, the first and last
columns of which are those that especially interest us; the remaining
columns showing clearly enough how these two were obtained. It will be
seen that for each increase of one unit on the part of the parent seed,
there is a mean increase of only one-third part of a unit in the filial
seed; and again that the mean filial seed resembles the parental when
the latter is about 15.5 hundredths of an inch in diameter.  Taking then
15.5 as the point towards which filial regression points, whatever may
be the parental deviation (within the tabular limits) from that point,
the mean filial deviation will be in the same direction, but only
one-third as much.

This point of regression is so low that I possessed less evidence than I
desired to prove the bettering of the produce of very small seeds.  The
seeds smaller than Q were such a miserable set that I could hardly deal
with them.  Moreover, they were very infertile.  It did, however, happen
that in a few of the sets some of the seeds turned out very well.  

If I desired to lay much stress on these experiments, I could  make my
case considerably stronger by going minutely into the details of the
several experiments, foliage and length of pod included, but I do not
care to do so.  

\begin{center}
  {\large TABLE V.}
  
  \bigskip
  
  WEIGHTS AND DIAMETERS OF SEEDS (SWEET PEA).
  
  \begin{tabular}{c|c|c|c}
    \hline
  Letter of & Weight of one seed &   Length of row of   &  Diameter of one   \\
    seed.   &      in grains.     & 100 seeds in inches. & seed in hundredths \\
    \hline
       K    &        1.750        &          21.0        &        21    \\
       L    &        1.578        &          20.2        &        20    \\
       M    &        1.406        &          19.2        &        19    \\
       N    &        1.234        &          17.9        &        18    \\
       O    &        1.062        &          17.0        &        17    \\
       P    & \phantom{1}.890     &          16.1        &        16    \\
       Q    & \phantom{1}.718     &          15.2        &        15    \\
    \hline
  \end{tabular}
  
  \bigskip
  
  {\large TABLE VI}
    
  \bigskip
  
  \textsc{Parent seeds and their Produce.}
\end{center}
  
Table showing the proportionate number of seeds (sweet peas) of
different seeds produced by parent seeds also of different sizes.  The
measurements are those of mean diameter, in hundredths of an inch.
    
\begin{center}
{\footnotesize
  \begin{tabular}{c|c|c|c|c|c|c|c|c|c|c|c}
    \hline
    & \multicolumn{8}{|c|}{Diameter of filial seeds.} & &
      \multicolumn{2}{c}{Mean diameter of Filial} \\
    Diameter of  & & & & & & & & & &
      \multicolumn{2}{c}{Seeds.} \\
    Parent Seed. & & & & & & & & & Total. & \multicolumn{2}{c}{\ } \\
    \cline{2-9} \cline{11-12}
    & Under & & & & & & & Above & & & \\
    & 15 & $15-$ & $16-$ & $17-$ & $18-$ & $19-$ & $20-$ & $21-$ & &
      Observed. & Smoothed. \\
    \hline
    21 & 22 & \phantom{0}8 & 10 & 18 & 21 & 13 & 6 & 2 & 100 & 17.5 & 17.3 \\
    20 & 23 & 10 & 12 & 17 & 20 & 13 & 3 & 2 & 100 & 17.3 & 17.0 \\
    19 & 35 & 16 & 12 & 13 & 11 & 10 & 2 & 1 & 100 & 16.0 & 16.6 \\
    18 & 34 & 12 & 13 & 17 & 16 & \phantom{0}6 & 2 & 0 & 100 & 16.3 & 16.3 \\
    17 & 37 & 16 & 13 & 16 & 13 & \phantom{0}4 & 1 & 0 & 100 & 15.6 & 16.0 \\
    16 & 34 & 15 & 18 & 16 & 13 & \phantom{0}3 & 1 & 0 & 100 & 16.0 & 15.7 \\
    15 & 46 & 14 & \phantom{0}9 & 11 & 14 & \phantom{0}4 & 2 & 0 & 100 &
    15.3 & 15.4 \\
    \hline
  \end{tabular}
}
  
  \bigskip
  
  II.---\textit{Separate Contribution of each Ancestor to the Heritage of the}
  \\
  \textit{Offspring.}
\end{center}

When we say that the mid-parent contributes two-thirds of his
peculiarity of height to the offspring, it is supposed that nothing is
known about the previous ancestor.  We now see that though nothing is
known, something is implied, and that something must be eliminated if we
desire to know what the parental bequest, pure and simple, may amount
to.  Let the deviate of the mid-parent be $a$, then the implied deviate
of the mid-grandparent will be $\frac{1}{3}a$, of the mid-ancestor in
the next generation $\frac{1}{9}a$, and so on. Hence the sum of the
deviates of all the mid-generations that contribute to the heritage of
the offspring is $a(1+\frac{1}{3}+\frac{1}{9}+\text{\&c.})=a\frac{3}{2}$.

Do they contribute on equal terms, or otherwise?  I am not prepared as
yet with sufficient data to yield a direct reply, therefore we must try
the effects of limiting suppositions.  First, suppose they contribute
equally; then as an accumulation of ancestral deviates whose sum amounts
to $a\frac{3}{2}$, yields an effective heritage of only $a\frac{2}{3}$,
it follows that each piece of property, as it were, must be reduced by a
succession tax to $\frac{4}{9}$ of its original amount, because
$\frac{3}{2}\times\frac{4}{9}=\frac{2}{3}$. 

Another supposition is that of successive diminution, the 
property being taxed afresh in each transmission, so that the
effective heritage would be---
\[ a\left(\frac{1}{r}+\frac{1}{3r^2}+\frac{1}{3^2r^2}+\text{---}\right)
   = a\left(\frac{3}{3r-1}\right) \]
and this must, as before, be equal to $a\frac{2}{3}$, whence  
$\frac{1}{r}=\frac{6}{11}$.

The third limiting supposition of a mid-ancestral deviate in any one
remote generation contributing more than a mid-parental deviate, is
notoriously incorrect.  Thus the descendants of ``pedigree-wheat'' in
the (say) twentieth generation show no sign of their mid-ancestral
magnitude, but those in the first generation do so most unmistakably.
The results of our two valid limiting suppositions are therefore (1)
that the mid-parental deviate, pure and simple, influences the offspring
to $\frac{4}{9}$ of its amount; (2) that it influences it to the
$\frac{6}{11}$ of its amount.  These values differ but slightly from
$\frac{1}{2}$, and their mean is closely $\frac{1}{2}$, so we may fairly
accept that result.  Hence the influence, pure and simple, of the
mid-parent may be taken as $\frac{1}{2}$, of the mid-grandparent
$\frac{1}{4}$, of the mid-great-grandparent $\frac{1}{8}$ and so on. 
That of the individual parent would therefore be $\frac{1}{4}$, of the
individual grandparent $\frac{1}{16}$, of an individual in the next
generation $\frac{1}{64}$ and so on.  

\begin{center}
  \textit{Explanation of Plates IX and X.}
\end{center}
 
Plate IX, fig.\ \textit{a}.  Rate of Regression in Hereditary Stature.

The short horizontal lines refer to the stature of the mid-parents as
given on the scale to the left.  These are the same values as those in
the left hand column of Table I.

The small circles, one below each of the above, show the mean stature of
the children of each of those mid-parents.  These are the values in the
right hand column of Table I, headed ``Medians .'' [The Median is the
value that half the cases exceed, and the other half fall short of it.
It is practically the same as the mean, but is a more convenient value
to find, in the way of working adopted throughout in the present
instance.] 

The sloping line $AB$ passes through all possible mid-parental heights.

The sloping line $CD$ passes through all the corresponding mean heights
of their children.  It gives the ``smoothed'' results of the actual
observations.  

The ratio of $CM$ to $AM$ is as 2 to 3, and this same ratio connects the
deviate of every mid-parental value with the mean deviate of its
offspring.  

The point of convergence is at the level of mediocrity, which is
$68\frac{1}{4}$ inches.  

The above data are derived from the 928 adult children of 205 
mid-parents, female statures having in every case been converted to
their male equivalents by multiplying each of them by 1.08.  

Fig.\ \textit{b}.  Forecasts of stature. This is a diagram of the
mechanism by which the most probable heights of the sons and daughters
can be foretold, from the data of the heights of each of their parents.

The weights $M$ and $F$ have to be set opposite to the heights of the
mother and father on their respective scales; then the weight $sd$ will
show the most probable heights of a son and a daughter on the
corresponding scales.  In every one of these cases it is the fiducial
mark in the middle of each weight by which the reading is to be made. 
But, in addition to this, the length of the weight $sd$ is so arranged
that it is an equal chance (an even bet) that the height of each son or
each daughter will lie within the range defined by the upper and lower
edge of the weight, on their respective scales.  The length of $sd$ is 3
$\text{inches} = 2f$; that is, $2\times1.50$ inch.

$A$, $B$, and $C$ are three thin wheels with grooves round their edges.
They are screwed together so as to form a single piece that turns easily
on its axis.  The weights $M$ and $F$ are attached to either end of a
thread that passes over the movable pulley $D$. The pulley itself hangs
from a thread which is wrapped two or three times round the groove of
$B$ and is then secured to the wheel.  The eight $sd$ hangs from a
thread that is wrapped in the same direction two or three times round
the groove of A, and is then secured to the wheel.  The diameter of $A$
is to that of $B$ as 2 to 3. Lastly, a thread wrapped in the opposite
direction round the wheel $C$, which may have any convenient diameter,
is attached to a counterpoise.

It is obvious that raising $M$ will cause $F$ to fall, and \textit{vice
vers\^a}, without affecting the wheels $AB$, and therefore
without affecting $sd$; that is to say, the parental differences may be
varied indefinitely without affecting the stature of the children, so
long as their mid-parental height is unchanged.  But if the mid-parental
height is changed, then that of $sd$ will be changed to $\frac{2}{3}$ of
the amount.

The scale of female heights differs from that of the males, each female
height being laid down in the position which would be occupied by its
male equivalent.  Thus 56 is written in the position of 60.48 inches,
which is equal to $56\times1.08$.  Similarly, 60 is written in the
position of 64.80, which is equal to $60\times1.08$.

In the actual machine the weights run in grooves.  It is also taller and
has a longer scale than is shown in the figure, which is somewhat
shortened for want of space.  

Plate X.  This is a diagram based on Table I.  The figures in it were
first ``smoothed'' as described in the memoir, then lines were drawn
through points corresponding to the same values, just as isobars or
isotherms are drawn.  These lines, as already stated, formed ellipses. 
I have also explained how calculation showed that they were true
ellipses, and verified the values I had obtained of the relation of
their major to their minor axes, of the inclination of these to the
coordinates passing through their common centre, and so forth.  The
ellipse in the figure is one of these.  The numerals are not directly
derived from the smoothed results just spoken of, but are rough
interpolations so as to suit their present positions.  It will be
noticed that each horizontal line grows to a maximum and then
symmetrically diminishes, and that the same is true of each vertical
line.  It will also be seen that the loci of maxima in these follow the
lines $ON$ and $OM$, which are respectively inclined to their adjacent
coordinates at the gradients of 2 to 3, and of 1 to 3.  If there had
been no regression, but if like bred like, then $OM$ and $ON$ would both
have coincided with the diagonal $OL$, in fig.\ \textit{a}, as shown by
the dotted lines.  

I annex a comparison between calculated and observed results. The latter
are inclosed in brackets.  

Given---

\qquad $\text{``Probable error'' of each system of mid-parentages} = 1.22$.  

\qquad $\text{Ratio of mean filial regression}=\frac{2}{3}$.

\qquad $\text{``Probable error'' of each system of regressed values} = 1.50$.  

\qquad Sections of surface of frequency parallel to XY are true ellipses.  

\qquad\qquad [Obs.---Apparently true ellipses.] 

\qquad $MX : YO = 6 : 17.5$, or nearly $1 : 3$. 

\qquad\qquad [Obs.---$1 : 3$.]  

\qquad $\text{Major axes to minor axes} = \sqrt{7} : \sqrt{2} = 10 : 5.35$. 

\qquad\qquad [Obs.---$10 : 5.l$.]  

\qquad Inclination of major axes to $OX = 26^{\circ}\, 36'$.  

\qquad\qquad [Obs.---$25^{\circ}$.]

\qquad Section of surface parallel to $XY$ is a true curve of frequency.

\qquad\qquad [Obs.---Apparently so.]  

\qquad $\text{``Probable error'' of that curve} = 1.07$. 

\qquad\qquad [Obs.---1.0 or a little more.] 

\bigskip\bigskip

\noindent
[\textit{Journal of the Anthropological Institute} \textbf{15} (1886), 
 246--263.]

\end{document}

%