#### TL;DR
This 1908 paper was published under the pen name "Stu...
This is the standard rule that Student is going to challenge. It tr...
The capital $S$ here means "add up over all the measurements in the...
A "moment coefficient" is simply an average of the deviations from ...
$\beta_1$ and $\beta_2$ (just above) are Karl Pearson's shape numbe...
This is a change-of-variables step. He has the distribution of the ...
In this section Gosset establishes that for normal data, how far a ...
Since the true spread is unknown Gosset measures the average's erro...
This is the distribution of $z$ that Gosset was after **today know...
This picture shows how widely the standard deviation measured from ...
**Using the ordinary bell curve on a small sample, you might believ...
Before solving the problem with pure mathematics Gosset checked it ...
These are the working tables — the practical payoff of all the theo...
This is the new method applied to a real experiment and the data be...
Someone with real expertise in the subject knows things the two dat...
Discussion
Since the true spread is unknown Gosset measures the average's error $x$ in units of the *sample's own* standard deviation $s$, forming the ratio $z$. Working out how this $z$ is distributed and tabulating it, is what lets you judge a small experiment honestly and it's the quantity that became **"Student's $t$."**
This is the standard rule that Student is going to challenge. It treats the sample's own measured spread $s$ as if it were the true spread of the whole population, then reads probabilities off the normal "bell curve" tables. The "probability integral" is simply the area under the bell curve - what those tables list.
The capital $S$ here means "add up over all the measurements in the sample" what we'd now write with a $\Sigma$. So the equation is just the ordinary formula for a sample's variance $s^2$ - the average of the squares minus the square of the average - written in 1908 notation.
In this section Gosset establishes that for normal data, how far a sample's average strays from the true average is unrelated to how big that sample's spread happens to be. The author needs this independence so that he can later combine the separate behaviors of the average and the spread without one contaminating the other.
These are the working tables — the practical payoff of all the theory. Reading down the column for your sample size $n$, each entry gives the probability that the true (population) mean falls below a point lying $z$ sample-standard-deviations above your sample's average. The extra right-hand column for $n = 10$ gives the normal-curve answer alongside, so you can see directly how much the ordinary bell curve would mislead you.
$\beta_1$ and $\beta_2$ (just above) are Karl Pearson's shape numbers:
- $\beta_1$ gauges lopsidedness
- and $\beta_2$ gauges how peaked or heavy-tailed a curve is (a normal bell curve has $\beta_2 = 3$).
The particular values Gosset gets are the fingerprint of Pearson's "Type III" curve, which today we'd call a gamma - or chi-squared - distribution. The resulting curve for the sample variance is $y = Cx^{\frac{n-3}{2}}\,e^{-\frac{nx}{2\mu_2}}$ that is the shape taken by the scatter of sample variances.
Someone with real expertise in the subject knows things the two data points don't capture and should temper that confidence accordingly. The statistic is a tool to inform judgement not a replacement for it.
**Using the ordinary bell curve on a small sample, you might believe the odds are about 7000 to 1 that the true mean sits in a certain range, when the honest odds are closer to 550 to 1.** The normal curve makes you far more confident than the data warrant; Gosset's curve corrects this.
This picture shows how widely the standard deviation measured from a sample of just 10 can bounce around the true value - the curve is broad and noticeably lopsided. That variability is exactly why trusting a small-sample spread (and the normal-curve odds built on it) is dangerous, which is the problem the rest of the paper solves.
This is the new method applied to a real experiment and the data became a classic that still turns up in statistics teaching today.
Before solving the problem with pure mathematics Gosset checked it by hand-simulation. He wrote 3000 real body measurements on cards, shuffled them thoroughly, dealt them into samples of four, and computed the statistics for each. This card-shuffling experiment is an early example of testing a theory by brute-force random sampling ***what we now call Monte Carlo methods.***
This is the distribution of $z$ that Gosset was after **today known as Student's $t$-distribution.**
The decisive feature is what's *absent*: the unknown population spread $\sigma$ has dropped out so the curve depends only on the sample size $n$. That's why it can actually be used in practice, and why it has fatter tails - more allowance for being wrong - than the normal curve when samples are small.
#### TL;DR
This 1908 paper was published under the pen name "Student" by William Sealy Gosset. Gosset was a chemist at the Guinness brewery who wrote anonymously because his employer treated statistical methods as trade secrets.
In this paper he tackles a problem earlier statistics had avoided: how to judge the reliability of an average when you have only a few measurements. The accepted method assumed you already knew the true spread of the data, but with a small sample you have to estimate that spread from the sample itself, and that estimate is itself unreliable. As a result the usual normal-curve odds overstate how sure you can be.
Gosset works out the exact probability curve for a small sample's average when its error is measured in units of the sample's own standard deviation (a quantity he calls $z$). He shows that this curve depends only on the sample size, not on the unknown true spread, then tabulates it, checks it against shuffled real data, and tests it on experiments in medicine and agriculture. **The result is the distribution we now call Student's $t$, and the method is the ancestor of the modern $t$-test, one of the most widely used tools in statistics.**
This is a change-of-variables step. He has the distribution of the *variance* $s^2$ and wants the distribution of the *standard deviation* $s$; when you relabel the horizontal axis of a probability curve this way, you have to rescale its height so the total area (total probability) still adds to 1, and that rescaling is where the factor $2s$ comes from. He reuses this same move later to switch to the variable $z$.
A "moment coefficient" is simply an average of the deviations from the mean raised to some power. The second one is the variance, $\mu_2 = \sigma^2$ (the standard deviation squared) - higher ones capture lopsidedness and tail-heaviness. Gosset builds his entire argument by tracking these averages - a technique known as the method of moments.