If you have taken a regression or design of experiments class (or both), you probably have come across the following problem (or a similar one):
“Show that the sum-of-squares decomposition and F-statistic reduces to the usual equal-variance (pooled) two sample t-test in the case of treatments - with the realization that an statistic with (numerator) and (denominator) degrees of freedom is equivalent to a statistic with degrees of freedom, viz, ”
The interesting thing about this proof is that is really hard to find (I spent some reasonable amount of time googling and looking in books with no success). More interesting than that though, is that when this proof is mentioned is usually followed by the most annoying phrases in a Math textbook:
- is easy to prove…
- is not difficult to show…
- this easy/straightforward/simple proof is left to the reader…
Despite all of this adjectives, . The humble purpose of this blog post is to get rid of the vanity, work the proof, and let you judge if it is easy/straightforward/simple (or not).
Finally, let me point out that this blog post assumes you are somewhat familiar with the F-test, the t-test, and notation frequently used in design of experiments like , , or
Bye-bye words, hello formulas
Let’s start by putting all the wording into formulas:
We have to prove that
reduces to
(this is key)
Notation
Symbol | Description |
---|---|
SSE | Sum of Squares due to Error |
SST | Sum of Squares of Treatment |
MSE | Mean Sum of squares Error |
MST | Mean Sum of squares Treatment |
a | Number of treatments |
Number of observations in treatment 1 | |
Number of observations in treatment 2 | |
N | Total number of observations |
Mean of treatment | |
Global mean | |
Degrees of freedom of the denominator of F |
Now that we have the formulas, we will work the following:
- Denominator of equation (1)
- Numerator of equation (1)
2.a. Part a
2.b. Part b
2.c. Part c
- Put all together
1. Denominator of equation (1)
When the denominator of expression is:
Recalling that the formula for the sample variance estimator is, we can multiply and divide the terms in the numerator in by and get . Don’t forget that in this case
is called the pooled variance estimator.
2. Numerator of equation (1)
When the numerator of expression is:
and the general expression for SST reduces to . The next step is to expand the sum as follows:
is called the global mean and we are going to write it in a different way. The new way is:
Next, replace (6) in formula (5) and re-write SST as:
The next step is to find alternative ways for the expressions Part a and Part b
2.a. Part a
Multiply and divide the term with by
is common denominator
is common factor of and
Replace
Now is common factor of and
Take and out of the square
2.b. Part b
Multiply and divide the term with by
is common denominator
is common factor of and
Replace
Now is common factor of and
Take and out of the square
Now that we have Part a and Part b we are going to go back to equation and replace them:
Taking into account that , we can re-write equation as :
This lead us with part Part c, that we are going to work next.
2.c. Part c
is common denominator and each of the summands has a factor that we can factor out. Then we have:
Replace
Simplify
Re-write the fraction
Replace
And we have
Finally, we have to replace this expression for Part c in and re-write SST as:
3. Put all together
With the previous steps we have shown that, , we have:
and
The ratio of these two expressions, namely the F-statistic, is then:
And this concludes our proof.