6 min read

When does the F-test reduce to a t-test?

If you have taken a regression or design of experiments class (or both), you probably have come across the following problem (or a similar one):

“Show that the sum-of-squares decomposition and F-statistic reduces to the usual equal-variance (pooled) two sample t-test in the case of a=2 treatments - with the realization that an F statistic with 1 (numerator) and k (denominator) degrees of freedom is equivalent to a t statistic with k degrees of freedom, viz, F1,k=tk2

The interesting thing about this proof is that is really hard to find (I spent some reasonable amount of time googling and looking in books with no success). More interesting than that though, is that when this proof is mentioned is usually followed by the most annoying phrases in a Math textbook:

  • is easy to prove…
  • is not difficult to show…
  • this easy/straightforward/simple proof is left to the reader…

Despite all of this adjectives, it is hard to find the actual proof. The humble purpose of this blog post is to get rid of the vanity, work the proof, and let you judge if it is easy/straightforward/simple (or not).

Finally, let me point out that this blog post assumes you are somewhat familiar with the F-test, the t-test, and notation frequently used in design of experiments like y¯.., y¯i., or y¯.j


Bye-bye words, hello formulas

Let’s start by putting all the wording into formulas:

We have to prove that

(1)Fa1,Na=MSTMSE=SSTa1SSENa

reduces to

(2)tk2=(y¯1.y¯2.)2Sp2(1n1+1n2)

When a = 2 (this is key)


Notation

Symbol Description
SSE Sum of Squares due to Error
SST Sum of Squares of Treatment
MSE Mean Sum of squares Error
MST Mean Sum of squares Treatment
a Number of treatments
n1 Number of observations in treatment 1
n2 Number of observations in treatment 2
N Total number of observations
y¯i. Mean of treatment i
y¯.. Global mean
k=Na Degrees of freedom of the denominator of F

Now that we have the formulas, we will work the following:

  1. Denominator of equation (1)
  2. Numerator of equation (1)
    2.a. Part a
    2.b. Part b
    2.c. Part c
  3. Put all together

1. Denominator of equation (1)

When a=2 the denominator of expression (1) is:

(3)MSE=SSEN2=j=1n1(y1jy¯1.)2+j=1n2(y2jy¯2.)2N2

Recalling that the formula for the sample variance estimator is, Si2=j=1ni(yijy¯i.)2ni1 we can multiply and divide the terms in the numerator in (3) by (ni1) and get (4). Don’t forget that in this case N=n1+n2

(4)SSEN2=(n11)S12+(n21)S22n1+n22=Sp2

Sp2 is called the pooled variance estimator.


2. Numerator of equation (1)

When a=2 the numerator of expression (1) is:

SST21=SST

and the general expression for SST reduces to SST=12ni(y¯i.y¯..)2 . The next step is to expand the sum as follows:
(5)SST=12ni(y¯i.y¯..)2=n1(y¯1.y¯..)2+n2(y¯2.y¯..)2

y¯.. is called the global mean and we are going to write it in a different way. The new way is:

(6)y¯..=n1y¯1.+n2y¯2.N

Next, replace (6) in formula (5) and re-write SST as:

(7)SST=n1[y¯1.(n1y¯1.+n2y¯2.N)]2Part a+n2[y¯2.(n1y¯1.+n2y¯2.N)]2Part b

The next step is to find alternative ways for the expressions Part a and Part b


2.a. Part a

Part a=n1[y¯1.(n1y¯1.+n2y¯2.N)]2

Multiply and divide the term with y¯1. by N

n1[Ny¯1.N(n1y¯1.+n2y¯2.N)]2

N is common denominator

n1[Ny¯1.n1y¯1.n2y¯2.N]2

y¯1. is common factor of N and n1

n1[(Nn1)y¯1.n2y¯2.N]2

Replace (Nn1)=n2

n1[n2y¯1.n2y¯2.N]2

Now n2 is common factor of y¯1. and y¯2.

n1[n2(y¯1.y¯2.)N]2

Take n2 and N out of the square

Part a=n1n22N2(y¯1.y¯2.)2


2.b. Part b

Part b=n2[y¯2.(n1y¯1.+n2y¯2.N)]2

Multiply and divide the term with y¯2. by N

n2[Ny¯2.N(n1y¯1.+n2y¯2.N)]2

N is common denominator

n2[Ny¯2.n1y¯1.n2y¯2.N]2

y¯2. is common factor of N and n2

n2[(Nn2)y¯2.n1y¯1.N]2

Replace (Nn2)=n1

n2[n1y¯2.n1y¯1.N]2

Now n1 is common factor of y¯1. and y¯2.

n2[n1(y¯2.y¯1.)N]2

Take n1 and N out of the square

Part b=n2n12N2(y¯2.y¯1.)2


Now that we have Part a and Part b we are going to go back to equation (7) and replace them:

(8)SST=n1n22N2(y¯1.y¯2.)2+n2n12N2(y¯2.y¯1.)2

Taking into account that (y¯1.y¯2.)2=(y¯2.y¯1.)2, we can re-write equation (8) as (9):

(9)SST=[n1n22N2+n2n12N2]Part c(y¯1.y¯2.)2

This lead us with part Part c, that we are going to work next.


2.c. Part c

Part c=n1n22N2+n2n12N2

N2 is common denominator and each of the summands has a n1n2 factor that we can factor out. Then we have:

n1n2(n1+n2)N2

Replace N=n1+n2

n1n2NN2

Simplify N

n1n2N

Re-write the fraction

1Nn1n2

Replace N=n1+n2

1n1+n2n1n2=11n1+1n2

And we have

Part c=11n1+1n2


Finally, we have to replace this expression for Part c in (9) and re-write SST as:

SST=11n1+1n2(y¯1.y¯2.)2


3. Put all together

With the previous steps we have shown that, when a = 2, we have:

SST21=(y¯1.y¯2.)21n1+1n2

and

SSEN2=Sp2

The ratio of these two expressions, namely the F-statistic, is then:

F1,k=SST21SSEN2=(y¯1.y¯2.)2Sp2(1n1+1n2)=tk2

And this concludes our proof.