Calculating the G-test of Association
The G-test allows biologists to compare observed values with those predicted
from a specific null hypothesis. The G-test determines the probability that
differences between the observed and predicted (or expected) values are large
enough that they are unlikely to have occurred due to chance alone. The G-test
is generally used with variables that are counts, not scalars. The Chi-square
test can be used in similar instances.
For example, the following table compares the observed distributions of genotypes
in a population with that predicted by the Hardy-Weinberg Principle. The aim
of the G-test is to determine whether the Aa genotype is really more common,
and the aa genotype less common, than they should be under the Hardy-Weinberg
assumptions.
Genotype
AA
Aa aa
Expected from H-W 32 64
32
Observed in population 32 74 22
1) First Step is to calculate the "test statistic"
this is the G value.
E
= expected number for each category
G = 2 the sum of [O * ln (O/E)] O = observed number
for each category
ln
= natural log
"the
sum of" = sum the values in the bracket for n categories (in this case
it is 3)
A G value of zero means that the observed numbers are exactly equal to the
expected numbers. The larger the differences between observed and expected,
the greater the value of G. The higher the G value, the more likely that the
results are significantly different from that predicted by the null hypothesis
(i.e., the smaller the P value).
In the example given above, G is:
G = 2[32*ln(32/32) + 74*ln(74/64) + 22*ln(22/32)] = 2[0 + 10.74 8.24]
= 5.0
2) Second Step is to calculate the degrees of freedom
G increases as the observed values become more and more different from the
expected values, but G also increases as we add more categories. To correct
for this, we need to figure the degrees of freedom in our sample. To calculate
the degrees of freedom (df), we need to determine the minimum number of categories
whose value we need to know before we could calculate the rest. In our genotype
example, if we know any two genotype categories, it is possible to calculate
the third (by subtracting from the total). Thus the degrees of freedom is
two.
3) Third Step is to compare your G-value to the Critical G-value
To determine whether the difference between the observed and expected values
is greater than that expected by chance alone, the G value is compared to
those on a table.
In the table, the degrees of freedom are listed down the left side, and the
Critical G-values are given in the adjacent column.
A p-value of less than or equal to 0.05 is usually accepted as indicating
a significant difference. If you look at the table,
you will see that for 2 degrees of freedom, a G-Value of 6.0 is necessary
to yield a p-value of 0.05 or less. The G-value calculated above is NOT greater
than 6.0, hence we cannot reject the null hypothesis. Therefore, we must conclude
that the data do not differ significantly from the expected Hardy-Weinberg
distribution.
Sometimes the expected values for the G-test are generated from the data themselves.
For example, you have data on the number of times that a large male cricket
mates (n1=12), and the number of times that a small male cricket mates (n2=4).
You would expect, if mating were not related to size, that the two types of
males would have the same opportunity to mate. Thus, since there were 16 total
mating events, each male would be expected to mate 8 times. You can then perform
the G-test using 8 as the expected value for each male, and 12 and 4 as the
observed values.
4) Fourth Step is to report your results
"The genotype frequencies in the first simulation did
not differ from those predicted by Hardy-Weinberg equilibrium (G = 1.23, p
> 0.05, d.f. = 2)."
OR if you did find an effect:
"The genotype frequencies in the small populations differed significantly from those predicted by Hardy-Weinberg equilibrium (G = 8.45, p < 0.05, d.f. = 2)."
© Cody Arenz, Garry Duncan, & Nebraska Wesleyan University