Calculating the G-test of Association


The G-test allows biologists to compare observed values with those predicted from a specific null hypothesis. The G-test determines the probability that differences between the observed and predicted (or expected) values are large enough that they are unlikely to have occurred due to chance alone. The G-test is generally used with variables that are counts, not scalars. The Chi-square test can be used in similar instances.


For example, the following table compares the observed distributions of genotypes in a population with that predicted by the Hardy-Weinberg Principle. The aim of the G-test is to determine whether the Aa genotype is really more common, and the aa genotype less common, than they should be under the Hardy-Weinberg assumptions.


                                         Genotype
                                         AA   Aa    aa
Expected from H-W         32    64    32
Observed in population     32    74    22


1) First Step is to calculate the "test statistic" – this is the G value.


                                                        E = expected number for each category
G = 2 the sum of [O * ln (O/E)]     O = observed number for each category
                                                        ln = natural log
                                                        "the sum of" = sum the values in the bracket for n categories (in this case it is 3)


A G value of zero means that the observed numbers are exactly equal to the expected numbers. The larger the differences between observed and expected, the greater the value of G. The higher the G value, the more likely that the results are significantly different from that predicted by the null hypothesis (i.e., the smaller the P value).


In the example given above, G is:
G = 2[32*ln(32/32) + 74*ln(74/64) + 22*ln(22/32)] = 2[0 + 10.74 – 8.24] = 5.0

2) Second Step is to calculate the degrees of freedom


G increases as the observed values become more and more different from the expected values, but G also increases as we add more categories. To correct for this, we need to figure the degrees of freedom in our sample. To calculate the degrees of freedom (df), we need to determine the minimum number of categories whose value we need to know before we could calculate the rest. In our genotype example, if we know any two genotype categories, it is possible to calculate the third (by subtracting from the total). Thus the degrees of freedom is two.

3) Third Step is to compare your G-value to the Critical G-value


To determine whether the difference between the observed and expected values is greater than that expected by chance alone, the G value is compared to those on a table. In the table, the degrees of freedom are listed down the left side, and the Critical G-values are given in the adjacent column.


A p-value of less than or equal to 0.05 is usually accepted as indicating a significant difference. If you look at the table, you will see that for 2 degrees of freedom, a G-Value of 6.0 is necessary to yield a p-value of 0.05 or less. The G-value calculated above is NOT greater than 6.0, hence we cannot reject the null hypothesis. Therefore, we must conclude that the data do not differ significantly from the expected Hardy-Weinberg distribution.


Sometimes the expected values for the G-test are generated from the data themselves. For example, you have data on the number of times that a large male cricket mates (n1=12), and the number of times that a small male cricket mates (n2=4). You would expect, if mating were not related to size, that the two types of males would have the same opportunity to mate. Thus, since there were 16 total mating events, each male would be expected to mate 8 times. You can then perform the G-test using 8 as the expected value for each male, and 12 and 4 as the observed values.

4) Fourth Step is to report your results

"The genotype frequencies in the first simulation did not differ from those predicted by Hardy-Weinberg equilibrium (G = 1.23, p > 0.05, d.f. = 2)."

OR if you did find an effect:

"The genotype frequencies in the small populations differed significantly from those predicted by Hardy-Weinberg equilibrium (G = 8.45, p < 0.05, d.f. = 2)."

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

© Cody Arenz, Garry Duncan, & Nebraska Wesleyan University