Analysis of Variance

(ANOVA)



When confronted with a need to compare the means of more than two groups, one might ask why we would not use multiple t tests. For instance, with three groups, why would I not compare groups one and two with a t test, then compare groups one and three, and then compare groups two and three? The answer can be found in our basic probability review. We are concerned with the probability of a TYPE I error (rejecting a true null hypothesis). We generally set an alpha level of .05, which is the probability of making a TYPE I error. Now consider what happens when we do three t tests. There is .05 probability of making a TYPE I error on the first test, .05 probability of the same error on the second test, and .05 probability on the third test. What happens is that these errors are essentially additive, in that the chances of at least one TYPE I error among the three tests much greater than .05. It is like the increased probability of drawing an ace from a deck of cards when we can make multiple draws. One way to remedy this situation is to use a lower alpha level, so that we are sure the overall alpha is .05. That is generally accomplished by simply dividing the desired alpha by the number of comparisons, and evaluating the null hypothesis at the adjusted alpha level. For instance, with three comparisons, each could be tested at alpha =.05/3=.016. This would make the overall ("experimentwise" or "familywise") alpha less than .05. In fact, this is a frequently used method, and it is called a Bonferroni correction.

The Bonferroni correction is not the most powerful method for dealing with this situation. ANOVA allows us do an "overall" test of multiple groups to determine if there are any differences among groups within the set. This test results in a single statistic named F. The F statistic reflects a probability distribution just like z and t do, but it is different in the respect that it only has one tail (it is positively skewed). The reason is that the F value itself is a ratio of to positive numbers, so we never have a negative F the way we have negative z and t values. The region of rejection for the F distribution is all in the upper tail.

Unlike the t distribution, the F distribution requires two df values. One is called the numerator df and the other is called the denominator df. In learning to compute the F value, you will see that the numerator and denominator being referred to are for a ratio of two variances -- the between-groups variance and the within-groups variance, respectively. The within-groups variance is the also called the error variance. If there is no difference between groups, it should be intuitive that we would expect no more variation among group means than we see among scores within groups. That is, if the groups are equal, we would expect the means to be about as constant as the scores within the groups. If the means vary much more than the scores within the groups, it is tempting to conclude that there must be a difference between groups.

In order to make the decision of significance, the F value obtained by dividing the between-groups variance estimate by the within-groups variance estimate is evaluated against a critical value of F at some alpha level. F tables are widely available. The two most accessible to you at this time are the NIST table, and the table on page A-3 in your text.

Let's look at what we would do after obtaining an F statistic in order to see why want to compute it. Consider a case where we are comparing four groups with 31 people in each group. For this analysis, the numerator df value would be 3, and the denominator df value would be 120 (do not worry about why just now). In the F table in your text, find the column for the number 3 and the row for the number 120 (that is on page A-9). Notice that the critical value for F at the .05 alpha level (third small row) is 2.68. Evaluating the meaning of this is as follows: If the null hypothesis that all groups are equal is true, the probability of getting an F value at or above 2.68 is less than .05. This is the same reasoning we used with z and t. The only difference is the type of sampling distribution. See the figure below to understand how the F distribution is similar:








Notice that this conclusion does not tell us which groups among the three groups are different from each other. That is another matter to be addressed with what is called post hoc tests. The primary test in ANOVA is only to determine if there is a significant difference among the groups somewhere. The null hypothesis that all group means are equal is called an an omnibus null hypothesis.

Please take the time now to study my slides at Lesson VI Remember the method for studying the slides. Do not pass a slide that you do not understand without asking questions, as subsequent slides require knowledge of previous slides. At this time, do not attempt to interpret the model comparison perspective in your text unless you are an advanced student. Try to follow my reasoning first. I will point out the relevant parts of your text as we continue.