Which test do you use to identify the difference between two or more than two group means after any other variance in the outcome variable is accounted for?

Developed by Ronald Fisher, ANOVA stands for Analysis of Variance. One-Way Analysis of Variance tells you if there are any statistical differences between the means of three or more independent groups.

Table of Contents Show

When might you use ANOVA?

You might use Analysis of Variance (ANOVA) as a marketer, when you want to test a particular hypothesis. You would use ANOVA to help you understand how your different groups respond, with a null hypothesis for the test that the means of the different groups are equal. If there is a statistically significant result, then it means that the two populations are unequal (or different).

How can ANOVA help?

The one-way ANOVA can help you know whether or not there are significant differences between the means of your independent variables (such as the first example: age, sex, income). When you understand how each independent variable’s mean is different from the others, you can begin to understand which of them has a connection to your dependent variable (landing page clicks), and begin to learn what is driving that behavior.

Examples of using ANOVA

You may want to use ANOVA to help you answer questions like this:

Do age, sex, or income have an effect on whether someone clicks on a landing page?

Do location, employment status, or education have an effect on NPS score?

One-way ANOVA can help you know whether or not there are significant differences between the groups of your independent variables (such as USA vs Canada vs Mexico when testing a Location variable). You may want to test multiple independent variables (such as Location, employment status or education). When you understand how the groups within the independent variable differ (such as USA vs Canada vs Mexico, not location, employment status, or education), you can begin to understand which of them has a connection to your dependent variable (NPS score).

“Do all your locations have the same average NPS score?”

Although, you should note that ANOVA will only tell you that the average NPS scores across all locations are the same or are not the same, it does not tell you which location has a significantly higher or lower average NPS score.

What is the difference between one-way and two-way ANOVA tests?

This is defined by how many independent variables are included in the ANOVA test. One-way means the analysis of variance has one independent variable. Two-way means the test has two independent variables. An example of this may be the independent variable being a brand of drink (one-way), or independent variables of brand of drink and how many calories it has or whether it’s original or diet.

How does ANOVA work?

Like other types of statistical tests, ANOVA compares the means of different groups and shows you if there are any statistical differences between the means. ANOVA is classified as an omnibus test statistic. This means that it can’t tell you which specific groups were statistically significantly different from each other, only that at least two of the groups were.

It’s important to remember that the main ANOVA research question is whether the sample means are from different populations. There are two assumptions upon which ANOVA rests:

First: Whatever the technique of data collection, the observations within each sampled population are normally distributed.

Second: The sampled population has a common variance of s2.

How to conduct an ANOVA test

Stats iQ and ANOVA

Stats iQ from Qualtrics can help you run an ANOVA test. When users select one categorical variable with three or more groups and one continuous or discrete variable, Stats iQ runs a one-way ANOVA (Welch’s F test) and a series of pairwise “post hoc” tests (Games-Howell tests).

The one-way ANOVA tests for an overall relationship between the two variables, and the pairwise tests test each possible pair of groups to see if one group tends to have higher values than the other.

Users can run an ANOVA test through StatsiQ

The Overall Stat Test of Averages acts as an Analysis of Variance (ANOVA). An ANOVA tests the relationship between a categorical and a numeric variable by testing the differences between two or more means. This test produces a p-value to determine whether the relationship is significant or not.

In StatsiQ take the following steps:

Click a variable with 3+ groups and one with numbers,
Then click “Relate”,
You’ll then get an ANOVA, a related “effect size”, and a simple, easy to understand summary.

Qualtrics Crosstabs and ANOVA

You can run an ANOVA test through the Qualtrics Crosstabs feature too.

Ensure your “banner” (column) variable has 3+ groups and your “stub” (rows) variable has numbers (like Age) or numeric recodes (like “Very Satisfied” = 7)
Click “Overall stat test of averages”
You’ll see a basic ANOVA p-value

What does an ANOVA test reveal?

A one way ANOVA will allow you to distinguish that at least two groups were different from each other. Once you begin to understand the difference between the independent variables you will then be able to see how each behaves with your dependent variable. (See landing page example above)

What are the limitations of ANOVA?

Whilst ANOVA will help you to analyse the difference in means between two independent variables, it won’t tell you which statistical groups were different from each other. If your test returns a significant f-statistic (this is the value you get when you run an ANOVA test), you may need to run an ad hoc test (like the Least Significant Difference test) to tell you exactly which groups had a difference in means.

Welch’s F Test ANOVA

Stats iQ recommends an unranked Welch’s F test if several assumptions about the data hold:

The sample size is greater than 10 times the number of groups in the calculation (groups with only one value are excluded), and therefore the Central Limit Theorem satisfies the requirement for normally distributed data.
There are few or no outliers in the continuous/discrete data.

Unlike the slightly more common F test for equal variances, Welch’s F test does not assume that the variances of the groups being compared are equal. Assuming equal variances leads to less accurate results when variances are not in fact equal, and its results are very similar when variances are actually equal.

Ranked ANOVA

When assumptions are violated, the unranked ANOVA may no longer be valid. In that case, Stats iQ recommends the ranked ANOVA (also called “ANOVA on ranks”); Stats iQ rank-transforms the data (replaces values with their rank ordering) and then runs the same ANOVA on that transformed data.

The ranked ANOVA is robust to outliers and non-normally distributed data. Rank transformation is a well-established method for protecting against assumption violation (a “nonparametric” method), and is most commonly seen in the difference between the Pearson and Spearman correlation. Rank transformation followed by Welch’s F test is similar in effect to the Kruskal-Wallis Test.

Note that Stats iQ’s ranked and unranked ANOVA effect sizes (Cohen’s f) are calculated using the F value from the F test for equal variances.

Games-Howell Pairwise Test

Stats iQ runs Games-Howell tests regardless of the outcome of the ANOVA test (as per Zimmerman, 2010). Stats iQ shows unranked or ranked Games-Howell pairwise tests based on the same criteria as those used for ranked vs. unranked ANOVA, so if you see “Ranked ANOVA” in the advanced output, the pairwise tests will also be ranked.

The Games-Howell is essentially a t-test for unequal variances that accounts for the heightened likelihood of finding statistically significant results by chance when running many pairwise tests. Unlike the slightly more common Tukey’s b test, the Games-Howell test does not assume that the variances of the groups being compared are equal. Assuming equal variances leads to less accurate results when variances are not in fact equal, and its results are very similar when variances are actually equal (Howell, 2012).

Note that while the unranked pairwise test tests for the equality of the means of the two groups, the ranked pairwise test does not explicitly test for differences between the groups’ means or medians. Rather, it tests for a general tendency of one group to have larger values than the other.

Additionally, while Stats iQ does not show results of pairwise tests for any group with less than four values, those groups are included in calculating the degrees of freedom for the other pairwise tests.

Additional ANOVA Considerations

With smaller sample sizes, data can still be visually inspected to determine if it is in fact normally distributed; if it is, unranked t-test results are still valid even for small samples. In practice, this assessment can be difficult to make, so Stats iQ recommends ranked t-tests by default for small samples.

With larger sample sizes, outliers are less likely to negatively affect results. Stats iQ uses Tukey’s “outside fence” to define outliers as points more than three times the intraquartile range above the 75th or below the 25th percentile point.

Data like “Highest level of education completed” or “Finishing order in marathon” are unambiguously ordinal. Though Likert scales (like a 1 to 7 scale where 1 is Very dissatisfied and 7 is Very satisfied) are technically ordinal, it is common practice in social sciences to treat them as though they are continuous (i.e., with an unranked t-test).

What does a statistical test do?

Statistical tests work by calculating a test statistic – a number that describes how much the relationship between variables in your test differs from the null hypothesis of no relationship.

It then calculates a p-value (probability value). The p-value estimates how likely it is that you would see the difference described by the test statistic if the null hypothesis of no relationship were true.

If the value of the test statistic is more extreme than the statistic calculated from the null hypothesis, then you can infer a statistically significant relationship between the predictor and outcome variables.

If the value of the test statistic is less extreme than the one calculated from the null hypothesis, then you can infer no statistically significant relationship between the predictor and outcome variables.

When to perform a statistical test

You can perform statistical tests on data that have been collected in a statistically valid manner – either through an experiment, or through observations made using probability sampling methods.

For a statistical test to be valid, your sample size needs to be large enough to approximate the true distribution of the population being studied.

To determine which statistical test to use, you need to know:

whether your data meets certain assumptions.
the types of variables that you’re dealing with.

Statistical assumptions

Statistical tests make some common assumptions about the data they are testing:

Independence of observations (a.k.a. no autocorrelation): The observations/variables you include in your test are not related (for example, multiple measurements of a single test subject are not independent, while measurements of multiple different test subjects are independent).
Homogeneity of variance: the variance within each group being compared is similar among all groups. If one group has much more variation than others, it will limit the test’s effectiveness.
Normality of data: the data follows a normal distribution (a.k.a. a bell curve). This assumption applies only to quantitative data.

If your data do not meet the assumptions of normality or homogeneity of variance, you may be able to perform a nonparametric statistical test, which allows you to make comparisons without any assumptions about the data distribution.

If your data do not meet the assumption of independence of observations, you may be able to use a test that accounts for structure in your data (repeated-measures tests or tests that include blocking variables).

Types of variables

The types of variables you have usually determine what type of statistical test you can use.

Quantitative variables represent amounts of things (e.g. the number of trees in a forest). Types of quantitative variables include:

Continuous (a.k.a ratio variables): represent measures and can usually be divided into units smaller than one (e.g. 0.75 grams).
Discrete (a.k.a integer variables): represent counts and usually can’t be divided into units smaller than one (e.g. 1 tree).

Categorical variables represent groupings of things (e.g. the different tree species in a forest). Types of categorical variables include:

Ordinal: represent data with an order (e.g. rankings).
Nominal: represent group names (e.g. brands or species names).
Binary: represent data with a yes/no or 1/0 outcome (e.g. win or lose).

Choose the test that fits the types of predictor and outcome variables you have collected (if you are doing an experiment, these are the independent and dependent variables). Consult the tables below to see which test best matches your variables.

Professional editors proofread and edit your paper by focusing on:

Academic style
Vague sentences
Grammar
Style consistency

See an example

Choosing a parametric test: regression, comparison, or correlation

Parametric tests usually have stricter requirements than nonparametric tests, and are able to make stronger inferences from the data. They can only be conducted with data that adheres to the common assumptions of statistical tests.

The most common types of parametric test include regression tests, comparison tests, and correlation tests.

Regression tests

Regression tests look for cause-and-effect relationships. They can be used to estimate the effect of one or more continuous variables on another variable.

	Predictor variable	Research question example
Simple linear regression		What is the effect of income on longevity?
Multiple linear regression	Continuous 2 or more predictors	What is the effect of income and minutes of exercise per day on longevity?
Logistic regression		What is the effect of drug dosage on the survival of a test subject?

Comparison tests

Comparison tests look for differences among group means. They can be used to test the effect of a categorical variable on the mean value of some other characteristic.

T-tests are used when comparing the means of precisely two groups (e.g. the average heights of men and women). ANOVA and MANOVA tests are used when comparing the means of more than two groups (e.g. the average heights of children, teenagers, and adults).

	Predictor variable	Outcome variable	Research question example
Paired t-test		Quantitative groups come from the same population	What is the effect of two different test prep programs on the average exam scores for students from the same class?
Independent t-test		Quantitative groups come from different populations	What is the difference in average exam scores for students from two different schools?
ANOVA	Categorical 1 or more predictor		What is the difference in average pain levels among post-surgical patients given three different painkillers?
MANOVA	Categorical 1 or more predictor	Quantitative 2 or more outcome	What is the effect of flower species on petal length, petal width, and stem length?

Correlation tests

Correlation tests check whether variables are related without hypothesizing a cause-and-effect relationship.

These can be used to test whether two variables you want to use in (for example) a multiple regression test are autocorrelated.

	Variables	Research question example
Pearson’s r		How are latitude and temperature related?

Choosing a nonparametric test

Non-parametric tests don’t make as many assumptions about the data, and are useful when one or more of the common statistical assumptions are violated. However, the inferences they make aren’t as strong as with parametric tests.

	Predictor variable	Outcome variable	Use in place of…
Spearman’s r			Pearson’s r
Chi square test of independence			Pearson’s r
Sign test			One-sample t-test
Kruskal–Wallis H	Categorical 3 or more groups		ANOVA
ANOSIM	Categorical 3 or more groups	Quantitative 2 or more outcome variables	MANOVA
Wilcoxon Rank-Sum test		Quantitative groups come from different populations	Independent t-test
Wilcoxon Signed-rank test		Quantitative groups come from the same population	Paired t-test

Flowchart: choosing a statistical test

This flowchart helps you choose among parametric tests. For nonparametric alternatives, check the table above.

Frequently asked questions about statistical tests

What is statistical significance?

Statistical significance is a term used by researchers to state that it is unlikely their observations could have occurred under the null hypothesis of a statistical test. Significance is usually denoted by a p-value, or probability value.

Statistical significance is arbitrary – it depends on the threshold, or alpha value, chosen by the researcher. The most common threshold is p < 0.05, which means that the data is likely to occur less than 5% of the time under the null hypothesis.

When the p-value falls below the chosen alpha value, then we say the result of the test is statistically significant.

What is the difference between quantitative and categorical variables?

Quantitative variables are any variables where the data represent amounts (e.g. height, weight, or age).

Categorical variables are any variables where the data represent groups. This includes rankings (e.g. finishing places in a race), classifications (e.g. brands of cereal), and binary outcomes (e.g. coin flips).

You need to know what type of variables you are working with to choose the right statistical test for your data and interpret your results.

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2022, July 06). Choosing the Right Statistical Test | Types & Examples. Scribbr. Retrieved November 3, 2022, from https://www.scribbr.com/statistics/statistical-tests/