21 ANOVA

library(tidyverse)
library(janitor)
library(here)
library(skimr)
library(ggridges)
library(lmtest)
library(performance)
library(patchwork)
library(ggridges)
library(qqplotr)

Warning

Set this up as a new R project, with organised folders.

Make sure to include some data checking in your script.

21.1 Analysis of Variance (ANOVA)

So far we have used linear models for analyses between two ‘categorical’ explanatory variables e.g. t-tests, and for regression with two ‘continuous’ variables. However, as designs become more complicated, the number of comparisons can quickly become overwhelming, working with estimates and intervals alone can become harder.

21.1.2 ANOVA is Useful When We Have More Than Two Groups

When we have more than two groups (i.e., more than two levels of a categorical factor), a t-test would require multiple pairwise comparisons, each introducing an additional chance for Type I errors. If we conducted multiple t-tests (e.g., three comparisons for three groups), the probability of finding at least one significant result by chance (when there is no true effect) increases. ANOVA mitigates this problem by testing an overall hypothesis: it asks if there is a statistically significant difference among the group means without specifying which groups are different. By conducting a single F-test, ANOVA controls the Type I error rate across all group comparisons, maintaining a consistent significance threshold.

21.2 Maize data

Our simple linear model for the maize data was:

lsmodel1 <- lm(height ~ type, data = darwin)

A one-way ANOVA is a way of fitting a linear model with a single explanatory variable. Its main goal is to measure the total variability in the data and split it into two parts: the variability between groups (signal) and the variability within groups (noise). This “signal-to-noise” ratio helps us understand how much of the total variation is explained by our model (the signal) versus what remains unexplained (the noise).

The method for fitting this model is called ordinary least squares (OLS). Here’s how it works:

Total Variation (SST): First, the model calculates the total variation (or total sum of squares, SST) by measuring how far each data point is from the overall mean.

Explained Variation (SSR): Then, the model fits a line (slope) that minimizes the squared distances between the data points and this line, capturing the variability explained by the groups (sum of squares for regression, SSR).

Unexplained Variation (SSE): Finally, the leftover variation, or what the model can’t explain, is measured as the sum of squares for error (SSE).

In this way, the total variability (SST) is divided into explained (SSR) and unexplained (SSE) parts, allowing us to see if the differences between group means are likely due to a real effect.

\[ SST = SSR + SSE \]

Remember:

SST = the sum of squared differences between the data points and the grand mean
SSR = the sum of the squared differences between the grand mean and the predicted position on the linear model
SSE = the sum of the squared differences between the predicted position and the observed position

Least squares quantifies A) total least squares, B) treatment least squares, C) error sum of squares (SST, SSR, SSE). The vertical lines measure the distances, which are then squared and summed. SST is calculated by measuring to the intercept, SSR is calculated by measuring the distance of the estimates to the intercept, SSE is calculated by measuring the distance to the estimates

21.2.1 The ANOVA table

If we want to get the ANOVA table for a linear model we can use the anova() function:

anova(lsmodel1)

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
type	1	51.35208	51.352083	5.939518	0.0214145
Residuals	28	242.08333	8.645833	NA	NA

When you compare this with the output from the summary() function, you’ll see that the F-value, P-value, and degrees of freedom (df) match exactly. However, the ANOVA table itself is fairly minimal—it doesn’t give us biological context, such as differences in means or uncertainties (like standard errors). Despite this, ANOVA tables are commonly reported, though they provide only a basic part of the analysis.

ANOVA tables help simplify effects when there are more than two levels, and understanding their structure is key:

Source (Column 1): Lists the variance source—signal (regression) in the first row and noise (error) in the second.
Degrees of Freedom (Column 2): Shows df for treatments (k - 1) in the first row and residual df (N - k) in the second.
Sum of Squares (Column 3): The total explained variance (SSR) in the first row, and unexplained variance (SSE) in the second.
Mean Sum of Squares (Column 4): The average variance per group, MSR = SSR/(k - 1) for treatments, and MSE = SSE/(N - k) for residuals. This is pooled from all treatments, enhancing power with smaller samples but requiring equal variances across groups (homogeneity).
F-Statistic (Column 5): Our signal-to-noise ratio, calculated as the treatment variance divided by the residual error variance.

\[ F =\frac{SSR/(k-1)}{SSE/(N-k)} = \frac{MSR}{MSE} \]

In our example, with an F value of 5.9, the “signal” (or effect of the treatment) is almost six times greater than the “noise” (within-group variation). To determine if this is significant, we use the F value, along with sample size and treatment levels, to calculate the probability of seeing this ratio if there were actually no treatment effect (null hypothesis).

This probability (the P-value) is found using an F-distribution (see below), which takes into account the degrees of freedom for both the signal (mean squared regression value) and noise (mean squared error value). You can verify this in R with the pf() function, using the F value, degrees of freedom for signal and noise, and setting the last argument to indicate a two-sided test. This should match the P-value in your ANOVA output exactly.

pf(5.9395, df1 = 1, df2 = 28, lower.tail=FALSE)

[1] 0.02141466

Lastly, it’s helpful to remember that F and t are related: for two groups, the F value is simply the square of the t value.

It’s common to see P-values reported without context, but these “naked P-values” are hard to interpret without knowing:

Which test was used
The observed test value
The degrees of freedom.

So the good (and conventional) way to report this result as an ANOVA would be:

The height of the cross-pollinated plants was significantly taller than the height of the self-pollinated plants (F_1,28 = 5.9, P = 0.02).

However, this approach focuses on the statistical test rather than the biological meaning. It doesn’t tell us the actual heights, the estimated difference, or the uncertainty around this estimate.

The self pollinated maize plants measured an average of 17.6 [16-19.1] (mean[95% CI]) inches high, while the cross-pollinated plants had a mean height of 20.2 [18.6-21.7] inches - a difference of 2.6 [-0.4-4.8] inches (one-way ANOVA: F_1,28 = 5.9, P = 0.02).

Lastly, it’s helpful to remember that F and t are related: for two groups, the F value is simply the square of the t value.

The F value is the square of the t value when comparing two groups because both tests measure the same thing: the ratio of between-group to within-group variance. In a two-group case, squaring the t statistic (which compares means) matches the variance ratio approach of F, making F = t² and thus equivalent for two groups.

\[ F= \frac{within~group~variance}{between~group~variance} \]

and in a two-sample t-test:

\[ t^2= \frac{difference~in~means^2}{SE^2} \]

21.3 Two-way ANOVA

Two way ANOVA (as you might guess) includes two explanatory variables. Usually these are treatments of interest, but for this example we will stick with including the pair variable.

lsmodel2 <- lm(height ~ type + as.factor(pair), data = darwin)

And we can look at this table using the anova() function again

anova(lsmodel2)

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
type	1	51.35208	51.352083	4.6138501	0.0497029
as.factor(pair)	14	86.26354	6.161682	0.5536109	0.8597119
Residuals	14	155.81979	11.129985	NA	NA

Note how some of the degrees of freedom which were initially in the residuals, have now been ‘used’ by the new term in the more complex model. And that some of the SSE has now been used by SSR explained by the pair term.

Comparing F-values between different predictors is often a good way of comparing how important they are (what share of variance explained do they have).

When we see an F value less than 1, it suggests that pairing in the experiment isn’t adding useful information—essentially, it’s doing “nothing.” If pairing had no effect, we’d expect an F value around 1, but seeing an F much lower than 1 implies a “negative variance component.” In other words, pairing has actually increased the noise (SSE) relative to the signal (SSR).

When the mean squares for the treatment (MSR) are smaller than for the residuals (MSE), it usually points to a design issue. This could mean there wasn’t enough sampling (more pairs may be needed) or that the pairs weren’t randomly selected.

21.4 Summary

ANOVA tables can be built for any linear model. The tables partition the variance into signal(s) and noise, which can be compared using an F-test. For complex analyses where many pairwise comparisons could be performed, an initial F-test can provide the initial evidence for whether there are any differences at all, reducing the risk of overtesting and the false positives that can be generated.

21.5 ANOVA for frogspawning

Q. How does frogspawn hatching time vary with temperature?

Imagine we ran a manipulative experiment.

A manipulative study is one in which the experimenter changes something about the experimental study system and studies the effect of this change.

We collected newly-layed frogspawn from a pond in the Italian Alps and we brought them back to the lab, where we divided them into 60 water containers. 20 of the containers’ water temperature was kept to 13°C, 20 containers were kept to 18°C and the remaining 20 containers were kept to 25°C. Having a high number of replicates increases our confidence that the expected difference between groups is due to the factor we are interested in. Here, temperature.

We monitored each water container and we recorded hatching times (days until hatching of eggs) in a spreadsheet (here called frogs_messy_data.csv).

Our response variable is Hatching_time.
Our explanatory variable is Temperature, with 3 levels: 13°C, 18°C and 25°C.

We want to compare the means of 3 independent groups (13°C, 18°C and 25°C temperature groups) and we have one continuous response variable (Hatching time in days) and one categorical explanatory variable (Temperature). One-way ANOVA is the appropriate analysis!

21.5.1 Activity 1. Make a Hypothesis

Always make a hypothesis and prediction, before you delve into the data analysis.

A hypothesis is a tentative answer to a well-framed question, referring to a mechanistic explanation of the expected pattern. It can be verified via predictions, which can be tested by making additional observations and performing experiments.

This should be backed up by some level of knowledge about your study system.

In our case, knowing that frogspawn takes around 2-3 weeks to hatch under optimal temperatures (15-20°C), we can hypothesize that the lower the temperature, the longer it will take for frogspawn to hatch. Our hypothesis can therefore be:

Mean frogspawn hatching time will vary with temperature level. Higher temperatures will lead to shorter hatching times. Specifically, at the highest temperature in our study (25°C), we predict that hatching time will be reduced compared to lower temperatures

21.5.2 Activity 2. Import and clean data

After setting up your R project, import and tidy your dataset.

Tip

Remember to organise your analysis script - use the document outline feature to help you.

Is your data in a tidy format according to {Appendix C}?

Start by using functions such as skimr::skim() or glimpse() or simple inspecting the tibble to look at data formats

#___________________________----

# SET UP ----
## An analysis of the development time of frogspawn in response to water temperature ----
#___________________________----

# PACKAGES ----
library(tidyverse)
library(here)
library(skimr)
library(ggridges)
library(lmtest)
library(patchwork)
#___________________________----

# IMPORT DATA ----
frogs <- read_csv(here("data", "frogs_messy_data.csv"))
#___________________________----
# CHECK DATA ----

# A data skim indicates that the column Temperature18 is not in numerical format
# Temperature15 has at least one impossible value of 999 days for egg hatching
# There ARE lots of NAs but this is because data is in a wide format

skimr::skim(frogs)

# TIDY DATA ----

frogs <- frogs |> 
  #make snake_case names and remove redundant Temperature label
  rename("13" = Temperature13,
         "18" = Temperature18,
         "25" = Temperature25,
         frogspawn_id = `Frogspawn sample id`) |> 
  pivot_longer(`13`:`25`, 
               names_to="temperature", 
               values_to="days",
               values_drop_na = TRUE)

`r unhide()`

Your turn

Check the distribution of the data
Remove any impossible values
Optional - check normality by subgroups

#___________________________----

# CHECK FOR OUTLIERS ----
# histograms, dotplots or density plots works here
frogs |> 
ggplot(aes(x = days, 
           y = temperature)) + 
  ggridges::geom_density_ridges()

# Remove impossible days values
frogs <- frogs |> 
  filter(days < 900)
#___________________________----
# CHECK NORMALITY----

# recheck distributions and assess normality  
frogs |> 
ggplot(aes(x = days, 
                     y = temperature)) + 
  ggridges::geom_density_ridges()

library(qqplotr)
ggplot(frogs, aes(sample = days))+
  # stat_qq_band requires qqplotr and the latest version of ggplot2
    stat_qq_band() +
    stat_qq_line() +
    stat_qq_point() +
  facet_wrap(~temperature)

21.5.3 Activity 3. Design a linear model

With the data tidied can you produce a linear model, check its assumptions and interpret its findings?

# ANALYSIS ----
lsmodel_frogs1 <- lm(days ~ temperature, data = frogs)

 summary(lsmodel_frogs1)

# anova(lsmodel_frogs1)

# broom::tidy(lsmodel_frogs1, conf.int = T)
#___________________________----


Call:
lm(formula = days ~ temperature, data = frogs)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.20000 -0.61905 -0.04762  0.80000  1.95238 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)    26.6190     0.2292  116.12   <2e-16 ***
temperature18  -5.5714     0.3242  -17.19   <2e-16 ***
temperature25 -10.4190     0.3282  -31.75   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.05 on 59 degrees of freedom
Multiple R-squared:  0.9448,    Adjusted R-squared:  0.943 
F-statistic: 505.4 on 2 and 59 DF,  p-value: < 2.2e-16

The summary of the model tells us lots of things:

The mean days to development is 26.62 ±0.23 ( mean SE) days at 13 degrees celsius.
The time to development is 5.57 (± 0.32) days less than the intercept at 18 degrees
The time to development is 10.42 (± 0.32) days less than the intercept at 25 degrees

The ANOVA shows that temperature does explain a significant amount of the variance in time to hatching F_2,59 = 505.4, P < 0.001

The R squared shows that 94% of the variance in time to hatching is explained by temperature

21.6 Checking the fit of a model

After conducting a linear model analysis, we often find ourselves with results and inferences. However, before we can fully trust our analysis, it’s essential to check whether the model fits our data well. This is often described as whether our model meets certain “Assumptions” about the data. If our model deviates from these assumptions, the results are not to be trusted! More on this later

We can use base R commands like plot() or the performance package to inspect our models.

In linear models these are as follows:

21.6.1 Assumption 1: Normality of Residuals

Residuals are the differences between observed values and the fitted values produced by the model. In our case, this involves analyzing plant heights against treatment means. The normality of residuals is crucial because the linear model relies on this assumption to compute standard errors, which in turn affects confidence intervals and p-values.

21.6.1.1 Graphical Checks for Normality

To visually assess the normality of residuals, we can utilize various plotting functions in R. Two effective methods are:

Quantile-Quantile (QQ) Plot: This plot compares the distribution of residuals to a theoretical normal distribution. If the residuals follow a normal distribution, they should approximately align along a diagonal line.

plot(lsmodel_frogs1, which=c(2,2))

performance::check_model(lsmodel_frogs1, check=c("normality","qq"),
                         detrend = F)

21.6.1.2 Formal test of Normality

Warning

Formal tests (like Shapiro-Wilk) are sensitive to sample size:

With small samples, these tests often lack power, failing to detect real departures from normality.

With large samples, even minor deviations from normality can yield a significant p-value, suggesting non-normality even when it’s not practically concerning. Graphs, on the other hand, allow you to visually assess how much any deviation might matter in practical terms

shapiro.test(residuals(lsmodel_frogs1))


    Shapiro-Wilk normality test

data:  residuals(lsmodel_frogs1)
W = 0.97044, p-value = 0.1398

performance::check_normality(lsmodel_frogs1)

OK: residuals appear as normally distributed (p = 0.140).

21.6.2 Assumption 2: Homogeneity of Variance

The assumption of homogeneity of variance (or homoscedasticity) states that the residual variance should be constant across all levels of the independent variable(s). This is important because linear models often pool variance across groups to make inferences.

21.6.2.1 Graphical Checks for Homogeneity

To assess this assumption, we can plot the residuals against the fitted values:

plot(lsmodel_frogs1, which=c(1,3))

performance::check_model(lsmodel_frogs1, check=c("homogeneity"))

21.6.2.2 Formal Tests for Homogeneity of Variance

To formally test for homogeneity of variance, we can use:

Breusch-Pagan Test: This test evaluates whether the variance of residuals is constant.

library(lmtest)

bptest(lsmodel1, varformula = ~fitted.values(lsmodel_frogs1), studentize = F )


    Breusch-Pagan test

data:  lsmodel1
BP = 0.3423, df = 1, p-value = 0.5585

performance::check_heteroscedasticity(lsmodel_frogs1)

OK: Error variance appears to be homoscedastic (p = 0.161).

21.6.3 Assumption 3: Linearity

Linear regression assumes a linear relationship between the predictor(s) and the response variable. If this assumption is violated, the model may produce biased or inaccurate predictions.

21.6.3.1 Graphical checks for linearity

To assess this assumption, we can plot the residuals against the fitted values:

plot(lsmodel_frogs1, which=c(1,3))

performance::check_model(lsmodel_frogs1, check=c("linearity"))

Checking for linearity is less relevant when we only have categorical predictors:

No Intermediate Values: Fixed level predictors means there are no intermediate values to examine. This means we should always be able to fit a straight line

21.6.4 Outliers and Their Impact

Outliers can significantly affect the estimates of a model, influencing both residuals and overall model fit. Therefore, it’s essential to assess the presence and impact of outliers on our analysis.

21.6.4.1 Identifying Outliers

To identify outliers, we can employ Cook’s Distance, which measures the influence of each data point on the model fit. Large values of Cook’s Distance suggest that a point may be having an outsized effect.

plot(lsmodel_frogs1, which=c(4,4))

performance::check_model(lsmodel_frogs1, check=c("outliers"))

21.6.4.2 Formal Tests for Outliers

performance::check_outliers(lsmodel_frogs1)

OK: No outliers detected.
- Based on the following method and threshold: cook (0.5).
- For variable: (Whole model)

21.6.5 Quick model checks

Hit Enter in the console to cycle through plots

plot(lsmodel_frogs1)

performance::check_model(lsmodel_frogs1,
                         detrend = F)

21.7 Making Figures

21.7.1 Activity 4. Design a figure

Can you produce a suitable Figure for this dataset?

There are multiple ways to present the same data, equally valid, but emphasising different concepts. For example this first figure uses a boxplot and data points to illustrate the differences. Remember the median and IQR are essentially descriptive statistics, not inferential.

Frogspawn hatching times at 13, 18 and 25 degrees Celsius. Boxplot displays median, hinges are first and third quartiles, whiskers extend from the hinge to 1.5X the interquartile range. Points represent individual frogspawns.

Whereas the next figure, uses the emmeans() package to produce estimate means and confidence intervals from the lm() and therefore is the produce of inferential statistics, this figure illustrates the estimates of our model rather than the parameters of our sample.

Neither method is ‘best’.

Time to hatching is inversely related to temperature in frogspawn. Circles represent estimated mean hatching times with 95% confidence intervals from a one-way ANOVA (F_1,28 = 385.9, P < 0.001). Dashed lines indicate the slope of the mean difference between 13-18 degrees and 13-25 degrees Celsius. Faded points represent individual data points.

21.7.1.1 Task

Can you write a summary of the Results?

Increasing temperatures had a clear effect on reducing the time taken for frogspawn to hatch (one-way ANOVA: F_2,59 = 505.4, P < 0.001). At 13\(^\circ\) C the mean time to hatching was 26.62 days [26.16-27 95% CI], this reduced by an average of 5.57 days [4.92 - 6.22] at 18\(^\circ\) C and by 10.42 days [9.76 - 11.08] at 25\(^\circ\) C.

21.1 Analysis of Variance (ANOVA)

21.1.1 ANOVA and the t-test Share a Common Foundation

21.1.2 ANOVA is Useful When We Have More Than Two Groups

21.2 Maize data

21.2.1 The ANOVA table

21.3 Two-way ANOVA

21.4 Summary

21.5 ANOVA for frogspawning

21.5.1 Activity 1. Make a Hypothesis

21.5.2 Activity 2. Import and clean data

21.5.3 Activity 3. Design a linear model

21.6 Checking the fit of a model

21.6.1 Assumption 1: Normality of Residuals

21.6.1.1 Graphical Checks for Normality

21.6.1.2 Formal test of Normality

21.6.2 Assumption 2: Homogeneity of Variance

21.6.2.1 Graphical Checks for Homogeneity

21.6.2.2 Formal Tests for Homogeneity of Variance

21.6.3 Assumption 3: Linearity

21.6.3.1 Graphical checks for linearity

21.6.4 Outliers and Their Impact

21.6.4.1 Identifying Outliers

21.6.4.2 Formal Tests for Outliers

21.6.5 Quick model checks

21.7 Making Figures

21.7.1 Activity 4. Design a figure

21.7.1.1 Task