6 Worked Example 3 - Complex Designs

This is definitely the most complicated experiment we will analyse in this workbook, as a result of the complex data hierarchy.

6.1 The data

The BIODEPTH (Biodiversity and Ecosystem Process in Terrestrial Herbaceous Ecosystems) project manipulated grassland plant diversity in field plots to simulate the impact of the loss of plant species on ecosystem processes here above ground yield.

There is a gradient of managed species diversity with five levels ranging from monoculture to an attempted maximum for each site. This was repeated at eight different grassland sites across seven countries.

Each level of diversity was repeated with different mixtures of plant species chosen at random, and this is designed to separate the features of diversity from the species composition.

Experiments were repeated in two blocks per site.

biodepth <- read_tsv("files/Biodepth.txt")
biodepth <- biodepth %>% 
  mutate(Mix = factor(Mix),
         Diversity2 = log(Diversity, 2)) %>% 
  drop_na()

# set Mix as a factor
# Set Diversity to log, base 2
head(biodepth)
Plot Site Block Mix Mix_Nested Diversity Shoot2 Shoot3 Diversity2
B1P001 Germany A 5 5 1 463.75 554.75 0
B1P003 Germany A 15 15 2 674.55 539.95 1
B1P004 Germany A 7 7 1 563.40 476.00 0
B1P005 Germany A 16 16 2 567.10 308.45 1
B1P006 Germany A 8 8 1 378.20 447.10 0
B1P008 Germany A 13 13 2 889.15 825.10 1

Important variables are:

  • Shoot2 as one of the variables representing yield, our response or dependent variable.

  • Diversity2 Species richness indicator (on a log base 2 scale)

  • Block, Site, Mix are all random effects

What follows is a complex design, but will hopefully outline the careful thought that must go into designing mixed-models

Q. Discussion - what is wrong with this design?

bio.lmer1 <- lmer(Shoot2 ~ Diversity2 + (1|Site) + (1|Block) + (1|Mix), data = biodepth)

summary(bio.lmer1)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Shoot2 ~ Diversity2 + (1 | Site) + (1 | Block) + (1 | Mix)
##    Data: biodepth
## 
## REML criterion at convergence: 6082.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2187 -0.5179 -0.1031  0.3941  3.1735 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Mix      (Intercept) 33665.3  183.48  
##  Block    (Intercept)   383.2   19.57  
##  Site     (Intercept) 30163.9  173.68  
##  Residual             22039.8  148.46  
## Number of obs: 451, groups:  Mix, 192; Block, 15; Site, 8
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)  349.488     66.244   8.576   5.276 0.000598 ***
## Diversity2    78.901     12.059 175.159   6.543 6.42e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##            (Intr)
## Diversity2 -0.286

Overall, the model assumes that the response variable "Shoot2" is influenced by the fixed effect "Diversity2" while accounting for the random effects due to the varying levels of "Site," "Block," and "Mix" within the data.

It also suffers from the fact that the design may be nested or crossed, it is not explicit. Basically we know that each block exists only within each site, they should be nested. If each block has a unique code then this is ok, but if for example they are coded as Germany: A/B, Switzerland A/B etc... then we would need to make our design explicity e.g. Site/Block.

This model also does not allow for any random slopes, which we may wish to consider.

Q. Discuss these two designs

bio.lmer1a <- lmer(Shoot2 ~ Diversity2 + (Diversity2 | Site) + (1 | Site/Mix) + (1 | Block), data = biodepth)

#Site/Mix denotes that Mix is nested fully within Site
bio.lmer2 <- lmer(Shoot2 ~ Diversity2 + (Diversity2|Site) + (1 | Site:Mix) + (1|Mix) + (1 | Block), data = biodepth)

# Site:Mix denotes the random effect varies independently for each combination of levels in the Site and Mix factors

In the above designs we have produced a random slope and intercept for the effect of Site. This makes sense if we think that the way diversity affects yield may vary in a random site-specific manner.

What about the nesting or interaction of Site and Mix?

In the 1 | Site/Mix specification represents a nested random effect structure. It suggests that the random effect of Mix is nested within the random effect of Site. This specification assumes that the levels of Mix are nested within each level of Site, meaning that each level of Site has its own set of random effects for the levels of Mix. The / operator denotes the nesting relationship.

On the other hand 1 | Site:Mix specification, the random effect is specified as a two-way crossed random effect. It means that the random effect varies independently for each combination of levels in the Site and Mix factors. The : operator represents the interaction or crossed effect between the two factors. This specification allows for correlations between random effects within each combination of Site and Mix.

To summarize:

1 | Site:Mix: Two-way crossed random effect, allowing for independent variation of random effects for each combination of levels in Site and Mix.

1 | Site/Mix: Nested random effect, with the random effect of Mix nested within the random effect of Site, assuming that the levels of Mix are completely nested within each level of Site.

bio.lmer3 <- lmer(Shoot2 ~ Diversity2 + (Diversity2|Site) + (1 | Site:Mix) + (1|Mix), data = biodepth)
anova(bio.lmer2, bio.lmer3)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
bio.lmer3 8 6080.019 6112.911 -3032.010 6064.019 NA NA NA
bio.lmer2 9 6080.168 6117.171 -3031.084 6062.168 1.851387 1 0.1736222

Q. What does the LRT indicate about the effect of block on our model?

Should we remove it from our model?

The LRT indicates that removing block from the design of our model does not produce a simpler model that explains significantly less variance. So in effect we could remove it. However, I would argue it also does not harm the model and it is part of our explicit design, so should be left in.

6.1.1 Model predictions

nested_data <- biodepth %>% 
    group_by(Site) %>% 
    nest()

models <- map(nested_data$data, ~ lm(Shoot2 ~ Diversity2, data = .)) %>% 
    map(predict)
biodepth.2 <- biodepth %>% 
    mutate(fit.m = predict(bio.lmer2, re.form = NA),
           fit.c = predict(bio.lmer2, re.form = ~(1+Diversity2|Site)),
           fit.l = unlist(models))

biodepth.2 %>% 
    ggplot(aes(x = Diversity2, y = Shoot2, colour = Site)) +
    geom_point(pch = 16) + 
    geom_line(aes(y = fit.l), color = "black")+
    geom_line(aes(y = fit.c))+ 
    geom_line(aes(y = fit.m), colour = "grey",
              linetype = "dashed")+
    facet_wrap( ~ Site)+
   coord_cartesian(ylim = c(0, 1200))+
   theme(legend.position = "none")

Q. Can you observe the effect of partial pooling in this analysis?