DI Models

Starter Pack

Welcome to the Diversity-Interactions (DI) models starter pack. Here you will find some simple examples of how to fit DI models to data, including details of the models, code to fit the models, and how to interpret the results.

Installing and Loading the DImodels R Package

Before you get started, we recommend you install and load the DImodels R package.

As the DImodels package is available on CRAN, installing it is simple. We use the install.packages() function, specifying DImodels (with the quotes), to install the latest version of the package.

install.packages("DImodels")

Next, in order to have the functions in the package available to our current R session, we load the installed package using the library() function, specifying DImodels (this time without the quotes).

library(DImodels)

And now the package is ready for use!

You should only have to install the package once (and again if an updated version is available on CRAN) but will have to load the package each time you start a new R session. It is good coding practice to load all required libraries/packages at the start of any R file that you make.

What will I learn from looking at this example?

In this example, we describe how to implement DI models using R (version ≥ 4.1.2) for a two species system using a simulated dataset consisting of two unique species and a treatment factor with two levels. The models described could be used to answer research questions like whether species diversity has an impact on the ecosystem function and if yes, is it affected by only the species identities or also their interaction.

Note that in all models fitted to this dataset, it is assumed that \(\theta\) (the non-linear interaction parameter) is equal to 1. Find out more about this non-linear parameter in the Deep Dive section.
Summary of Data
image description image description
training-starterPack-2species-summary.knit

In this example, we simulate a dataset consisting of 11 communities comprised of different combinations of two plant species (\(Sp1\) and \(Sp2\)), with each community replicated three times under two fertiliser treatments (“low” and “high”). The response or ecosystem function is \(y\) and is simulated under the model:

\[\large y = \beta_1 p_1 + \beta_2 p_2 + \delta_{12} p_1 p_2 + \alpha_k + \epsilon\]

where \(\beta_1\) and \(\beta_2\) are the identity effects of species \(Sp1\) and \(Sp2\) respectively, with \(p_1\) and \(p_2\) being their respective proportions, \(\delta_{12}\) is the interaction effect between the two species, and \(\alpha_k\) is the effect of treatment (which has \(k = 1, 2\) levels here). \(\epsilon\) is the error term and is assumed to be normally distributed with constant variance.

The response is simulated using true values of 5 and 3 for the identity effects \(\beta_{1}\) and \(\beta_{2}\) for species \(Sp1\) and \(Sp2\), respectively, 12 for the interaction effect \(\delta_{12}\) and 6 for the high treatment effect \(\alpha_2\) (and \(\alpha_1\) = 0 for the low treatment effect). A random normal error term \(\epsilon\) is added to the response with mean 0 and variance 1.

# Seed for reproducibility
set.seed(123)

# Simulate communities
two_data <- data.frame("Sp1" = rep(seq(0, 1, .1), 6),
                       "Sp2" = rep(seq(1,0, by = -.1), 6))
two_data$Sp1Sp2 <- two_data$Sp1 * two_data$Sp2

# Add Treatments (assume low fertiliser treatment is 0 and high is 1)
two_data$Treatment <- rep(c(0,1), each = 33)

# Set model coefficients
Sp1_id <- 5
Sp2_id <- 3
Sp1Sp2_int <- 12
Treat <- 6

coeffs <- matrix(c(Sp1_id, Sp2_id, Sp1Sp2_int, Treat), ncol = 1)

# Simulate response
two_data$Response <- as.matrix(two_data) %*% coeffs

# Add random normal error term
two_data$Response <- two_data$Response + rnorm(nrow(two_data), 0, 1)

# Recode treatment as high and low
two_data$Treatment <- factor(ifelse(two_data$Treatment == 0, "low", "high"), levels = c("high", "low"))

# Add unique ID for each community
two_data$Community <- rep(seq(1, 11), times = 6)

# Reorder columns 
two_data <- two_data[, c("Community", "Sp1", "Sp2", "Sp1Sp2",
                         "Treatment", "Response")]

head(two_data, 6)
##   Community Sp1 Sp2 Sp1Sp2 Treatment Response
## 1         1 0.0 1.0   0.00       low 2.439524
## 2         2 0.1 0.9   0.09       low 4.049823
## 3         3 0.2 0.8   0.16       low 6.878708
## 4         4 0.3 0.7   0.21       low 6.190508
## 5         5 0.4 0.6   0.24       low 6.809288
## 6         6 0.5 0.5   0.25       low 8.715065

The data columns are as follows:

Community: A unique identifier for each community in the design.
Sp1: A numeric vector indicating the initial proportion of species 1.
Sp2: A numeric vector indicating the initial proportion of species 2.
Sp1Sp2: A numeric vector indicating the product of the proportions of species 1 and 2.
Treatment: A factor with levels “low” and “high” identifying the two fertilisation treatments.
Response: A numeric vector indicating the value of the measured ecosystem function.

Model Fitting using DImodels Package
image description image description
Here we fit a range of DI models to the two species dataset.

training-starterPack-2species-fitting_str.knit

In this model structure, we assume that changing species diversity does not have any effect on the ecosystem function. This implies that species identities and interactions coefficients are both 0, leaving only the experimental structures (treatments, blocking structure, etc.) as fixed effects in the model. An example of this model for this data is as follows:

\[\large y= \mu + \alpha_k + \epsilon\]

where \(y\) is a single response variable, \(\mu\) is a constant, \(\alpha_k\) is the effect of the treatment which is a factor with two levels (\(k = 1, 2\)). The error term, \(\epsilon\), is assumed normally distributed with mean 0 and constant variance.

We supply the DI() function with our response column (by name), our initial species (Sp1 and Sp2) proportions (again, either by name or column number), the treatment column (by name or column number), the type of DI model that we want to fit (“STR”), and our dataset (two_data).

STRModel <- DI(y = "Response", prop = c("Sp1", "Sp2"), treat = "Treatment", DImodel = "STR", data = two_data)
## Fitted model: Structural 'STR' DImodel
STRModel <- DI(y = "Response", prop = 2:3, treat = 5, DImodel = "STR", data = two_data)
## Fitted model: Structural 'STR' DImodel
summary(STRModel)
## 
## Call:
## glm(formula = fmla, family = family, data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.8143  -0.9987   0.1496   1.0318   2.9268  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     5.7883     0.2540   22.79   <2e-16 ***
## Treatmenthigh   6.0748     0.3592   16.91   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 2.128805)
## 
##     Null deviance: 745.14  on 65  degrees of freedom
## Residual deviance: 136.24  on 64  degrees of freedom
## AIC: 241.14
## 
## Number of Fisher Scoring iterations: 2
training-starterPack-2species-fitting_id.knit

In this model structure, we assume that species do not interact with one another. This implies that species interactions don’t affect the ecosystem function leaving only the species proportions as fixed effects and experimental structures. An example of this model is as follows:

\[\large y=\sum_{i=1}^{S=2} \beta_{i} p_{i} + \alpha_k + \epsilon\]

where \(y\) is a single response variable or the ecosystem function, \(S\) is the number of species, and \(p_{i}\) represents the initial proportion of the \(i^{th}\) species. \(\beta_{i}\) scaled by \(p_{i}\) is the expected contribution of the \(i^{th}\) species to the response and is referred to as the \(i^{th}\) species’ identity effect, \(\alpha_k\) is the effect of the treatment which is a factor with two levels (\(k = 1, 2\)). The error term, \(\epsilon\), is assumed normally distributed with mean 0 and constant variance.

For this dataset, the ID model can also be written as:

\[\large y=\beta_{1} p_{1} +\beta_{2} p_{2} + \alpha_k + \epsilon\]

We supply the DI() function with the same parameters as the previous (STR) model, but change the DIModel parameter to ‘ID’ now.

IDModel <- DI(y = "Response", prop = c("Sp1", "Sp2"), treat = "Treatment", DImodel = "ID", data = two_data)
## Fitted model: Species identity 'ID' DImodel
summary(IDModel)
## 
## Call:
## glm(formula = new_fmla, family = family, data = new_data)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.87594  -0.94669   0.02013   0.91477   2.92678  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## Sp1_ID          6.7266     0.3497   19.23   <2e-16 ***
## Sp2_ID          4.8499     0.3497   13.87   <2e-16 ***
## Treatmenthigh   6.0748     0.3297   18.43   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 1.793625)
## 
##     Null deviance: 5886  on 66  degrees of freedom
## Residual deviance:  113  on 63  degrees of freedom
## AIC: 230.79
## 
## Number of Fisher Scoring iterations: 2
training-starterPack-2species-fitting_av.knit

The average pairwise interaction structure assumes that all species interact in the same way. However, in the case of a two-species example, as in this dataset, there are not multiple pairwise interactions to average over and this model cannot be fit for this dataset.

See the three-species and nine-species examples in the Starter Pack for examples of fitting the AV model structure.

If we try to fit the pairwise interaction model to this dataset, we call the DI() function with the DImodel parameter set to ‘AV’, but it won’t fit.

AVModel <- DI(y = "Response", prop = c("Sp1", "Sp2"), treat = "Treatment", DImodel = "AV", data = two_data)
## Error in DI(y = "Response", prop = c("Sp1", "Sp2"), treat = "Treatment", : you must have > 2 species to fit model AV (Average interactions 'AV' DImodel)
training-starterPack-2species-fitting_fg.knit

The functional group model groups the pairwise interactions according to functional group membership. However, when there are only two species, there is only one possible pairwise interaction and so it is not possible to group interaction terms. Thus for this dataset, the functional group model cannot be fit.

See the nine-species example in the Starter Pack for an example of fitting the functional group model.

training-starterPack-2species-fitting_add.knit

This additive species model assumes that the contribution of a species in its interaction is the same regardless of the other species it interacts with. In this case, there are only two species and so the additive species model does not make sense. More than three species are needed to fit the additive species model, see the nine-species example in the Starter Pack for an example of fitting the additive species model.

We can try to fit this model using the DI function with the DImodel parameter set to ‘ADD’, however, it will return an error:

ADDModel <- DI(y = "Response", prop = c("Sp1", "Sp2"), treat = "Treatment", DImodel = "ADD", data = two_data)
## Error in DI(y = "Response", prop = c("Sp1", "Sp2"), treat = "Treatment", : you must have > 3 species to fit model ADD (Additive species contributions to interactions 'ADD' DImodel)
training-starterPack-2species-fitting_full.knit

This interaction structure assumes that each pair of species interacts uniquely.

\[\large y=\sum_{i=1}^{S=2} \beta_{i} p_{i} + \sum_{\substack{i,j=1 \\ i<j}}^{S=2} \delta_{ij} (p_{i} p_{j}) + \alpha_k + \epsilon\]

where \(y\) is a single response variable, \(S\) is the number of species, and \(p_{i}\) represents the initial proportion of the \(i^{th}\) species. The parameter \(\beta_{i}\) scaled by \(p_{i}\) is the expected contribution of the \(i^{th}\) species to the response and is referred to as the \(i^{th}\) species’ identity effect; if no species interactions or treatments are required in the model, the response \(y\) is the weighted average of the species identity effects. The parameter \(\delta_{ij}\) is the interaction between species \(i\) and \(j\) and is scaled by the product of \(p_{i}\) and \(p_{j}\) to determine the contribution of the interaction to the expected response. The parameter \(\alpha_k\) is the effect of the treatment which is a factor with two levels (\(k = 1, 2\)). The error term, \(\epsilon\), is assumed normally distributed with mean 0 and constant variance.


For this dataset, the full pairwise model can also be written as:

\[\large y=\beta_{1} p_{1}+ \beta_{2} p_{2} + \delta_{12} (p_{1} p_{2}) + \alpha_k + \epsilon\]

There is only one interaction term, because there are only two species in this example.


We supply the DI() function with the same parameters as the previous model, but change the DImodel parameter to ‘FULL’ now.

FULLModel <- DI(y = "Response", prop = c("Sp1", "Sp2"), treat = "Treatment", DImodel = "FULL", data = two_data)
## Fitted model: Separate pairwise interactions 'FULL' DImodel
summary(FULLModel)
## 
## Call:
## glm(formula = new_fmla, family = family, data = new_data)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.84905  -0.50296  -0.02355   0.59278   2.01167  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## Sp1_ID          5.0825     0.3038  16.729  < 2e-16 ***
## Sp2_ID          3.2058     0.3038  10.552 1.80e-15 ***
## `Sp1:Sp2`      10.9608     1.2659   8.658 2.87e-12 ***
## Treatmenthigh   6.0748     0.2236  27.167  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.8250075)
## 
##     Null deviance: 5886.04  on 66  degrees of freedom
## Residual deviance:   51.15  on 62  degrees of freedom
## AIC: 180.48
## 
## Number of Fisher Scoring iterations: 2
Model Selection
image description image description
training-starterPack-2species-selection.knit

From these models, we can formally test for our best model in many ways. One option is by comparing each of their AIC values (given in the summary() outputs above, or separately as below).

# Get the Akaike information criterion (AIC) for fitted models
AIC(STRModel); AIC(IDModel); AIC(FULLModel);
## [1] 241.136
## [1] 230.7893
## [1] 180.4776

As the full pairwise model (FULLModel) has the lowest AIC, we can infer that it best describes the BEF relationship out of the chosen models.

Another option is the use of the anova() function in R to perform an F-test. This will test the hypothesis that the added interaction terms are not significantly different from 0, and therefore whether the extra terms in the model explain enough variance to be worth the extra complexity. These tests are possible because the presented interaction structures have a nested structure (as described in Kirwan et al 2009).

# Anova test
anova(STRModel, IDModel, FULLModel, test = "F")
## Analysis of Deviance Table
## 
## Model 1: Response ~ Treatment
## Model 2: Response ~ 0 + Sp1_ID + Sp2_ID + Treatmenthigh
## Model 3: Response ~ 0 + Sp1_ID + Sp2_ID + `Sp1:Sp2` + Treatmenthigh
##   Resid. Df Resid. Dev Df Deviance      F    Pr(>F)    
## 1        64     136.24                                 
## 2        63     113.00  1   23.245 28.176 1.582e-06 ***
## 3        62      51.15  1   61.848 74.966 2.874e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In the cases where the F-test is significant (i.e. below the chosen \(\alpha\) threshold, usually 0.05), we can reject the null hypothesis that the added terms are equal to 0 and so conclude that more complex model is a better fit for the data. The first test here (F = 28.176, p < 0.001) confirms that the species identity effects are explaining significant variation. The second test (F = 74.966, p < 0.001) confirms that the interaction between the two species is needed.

Interpretation of the Selected Model
image description image description
training-starterPack-2species-interpret.knit

Interpreting model coefficients

With the full pairwise model was selected as best, we can now interpret the parameters!

summary(FULLModel)$coefficients
##                Estimate Std. Error   t value     Pr(>|t|)
## Sp1_ID         5.082510  0.3038232 16.728512 1.152043e-24
## Sp2_ID         3.205813  0.3038232 10.551574 1.798994e-15
## `Sp1:Sp2`     10.960824  1.2659300  8.658318 2.874219e-12
## Treatmenthigh  6.074773  0.2236078 27.167086 3.876526e-36

The ID effects (\(\hat{\beta_1} = 5.08\) and \(\hat{\beta_2} = 3.21\)) are the predicted monoculture performances of species \(Sp_1\) and \(Sp_2\), with \(Sp_1\) being the best monoculture as 5.08 > 3.21. In mixture, the combination of weighted identity effects estimates and the interaction estimate scaled by the product of the proportions determine the predicted ecosystem function. Here, the species interaction parameter is significant and positive (\(\hat{\delta}_{12} =10.96\), p \(<0.001\)) implying that the species have a synergistic relationship.

The parameter estimate, \(\hat{\alpha}_1\) represents the low level of the treatment and is set to 0, while \(\hat{\alpha}_2\) represents the high level of the treatment. The treatment effect (\(\hat{\alpha}_2 = 6.07\)) being significant (p < 0.001) implies a difference between the low and high levels of the treatment.

Prediction and inference

To determine which combinations of species proportions give the best predicted ecosystem function, we can predict from the model for a range of communities. For example, predicting the performance of the monoculture of species \(Sp_1\) under the ‘low’ treatment gives:

\[\large \hat{y} = 5.08*1 + 3.21*0 + 10.96*1*0 = 5.08\]

While predicting for a \(50:50\) mixture under the ‘low’ treatment gives:

\[\large \hat{y} = 5.08*0.5 + 3.21*0.5 + 10.96*0.5*0.5 = 6.885\]

We can use the predict() function to automatically generate these predictions in R and also use the contrasts_DI() function to formally test between different communities.

# Store communites of interest in data frame
my_communities <- as.data.frame(matrix(c(1, 0,
                                         0.5, 0.5),
                                nrow = 2, byrow = T))
my_communities$Treatment <- c("low", "low")
colnames(my_communities) <- c("Sp1", "Sp2", "Treatment")
# Make prediction 
# (Don't need to add interaction terms, the predict function will calculate those)
predict(FULLModel, newdata = my_communities)
## [1] 5.082510 6.884367
# Communities for comparison
mono_1 <- c(1, 0, 0, 0) # Species 1 monoculture at treatment 1
two_mix <- c(0.5, 0.5, 0.25, 0) # 50:50 mix of two species at treatment 1

# Contrast between the two species
contr <- contrasts_DI(FULLModel, contrast = two_mix - mono_1,
                      alternative = "greater")
## Generated contrast matrix:
##        Sp1_ID Sp2_ID `Sp1:Sp2` Treatmenthigh
## Test 1   -0.5    0.5      0.25             0
summary(contr)
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Fit: glm(formula = new_fmla, family = family, data = new_data)
## 
## Linear Hypotheses:
##             Estimate Std. Error z value   Pr(>z)    
## Test 1 <= 0   1.8019     0.3625   4.971 3.34e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

These results show that the 50:50 mixture of \(Sp_1\) and \(Sp_2\) performs significantly better (at the \(\alpha = 0.05\) level) than the \(Sp_1\) monoculture.

Visualising results

We can visualise the predictions for various communities in the experiment and split the ecosystem function into the contribution of species identities and their interactions.


A major advantage of DI models is that besides predicting for communities within the experiment, we can predict across the entire gradient of species proportions within the simplex. This can help to identify the best performing community for a particular response.

Here we can see the response for all possible two species communities split into the contribution of the identity effects of the two species and their interaction. This is useful to see the variation in the response as the proportion of 1 species increases or decreases. We can also identify the ‘best’ performing community (the highest point of the curve, highlighted in the figure: a 60:40 mix of species 1 and 2).


To see further details for how best to show off your model results in a plot/graph, see Visualising DI Models.

What will I learn from looking at this example?

In this example, we describe how to implement simple DI models using R (version ≥ 4.1.2) code. The dataset used contains three species, with no functional groupings and no treatment or blocking effects. The models described could be used to answer a variety of possible research questions, such as whether or not the identity of the species present in a community (rather than simply the number of species present) has any effect on the ecosystem function output, or if any of the species present interact in a significant way to improve the ecosystem function response, and if so, what is the best form of the interactions?

Note that in all models fitted to this dataset, it is assumed that \(\theta\) (the non-linear interaction parameter) is equal to 1. Find out more about this non-linear parameter in the Deep Dive section.
Summary of data
image description image description
training-starterPack-3species-summary.knit

In this example, we use the “sim0” dataset available in the DImodels R package. It contains 16 unique communities comprised of differing combinations of three species proportions (p1, p2, p3), with each community replicated four times and an ecosystem function measurement (response).

We can load and view the first few rows of the dataset:

data("sim0")
head(sim0)
##   community richness  p1  p2  p3 response
## 1         1        1 1.0 0.0 0.0   24.855
## 2         2        1 0.0 1.0 0.0   19.049
## 3         3        1 0.0 0.0 1.0   16.292
## 4         4        2 0.8 0.2 0.0   31.529
## 5         5        2 0.2 0.8 0.0   25.102
## 6         6        2 0.8 0.0 0.2   24.615

This dataset matches the recommended wide format for using DI models, with a separate column for each species’ proportions and the response.

Model Fitting using DImodels Package
image description image description
Here we fit a range of DI models to the three species dataset.

training-starterPack-3species-fitting_id.knit

In this model structure, we assume that species do not interact with one another. This implies that species interactions don’t affect the ecosystem function leaving only the species proportions as fixed effects. An example of this model is as follows:

\[\large y=\sum_{i=1}^{S=3} \beta_{i} p_{i} + \epsilon\]

where \(y\) is a single response variable, \(S\) is the number of species, and \(p_{i}\) represents the initial proportion of the \(i^{th}\) species. \(\beta_{i}\) scaled by \(p_{i}\) is the expected contribution of the \(i^{th}\) species to the response and is referred to as the \(i^{th}\) species’ identity effect. The error term, \(\epsilon\), is assumed to be normally distributed with constant variance.

For this dataset, the ID model can also be written as:

\[\large y=\beta_{1} p_{1} + \beta_{2} p_{2} + \beta_{3} p_{3} + \epsilon\]

To fit the model, we supply the DI() function with our response column (either by name or column number), our initial species proportions (p1, p2, p3) (again, either by name or column number), the type of DI model that we want to fit (“ID”), and our dataset (sim0).

IDModel <- DI(y = "response", prop = c("p1", "p2", "p3"), DImodel = "ID", data = sim0)
## Fitted model: Species identity 'ID' DImodel
IDModel <- DI(y = 6, prop = 3:5, DImodel = "ID", data = sim0)
## Fitted model: Species identity 'ID' DImodel
summary(IDModel)
## 
## Call:
## glm(formula = fmla, family = family, data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -9.7126  -2.5352   0.3654   3.2908   6.9095  
## 
## Coefficients:
##       Estimate Std. Error t value Pr(>|t|)    
## p1_ID   29.211      1.264   23.10   <2e-16 ***
## p2_ID   26.471      1.264   20.94   <2e-16 ***
## p3_ID   21.537      1.264   17.03   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 19.11035)
## 
##     Null deviance: 43863.7  on 64  degrees of freedom
## Residual deviance:  1165.7  on 61  degrees of freedom
## AIC: 375.37
## 
## Number of Fisher Scoring iterations: 2
training-starterPack-3species-fitting_av.knit

This interaction structure assumes that all pairs of species interact in the same way.

\[\large y = \sum_{i=1}^{S=3} \beta_{i} p_{i} + \delta_{AV} \sum_{\substack{i,j=1 \\ i<j}}^{S=3} p_{i} p_{j} + \epsilon\]

where \(y\) is a single response variable, \(S = 3\) is the number of species, and \(p_{i}\) represents the initial proportion of the \(i^{th}\) species. The parameter \(\beta_{i}\) scaled by \(p_{i}\) is the expected contribution of the \(i^{th}\) species to the response and is referred to as the \(i^{th}\) species’ identity effect; \(\delta_{AV}\) is the coefficient of the average interaction term and represents the interactions between any pair of species. The error term, \(\epsilon\), is assumed to be normally distributed with constant variance.

For this dataset, the AV model can also be written as:

\[\large y = \beta_{1} p_{1} + \beta_{2} p_{2} + \beta_{3} p_{3} + \delta_{AV} (p_1p_2 + p_1p_3 + p_2p_3) + \epsilon\]



To fit the model, we supply the DI() function with our response column (either by name or column number), our initial species proportions (p1, p2, p3) (again, either by name or column number), the type of DI model that we want to fit (“AV”), and our dataset (sim0).

AVModel <- DI(y = "response", prop = c("p1", "p2", "p3"), DImodel = "AV", data = sim0)
## Fitted model: Average interactions 'AV' DImodel
summary(AVModel)
## 
## Call:
## glm(formula = fmla, family = family, data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.6841  -1.6560  -0.2981   1.3512   4.7265  
## 
## Coefficients:
##       Estimate Std. Error t value Pr(>|t|)    
## p1_ID   22.671      0.787   28.81   <2e-16 ***
## p2_ID   19.930      0.787   25.32   <2e-16 ***
## p3_ID   14.996      0.787   19.05   <2e-16 ***
## AV      36.295      2.644   13.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 4.691179)
## 
##     Null deviance: 43863.70  on 64  degrees of freedom
## Residual deviance:   281.47  on 60  degrees of freedom
## AIC: 286.42
## 
## Number of Fisher Scoring iterations: 2
training-starterPack-3species-fitting_fg.knit

The functional group model groups the pairwise interactions according to functional group membership. However, in this three species example, there are no defined functional groups, i.e., the species have not been classified according to any functional traits or groupings, and so it is not appropriate to fit the functional group model to this data.

See the nine-species example in the Starter Pack for an example of fitting the functional group model.

training-starterPack-3species-fitting_add.knit

This interaction structure assumes that the contribution of a species in its interaction is the same regardless of the other species it interacts with.

\[\large y = \sum_{i=1}^{S = 3} \beta_{i} p_{i} + \sum_{\substack{i,j=1 \\ i<j}}^{S = 3} (\lambda_{i} + \lambda{j}) (p_{i} p_{j}) + \epsilon\]

where \(y\) is a single response variable, \(S\) is the number of species, and \(p_{i}\) represents the initial proportion of the \(i^{th}\) species. \(\beta_{i}\) scaled by \(p_{i}\) is the expected contribution of the \(i^{th}\) species to the response and is referred to as the \(i^{th}\) species’ identity effect; \(\lambda_i\) is the contribution of species \(i\) to the interaction with species \(j\). The error term, \(\epsilon\), is assumed to be normally distributed with constant variance.

In order to fit this structure, enter DImodel = “ADD”. However, for a dataset with less than four species (such as this example), this model has the same number of terms as the full pairwise model and is therefore redundant and an error message is returned.

ADDModel <- DI(y = "response", prop = c("p1", "p2", "p3"), DImodel = "ADD", data = sim0)
## Error in DI(y = "response", prop = c("p1", "p2", "p3"), DImodel = "ADD", : you must have > 3 species to fit model ADD (Additive species contributions to interactions 'ADD' DImodel)



An example of fitting the additive species model can be seen in the nine species example in the Starter Pack.

training-starterPack-3species-fitting_full.knit

This interaction structure assumes that each pair of species interacts uniquely.

\[\large y=\sum_{i=1}^{S=3} \beta_{i} p_{i} + \sum_{\substack{i,j=1 \\ i<j}}^{S=3} \delta_{ij} (p_{i} p_{j}) + \epsilon\]

where \(y\) is an ecosystem function response variable, \(S\) is the number of species and \(p_{i}\) represents the initial proportion of the \(i^{th}\) species, \(\beta_{i}\) scaled by \(p_{i}\) is the expected contribution of the \(i^{th}\) species to the response and is referred to as the \(i^{th}\) species’ identity effect. Finally, \(\delta_{ij}\) is the coefficient of the interaction between species \(i\) and \(j\) and is scaled by the product of \(p_{i}\) and \(p_{j}\) to determine the expected contribution to the response. The error term, \(\epsilon\), is assumed to be normally distributed with constant variance.

For this dataset, the full pairwise model can also be written as:

\[\large y=\beta_{1} p_{1} + \beta_{2} p_{2} + \beta_{3} p_{3} + \delta_{12} (p_{1} p_{2}) + \delta_{13} (p_{1} p_{3}) + \delta_{23} (p_{2} p_{3}) + \epsilon\]


To fit the model, we supply the DI() function with our response column (either by name or column number), our initial species proportions (p1, p2, p3) (again, either by name or column number), the type of DI model that we want to fit (“FULL”), and our dataset (sim0).

FULLModel <- DI(y = "response", prop = c("p1", "p2", "p3"), DImodel = "FULL", data = sim0)
## Fitted model: Separate pairwise interactions 'FULL' DImodel
summary(FULLModel)
## 
## Call:
## glm(formula = fmla, family = family, data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9829  -1.1752  -0.1008   1.1086   5.0970  
## 
## Coefficients:
##         Estimate Std. Error t value Pr(>|t|)    
## p1_ID    24.1740     0.7698  31.403  < 2e-16 ***
## p2_ID    18.6155     0.7698  24.182  < 2e-16 ***
## p3_ID    14.8069     0.7698  19.235  < 2e-16 ***
## `p1:p2`  34.8345     3.5694   9.759 7.60e-14 ***
## `p1:p3`  26.1355     3.5694   7.322 8.43e-10 ***
## `p2:p3`  47.9159     3.5694  13.424  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 3.541817)
## 
##     Null deviance: 43863.70  on 64  degrees of freedom
## Residual deviance:   205.43  on 58  degrees of freedom
## AIC: 270.26
## 
## Number of Fisher Scoring iterations: 2
Model Selection
image description image description
training-starterPack-3species-selection.knit

We can formally compare the fitted models to identify which one fits our data best. One way to do this is by comparing the AIC values across the models (given in the summary() outputs above, or separately as below).

# Get the Akaike information criterion (AIC) for fitted models
AIC(IDModel); AIC(AVModel);  AIC(FULLModel);
## [1] 375.3663
## [1] 286.4174
## [1] 270.2609

As the full pairwise model (FULLModel) has the lowest AIC, we can infer that it best describes the BEF relationship out of the fitted models.

Another option is the use of the anova() function in R to perform an F-test to compare pairs of models.

anova(IDModel, AVModel, test = "F")
## Analysis of Deviance Table
## 
## Model 1: response ~ 0 + p1_ID + p2_ID + p3_ID + 0
## Model 2: response ~ 0 + p1_ID + p2_ID + p3_ID + AV + 0
##   Resid. Df Resid. Dev Df Deviance      F    Pr(>F)    
## 1        61    1165.73                                 
## 2        60     281.47  1   884.26 188.49 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comparing the ID model and the average pairwise (AV) model, the F-test is significant (p<0.001) meaning that there is evidence that the average interaction term is needed, compared to a model that only has the identity effects.

anova(AVModel, FULLModel, test = "F")
## Analysis of Deviance Table
## 
## Model 1: response ~ 0 + p1_ID + p2_ID + p3_ID + AV + 0
## Model 2: response ~ 0 + p1_ID + p2_ID + p3_ID + `p1:p2` + `p1:p3` + `p2:p3` + 
##     0
##   Resid. Df Resid. Dev Df Deviance      F   Pr(>F)    
## 1        60     281.47                                
## 2        58     205.43  2   76.045 10.735 0.000108 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comparing the average pairwise model and the full pairwise model, the F-test is significant (p<0.001) meaning that there is evidence that three unique interaction terms are needed instead of a single average interaction term.

These two tests are possible because the pair of models in each comparison are nested.

For this dataset, using either AIC or F-tests to compare models give the same outcome: the full pairwise model bests fits the data.

Interpretation of the Selected Model
image description image description
training-starterPack-3species-interpret.knit

Interpreting model coefficients

With the full pairwise model was selected as best, we can now interpret the parameters!

summary(FULLModel)$coefficients
##         Estimate Std. Error   t value     Pr(>|t|)
## p1_ID   24.17403  0.7698027 31.402890 4.224059e-38
## p2_ID   18.61551  0.7698027 24.182188 5.543870e-32
## p3_ID   14.80689  0.7698027 19.234656 7.537304e-27
## `p1:p2` 34.83451  3.5694198  9.759151 7.600814e-14
## `p1:p3` 26.13553  3.5694198  7.322068 8.429085e-10
## `p2:p3` 47.91594  3.5694198 13.424014 1.925427e-19

Examining the ID effects (\(\hat{\beta_1} = 24.17\), \(\hat{\beta_2} = 18.62\), and \(\hat{\beta_3} = 14.81\)), species 1 is the highest, followed by species 2 and species 3 (24.17 > 18.62 > 14.81; differences can be detected formally). So without considering the interaction terms, we would assume that maximising the proportion of species 1 present would maximise the response. However, as our interaction terms are also significant, we need to take a close look at them as well.

We can see that each of our interaction terms are positive and significant (p<0.001), implying that each pair of species have synergistic relationships (rather than antagonistic) and so benefit from each other’s presence. When we consider the magnitude of our interaction terms, we can see that the interaction between species 2 and species 3 is the strongest, at a value of 47.92, and so the inclusion of these two species in the ecosystem would be extremely beneficial.

Prediction

To predict form the model for a monoculture of species 1 would result in the following:

\[\hat{y} = 24.17*1 + 18.62*0 + 14.81*0 + 34.83*1*0 + 26.14*1*0 + 47.92*0*0 = 24.17\]

To predict for an equi-proportional three species mixture, where each of our three species are equally represented at 1/3:

\[\hat{y} = 24.17*\frac{1}{3} + 18.62*\frac{1}{3} + 14.81*\frac{1}{3} + 34.83*(\frac{1}{3}*\frac{1}{3}) + 26.14*(\frac{1}{3}*\frac{1}{3}) + 47.92*(\frac{1}{3}*\frac{1}{3}) = 31.2988889 \]

We can use the predict() function in R to do these predictions.

sim0[c(1, 16), 3:5]
##           p1        p2        p3
## 1  1.0000000 0.0000000 0.0000000
## 16 0.3333333 0.3333333 0.3333333
predict(FULLModel, newdata=sim0[c(1, 16),])
## [1] 24.17403 31.29725


In order to find the optimal community for the given response without manually selecting and predicting for each possible community, we could plot the gradient formed by the DI model over the simplex space formed by the proportions of our three species. As we have three species, a ternary diagram is a great way to visualise this.

3 Species Example Ternary

In this ternary diagram, the predicted ecosystem function for each species in monoculture is at the relevant vertex, predictions for all possible two-species mixtures are along the edges, and predictions for all possible three-species mixtures are in the body of the triangle. The small black triangle represents the community that gives the highest point estimate prediction. We can also see that a considerable range of communities encircling the triangle is in the same response colour band and so also give high predictions for the ecosystem function, showing the robustness of the ecosystem function across changing species proportions to a certain degree.

To see further details for how best to show off your model results in a plot/graph, see Visualising DI Models.

What will I learn from looking at this example?

In this example, we describe how to implement DI models using R code. The dataset used contains data from a study of nine species classified according to three functional groups that were manipulated across varying species proportions and a treatment with two levels. The models described could be used to answer a variety of possible research questions, such as whether or not the identity of the species present in a system (rather than simply the number of species present) has any effect on the ecosystem function output, or if the species interact in a significant way, and if so, what is the nature of those interactions?

Note that in all models fitted to this dataset, it is assumed that \(\theta\) (the non-linear interaction parameter) is equal to 1. Find out more about this non-linear parameter in the Deep Dive section.
Summary of data
image description image description
training-starterPack-9species-summary.knit

In this example, we use the “sim3” dataset available in the DImodels R package. It contains data from different combinations of nine species proportions (p1 - p9), across two levels of a treatment factor (levels A and B). The species are categorised according to three functional groups: species 1 to 5 come from functional group 1, species 6 and 7 from functional group 2 and species 8 and 9 from functional group 3. The ecosystem function is stored in ‘response’. There are a total of 412 observations in the dataset.

data("sim3")
head(sim3)
##   community richness treatment p1 p2 p3 p4 p5 p6 p7 p8 p9 response
## 1         1        1         A  0  0  0  0  0  0  0  0  1   10.265
## 2         1        1         B  0  0  0  0  0  0  0  0  1    7.740
## 3         1        1         A  0  0  0  0  0  0  0  0  1   12.173
## 4         1        1         B  0  0  0  0  0  0  0  0  1    8.497
## 5         2        1         A  0  0  0  0  0  0  0  1  0   10.763
## 6         2        1         B  0  0  0  0  0  0  0  1  0    8.989

The dataset contains the following variables:

Community: A unique identifier for each combination of species proportions in the design.
p1: A numeric vector indicating the initial proportion of species 1.
p2: A numeric vector indicating the initial proportion of species 2.
p3: A numeric vector indicating the initial proportion of species 3.
p4: A numeric vector indicating the initial proportion of species 4.
p5: A numeric vector indicating the initial proportion of species 5.
p6: A numeric vector indicating the initial proportion of species 6.
p7: A numeric vector indicating the initial proportion of species 7.
p8: A numeric vector indicating the initial proportion of species 8.
p9: A numeric vector indicating the initial proportion of species 9.
Treatment: A factor with levels ‘A’ and ‘B’ identifying two levels of a treatment.
Response: A numeric vector indicating the value of the ecosystem function.

Model Fitting using DImodels Package
image description image description
Here we fit a range of DI models to the nine species dataset.

training-starterPack-9species-fitting_id.knit

In this model structure, we assume that species do not interact with one another, leaving only the species proportions and experimental structures as fixed effects. The model equation is:

\[\large y=\sum_{i=1}^{S=9} \beta_{i} p_{i} + \alpha_k + \epsilon\]

where \(y\) is the ecosystem function response variable, \(S = 9\) is the number of species, and \(p_{i}\) represents the initial proportion of the \(i^{th}\) species. The parameter \(\beta_{i}\) scaled by \(p_{i}\) is the expected contribution of the \(i^{th}\) species to the response and is referred to as the \(i^{th}\) species’ identity effect. The parameter \(\alpha_k\) is the effect of treatment, which has \(k\) = 2 levels in this dataset. The error term, \(\epsilon\), is assumed normally distributed with mean 0 and constant variance.

We can supply the DI() function with our response column (either by name or column number), our initial species proportions (p1 - p9; by name or column number) and the treatment (treatment). In this case, because of having nine species, it is easier to use column number in the prop argument. The type of DI model that we want to fit is the ID model (“ID”) and our dataset is sim3.

IDModel <- DI(y = "response", prop = 4:12, DImodel = "ID", treat = "treatment", data = sim3)
## Fitted model: Species identity 'ID' DImodel
summary(IDModel)
## 
## Call:
## glm(formula = new_fmla, family = family, data = new_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -4.1019  -0.8444   0.1160   0.8650   3.8917  
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## p1_ID       11.3266     0.3635   31.16   <2e-16 ***
## p2_ID        9.8534     0.3635   27.11   <2e-16 ***
## p3_ID        9.7985     0.3635   26.95   <2e-16 ***
## p4_ID        7.8162     0.3635   21.50   <2e-16 ***
## p5_ID       12.4509     0.3635   34.25   <2e-16 ***
## p6_ID        6.3545     0.3635   17.48   <2e-16 ***
## p7_ID        5.7442     0.3635   15.80   <2e-16 ***
## p8_ID       10.3742     0.3635   28.54   <2e-16 ***
## p9_ID       11.2651     0.3635   30.99   <2e-16 ***
## treatmentA   3.1018     0.1425   21.76   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 2.092721)
## 
##     Null deviance: 52280.33  on 412  degrees of freedom
## Residual deviance:   841.27  on 402  degrees of freedom
## AIC: 1485.3
## 
## Number of Fisher Scoring iterations: 2
training-starterPack-9species-fitting_av.knit

This interaction structure assumes that all pairs of species interact in the same way. The model equation is:

\[\large y = \sum_{i=1}^{S=9} \beta_{i} p_{i} + \delta_{AV} \sum_{\substack{i,j=1 \\ i<j}}^{S=9} p_{i} p_{j} + \alpha_k + \epsilon\]

where \(y\) is the ecosystem function response variable, \(S = 9\) is the number of species, and \(p_{i}\) represents the initial proportion of the \(i^{th}\) species. The parameter \(\beta_{i}\) scaled by \(p_{i}\) is the expected contribution of the \(i^{th}\) species to the response and is referred to as the \(i^{th}\) species’ identity effect; \(\delta_{AV}\) is the coefficient of the average interaction term and represents the interaction between any pair of species. The parameter \(\alpha_k\) is the effect of treatment, which has \(k\) = 2 levels in this dataset. The error term, \(\epsilon\), is assumed normally distributed with mean 0 and constant variance.

We supply the DI() function with our response column (either by name or column number), our initial species proportions (p1-p9) (again, either by name or column number), the type of DI model that we want to fit (“AV”), the treatment (“treatment”) and the dataset (sim3).

AVModel <- DI(y = "response", prop = 4:12, treat = "treatment", DImodel = "AV", data = sim3)
## Fitted model: Average interactions 'AV' DImodel
summary(AVModel)
## 
## Call:
## glm(formula = new_fmla, family = family, data = new_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.6183  -0.8218   0.0626   0.8747   3.6932  
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## p1_ID        9.7345     0.3731  26.092   <2e-16 ***
## p2_ID        8.2613     0.3731  22.144   <2e-16 ***
## p3_ID        8.2064     0.3731  21.996   <2e-16 ***
## p4_ID        6.2242     0.3731  16.683   <2e-16 ***
## p5_ID       10.8589     0.3731  29.106   <2e-16 ***
## p6_ID        4.7624     0.3731  12.765   <2e-16 ***
## p7_ID        4.1522     0.3731  11.129   <2e-16 ***
## p8_ID        8.7821     0.3731  23.540   <2e-16 ***
## p9_ID        9.6731     0.3731  25.928   <2e-16 ***
## AV           5.3716     0.5830   9.213   <2e-16 ***
## treatmentA   3.1018     0.1297  23.924   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 1.731445)
## 
##     Null deviance: 52280.33  on 412  degrees of freedom
## Residual deviance:   694.31  on 401  degrees of freedom
## AIC: 1408.2
## 
## Number of Fisher Scoring iterations: 2
training-starterPack-9species-fitting_fg.knit

This interaction structure assumes that species interactions can be grouped based on their functional group membership. In this dataset, species 1 to 5 come from functional group 1, species 6 and 7 from functional group 2 and species 8 and 9 from functional group 3. Thus, the interaction terms can then be structured as ‘within’ or ‘between’ functional group interactions according to this classification. Here is an example of the equation for the functional group model:

\[\large y = \sum_{i=1}^{S = 9} \beta_{i} p_{i} + \sum_{q=1}^{T = 3} \omega_{qq} \sum_{\substack{i,j \in FG_q \\ i \lt j}} p_i p_j + \sum_{\substack{q,r = 1 \\ q \lt r}}^{T = 3} \omega_{qr} \sum_{i \in FG_q} \sum_{j \in FG_r} p_i p_j + \alpha_k + \epsilon\]

where \(y\) is the ecosystem function response variable, \(S = 9\) is the number of species, and \(p_{i}\) represents the initial proportion of the \(i^{th}\) species. The parameter \(\beta_{i}\) scaled by \(p_{i}\) is the expected contribution of the \(i^{th}\) species to the response and is referred to as the \(i^{th}\) species’ identity effect. Since we have \(T = 3\) functional groups (\(FG_1\) - \(FG_3\)), with five, two and two species in each group respectively, the functional group memberships are represented mathematically as:
\(FG_1 = \{1,2,3,4,5\}\), \(FG_2 = \{6,7\}\) and \(FG_3 = \{8,9\}\).
The parameter \(\omega_{qq}\) refers to the within functional group interaction between species from the functional group \(q\) and \(\omega_{qr}\) refers to the interaction between species from functional groups \(q\) and \(r\), for \(q, r = 1, 2, 3\). The parameter \(\alpha_k\) is the effect of treatment, which has \(k\) = 2 levels in this dataset. The error term, \(\epsilon\), is assumed normally distributed with mean 0 and constant variance.

In this example, \(\omega_{11}\) is the interaction term for any pair of species in FG\(_1\), that is any pair of species 1 to 5. Similarly, \(\omega_{12}\) is the interaction term for any pair of species with one in FG\(_1\) and one in FG\(_2\), so any of species 1 to 5 with either of species 6 and 7. An alternative way to write the functional group model equation for this dataset is:

\[\large y = \sum_{i=1}^{S = 9} \beta_{i} p_{i} + \omega_{11} \sum_{\substack{i,j \in \{1,...,5\} \\ i \lt j}} p_i p_j + \omega_{22} (p_6 p_7) + \omega_{33} (p_8 p_9) \\ \large + \omega_{12} \sum_{i \in \{1,...,5\} \\ j \in \{6,7\}} p_i p_j + \omega_{13}\sum_{i \in \{1,...,5\}\\ j \in \{8,9\}} p_i p_j + \omega_{23} \sum_{i \in \{6,7\} \\ j \in \{8,9\}} p_i p_j + \alpha_k + \epsilon\]

To fit the functional group model, we supply a list of the functional group memberships for the nine species through the FG parameter and enter DImodel = “FG”.

FGModel <- DI(y = "response", prop = 4:12, FG = c("FG1","FG1","FG1","FG1","FG1","FG2","FG2","FG3","FG3"),
              DImodel = "FG", treat = "treatment", data = sim3)
## Fitted model: Functional group effects 'FG' DImodel
summary(FGModel)
## 
## Call:
## glm(formula = new_fmla, family = family, data = new_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.8425  -0.8141   0.0509   0.8048   3.5657  
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## p1_ID            9.7497     0.3666  26.595  < 2e-16 ***
## p2_ID            8.5380     0.3672  23.253  < 2e-16 ***
## p3_ID            8.2329     0.3666  22.459  < 2e-16 ***
## p4_ID            6.3644     0.3665  17.368  < 2e-16 ***
## p5_ID           10.8468     0.3669  29.561  < 2e-16 ***
## p6_ID            5.9621     0.4515  13.205  < 2e-16 ***
## p7_ID            5.4252     0.4516  12.015  < 2e-16 ***
## p8_ID            7.3204     0.4515  16.213  < 2e-16 ***
## p9_ID            8.2154     0.4515  18.196  < 2e-16 ***
## FG_bfg_FG1_FG2   3.4395     0.8635   3.983 8.09e-05 ***
## FG_bfg_FG1_FG3  11.5915     0.8654  13.395  < 2e-16 ***
## FG_bfg_FG2_FG3   2.8711     1.2627   2.274  0.02351 *  
## FG_wfg_FG1       2.8486     0.9131   3.120  0.00194 ** 
## FG_wfg_FG2       0.6793     2.3553   0.288  0.77319    
## FG_wfg_FG3       2.4168     2.3286   1.038  0.29997    
## treatmentA       3.1018     0.1171  26.479  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 1.413412)
## 
##     Null deviance: 52280.33  on 412  degrees of freedom
## Residual deviance:   559.71  on 396  degrees of freedom
## AIC: 1329.4
## 
## Number of Fisher Scoring iterations: 2


Using the functional group model (when there are species that can be classified according to functional groups) can considerably reduce the number of interaction terms needed in the model.

training-starterPack-9species-fitting_add.knit

The additive species model assumes that the contribution of a species to any pairwise interaction is the same regardless of the other species it is interacting with. Here is the equation of the model:

\[\large y = \sum_{i=1}^{S = 9} \beta_{i} p_{i} + \sum_{\substack{i,j=1 \\ i<j}}^{S = 9} (\lambda_{i} + \lambda{j}) (p_{i} p_{j}) + \alpha_k + \epsilon\]

where \(y\) is the ecosystem function response variable, \(S = 9\) is the number of species, and \(p_{i}\) represents the initial proportion of the \(i^{th}\) species. The parameter \(\beta_{i}\) scaled by \(p_{i}\) is the expected contribution of the \(i^{th}\) species to the response and is referred to as the \(i^{th}\) species’ identity effect; \(\lambda_i\) is the contribution of species \(i\) to its pairwise interaction with any other species. The parameter \(\alpha_k\) is the effect of treatment, which has \(k\) = 2 levels in this dataset. The error term, \(\epsilon\), is assumed normally distributed with mean 0 and constant variance.

In order to fit this structure, enter DImodel = “ADD”.

ADDModel <- DI(y = "response", prop = 4:12, 
           treat = "treatment", DImodel = "ADD", data = sim3)
## Fitted model: Additive species contributions to interactions 'ADD' DImodel
summary(ADDModel)
## 
## Call:
## glm(formula = new_fmla, family = family, data = new_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.8784  -0.8266   0.0775   0.8138   3.4747  
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## p1_ID       10.0117     0.6064  16.509  < 2e-16 ***
## p2_ID        8.7290     0.6064  14.394  < 2e-16 ***
## p3_ID        8.6875     0.6064  14.325  < 2e-16 ***
## p4_ID        5.3745     0.6064   8.862  < 2e-16 ***
## p5_ID       10.9691     0.6064  18.087  < 2e-16 ***
## p6_ID        5.7841     0.6064   9.538  < 2e-16 ***
## p7_ID        5.6277     0.6064   9.280  < 2e-16 ***
## p8_ID        7.5514     0.6064  12.452  < 2e-16 ***
## p9_ID        7.9200     0.6064  13.060  < 2e-16 ***
## p1_add       2.0347     1.1779   1.727   0.0849 .  
## p2_add       1.5872     1.1779   1.347   0.1786    
## p3_add       1.5559     1.1779   1.321   0.1873    
## p4_add       4.6813     1.1779   3.974 8.40e-05 ***
## p5_add       2.4270     1.1779   2.060   0.0400 *  
## p6_add       0.2863     1.1779   0.243   0.8081    
## p7_add      -0.7797     1.1779  -0.662   0.5084    
## p8_add       5.5762     1.1779   4.734 3.08e-06 ***
## p9_add       6.8032     1.1779   5.776 1.56e-08 ***
## treatmentA   3.1018     0.1256  24.687  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 1.625989)
## 
##     Null deviance: 52280.33  on 412  degrees of freedom
## Residual deviance:   639.01  on 393  degrees of freedom
## AIC: 1390
## 
## Number of Fisher Scoring iterations: 2
training-starterPack-9species-fitting_full.knit

The full pairwise model assumes that each pair of species interacts uniquely. The equation of the model for this dataset is:

\[\large y=\sum_{i=1}^{S=9} \beta_{i} p_{i} + \sum_{\substack{i,j=1 \\ i<j}}^{S=9} \delta_{ij} (p_{i} p_{j}) + \alpha_k + \epsilon\]

where \(y\) is the ecosystem function response variable, \(S = 9\) is the number of species and \(p_{i}\) represents the initial proportion of the \(i^{th}\) species, \(\beta_{i}\) scaled by \(p_{i}\) is the expected contribution of the \(i^{th}\) species to the response and is referred to as the \(i^{th}\) species’ identity effect. The parameter \(\delta_{ij}\) is the coefficient of the interaction between species \(i\) and \(j\) and is scaled by the product of \(p_{i}\) and \(p_{j}\). The parameter \(\alpha_k\) is the effect of treatment, which has \(k\) = 2 levels in this dataset. The error term, \(\epsilon\), is assumed normally distributed with mean 0 and constant variance.

There will sometimes not be sufficient data to estimate all pairwise interactions in the ‘FULL’ model, or when they can all be estimated, it may be difficult to find valuable biological information if there are too many parameters. For this dataset, there are \(9 \choose 2\) = 36 pairwise interactions and they can each be estimated, however, it is quite a high number of species interaction terms to interpret.

We supply the DI() function with our response column (either by name or column number), our initial species proportions (p1-p9) (again, either by name or column number), the type of DI model that we want to fit (“FULL”), the treatment (“treatment”) and our dataset (sim3).

FULLModel <- DI(y = "response", prop = 4:12,
              DImodel = "FULL", treat = "treatment", data = sim3)
## Fitted model: Separate pairwise interactions 'FULL' DImodel
summary(FULLModel)
## 
## Call:
## glm(formula = new_fmla, family = family, data = new_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.5069  -0.7797   0.0286   0.7946   3.4967  
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## p1_ID      10.05168    0.56912  17.662  < 2e-16 ***
## p2_ID       8.62791    0.56910  15.161  < 2e-16 ***
## p3_ID       8.74352    0.56903  15.366  < 2e-16 ***
## p4_ID       5.36054    0.56903   9.420  < 2e-16 ***
## p5_ID      11.00558    0.56911  19.338  < 2e-16 ***
## p6_ID       5.78022    0.56934  10.152  < 2e-16 ***
## p7_ID       5.55972    0.56913   9.769  < 2e-16 ***
## p8_ID       7.57246    0.56928  13.302  < 2e-16 ***
## p9_ID       7.95342    0.56900  13.978  < 2e-16 ***
## `p1:p2`     0.68202    2.37136   0.288 0.773810    
## `p1:p3`    -0.47386    2.36532  -0.200 0.841329    
## `p1:p4`     2.62135    2.37502   1.104 0.270443    
## `p1:p5`     1.88040    2.36697   0.794 0.427458    
## `p1:p6`     2.75715    2.36473   1.166 0.244394    
## `p1:p7`     3.30887    2.36449   1.399 0.162540    
## `p1:p8`    13.02960    2.36715   5.504 6.98e-08 ***
## `p1:p9`    13.21818    2.36440   5.590 4.43e-08 ***
## `p2:p3`     5.42129    2.34578   2.311 0.021384 *  
## `p2:p4`     3.38768    2.36121   1.435 0.152220    
## `p2:p5`     4.30217    2.36255   1.821 0.069426 .  
## `p2:p6`     0.29687    2.37631   0.125 0.900650    
## `p2:p7`     3.02178    2.35990   1.280 0.201192    
## `p2:p8`    10.33087    2.40094   4.303 2.17e-05 ***
## `p2:p9`    12.36450    2.37896   5.197 3.37e-07 ***
## `p3:p4`     2.96470    2.36385   1.254 0.210575    
## `p3:p5`     0.51441    2.36183   0.218 0.827703    
## `p3:p6`     2.84968    2.37422   1.200 0.230815    
## `p3:p7`     1.57343    2.36345   0.666 0.505999    
## `p3:p8`     8.72365    2.36366   3.691 0.000258 ***
## `p3:p9`    10.06688    2.35671   4.272 2.48e-05 ***
## `p4:p5`     6.44354    2.37450   2.714 0.006970 ** 
## `p4:p6`     9.04877    2.38253   3.798 0.000171 ***
## `p4:p7`     7.24852    2.36340   3.067 0.002323 ** 
## `p4:p8`    13.08220    2.36361   5.535 5.95e-08 ***
## `p4:p9`    13.87147    2.34705   5.910 7.83e-09 ***
## `p5:p6`     4.05861    2.37897   1.706 0.088849 .  
## `p5:p7`    -0.09837    2.36553  -0.042 0.966854    
## `p5:p8`     8.34980    2.37336   3.518 0.000489 ***
## `p5:p9`    12.84620    2.34711   5.473 8.22e-08 ***
## `p6:p7`     1.09203    2.38344   0.458 0.647100    
## `p6:p8`     3.76147    2.38500   1.577 0.115628    
## `p6:p9`     3.15963    2.36522   1.336 0.182421    
## `p7:p8`     2.91356    2.37617   1.226 0.220929    
## `p7:p9`     2.44820    2.35727   1.039 0.299690    
## `p8:p9`     2.30884    2.37963   0.970 0.332561    
## treatmentA  3.10179    0.11778  26.335  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 1.428887)
## 
##     Null deviance: 52280.33  on 412  degrees of freedom
## Residual deviance:   522.97  on 366  degrees of freedom
## AIC: 1361.5
## 
## Number of Fisher Scoring iterations: 2
Model Selection
image description image description
training-starterPack-9species-selection.knit

We can compare the models that have been fitted to identify the model that best fits our data. One option is to compare the AIC values across the models. This can be done by comparing the summary() outputs for each model above, or separately as below.

# Get the Akaike information criterion (AIC) for fitted models
AIC(IDModel); AIC(AVModel); AIC(FGModel); AIC(ADDModel); AIC(FULLModel);
## [1] 1485.33
## [1] 1408.226
## [1] 1329.441
## [1] 1390.033
## [1] 1361.47

As the functional group model (FGModel) has the lowest AIC, we can infer that it best describes the BEF relationship out of the tested models.


Another option is the use of the anova() function in R to perform an F-test to compare pairs of models.

anova(IDModel, AVModel, test = "F")
## Analysis of Deviance Table
## 
## Model 1: response ~ 0 + p1_ID + p2_ID + p3_ID + p4_ID + p5_ID + p6_ID + 
##     p7_ID + p8_ID + p9_ID + treatmentA
## Model 2: response ~ 0 + p1_ID + p2_ID + p3_ID + p4_ID + p5_ID + p6_ID + 
##     p7_ID + p8_ID + p9_ID + AV + treatmentA
##   Resid. Df Resid. Dev Df Deviance     F    Pr(>F)    
## 1       402     841.27                                
## 2       401     694.31  1   146.96 84.88 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here the ID and AV model are compared and the test is significant, meaning that the AV model is an improvement over the ID model.



anova(AVModel, FGModel, test = "F")
## Analysis of Deviance Table
## 
## Model 1: response ~ 0 + p1_ID + p2_ID + p3_ID + p4_ID + p5_ID + p6_ID + 
##     p7_ID + p8_ID + p9_ID + AV + treatmentA
## Model 2: response ~ 0 + p1_ID + p2_ID + p3_ID + p4_ID + p5_ID + p6_ID + 
##     p7_ID + p8_ID + p9_ID + FG_bfg_FG1_FG2 + FG_bfg_FG1_FG3 + 
##     FG_bfg_FG2_FG3 + FG_wfg_FG1 + FG_wfg_FG2 + FG_wfg_FG3 + treatmentA
##   Resid. Df Resid. Dev Df Deviance      F    Pr(>F)    
## 1       401     694.31                                 
## 2       396     559.71  5    134.6 19.046 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here the AV and FG models are compared and the test is also significant, meaning that the FG model is an improvement over the AV model, i.e., variation in the response is better explained by six interaction terms (specified according to functional group membership) than a single interaction term (that assumes all pairs of species interact in the same way).



anova(FGModel, FULLModel, test = "F")
## Analysis of Deviance Table
## 
## Model 1: response ~ 0 + p1_ID + p2_ID + p3_ID + p4_ID + p5_ID + p6_ID + 
##     p7_ID + p8_ID + p9_ID + FG_bfg_FG1_FG2 + FG_bfg_FG1_FG3 + 
##     FG_bfg_FG2_FG3 + FG_wfg_FG1 + FG_wfg_FG2 + FG_wfg_FG3 + treatmentA
## Model 2: response ~ 0 + p1_ID + p2_ID + p3_ID + p4_ID + p5_ID + p6_ID + 
##     p7_ID + p8_ID + p9_ID + `p1:p2` + `p1:p3` + `p1:p4` + `p1:p5` + 
##     `p1:p6` + `p1:p7` + `p1:p8` + `p1:p9` + `p2:p3` + `p2:p4` + 
##     `p2:p5` + `p2:p6` + `p2:p7` + `p2:p8` + `p2:p9` + `p3:p4` + 
##     `p3:p5` + `p3:p6` + `p3:p7` + `p3:p8` + `p3:p9` + `p4:p5` + 
##     `p4:p6` + `p4:p7` + `p4:p8` + `p4:p9` + `p5:p6` + `p5:p7` + 
##     `p5:p8` + `p5:p9` + `p6:p7` + `p6:p8` + `p6:p9` + `p7:p8` + 
##     `p7:p9` + `p8:p9` + treatmentA
##   Resid. Df Resid. Dev Df Deviance     F Pr(>F)
## 1       396     559.71                         
## 2       366     522.97 30   36.738 0.857  0.686

Here the FG and FULL models are compared and the test is not significant and so there is no evidence that the FULL model with 36 pairwise interaction terms is needed over the FG model that has six interaction terms.

The F-tests shown here are possible because each pair of models being compared are nested.

Interpretation of the Selected Model
image description image description
training-starterPack-9species-interpret.knit

Interpreting model coefficients

With the functional group model selected as best, we can now interpret the parameter estimates.

summary(FGModel)$coefficients
##                  Estimate Std. Error    t value      Pr(>|t|)
## p1_ID           9.7496983  0.3665922 26.5954900  3.868093e-90
## p2_ID           8.5379725  0.3671714 23.2533690  4.856522e-76
## p3_ID           8.2329179  0.3665722 22.4591999  1.235146e-72
## p4_ID           6.3644271  0.3664539 17.3676077  1.227074e-50
## p5_ID          10.8468108  0.3669358 29.5605145 3.061340e-102
## p6_ID           5.9621064  0.4514972 13.2051898  3.028830e-33
## p7_ID           5.4252439  0.4515519 12.0146619  1.445882e-28
## p8_ID           7.3204224  0.4515218 16.2127786  1.057036e-45
## p9_ID           8.2154480  0.4514867 18.1964331  3.304754e-54
## FG_bfg_FG1_FG2  3.4395080  0.8634949  3.9832408  8.090458e-05
## FG_bfg_FG1_FG3 11.5914584  0.8653527 13.3950678  5.235367e-34
## FG_bfg_FG2_FG3  2.8710626  1.2626866  2.2737729  2.351423e-02
## FG_wfg_FG1      2.8486123  0.9131213  3.1196427  1.943235e-03
## FG_wfg_FG2      0.6792847  2.3553291  0.2884033  7.731889e-01
## FG_wfg_FG3      2.4167739  2.3286095  1.0378614  2.999679e-01
## treatmentA      3.1017864  0.1171428 26.4786700  1.182406e-89

There are nine ID effect estimates, for example, \(\hat{\beta_1} = 9.75\) and \(\hat{\beta_2} = 8.54\). The species in FG1 (species 1-5) are higher on average than the species in FG2 (species 6 and 7).

There are three ‘within’ functional group interaction terms (FG_wfg_FG1, FG_wfg_FG2 and FG_wfg_FG3) for pairs of species from ‘within’ each functional group, and three ‘between’ functional group interactions for pairs of species from different functional groups (FG_bfg_FG1_FG2, FG_bfg_FG1_FG3 and FG_bfg_FG2_FG3).

The strongest interaction estimate is the interaction between FG’s 1 and 3: \(\hat{\omega_{13}} = 11.59\). All of the between FG interactions are positive and significant (each p<0.05), while for the ‘within interactions’ only the interaction for FG1 is significant (p<0.01).

The treatment effect has been included as an additive factor only. This means that all ID and interaction estimates are for treatment level B, and to adjust for treatment A the estimate of \(3.1\) should be added. It would be possible to test for interactions between treatment and the ID effects and the species interaction effects.

According to this model, to maximise the response, treatment A should be used, and combining species from functional groups 1 and 3 is important. The identity effect estimates can also be examined to help identify which species are best to use to optimise the response.

Prediction

To predict for a monoculture of species 1 under treatment B we do the following (all other terms involve a multiplication by 0 and are omitted):

\[\hat{y} = 9.75*1 = 9.75\]

And to predict for a monoculture of species 1 under treatment A:

\[\hat{y} = 9.75*1 + 3.1 = 12.85\]


While to predict for a 50:50 mixture of species 1 and 6 under treatment B:

\[\hat{y} = 9.75*0.5 + 5.96*0.5 + 3.44*0.5*0.5 = 8.715 \]

And to predict for a 50:50 mixture of species 1 and 6 under treatment A:

\[\hat{y} = 9.75*0.5 + 5.96*0.5 + 3.44*0.5*0.5 + 3.1 = 11.815 \]


We can use the predict() function in R to do these predictions faster.

These are the communities we have predicted for above:

sim3[c(34, 33, 138, 137), 1:12]
##     community richness treatment  p1 p2 p3 p4 p5  p6 p7 p8 p9
## 34          9        1         B 1.0  0  0  0  0 0.0  0  0  0
## 33          9        1         A 1.0  0  0  0  0 0.0  0  0  0
## 138        35        2         B 0.5  0  0  0  0 0.5  0  0  0
## 137        35        2         A 0.5  0  0  0  0 0.5  0  0  0

And here are the four corresponding predictions:

predict(FGModel, newdata=sim3[c(34, 33, 138, 137),])
## [1]  9.749698 12.851485  8.715779 11.817566

Visualising the results

From examining the coefficient estimates, we know that including species 1, 5 and 9 will positively impact on the predicted ecosystem function. This graph highlights that while the response increases on average across richness (and saturates), communities that include species 1, 5 and 9 are among the highest (as shown by the pie-glyphs identifying the composition of specific communities):

Predictions for nine-species example

Code to create graphs such as this one will be available soon on our Visualising DI Models page.

Reference

R version: 4.1.2
DImodels R package: Moral RA, R Vishwakarma, J Connolly, and C Brophy (2023) DImodels: Diversity-Interactions (DI) Models. R package version 1.3.