DI Models - Training

Visualising Results from Models Fit using the DImodels R Package

What will I learn from looking at this example?

In this example, we show visualisations that can be created using DImodelsVis for model objects fitted with the DImodels R (version ≥ 4.1.2) package. The visualisations shown here could help assess model assumptions, support model interpretation, and provide context on how species richness, composition, relative abundances, and their interactions affect ecosystem functioning.

Three species example

For this example, we use the best model selected in the three species example from the starter pack. Visualisations (along with associated code) depicting the BEF relationship across multiple dimensions of biodiversity, contributions of species identities and interactions to the response, response surface across the simplex space and effects of species addition or loss will be shown in this section.

training-visualising-DImodels-3species-model-fitting.knit

We first load the necessary libraries along with the dataset and refit the best model from the three species example in the starter pack.

# Load libraries
library(DImodels)
library(DImodelsVis)
library(PieGlyph)
library(dplyr)
library(ggplot2)
library(forcats)

# Load dataset
data("sim0")

head(sim0)

##   community richness  p1  p2  p3 response
## 1         1        1 1.0 0.0 0.0   24.855
## 2         2        1 0.0 1.0 0.0   19.049
## 3         3        1 0.0 0.0 1.0   16.292
## 4         4        2 0.8 0.2 0.0   31.529
## 5         5        2 0.2 0.8 0.0   25.102
## 6         6        2 0.8 0.0 0.2   24.615

Fit best (full pairwise) model selected in the three species example.

# Fit best model for three species exmaple
Three_sp_model <- DI(y = "response", prop = c("p1", "p2", "p3"),
                     DImodel = "FULL", data = sim0)
summary(Three_sp_model)

## 
## Call:
## glm(formula = fmla, family = family, data = data)
## 
## Coefficients:
##         Estimate Std. Error t value Pr(>|t|)    
## p1_ID    24.1740     0.7698  31.403  < 2e-16 ***
## p2_ID    18.6155     0.7698  24.182  < 2e-16 ***
## p3_ID    14.8069     0.7698  19.235  < 2e-16 ***
## `p1:p2`  34.8345     3.5694   9.759 7.60e-14 ***
## `p1:p3`  26.1355     3.5694   7.322 8.43e-10 ***
## `p2:p3`  47.9159     3.5694  13.424  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 3.541817)
## 
##     Null deviance: 43863.70  on 64  degrees of freedom
## Residual deviance:   205.43  on 58  degrees of freedom
## AIC: 270.26
## 
## Number of Fisher Scoring iterations: 2

training-visualising-DImodels-3species-model-diagnostics.knit

Model diagnostics plots are an essential tool for detecting any violations in the underlying model assumptions and identifying any issues with the model fit. For linear regression models, the main diagnostics plots are the “Residual vs Fitted” plot, used to assess the zero-mean, constant variance and independence assumption of the residuals and the “QQ plot” which evaluates the normality of the residual distribution (i.e., \(\epsilon_i \overset{\text{i.i.d.}}{\sim} \mathcal{N}(0, \sigma^2)\)).

The model_diagnostics() function in DImodelsVis offers a modified version of standard diagnostics plots where points can be replaced with <a class = “R”, href = “https://rishvish.github.io/PieGlyph/”>pie-glyphs to highlight the species composition in any observation(s). The function requires a model object fitted using the DImodels (or DImodelsMulti) package. By default, the function produces the “Residuals vs Fitted”, “Normal Q-Q”, “Scale-Location”, and “Residuals vs Leverage” plots but we can also create a subset of these plots using the which argument as shown below.

# Model diagnostics
model_diagnostics(Three_sp_model, which = 1:2)

3 Species Example Diagnostics

The “Normal QQ” plot does not indicate any strong deviations from a normal distribution in the residuals as all points are near the \(x=y\) line. The residuals appear to be unbiased (i.e., centered around the Y-axis), have a constant variance across the range of the X-axis and do not exhibit any patterns in the “Residual vs Fitted” plot. However, there are some observations such as a 20-80 mixture of p1-p3 and an 80-20 mixture of p2-p3 which are predicted poorly and have high residuals (>3). While these cases warrant further investigation, they represent only two observations out of 64 and there aren’t any other concerning patterns in the diagnostics plots. Therefore, we proceed further with visualising the predictions from the model.

training-visualising-DImodels-3species-response-vs-richness.knit

Traditionally, species richness was assumed to be the primary driver of BEF relationships. However, other indicators of diversity such as species compositions and their relative abundances could also affect ecosystem functioning. DI models capture the joint effects of these diversity measures on ecosystem functioning, offering a more nuanced understanding of the BEF relationship.

# Gradient change
gradient_change(Three_sp_model)

3 Species Example Gradient change

The predicted response increases with species richness (dashed-black line). The individual pie-glyphs depict the species composition within each community and are jittered horizontally based on the evenness measure of each community. At a given level of species richness, the predicted response varies considerably due to differences in species composition and relative abundances. For example, at Richness = 2, the equi-proportional binary mixtures of p1-p2 and p2-p3 have higher predicted responses, while, mixtures dominated by p3 have a lower predicted response.

training-visualising-DImodels-3species-ternary-diagrams.knit

The conditional_ternary() function in DImodelsVis accepts a model object fitted using the DImodels (or DImodelsMulti) package and generates a ternary diagram with the response surface depicted as a contour map. Since ternary diagrams can display only three variable at a time, users must specify names of the three variables to be visualised in the tern_vars parameter. All additional parameters shown below are optional arguments specified to enhance aesthetics of the final plot. Finally, geom_point() is used to highlight the best performing mixture in the simplex space.

# Ternary
conditional_ternary(Three_sp_model, tern_vars = c("p1", "p2", "p3"),
                    upper_lim = 32, lower_lim = 14,
                    nlevels = 9, contour_text = FALSE) + 
  geom_point(data = function(x) {x %>% filter(.Pred == max(.Pred))},
             shape = 17, size = 3)

3 Species Example Ternary

The predicted response is higher within the centre of the ternary and declines as we move towards either of the vertices (i.e., monocultures of respective species). The best performing mixture (black triangle) contains about 40% p1, 38% p2 and 22% p3. However, the response is rather consistent across a broad region of the ternary around this point, suggesting flexibility in selecting an optimal combination of the three species.

training-visualising-DImodels-3species-prediction-contributions.knit

Since, DI models are linear regression models, the predicted response can be broken down into contributions from individual terms in the model. This enables visualisation of the expected contributions of species identities, their interactions and any additional treatments such as nitrogen fertiliser or weather effects to the predicted response.

The prediction_contributions() function in DImodelsVis visualises the contributions of individual species, their interactions, and any additional treatments to the predicted response as a stacked bar-chart. The function requires a model object fitted using the DImodels (or DImodelsMulti) package and, optionally, a data-frame containing species communities for which to predict the response. The code shown below selects specific communities from the experimental design and utilises the prediction_contributions() function to illustrate how the different components contribute to the predicted response. Additionally, the facet_grid() and theme() functions from ggplot2 are used to enhance the plot aesthetics.

# Prediction contributions
pred_data <- sim0 %>%
  slice(1:16) %>% 
  mutate(Group = apply(.[, 3:5], 1, function(x) paste0(names(x)[which(x!=0)], collapse = "-"))) %>% 
  arrange(richness, Group) %>% 
  mutate(Group = fct_inorder(Group))

# Note the spaces for `80-20 `, ` 20-80`  and `50-50` are intentional to trick 
# ggplot into treating these bars as separate. If using the same label, these
# bars would be grouped into a single object.
prediction_contributions(Three_sp_model,
                         data = pred_data,
                         bar_labs = c("p1_mono", "p2_mono", "p3_mono",
                                      "80-20", "20-80", "50-50",
                                      " 80-20", " 20-80", " 50-50",
                                      "  80-20", "  20-80", "  50-50",
                                      "60-20-20", "20-60-20",
                                      "20-20-60", "1/3-1/3-1/3")) +
  facet_grid(~Group, scale = "free_x", space = "free_x") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

3 Species Example Contributions

The figure shows how the species identities (green, orange, or purple) and individual pairwise interactions (shades of grey) contribute towards the predicted response. Visualising the predicted response as contributions from individual model terms help provide insights into why why certain communities perform better than other. Mixtures, for instance, have higher predicted responses than the monocultures due to synergistic species interactions, which are absent in monocultures. Additionally, the bars are grouped based on the species composition, enabling us to see how changing the relative abundances of the component species affects the predicted response. For example, among the two-species mixtures for a specific species composition (panels labelled p1-p2, p1-p3, p2-p3) the binary equi-proportional mixture has the highest (or near to the highest) predicted response because the interaction effect is strongest when the component species are present in equal proportion.

training-visualising-DImodels-3species-species-addition-loss.knit

The ability of DI models to interpolate the response across any species composition in the simplex space allows for predicting how the addition or removal of a species species would influence the expected performance of a given community. Mathematically, this can be explored by tracing a line from a specific point within the simplex space (representing a given species composition) to any vertex of the simplex. The change in predicted response along this line reflects the effect of incrementally changing the proportion of a particular species from the community. Panel (a) in the figure below shows few example lines within the 3d simplex space illustrating the change in predicted response as species 1 (p1) is added to the 80-20 p2-p3 mixture, 20-60-20 p1-p2-p3 mixture and the 1/3-1/3-1/3 p1-p2-p3 mixture. Panel (b) then shows the corresponding changes in predicted response along each of these paths in the X-Y plane with pie-glyphs identifying each species composition. Note that the code for producing this figure is not provided.

3 Species Example effects plot interpretation

The visualise_effects() function in DImodelsVis can be used to generate the visualisation shown in panel (b) of the above figure which illustrates the effect of adding or removing a particular species from any individual species community as well as on average across multiple communities. The function requires a model object fitted using the DImodels (or DImodelsMulti) package and, optionally, a data-frame containing species communities for which to generate the response curves.

visualise_effects(Three_sp_model,
                  data = sim0)

3 Species Example effects plot

The figure shows the effect of adding p1, p2, and p3 (in separate panels) to each community in the experimental design. The effect of adding any species is generally consistent across the various species communities shown in the figure. However, the predicted response for the intermediate communities along each curve varies depending on the composition of the initial community. On average (solid black line), adding a species to a community where it is initially present at low proportions increases the predicted response. This effect typically peaks around the 50% mark, after which further increases in that species’ proportion lead to a decline in the predicted response.

training-visualising-DImodels-3species-simplex-path.knit

The concept of tracking changes in the predicted response along a straight line in the simplex space can be extended to consider the change across any two points in that space. This is an extension of the approach used in the visualise_effects() function, allowing us to explore how the predicted response changes between any two species compositions. It enables visualisation of the effects of more complex shifts in community structure, such as simultaneous addition or removal of multiple species from a particular community on the predicted response. Furthermore, since these visualisations are shown in the X-Y plane, illustrating the uncertainty around the predicted response is also straightforward.

The figure below highlights five example paths of interest on the ternary plot. Along each path, the value of p2 is held constant, specifically \(p2 = 0, 0.2, 0.4, 0.6\), and \(0.8\) for the pink, blue, yellow, brown and purple paths, respectively, while the values of p1 and p3 are allowed to vary between \(0.05\) and \(1 - p2 - 0.05\) respectively. Note that this figure is included as an explanatory illustration and the code for adding the coloured segments along the ternary is not provided.

3 Species Example simplex path plot interpretation

The simplex_path() function in DImodelsVis accepts a model object fitted using the DImodels (or DImodelsMulti) package along with two data-frames containing the starting and target compositions in the starts and ends parameters, respectively. Each row in starts is paired with the corresponding row in ends, and the function interpolates the predicted response along a straight line connecting each matched pair. This interpolation is defined by the equation \(a + t*(b - a)\) where \(a\) and \(b\) are respective species compositions from starts and ends, while \(t\) is the interpolation factor ranging from 0 to 1 and represents the relative position along the straight-line path from the starting to the ending composition. The resulting plot displays the predicted response (Y-axis) as a function of \(t\) (X-axis), showing how the response changes along each interpolated path. The code snippet below shows how the simplex_path() function can be used to generate the response curves along the five paths shown in the ternary above. Only the creation of the input data frames and the call to simplex_path() function are needed to create the figure. The additional code from geom_line() code is included solely to visually link each response curve to its corresponding path in the ternary diagram.

path_starts <- data.frame(p1 = c(0.05, 0.05, 0.05, 0.05, 0.05),
                          p2 = c(0, 0.2, 0.4, 0.6, 0.8),
                          p3 = c(0.95, 0.75, 0.55, 0.35, 0.15))

path_ends <- data.frame(p1 = c(0.95, 0.75, 0.55, 0.35, 0.15),
                        p2 = c(0, 0.2, 0.4, 0.6, 0.8),
                        p3 = c(0.05, 0.05, 0.05, 0.05, 0.05))

simplex_path(model = Three_sp_model,
             starts = path_starts,
             ends = path_ends) +
  # This additional code is not mandatory, it is only provided to help connect the curves in this plot to that from the previous ternary
  geom_line(aes(colour = as.factor(.Group)), linewidth = 1, alpha = 0.75) +
  geom_pie_glyph(data = function(x) x %>% filter(.InterpConst %in% c(0, 0.5, 1)),
                 slices = c("p1", "p2", "p3"),
                 colour = "black",
                 radius = 0.3) + 
  scale_colour_manual(values = c("#CC6677", "#0072B2", "#F0E442", "#661100", "#332288")) + 
  guides(colour = "none") +
  labs(fill = "Variables")

3 Species Example simplex path plot

Since, the value of p2 is constant for a specific path, moving from left to right along the path shows the effect of adding p1 whilst removing p3 from a mixture. When \(p2 = 0.2, 0.4, or \ 0.6\) (i.e., pink, blue and yellow curves), increasing the proportion of p1 increases the predicted response initially, but this effect tapers off after moving 75% of the way along each path. At \(p2 = 0.6\) or \(p2 = 0.8\) (brown and purple curves), the response is flat along the entire path, indicating that at high values of p2, the predicted response is invariant to changes in the proportions of p1 or p3.

Four species example

For this example, we use the average pairwise model fit to a real world dataset coming from a biodiversity and ecosystem function (BEF) experiment conducted in Switzerland, where the sown proportions of four species were manipulated to assess the impact of diversity on dry matter yield. Visualisations (along with associated code) depicting the BEF relationship across multiple dimensions of biodiversity, contributions of species identities and interactions to the response, response surface across the simplex space and effects of species addition or loss will be shown in this section.

training-visualising-DImodels-4species-model-fitting.knit

In this example, we use the Switzerland dataset from the DImodels R package. This dataset comes from a grassland biodiversity experiment that was conducted in Switzerland as part of the “Agrodiversity Experiment” (Kirwan et al., 2014). A total of 68 grassland plots were established across a gradient of species diversity, and two additional treatments (nitrogen fertiliser and total seed density) were also manipulated. The proportions of four species (two grasses and two legumes) were varied across the plots: there were plots with 100% of a single species, and two- and four-species mixtures with varying proportions (e.g., (0.5, 0.5, 0, 0) and (0.7, 0.1, 0.1, 0.1)). Nitrogen fertiliser was either 50 or 100 kg N per annum and total seed density was either low or high. Total annual yield per plot was recorded for the first year after establishment.

The following code can be used to load the data in R

# Load libraries
library(DImodels)
library(DImodelsVis)
library(PieGlyph)
library(dplyr)
library(ggplot2)
library(forcats)

# Load dataset
data("Switzerland")

# View dataset
head(Switzerland, 6)

##   plot nitrogen density   p1   p2   p3   p4    yield
## 1    1      150    high 0.70 0.10 0.10 0.10 13.51823
## 2    2      150    high 0.10 0.70 0.10 0.10 13.16549
## 3    3      150    high 0.10 0.10 0.70 0.10 19.95682
## 4    4      150    high 0.10 0.10 0.10 0.70 17.93976
## 5    5      150    high 0.25 0.25 0.25 0.25 13.74719
## 6    6      150    high 0.40 0.40 0.10 0.10 15.11899

The data columns are as follows:

plot: A unique identifier for plot in the experiment.
nitrogen: A factor variable describing whether a specific plot received 150 or 50 kg nitrogen per annum.
density: A factor variable describing the seeding density for each plot.
p1: A numeric vector indicating the sown proportion of species 1 in each plot. Species 1 was the grass species Lolium perenne.
p2: A numeric vector indicating the sown proportion of species 2 in each plot. Species 2 was the grass species Dactylis glomerata.
p3: A numeric vector indicating the sown proportion of species 3 in each plot. Species 3 was the legume species Trifolium pratense.
p4: A numeric vector indicating the sown proportion of species 4 in each plot. Species 4 was the legume species Trifolium repens.
yield: A numeric vector giving the total dry matter yield in the first year from each plot (in tonnes per hectare).

We will fit the average interaction model with an additive effect of nitrogen treatment to this data.

# Fit AV model
species <- c("p1", "p2", "p3", "p4")

AVModel <- DI(y = "yield", prop = species,
              DImodel = "AV", 
              treat = "nitrogen", data = Switzerland)

## Fitted model: Average interactions 'AV' DImodel

summary(AVModel)

## 
## Call:
## glm(formula = new_fmla, family = family, data = new_data)
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## p1_ID        8.5691     0.6520  13.143  < 2e-16 ***
## p2_ID        8.6257     0.6520  13.230  < 2e-16 ***
## p3_ID       15.5192     0.6520  23.803  < 2e-16 ***
## p4_ID       12.0347     0.6520  18.459  < 2e-16 ***
## AV          13.7479     1.6292   8.438 6.89e-12 ***
## nitrogen50  -0.7407     0.4713  -1.572    0.121    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 2.816392)
## 
##     Null deviance: 13310.90  on 68  degrees of freedom
## Residual deviance:   174.62  on 62  degrees of freedom
## AIC: 271.11
## 
## Number of Fisher Scoring iterations: 2

training-visualising-DImodels-4species-model-diagnostics.knit

Model diagnostics plots are an essential tool for detecting any violations in the underlying model assumptions and identifying any issues with the model fit. For linear regression models, the main diagnostic plots are the “Residual vs Fitted” plot, used to assess the zero-mean, constant variance and independence assumptions of the residuals and the “QQ plot” which evaluates the normality of the residual distribution (i.e., \(\epsilon_i \overset{\text{i.i.d.}}{\sim} \mathcal{N}(0, \sigma^2)\)).

The model_diagnostics() function in DImodelsVis offers a modified version of standard diagnostics plots where points can be replaced with pie-glyphs to highlight the species composition in any observation(s). The function requires a model object fitted using the DImodels (or DImodelsMulti) package. By default, the function produces the “Residuals vs Fitted”, “Normal Q-Q”, “Scale-Location”, and “Residuals vs Leverage” plots but we can also create a subset of these plots using the which argument as shown below. Additionally, we use the facet_wrap() function from ggplot2 to display the observations for each nitrogen level in a separate panel.

# Diagnostics plot
model_diagnostics(AVModel, which = 1:2, nrow = 2) + 
  facet_wrap(~nitrogen, labeller = label_both)

4 Species Example Diagnostics

The “Normal QQ” plot does not indicate any strong deviations from a normal distribution in the residuals as all points are near the \(x=y\) line at both nitrogen levels. The residuals appear to be unbiased (i.e., centred around the Y-axis), have a constant variance across the range of the X-axis and do not exhibit any patterns in the “Residual vs Fitted” plot for both nitrogen levels. An additional advantage of using pie-glyphs in diagnostics plots is that they help reveal patterns between the predicted response and predictors. For example, in the “Residual vs Fitted” plot all mixtures containing high proportions of Species 3 have higher predictions than other communities.

training-visualising-DImodels-4species-response-vs-richness.knit

# Gradient change
gradient_change(AVModel,
                data = Switzerland %>% filter(nitrogen == "150"))

4 Species Example Gradient change

The mean predicted response (dashed black line) increases when moving from monocultures to two species mixtures, but remains unchanged when moving from two-species to four-species mixtures. The individual pie-glyphs depict the sown species composition within each community and are jittered horizontally based on the evenness measure of each community. At a given level of species richness, the predicted response varies considerably due to differences in species composition and relative abundances. For example, at Richness = 4, mixtures dominated by p3 have higher predicted responses, while mixtures dominated by p1 or p2 have a lower predicted response. The performances of p1 and p2 dominated mixtures are very similar, they overlap in the plot, and only the p2-dominated communities are visible.

training-visualising-DImodels-4species-prediction-contributions.knit

The prediction_contributions() function in DImodelsVis visualises the contributions of individual species, their interactions, and any additional treatments to the predicted response as a stacked bar-chart. The function requires a model object fitted using the DImodels (or DImodelsMulti) package and, optionally, a data-frame containing species communities for which to predict the response. The code shown below selects the four monocultures, six binary mixtures, four species mixtures dominated by 70% of each species and the centroid mixture from the data at the nitrogen = 150 level and utilises the prediction_contributions() function to illustrate how the different components contribute to the predicted response. Additionally, the facet_grid() and theme() functions from ggplot2 are used to enhance the plot aesthetics.

# Predictions contributions
pred_data <- Switzerland %>% 
  mutate(Richness = rowSums(.[, species] != 0)) %>% 
  filter(nitrogen=="150") %>% 
  filter(plot %in% c(12:15, 1:5, 31:36)) %>% 
  arrange(Richness)

prediction_contributions(model = AVModel, 
                         data = pred_data,
                         # labels for bars
                         bar_labs = c("p1", "p2", "p3", "p4", 
                                      "p1-p2", "p1-p3", "p1-p4",
                                      "p2-p3", "p2-p4", "p3-p4",
                                      "p1_dom", "p2_dom", 
                                      "p3_dom", "p4_dom",
                                      "Cent")) + 
  facet_grid(~Richness, scale = "free_x", space = "free_x",
             labeller = label_both)

4 Species Example Contributions

The figure shows how the species identities (green, orange, purple or blue) and average pairwise interactions (grey) contribute towards the predicted response. Visualising the predicted response as contributions from individual model terms help provide insights into why why certain communities perform better than other. For example, the monocultures with the exception of p3, perform poorly compared to the multi-species mixtures due to the absence of interaction effects. Amongst the mixtures, the contributions due to the average interaction term doesn’t appear to change a lot and any differences in the mixture performances are driven primarily due to contributions by the species identity effects. Additionally, the bars are grouped based on the species richness, enabling us to easily compare communities at different richness levels. For example, the best performing mixtures at both richness = 2 and 4 have comparable performances. Finally, since the model includes an additive effect of nitrogen treatment with nitrogen = 150 as the reference level, all species identity and interaction effects in the model are reported relative to this treatment. As all communities shown in the figure were observed under the reference level, there is no nitrogen treatment effect to display in the plots.

training-visualising-DImodels-4species-ternary-diagrams.knit

A major strength of DI models is the ability to predict the response for any combination of the species proportions, including those not present in the experimental design. This allows for the prediction and visualisation of the response surface across the entire simplex space. However, visualising the response surface for systems with more than four species is challenging as higher dimensional (over four dimensions) simplexes cannot be visualised directly. To address these, DImodelsVis offers two strategies for visualising the response surface in such cases using ternary plots: (1) Conditional ternary diagrams, where selected species are conditioned at fixed proportions and the response surface is visualised as 2d slices of the higher dimensional simplex and (2) Grouped ternary diagrams, where species are grouped (or aggregated) into functional groups to reduce dimensionality, enabling the response surface to be visualised in a simplified, lower-dimensional space.

Conditional ternary diagrams

Conditional ternary diagrams are 2d slices of higher dimensional simplexes. To create conditional ternary diagrams for a system with \(s\) species, \(s-3\) species are conditioned to have fixed values while the remaining three species are allowed to have any value whilst respecting the sum-to-one constraint on the species proportions. This allows the response surface to be visualised within a ternary plot for those three species. By viewing multiple such slices each showing a different set of species shown in the ternary and with different fixed values for the remaining species we can gain an approximate understanding of how the predicted response behaves across the full, higher-dimensional compositional space.

The conditional_ternary() function in DImodelsVis accepts a model object fitted using the DImodels (or DImodelsMulti) package and generates conditional ternary diagrams with the response surface depicted as a contour map. Users must specify names of the three species to be visualised in the tern_vars parameter. The names and values for the species to be held constant should be specified in the conditional parameter as a data-frame and a separate ternary would be created for each row within this data-frame. In this example, species p1, p2, and p3 are allowed to vary while p4 is held constant at values of 0, 0.25 and 0.5. Furthermore, values for any additional variables in the model (i.e., nitrogen in this case) can be specified within the add_var parameter. Here, we use it to create ternaries for the \(nitrogen = 150\) level. Finally, any additional parameters shown below are optional arguments specified to enhance aesthetics of the final plot.

# Conditional ternary
conditional_ternary(AVModel, tern_vars = c("p1", "p2", "p3"),
                    conditional = data.frame("p4" = c(0, 0.25, 0.5)),
                    add_var = list("nitrogen" = factor("150", levels = c("50", "150"))),
                    lower_lim = 8, upper_lim = 18, nlevels  = 10)

4 Species Example conditional ternary

Across all three ternaries, the predicted response is maximised by having high proportions of p3. In the absence of p4, (i.e., \(p4 = 0\); left ternary), the optimal mixture contains some proportion of p1 and p2 along with p3. However, as the proportion of p4 increases (middle and right ternaries), optimal performance is achieved through mixtures of p3 and p4 alone, with little to no contribution from p1 or p2. Notably, the contour lines are symmetric about the p1–p2 axis across all ternaries, suggesting that p1 and p2 contribute similarly to the predicted response and can be substituted for one another without substantially altering performance.

Grouped ternary diagrams

Grouped ternary diagrams are created by aggregating multiple species into functional groups (FG) and visualising the response surface as a contour map in a ternary diagram where some, or all the axes represent the total functional group proportion, instead of the individual species proportion. The total functional group proportion at each point across the ternary can be split between the species contained within the group, either equally or in a specific ratio (for example, if an FG contains two species, the proportion along each point in the ternary could either be split 1:1 between the two or 4:1 in favour of the first species). Graphically, this process generates 2d slices of the higher dimensional simplex. By adjusting the ratios in which the total FG proportion is divided among the component species, slices across different regions of the simplex can be generated.

The grouped_ternary() function in DImodelsVis accepts a model object fitted using the DImodels (or DImodelsMulti) package and generates grouped ternary diagrams with the response surface depicted as a contour map. Users must specify the functional group each species belongs to as a character vector via the FG parameter. In this example, the two grasses (p1 and p2) are grouped together and labelled as the functional group “G”, while the two legumes (p3 and p4) are shown individually along the ternary axes in the figure. By default, the total FG proportion would be split equally between the component species within each FG, i.e., the total grass proportion is split equally between the two grasses. However, this split can be set manually by using the values parameter. Furthermore, values for any additional variables in the model (nitrogen in this case) can be specified within the add_var parameter and separate sets of ternaries would be created for each value specified here. Finally, any additional parameters shown below are optional arguments specified to enhance aesthetics of the final plot.

# Grouped ternary
grouped_ternary(AVModel, FG = c("G", "G", "p3", "p4"),
                add_var = list("nitrogen" = factor("150", levels = c("50", "150"))),
                lower_lim = 8, upper_lim = 18, nlevels  = 10)

4 Species Example grouped ternary

The predicted response is higher within the interior of the ternary and declines as we move towards either of the vertices (i.e., monocultures of p3 and p4 or binary mixture containing only grasses). The best performing mixture contains about 12% grasses (i.e., 6% each of p1 and p2), 57% p3 and 31% p4. However, the response is rather consistent across a broad region of the ternary around this point, suggesting flexibility in selecting an optimal combination.

training-visualising-DImodels-4species-species-addition-loss.knit

The ability of DI models to interpolate the response across any species composition in the simplex space allows for predicting how the addition or removal of a species would influence the expected performance of a given community. Mathematically, this can be explored by tracing a line from a specific point within the simplex space (representing a given species composition) to any vertex of the simplex. The change in predicted response along this line reflects the effect of incrementally increasing or removing that species from the community.

The visualise_effects() function in DImodelsVis generates visualisations that illustrate the effect of adding or removing a particular species from any individual community as well as on average across multiple communities. The function requires a model object fitted using the DImodels (or DImodelsMulti) package and, optionally, a data-frame containing species communities for which to generate the response curves. We generate the visualisation using all communities in the experimental design at the nitrogen = 150 level in this example.

# Effects plot
visualise_effects(AVModel, 
                  data = Switzerland %>% filter(nitrogen == "50"))

4 Species Example effects plot

The figure shows the effect of adding each species in separate panels identified by the species name. The effect of adding any species to a community differs based on the respective species involved. For example, adding p3 (shown in purple) to any community increases the predicted response, although the effect plateaus as the proportion of p3 in the mixture reaches around 70-75%. In contrast, introducing p1 or p2 in any community except their monocultures, leads to reduction in the predicted repsonse. The average response curves (solid black line) for p1 and p2 are nearly identical, indicating that these species have similar functional roles in the system and effects on the predicted response.

training-visualising-DImodels-4species-simplex-path.knit

The concept of tracking changes in the predicted response along a straight line in the simplex space can be extended to consider the change between any two points in that space. This is an extension of the approach used in the visualise_effects() function, allowing us to explore how the predicted response changes between any two species compositions. It enables visualisation of the change in predicted response for more complex shifts in community structure, such as simultaneous addition or removal of multiple species from a particular community. Furthermore, since these visualisations are shown in the X-Y plane, illustrating the uncertainty around the predicted response is also straightforward.

The simplex_path() function in DImodelsVis accepts a model object fitted using the DImodels (or DImodelsMulti) package along with two data-frames containing the starting and target compositions in the starts and ends parameters, respectively. Each row in starts is paired with the corresponding row in ends, and the function interpolates the predicted response along a straight line connecting each matched pair. This interpolation is defined by the equation \(a + t*(b - a)\) where \(a\) and \(b\) are respective species compositions from starts and ends, while \(t\) is the interpolation factor ranging from 0 to 1 and represents the relative position along the straight-line path from the starting to the ending composition. The resulting plot displays the predicted response (Y-axis) as a function of \(t\) (X-axis), showing how the response changes along each interpolated path. The code snippet below shows how the simplex_path() function can be used to generate the response curves along the straight-line paths between all monocultures and four-species dominant mixtures to the centroid mixture.

# Simplex path
starts <- Switzerland %>% 
  filter(nitrogen == "150") %>% 
  filter(plot %in% c(12:15, 31:36))

ends <- Switzerland %>% filter(plot == 5)

simplex_path(AVModel,
             starts = starts,
             ends = ends)

3 Species Example simplex path plot

Moving toward the centroid mixture either increases or maintains the predicted response, regardless of the starting composition. Among all paths investigated in this figure, the one connecting the p3-p4 binary mixture to the centroid appears to offer the highest potential for optimal performance.

Nine species example

For this example, we use the best model selected in the nine species example from the starter pack. Visualisations (along with associated code) depicting the BEF relationship across multiple dimensions of biodiversity, contributions of species identities and interactions to the response, response surface across the simplex space and effects of species addition or loss will be shown in this section.

training-visualising-DImodels-9species-model-fitting.knit

We first load the necessary libraries along with the dataset and refit the best model from the nine species example in the starter pack.

# Load libraries
library(DImodels)
library(DImodelsVis)
library(PieGlyph)
library(dplyr)
library(ggplot2)
library(forcats)

# Load dataset
data("sim3")

head(sim3)

##   community richness treatment p1 p2 p3 p4 p5 p6 p7 p8 p9 response
## 1         1        1         A  0  0  0  0  0  0  0  0  1   10.265
## 2         1        1         B  0  0  0  0  0  0  0  0  1    7.740
## 3         1        1         A  0  0  0  0  0  0  0  0  1   12.173
## 4         1        1         B  0  0  0  0  0  0  0  0  1    8.497
## 5         2        1         A  0  0  0  0  0  0  0  1  0   10.763
## 6         2        1         B  0  0  0  0  0  0  0  1  0    8.989

Fit best (functional group) model selected in the nine species example.

# Fit best model for three species example
FGModel <- DI(y = "response", prop = 4:12, 
              FG = c("FG1","FG1","FG1","FG1","FG1","FG2","FG2","FG3","FG3"),
              DImodel = "FG", treat = "treatment", data = sim3)
summary(FGModel)

## 
## Call:
## glm(formula = new_fmla, family = family, data = new_data)
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## p1_ID            9.7497     0.3666  26.595  < 2e-16 ***
## p2_ID            8.5380     0.3672  23.253  < 2e-16 ***
## p3_ID            8.2329     0.3666  22.459  < 2e-16 ***
## p4_ID            6.3644     0.3665  17.368  < 2e-16 ***
## p5_ID           10.8468     0.3669  29.561  < 2e-16 ***
## p6_ID            5.9621     0.4515  13.205  < 2e-16 ***
## p7_ID            5.4252     0.4516  12.015  < 2e-16 ***
## p8_ID            7.3204     0.4515  16.213  < 2e-16 ***
## p9_ID            8.2154     0.4515  18.196  < 2e-16 ***
## FG_bfg_FG1_FG2   3.4395     0.8635   3.983 8.09e-05 ***
## FG_bfg_FG1_FG3  11.5915     0.8654  13.395  < 2e-16 ***
## FG_bfg_FG2_FG3   2.8711     1.2627   2.274  0.02351 *  
## FG_wfg_FG1       2.8486     0.9131   3.120  0.00194 ** 
## FG_wfg_FG2       0.6793     2.3553   0.288  0.77319    
## FG_wfg_FG3       2.4168     2.3286   1.038  0.29997    
## treatmentA       3.1018     0.1171  26.479  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 1.413412)
## 
##     Null deviance: 52280.33  on 412  degrees of freedom
## Residual deviance:   559.71  on 396  degrees of freedom
## AIC: 1329.4
## 
## Number of Fisher Scoring iterations: 2

training-visualising-DImodels-9species-model-diagnostics.knit

The model_diagnostics() function in DImodelsVis offers a modified version of standard diagnostics plot where points can be replaced with <a class = “R”, href = “https://rishvish.github.io/PieGlyph/”>pie-glyphs to highlight the species composition in any observation(s). The function requires a model object fitted using the DImodels (or DImodelsMulti) package. By default, the function produces the “Residuals vs Fitted”, “Normal Q-Q”, “Scale-Location”, and “Residuals vs Leverage” plots but we can also create a subset of these plots using the which argument as shown below. Rendering pie-glyphs for each observation can be resource intensive for large datasets. Therefore, the model_diagnostics() function includes an only_extremes argument, which, when set to TRUE displays only the most problematic observations (e.g., those with the highest residuals or influence values) as pie-glyphs. Additionally, the npoints argument controls how many such observations are shown as pie-glyphs.

# Model diagnostics
model_diagnostics(FGModel, which = 1:2, only_extremes = TRUE, npoints = 3)

3 Species Example Diagnostics

The “Normal QQ” plot does not indicate any strong deviations from a normal distribution in the residuals as all points are near the \(x=y\) line. The residuals appear to be unbiased (i.e., centered around the Y-axis), have a constant variance across the range of the X-axis and do not exhibit any patterns in the “Residual vs Fitted” plot. However, there are some observations (highlighted with pie-glyphs) which are predicted poorly and have high residuals (>3). While these cases warrant further investigation, they represent a small proportion of the total number of observations and there aren’t any other concerning patterns in the diagnostics plots. Therefore, we proceed further with visualising the predictions from the model.

training-visualising-DImodels-9species-response-vs-richness.knit

The gradient_change() function in DImodelsVis can be used to create visualisations that illustrate how the predicted response varies with species richness, composition, and relative abundances. The function requires a model object fitted using the DImodels (or DImodelsMulti) package and, optionally, a data-frame containing species communities for which to predict the response in data argument. The function uses pie-glyphs to depict the species composition for the various communities. However, there is a risk of over-plotting for large datasets, and therefore, the function offers the pie_data argument which accepts a subset of data for which to show pie-glyphs. In this example, pie-glyphs all monocultures along with the best and worst performing communities from the experimental design at each level of richness are displayed as pie-glyphs to reproduce the figure shown in the Interpretation section of nine species example in the starter pack.

# Gradient change
treatmentA_data <- sim3 %>% filter(treatment == "A")
# Select the specific observations to show as pie-glyphs
pie_data <- treatmentA_data %>% 
  # Add the predicted response from the model
  # Prediction will be added in a column labelled `.Pred`
  add_prediction(FGModel) %>% 
  # Group by richness and select the best and worst performing mixture at each level
  group_by(richness) %>% 
  filter(richness == 1 | richness == 9 | .Pred %in% c(max(.Pred), min(.Pred))) %>% 
  distinct(p1, p2, p3, p4, p5, p6, p7, p8, p9) %>% 
  ungroup()

gradient_change(model = FGModel,
                data = treatmentA_data,
                pie_data = pie_data,
                # Custom colours for the pie-glyph slices
                pie_colours = c("#192F42", "#284A66", "#36648B", "#447EB0", "#6497C3", 
                                "#A24700", "#FF7609",
                                "#863578", "#C061B0"))

3 Species Example Gradient change

The predicted response increases with species richness but saturates beyond four species (dashed-black line). The individual pie-glyphs depict the species composition for select species communities. At a given level of species richness, the predicted response varies considerably due to differences in species composition. Generally, the highest performing mixtures at each level of richness include species from FG1 (p1, p2, p3, p4, or p5) and FG3 (p8 and p9) while the ones performing poorly contain species from FG1 (p1, p2, p3, p4, or p5) and FG2 (p6 and p7).

training-visualising-DImodels-9species-prediction-contributions.knit

The prediction_contributions() function in DImodelsVis visualises the contributions of individual species, their interactions, and any additional treatments to the predicted response as a stacked bar-chart. The function requires a model object fitted using the DImodels (or DImodelsMulti) package and, optionally, a data-frame containing species communities for which to predict the response. The code shown below utilises the prediction_contributions() function to illustrate how the different components contribute to the predicted response for all the observations highlighted with pie-glyphs in the plot from the richness-vs-response section (i.e., the nine monocultures along with the best and worst performing communities from the experimental design at each level of richness). Additionally, the colours for the individual contributions are manually specified so they are easily distinguishable and the theme() function from ggplot2 is used to enhance the plot aesthetics.

# Predictions contributions
pred_data <- pie_data %>% 
  arrange(richness, 
          desc(p1), desc(p2), desc(p3), 
          desc(p4), desc(p5), desc(p6), 
          desc(p7), desc(p8), desc(p9)) %>% 
  mutate(labels = c("p1_mono", "p2_mono", "p3_mono", 
                    "p4_mono", "p5_mono", "p6_mono",
                    "p7_mono", "p8_mono", "p9_mono",
                    "p5-p9", "p6-p7", "p6-p7-p8",
                    "p1-p5-p8", "p1-p5-p8-p9", "p1-p3-p6-p9",
                    "p1-p2-p3-p4-p5-p9", "p1-p3-p4-p5-p6-p7",
                    "Centroid"))

prediction_contributions(FGModel, data = pred_data,
                         bar_labs = "labels",
                         colours = c("#192F42", "#284A66", "#36648B", "#447EB0", "#6497C3", 
                                     "#A24700", "#FF7609",
                                     "#863578", "#C061B0",
                                     get_shades("#E69F00", 3)[[1]],
                                     get_shades("#332288", 3)[[1]],
                                     "#CC6677")) + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

3 Species Example Contributions

The figure shows how the species identities (shades of blue, orange, or purple), between FG interactions (shades of yellow), within FG interactions (shades of indigo), and the treatment effect (pink) contribute towards the predicted response. Visualising the predicted response as contributions from individual model terms help provide insights into why why certain communities perform better than others. For example, the best performing communities at each level of richness have higher predicted responses than the monocultures due to synergistic between FG interactions interactions, which are absent in monocultures. Across all mixtures, the between FG interactions contribute more towards the response than within FG interactions. Treatment has a rather sizeable effect (approx 2.5 units) on the predicted response for all communities.

training-visualising-DImodels-9species-species-addition-loss.knit

The visualise_effects() function in DImodelsVis generates visualisations that illustrate the effect of adding or removing a particular species from any individual community as well as on average across multiple communities. The function requires a model object fitted using the DImodels (or DImodelsMulti) package and, optionally, a data-frame containing species communities for which to generate the response curves. We generate the visualisation for the nine monocultures and nine-species centroid mixture receiving treatment A in this example.

# Effects plot
effects_data <- sim3 %>%
  filter(treatment == "A", richness %in% c(1, 9))

visualise_effects(FGModel, 
                  data = effects_data,
                  pie_colours = c("#192F42", "#284A66", "#36648B", "#447EB0", "#6497C3", 
                                  "#A24700", "#FF7609",
                                  "#863578", "#C061B0"))

9 Species Example effects plot

The figure shows the effect of adding each species in separate panels identified by the species name. The effect of adding any species to a community appears to be driven by the functional group identity of the respective species involved. For example, adding any species from FG1 (p1 to p5; shown in blue) to monocultures of species from FG3 (p8 and p9; coloured purple) increases the predicted response, although the effect plateaus as the proportion of FG1 species in the mixture reaches around 50-60%. On the other hand, introducing species from FG2 to monocultures of either of the other functional groups results in a decline in the predicted response. On average (solid black line), adding p1 or p5 to a community has the strongest positive effect while adding p6 and p7 has a negative effect.

training-visualising-DImodels-9species-ternary-diagrams.knit

Conditional ternary diagrams

The conditional_ternary() function in DImodelsVis accepts a model object fitted using the DImodels (or DImodelsMulti) package and generates conditional ternary diagrams with the response surface depicted as a contour map. Users must specify names of the three species to be visualised in the tern_vars parameter. The names and values for the species to be held constant should be specified in the conditional parameter as a data-frame and a separate ternary would be created for each row within this data.frame. In this example, species p1, p3, p5 are allowed to vary while p7, p8, and p9 are held constant at different values. Any species not specified within the tern_vars or conditional arguments would be assumed to be 0. Furthermore, values for any additional variables in the model (treatment in this case) can be specified within the add_var parameter and separate sets of ternaries would be created for each value specified here. Finally, any additional parameters shown below are optional arguments specified to enhance aesthetics of the final plot.

# Conditional ternary
conditional_ternary(FGModel, tern_vars = c("p1", "p3", "p5"),
                    conditional = data.frame("p7" = c(0, 0.1, 0.1),
                                             "p8" = c(0, 0.1, 0.1),
                                             "p9" = c(0, 0.0, 0.3)),
                    add_var = list("treatment" = c("A", "B")),
                    lower_lim = 6, upper_lim = 16, nlevels  = 10, 
                    nrow = 2)

9 Species Example conditional ternary

The predicted response is higher for all mixtures receiving treatment A. Within a specific treatment level, for the three species shown within the ternaries, the predicted response is maximised by increasing the proportion of p5 and decreases as p3 increases (notice that the proportion of species in the ternaries only go as high as 1 - (p7 + p8 + p9)). Amongst the conditioned species combinations, the ternary with p7 = 0.1, p8 = 0.1 and p9 = 0.3 (middle ternaries) appears to be the slice with highest predicted response at a specific treatment level. The response surface is also relatively flat across the entire ternary. This indicates that the optimal composition of species maximising the predicted response is near this region. However, to confirm this, we would have to look at more slices including the three species (p2, p4, and p6) fixed at non-zero values.

Grouped ternary diagrams

The grouped_ternary() function in DImodelsVis accepts a model object fitted using the DImodels (or DImodelsMulti) package and generates grouped ternary diagrams with the response surface depicted as a contour map. Users must specify the functional group each species belongs to as a character vector via the FG parameter. By default, the total FG proportion would be split equally between the component species within each FG, however, these can be set manually by using the values parameter. Furthermore, values for any additional variables in the model (treatment in this case) can be specified within the add_var parameter and separate sets of ternaries would be created for each value specified here. Finally, any additional parameters shown below are optional arguments specified to enhance aesthetics of the final plot.

# Grouped ternary
group1 <- grouped_ternary(FGModel, 
                FG = c("FG1","FG1","FG1","FG1","FG1","FG2","FG2","FG3","FG3"),
                add_var = list("treatment" = c("A", "B")),
                lower_lim = 6, upper_lim = 16, nlevels  = 10)

ggpubr::ggarrange(group1[[1]],
                  group1[[2]], 
                  common.legend = TRUE,
                  legend = "bottom")

9 Species Example grouped ternary v1

The predicted response is higher for mixtures receiving treatment A. Within a specific treatment level, the predicted response is maximised by combining species from FG1 and FG3 with the approximate optimal composition being 55% of FG1 and 45% of FG3. Increasing the proportion of FG2 causes the predicted response to decline sharply.

While the previous figure is great for selecting an optimal combination of the functional groups, one still needs to select the relative proportions of each species within each functional group. By default, the total FG proportion is distributed equally between the component species. For example, the total proportion of FG1 is split into 20% each of p1, p2, p3, p4, and p5. Similarly, FG2 and FG3 are each split evenly between their two component species, 50% each for p6 and p7 (FG2), and p8 and p9 (FG3). These default proportions can be modified to explore the effects of different species combinations within an FG on the predicted response. In the example below, FG1 is composed of 30% p1 and 70% p5, with p2, p3, and p4 excluded (0%). FG2 remains evenly split between p6 and p7 (i.e., 50% each), while FG3 is composed of 20% p8 and 80% p9.

group2 <- grouped_ternary(FGModel, 
                FG = c("FG1","FG1","FG1","FG1","FG1","FG2","FG2","FG3","FG3"),
                values = c("p1" = 0.3, "p2" = 0, "p3" = 0, "p4" = 0, "p5" = 0.7,
                           "p6" = 0.5, "p7" = 0.5, # Species within an FG should sum to 1
                           "p8" = 0.2, "p9" = 0.8), 
                add_var = list("treatment" = c("A", "B")),
                lower_lim = 6, upper_lim = 16, nlevels  = 10)

ggpubr::ggarrange(group2[[1]],
                  group2[[2]], 
                  common.legend = TRUE,
                  legend = "bottom")

9 Species Example grouped ternary v2

The response surface contours are similar to the ternary where the FG proportion is split equally between the component species, in that the optimal composition is one where FG1 and FG3 are combined in a 55%-45% mix. However, the predicted response for the best mix is higher than the case when FG1 and FG3 were split equally between their component species. This suggests that, beyond functional group effects, species identities also play a role in shaping the response, and that favouring specific species (p1, p5 and p9 for this example) can enhance performance.

Visualising DI Models

Visualising DI Models

Installing and Loading the DImodelsVis R Package

Before you look at visualising DI models, we recommend you install and load the DImodelsVis R package.

Visualising Results from Models Fit using the DImodels R Package

What will I learn from looking at this example?

Conditional ternary diagrams

Grouped ternary diagrams

Conditional ternary diagrams

Grouped ternary diagrams