Code
library(tidyverse)
library(nlraa)
library(knitr)
library(kableExtra)
Olivia Hemond
March 2, 2024
Description:
The data used in this analysis is from a study of sorghum and maize growth in Greece. Sweet sorghum, fiber sorghum, and maize were grown in experimental plots. Half of the plots for each plant were exposed to optimum inputs of water and nitrogen, while the other half were exposed to limited water and nitrogen inputs. The scientists measured how the biomass (yield) for each crop type changed over time under these different experimental conditions.
Variables:
DOY = day of year (141 - 303)
Block = block in experimental design (1 - 4)
Input = level of water and nitrogen input. Either 1 (low) or 2 (high)
Crop = crop type (M = maize, S = sweet sorghum, F = fiber sorghum)
Yield = biomass yield in Mg/ha
Data source:
Danalatos, N.G., S.V. Archontoulis, and K. Tsiboukas. 2009. Comparative analysis of sorghum versus corn growing under optimum and under water/nitrogen limited conditions in central Greece. In: From research to industry and markets: Proceedings of the 17th European Biomass Conference, Hamburg, Germany. 29 June–3 July 2009. ETA–Renewable Energies, Florence, Italy. p. 538–544.
The purpose of this analysis is to fit models to the crop dataset using non-linear least squares (NLS). I can optimize the parameters of the model to find the best fitted model for each species of crop. These models can then be used to predict crop yields at various time points for each of these crops, and to assess the impact of fertilizer use on overall crop growth.
The Beta function
The model for this analysis is the Beta function from Table 1, Equation 2.5 of Archontoulis & Miguez, 2015 (see here). This is a sigmoid function and I will use it to describe plant biomass (yield) as a function of time.
Beta function variables and parameters
y = response variable
ymax = maximum value of response variable
t = explanatory variable
te = value of explanatory variable when y equals its asymptotic value
tm = inflection point at which growth rate is at its maximum
The steps of this analysis are as follows:
Write Beta function
Read in and tidy data
Create a function in R based upon the Beta function’s mathematical formula
Run one NLS model
Generate initial guesses for parameter values
Run model for just one crop and input level
Plot fitted model
Run multiple NLS models
Run models on each combination of plot, crop species, and input level
Find the best fitted model for each crop
Create final visual
This function was constructed to match Table 1, Equation 2.5 of Archontoulis & Miguez, 2015.
To fit the function correctly, it must be given initial parameter values that are reasonably close to their actual values. Using a plot of the data, I visually identified potential starting values based upon the definitions of ymax, te, and tm. I graphed a simulation of the function using those input values and modified them slightly until the model appeared to fit the data reasonably well.
# use beta model to simulate outputs using parameter guesses
beta_sim <- df %>%
mutate(sim = beta(ymax = 52, te = 280, tm = 220, t = doy))
# plot to figure out best starting parameter estimates
sim_plot <- ggplot(beta_sim, aes(x = doy, y = yield, color = "data")) +
geom_point() +
geom_line(aes(x = doy, y = sim, color = "model")) +
scale_color_manual(values = c("data" = "black", "model" = "green4")) +
theme_minimal()
For my first NLS model, I set the initial parameter values based upon the guesses I made in the section above.
This first model will be fit to the yield data from just sweet sorghum with high water and nitrogen inputs. I filtered the dataset to just this subset for the model fitting.
I ran NLS using the Beta function, the subset of sweet sorghum data, and the initial parameter guesses defined above. Below is a table of the estimated model parameters.
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
ymax | 39.82042 | 2.248862 | 17.70692 | 0 |
te | 281.58626 | 2.080747 | 135.32939 | 0 |
tm | 244.77728 | 3.511954 | 69.69831 | 0 |
I used the fitted model to predict values for each day of data collection, and then plotted these predicted outputs on top of the observed crop data.
# create df with addition column of predicted values based on fitted model
one_nls_predict <- df_filter %>%
mutate(predict = predict(one_nls, newdata = doy))
# plot
ggplot() +
geom_point(data = df,
aes(x = doy, y = yield, color = crop, shape = crop), alpha = 0.8) +
geom_path(data = one_nls_predict, aes(x = doy, y = predict), color = "black", size = 0.7) +
scale_color_manual(values = c("darkorchid3", "goldenrod2", "orchid2")) +
labs(x = "Day of the Year", y = "Biomass (Mg/ha)",
color = "Crop", shape = "Crop") +
theme_minimal()
Results:
While the model seems to fit the sweet and fiber sorghum data reasonably well, it does not appear to fit the maize data very well. This indicates that I may need a separate model fitted for each type of crop.
Given the results above, I decided to fit my models separately for each crop species, input level, and experimental block (plot). This way, they can be more specifically tailored to the fact that biomass yield will differ based upon these experimental variables.
To iterate, I needed to turn my NLS process into a function that could take any subset of my dataset and fit a model to it using the Beta function.
I then nested my data, so that I had 24 different subsets based upon different combinations of crop, input level, and experimental block. Using the NLS function, I then fit my model, generated predictions, and calculated RMSE values for each model fit to each subset of data. I also calculated smoothed model predictions, which give an output for every day within the study period, not just the actual days that the scientists took measurements on.
beta_all <- df %>%
group_by(block, input, crop) %>%
nest() %>%
mutate(nls_model = map(data, ~all_nls_fxn(.x)),
predictions = map2(nls_model, data, ~predict(.x, newdata = .y)),
smooth = map(nls_model, ~predict(.x, newdata = list(doy=seq(141,303)))),
rmse = map2_dbl(predictions, data, ~Metrics::rmse(.x, .y$yield)))
I identified the best models for each species based upon RMSE values. Lower RMSE values indicate better model fit. The estimated parameter values are shown in the tables below.
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
ymax | 29.0564 | 1.657637 | 17.52881 | 5e-07 |
te | 280.4296 | 1.838948 | 152.49459 | 0e+00 |
tm | 244.9988 | 3.538536 | 69.23735 | 0e+00 |
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
ymax | 18.93111 | 0.8396982 | 22.54514 | 5e-07 |
te | 252.57492 | 1.8691684 | 135.12690 | 0e+00 |
tm | 225.22840 | 1.9262820 | 116.92390 | 0e+00 |
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
ymax | 31.34742 | 2.047998 | 15.30637 | 1.2e-06 |
te | 278.34654 | 1.817122 | 153.17988 | 0.0e+00 |
tm | 245.77426 | 3.815014 | 64.42290 | 0.0e+00 |
As a final visualization, I graphed the observed crop biomass data with the smoothed outputs of my fitted NLS models for each crop species and input level. For simplicity, the data and models are limited to the plants grown in experimental block #1.
# filter observed data to plot 1
filter_1_low <- df %>%
filter(block == 1)
# filter smoothed predictions to plot 1
smooth_1_low <- beta_all %>%
filter(block == 1) %>%
unnest(cols = c(smooth)) %>%
select(crop, smooth) %>%
mutate(doy_smooth = seq(141, 303)) %>% # add doy for smoothed outputs
filter(!(crop == "M" & doy_smooth > 263)) # maize data not collected after day 263
# plot by input
ggplot() +
geom_point(data = filter_1_low,
aes(x = doy, y = yield, shape = crop, color = crop)) +
geom_path(data = smooth_1_low,
aes(x = doy_smooth, y = smooth, linetype = crop, color = crop),
linewidth = 0.7) +
facet_wrap(~input,
labeller = labeller(input = c("1" = "Low Inputs", "2" = "High Inputs"))) +
scale_color_manual(values = c("darkorchid3", "goldenrod2", "orchid2")) +
labs(x = "Day of the Year", y = "Biomass (Mg/ha)",
color = "Crop", shape = "Crop", linetype = "Crop") +
theme_light() +
theme(legend.position = c(0.08, 0.8),
legend.box.background = element_rect(color = "darkorchid4", size = 0.5),
strip.text.x = element_text(size = 11),
strip.background = element_rect(fill="darkorchid4"))
Results:
As shown in the figure, late-year biomass yield for every crop species is greatest in the high-input scenario. This indicates that increased fertilizer and water inputs lead to increased final crop yields. Interestingly, there is not as large of a difference between input levels for projected yields during the first half of the growing season.