Predicting spatial and temporal variability in crop yields: an inter-comparison of machine learning, regression and process-based models

Predicting spatial and temporal variability in crop yields: an inter-comparison of machine learning, regression and process-based models

Guoyong Leng1 and James W Hall2

1 Institute of Geographic Sciences and Natural Resources Research Chinese Academy of Sciences, Beijing, 100101, China

2 Environmental Change Institute, University of Oxford, UK


germany 2064517

Pervious assessments of crop yield response to climate change are mainly aided with either process-based models or statistical models, with a focus on predicting the changes in average yields, whilst there is growing interest in yield variability and extremes. In this study, we simulate US maize yield using process-based models, traditional regression model and a machine-learning algorithm, and importantly, identify the weakness and strength of each method in simulating the average, variability and extremes of maize yield across the country.

We show that both regression and machine learning models can well reproduce the observed pattern of yield averages, while large bias is found for process-based crop models even fed with harmonized parameters. As for the probability distribution of yields, machine learning shows the best skill, followed by regression model and process-based models. For the country as a whole, machine learning can explain 93% of observed yield variability, followed by regression model (51%) and process-based models (42%). Based on the improved capability of the machine learning algorithm, we estimate that US maize yield is projected to decrease by 13.5% under the 2°C global warming scenario (by ~2050s). Yields less than or equal to the 10th percentile in the yield distribution for the baseline period are predicted to occur in 19% and 25% of years in 1.5°C (by ~2040s) and 2°C global warming scenarios, with potentially significant implications for food supply, prices and trade.

The machine learning and regression methods are computationally much more efficient than process-based models, making it feasible to do probabilistic risk analysis of climate impacts on crop production for a wide range of future scenarios.

 

Publication details

Leng, G., Hall, J.W. Predicting spatial and temporal variability in crop yields: an inter-comparison of machine learning, regression and process-based models. Environmental Research Letters https://doi.org/10.1088/1748-9326/ab7b24