Developing machine learning models with multi-source environmental data to predict wheat yield in China

Publisher:
ELSEVIER SCI LTD
Publication Type:
Journal Article
Citation:
Computers and Electronics in Agriculture, 2022, 194
Issue Date:
2022-03-01
Filename Description Size
1-s2.0-S0168169922001077-main.pdf4.62 MB
Adobe PDF
Full metadata record
Crop yield is controlled by different environmental factors. Multi-source data for site-specific soils, climates, and remotely sensed vegetation indices are essential for yield prediction. Algorithms of data-model fusion for crop growth monitoring and yield prediction are complicated and need to be optimized to deal with model uncertainty. This study integrated multi-source environmental variables (e.g., satellite-based vegetation indices, climate data, and soil properties) into random forest (RF) and support vector machine (SVM) models for wheat yield prediction in China. The performance of both RF and SVM models was investigated using different types of vegetation indices associated with other predictors. Relative importance and partial dependence analyses were used to identify the main predictors and their relationships with wheat yield. We found that using remotely sensed vegetation indices improved our model precision, and that near-infrared reflectance of terrestrial vegetation (NIRv) was slightly better than normalized difference vegetation index (NDVI) and enhanced vegetation index (EVI) in predicting yield. NIRv was better in detecting climate stress on crops, and could capture more information regarding crop growth and yield formation. Compared with the SVM model, the RF model with NIRv and other covariates had better performance in wheat yield prediction, with R2 and RMSE being 0.74 and 758 kg/ha respectively. We also found that NIRv from jointing to heading was the most important predictor in determining yield, followed by solar radiation (especially during tillering–heading), relative humidity (during planting–tillering), soil organic carbon, and wind speed (throughout the growing season). In addition, wheat yield exhibited threshold-like responses to most factors based on our RF model. These threshold values can help to better understand how different environmental factors limit wheat yield, which will provide useful information for climate-adaptive crop management. Our findings demonstrated the potential of using NIRv for yield prediction. This approach is broadly applicable to other regions globally using publicly available data.
Please use this identifier to cite or link to this item: