Self-Learning Random Forests Model for Mapping Groundwater Yield in Data-Scarce Areas

Publication Type:
Journal Article
Natural Resources Research, 2019, 28 (3), pp. 757 - 775
Issue Date:
Filename Description Size
Sameen2018_Article_Self-LearningRandomForestsMode.pdfAccepted Manuscript Version14.46 MB
Adobe PDF
Full metadata record
© 2018, International Association for Mathematical Geosciences. Globally, groundwater plays a major role in supplying drinking water for urban and rural population and is used for irrigation to grow crops and in many industrial processes. A novel self-learning random forest (SLRF) model is developed and validated for groundwater yield zonation within the Yeondong Province in South Korea. This study was conducted with an inventory data initially divided randomly into 70% for training and 30% for testing and 13 groundwater-conditioning factors. SLRF was optimized using Bayesian optimization method. We also compared our method to other machine learning methods including support vector machine (SVM), artificial neural networks (ANN), decision trees (DT), and voting ensemble models. Model validation was accomplished using several methods, including a confusion matrix, receiver operating characteristics, cross-validation, and McNemar’s test. Our proposed self-learning method improves random forest (RF) generalization performance by about 23%, with SLRF success rates of 0.76 and prediction rates of 0.83. In addition, the optimized SLRF performed better [according to a threefold cross-validated AUC (area under curve) of 0.75] than that using randomly initialized parameters (0.57). SLRF outperformed all of the other models for the testing dataset (RF, SVM, ANN, DT, and Voted ANN-RF) when the overall accuracy, prediction rate, and cross-validated AUC metrics were considered. The SLRF also estimated the contribution of individual groundwater conditioning factors and showed that the three most influential factors were geology (1.00), profile curvature (0.97), and TWI (0.95). Overall, SLRF effectively modeled groundwater potential, even within data-scarce regions.
Please use this identifier to cite or link to this item: