Factors affecting landslide susceptibility mapping: Assessing the influence of different machine learning approaches, sampling strategies and data splitting

Abraham, MT; Satyam, N; Lokesh, R; Pradhan, B; Alamri, A

Factors affecting landslide susceptibility mapping: Assessing the influence of different machine learning approaches, sampling strategies and data splitting

Abraham, MT Satyam, N Lokesh, R Pradhan, B

Alamri, A

Permalink

Publisher:: MDPI AG
Publication Type:: Journal Article
Citation:: Land, 2021, 10, (9), pp. 1-24
Issue Date:: 2021-09-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download full textAdobe PDF (10.73 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Abraham, MT
dc.contributor.author	Satyam, N
dc.contributor.author	Lokesh, R
dc.contributor.author	Pradhan, B https://orcid.org/0000-0001-9863-2054
dc.contributor.author	Alamri, A
dc.date.accessioned	2022-02-18T03:14:51Z
dc.date.available	2022-02-18T03:14:51Z
dc.date.issued	2021-09-01
dc.identifier.citation	Land, 2021, 10, (9), pp. 1-24
dc.identifier.issn	2073-445X
dc.identifier.issn	2073-445X
dc.identifier.uri	http://hdl.handle.net/10453/154683
dc.description.abstract	Data driven methods are widely used for the development of Landslide Susceptibility Mapping (LSM). The results of these methods are sensitive to different factors, such as the quality of input data, choice of algorithm, sampling strategies, and data splitting ratios. In this study, five different Machine Learning (ML) algorithms are used for LSM for the Wayanad district in Kerala, India, using two different sampling strategies and nine different train to test ratios in cross validation. The results show that Random Forest (RF), K Nearest Neighbors (KNN), and Support Vector Machine (SVM) algorithms provide better results than Naïve Bayes (NB) and Logistic Regression (LR) for the study area. NB and LR algorithms are less sensitive to the sampling strategy and data splitting, while the performance of the other three algorithms is considerably influenced by the sampling strategy. From the results, both the choice of algorithm and sampling strategy are critical in obtaining the best suited landslide susceptibility map for a region. The accuracies of KNN, RF, and SVM algorithms have increased by 10.51%, 10.02%, and 4.98% with the use of polygon landslide inventory data, while for NB and LR algorithms, the performance was slightly reduced with the use of polygon data. Thus, the sampling strategy and data splitting ratio are less consequential with NB and algorithms, while more data points provide better results for KNN, RF, and SVM algorithms.
dc.language	en
dc.publisher	MDPI AG
dc.relation.ispartof	Land
dc.relation.isbasedon	10.3390/land10090989
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	0502 Environmental Science and Management
dc.title	Factors affecting landslide susceptibility mapping: Assessing the influence of different machine learning approaches, sampling strategies and data splitting
dc.type	Journal Article
utslib.citation.volume	10
utslib.for	0502 Environmental Science and Management
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Civil and Environmental Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - CAMGIS - Centre for Advanced Modelling and Geospatial lnformation Systems
utslib.copyright.status	open_access	*
pubs.consider-herdc	false
dc.date.updated	2022-02-18T03:14:46Z
pubs.issue	9
pubs.publication-status	Published
pubs.volume	10
utslib.citation.issue	9

Abstract:

Data driven methods are widely used for the development of Landslide Susceptibility Mapping (LSM). The results of these methods are sensitive to different factors, such as the quality of input data, choice of algorithm, sampling strategies, and data splitting ratios. In this study, five different Machine Learning (ML) algorithms are used for LSM for the Wayanad district in Kerala, India, using two different sampling strategies and nine different train to test ratios in cross validation. The results show that Random Forest (RF), K Nearest Neighbors (KNN), and Support Vector Machine (SVM) algorithms provide better results than Naïve Bayes (NB) and Logistic Regression (LR) for the study area. NB and LR algorithms are less sensitive to the sampling strategy and data splitting, while the performance of the other three algorithms is considerably influenced by the sampling strategy. From the results, both the choice of algorithm and sampling strategy are critical in obtaining the best suited landslide susceptibility map for a region. The accuracies of KNN, RF, and SVM algorithms have increased by 10.51%, 10.02%, and 4.98% with the use of polygon landslide inventory data, while for NB and LR algorithms, the performance was slightly reduced with the use of polygon data. Thus, the sampling strategy and data splitting ratio are less consequential with NB and algorithms, while more data points provide better results for KNN, RF, and SVM algorithms.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/154683