Exploring potential machine learning application based on big data for prediction of wastewater quality from different full-scale wastewater treatment plants.

Publisher:
Elsevier BV
Publication Type:
Journal Article
Citation:
Sci Total Environ, 2022, 832, pp. 154930
Issue Date:
2022-08-01
Filename Description Size
Exploring potential machine learning application.pdfAccepted version2.08 MB
Adobe PDF
Full metadata record
Water pollution generated from intensive anthropogenic activities has emerged as a critical issue concerning ecosystem balance and livelihoods worldwide. Although optimizing wastewater treatment efficiency is widely regarded as the foremost step to minimize pollutants released into the environment, this widespread application has encountered two major problems: firstly, the significant variation of influent wastewater constituents; secondly, complex treatment processes within wastewater treatment plants (WWTPs). Based on the data collected hourly using real-time sensors in three different full-scale WWTPs (24 h × 365 days × 3 WWTPs × 10 wastewater parameters), this work introduced the potential application of Machine Learning (ML) to predict wastewater quality. In this work, six different ML algorithms were examined and compared, varying from shallow to deep learning architectures including Seasonal Autoregressive Integrated Moving Average (SARIMAX), Random Forest (RF), Support Vector Machine (SVM), Gradient Tree Boosting (GTB), Adaptive Neuro-Fuzzy Inference System (ANFIS) and Long Short-Term Memory (LSTM). These models were developed to detect total phosphorus in the outlet (Outlet-TP), which served as an output variable due to the rising concerns about the eutrophication problem. Irrespective of WWTPs, SARIMAX consistently demonstrated the best performance for regression estimation as evidenced by the lowest values of Mean Square Error (MSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE) and the highest coefficient of determination (R2). In terms of computation efficiency, SARIMAX exhibited acceptable time computation, acknowledging the successful application of this algorithm for Outlet-TP modeling. In contrast, the complex structure of LSTM made it time-consuming and unstable coupled with noise, while other shallower architectures, i.e., RF, SVM, GTB, and ANFIS were unable to address large datasets with nonlinear and nonstationary behavior. Consequently, this study provides a reliable and accurate approach to forecast wastewater effluent quality, which is pivotal in terms of the socio-economic aspects of wastewater management.
Please use this identifier to cite or link to this item: