scholarly journals Flood Early Warning Systems Using Machine Learning Techniques: The Case of the Tomebamba Catchment at the Southern Andes of Ecuador

Hydrology ◽  
2021 ◽  
Vol 8 (4) ◽  
pp. 183
Author(s):  
Paul Muñoz ◽  
Johanna Orellana-Alvear ◽  
Jörg Bendix ◽  
Jan Feyen ◽  
Rolando Célleri

Worldwide, machine learning (ML) is increasingly being used for developing flood early warning systems (FEWSs). However, previous studies have not focused on establishing a methodology for determining the most efficient ML technique. We assessed FEWSs with three river states, No-alert, Pre-alert and Alert for flooding, for lead times between 1 to 12 h using the most common ML techniques, such as multi-layer perceptron (MLP), logistic regression (LR), K-nearest neighbors (KNN), naive Bayes (NB), and random forest (RF). The Tomebamba catchment in the tropical Andes of Ecuador was selected as a case study. For all lead times, MLP models achieve the highest performance followed by LR, with f1-macro (log-loss) scores of 0.82 (0.09) and 0.46 (0.20) for the 1 h and 12 h cases, respectively. The ranking was highly variable for the remaining ML techniques. According to the g-mean, LR models correctly forecast and show more stability at all states, while the MLP models perform better in the Pre-alert and Alert states. The proposed methodology for selecting the optimal ML technique for a FEWS can be extrapolated to other case studies. Future efforts are recommended to enhance the input data representation and develop communication applications to boost the awareness of society of floods.

Author(s):  
Paul Muñoz ◽  
Johanna Orellana-Alvear ◽  
Jörg Bendix ◽  
Jan Feyen ◽  
Rolando Célleri

Flood Early Warning Systems (FEWSs) using Machine Learning (ML) has gained worldwide popularity. However, determining the most efficient ML technique is still a bottleneck. We assessed FEWSs with three river states, No-alert, Pre-alert, and Alert for flooding, for lead times between 1 to 12 hours using the most common ML techniques, such as Multi-Layer Perceptron (MLP), Logistic Regression (LR), K-Nearest Neighbors (KNN), Naive Bayes (NB), and Random Forest (RF). The Tomebamba catchment in the tropical Andes of Ecuador was selected as case study. For all lead times, MLP models achieve the highest performance followed by LR, with f1-macro (log-loss) scores of 0.82 (0.09) and 0.46 (0.20) for the 1- and 12-hour cases, respectively. The ranking was highly variable for the remaining ML techniques. According to the g-mean, LR models correctly forecast and show more stability at all states, while the MLP models perform better in the Pre-alert and Alert states. Future efforts are recommended to enhance the input data representation and develop communication applications to boost the awareness of the society for floods.


2021 ◽  
Author(s):  
Paul Muñoz ◽  
Johanna Orellana-Alvear ◽  
Jörg Bendix ◽  
Rolando Célleri

Abstract Short-rain floods, especially flash-floods, produce devastating impacts on society, the economy, and ecosystems. A key countermeasure is to develop Flood Early Warning Systems (FEWSs) aimed at forecasting flood warnings with sufficient lead time for decision making. Although Machine Learning (ML) techniques have gained popularity among hydrologists, the research question poorly answered is what is the best ML technique for flood forecasting? To answer this, we compare the efficiencies of FEWSs developed with the five most common ML techniques for flood forecasting, and for lead times between 1 to 12 hours. We use the Tomebamba catchment in the Ecuadorean Andes as a case study, with three warning classes to forecast No-alert, Pre-alert, and Alert of floods. For all lead times, the Multi-Layer Perceptron (MLP) technique achieves the highest model performances (f1-macro score) followed by Logistic Regression (LR), from 0.82 (1-hour) to 0.46 (12-hour). This ranking was confirmed by the log-loss scores, ranging from 0.09 (1-hour) to 0.20 (12-hour) for the above mentioned methods. Model performances decreased for the remaining ML techniques (K-Nearest Neighbors, Naive Bayes and Random Forest) but their ranking was highly variable and not conclusive. Moreover, according to the g-mean, LR models depict greater stability for correctly classifying all flood classes, whereas MLP models are specialized in the minority (Pre-alert and Alert) classes. To improve the performance and the applicability of FEWSs, we recommend future efforts to enhance input data representation and to develop communication applications between FEWSs and the public as tools to boost the preparedness of the society against floods.


2020 ◽  
Author(s):  
Paul Munoz ◽  
Johanna Orellana-Alvear ◽  
Jörg Bendix ◽  
Rolando Célleri

<p>Flood Early Warning Systems have globally become an effective tool to mitigate the adverse effects of this natural hazard on society, economy and environment. A novel approach for such systems is to actually forecast flood events rather than merely monitoring the catchment hydrograph evolution on its way to an inundation site. A wide variety of modelling approaches, from fully-physical to data-driven, have been developed depending on the availability of information describing intrinsic catchment characteristics. However, during last decades, the use of Machine Learning techniques has remarkably gained popularity due to its power to forecast floods at a minimum of demanded data and computational cost. Here, we selected the algorithms most commonly employed for flood prediction (K-nearest Neighbors, Logistic Regression, Random Forest, Naïve Bayes and Neural Networks), and used them in a precipitation-runoff classification problem aimed to forecast the inundation state of a river at a decisive control station. These are “No-alert”, “Pre-alert”, and “Alert” of inundation with varying lead times of 1, 4, 8 and 12 hours. The study site is a 300-km2 catchment in the tropical Andes draining to Cuenca, the third most populated city of Ecuador. Cuenca is susceptible to annual floods, and thus, the generated alerts will be used by local authorities to inform the population on upcoming flood risks. For an integral comparison between forecasting models, we propose a scheme relying on the F1-score, the Geometric mean and the Log-loss score to account for the resulting data imbalance and the multiclass classification problem. Furthermore, we used the Chi-Squared test to ensure that differences in model results were due to the algorithm applied and not due to statistical chance. We reveal that the most effective model according to the F1-score is using the Neural Networks technique (0.78, 0.62, 0.51 and 0.46 for the test subsets of the 1, 4, 8 and 12-hour forecasting scenarios, respectively), followed by the Logistic Regression algorithm. For the remaining algorithms, we found F1-score differences between the best and the worse model inversely proportional to the lead time (i.e., differences between models were more pronounced for shorter lead times). Moreover, the Geometric mean and the Log-log score showed similar patterns of degradation of the forecast ability with lead time for all algorithms. The overall higher scores found for the Neural Networks technique suggest this algorithm as the engine for the best forecasting Early Warning Systems of the city. For future research, we recommend further analyses on the effect of input data composition and on the architecture of the algorithm for full exploitation of its capacity, which would lead to an improvement of model performance and an extension of the lead time. The usability and effectiveness of the developed systems will depend, however, on the speed of communication to the public after an inundation signal is indicated. We suggest to complement our systems with a website and/or mobile application as a tool to boost the preparedness against floods for both decision makers and the public.</p><p>Keywords: Flood; forecasting; Early Warning; Machine Learning; Tropical Andes; Ecuador.</p>


2020 ◽  
Vol 122 (14) ◽  
pp. 1-30
Author(s):  
James Soland ◽  
Benjamin Domingue ◽  
David Lang

Background/Context Early warning indicators (EWI) are often used by states and districts to identify students who are not on track to finish high school, and provide supports/interventions to increase the odds the student will graduate. While EWI are diverse in terms of the academic behaviors they capture, research suggests that indicators like course failures, chronic absenteeism, and suspensions can help identify students in need of additional supports. In parallel with the expansion of administrative data that have made early versions of EWI possible, new machine learning methods have been developed. These methods are data-driven and often designed to sift through thousands of variables with the purpose of identifying the best predictors of a given outcome. While applications of machine learning techniques to identify students at-risk of high school dropout have obvious appeal, few studies consider the benefits and limitations of applying those models in an EWI context, especially as they relate to questions of fairness and equity. Focus of Study In this study, we will provide applied examples of how machine learning can be used to support EWI selection. The purpose is to articulate the broad risks and benefits of using machine learning methods to identify students who may be at risk of dropping out. We focus on dropping out given its salience in the EWI literature, but also anticipate generating insights that will be germane to EWI used for a variety of outcomes. Research Design We explore these issues by using several hypothetical examples of how ML techniques might be used to identify EWI. For example, we show results from decision tree algorithms used to identify predictors of dropout that use simulated data. Conclusions/Recommendations Generally, we argue that machine learning techniques have several potential benefits in the EWI context. For example, some related methods can help create clear decision rules for which students are a dropout risk, and their predictive accuracy can be higher than for more traditional, regression-based models. At the same time, these methods often require additional statistical and data management expertise to be used appropriately. Further, the black-box nature of machine learning algorithms could invite their users to interpret results through the lens of preexisting biases about students and educational settings.


Landslides ◽  
2020 ◽  
Vol 17 (9) ◽  
pp. 2231-2246
Author(s):  
Hemalatha Thirugnanam ◽  
Maneesha Vinodini Ramesh ◽  
Venkat P. Rangan

2018 ◽  
Vol 3 ◽  
Author(s):  
Andreas Baumann

Machine learning is a powerful method when working with large data sets such as diachronic corpora. However, as opposed to standard techniques from inferential statistics like regression modeling, machine learning is less commonly used among phonological corpus linguists. This paper discusses three different machine learning techniques (K nearest neighbors classifiers; Naïve Bayes classifiers; artificial neural networks) and how they can be applied to diachronic corpus data to address specific phonological questions. To illustrate the methodology, I investigate Middle English schwa deletion and when and how it potentially triggered reduction of final /mb/ clusters in English.


Advancement in medical science has always been one of the most vital aspects of the human race. With the progress in technology, the use of modern techniques and equipment is always imposed on treatment purposes. Nowadays, machine learning techniques have widely been used in medical science for assuring accuracy. In this work, we have constructed computational model building techniques for liver disease prediction accurately. We used some efficient classification algorithms: Random Forest, Perceptron, Decision Tree, K-Nearest Neighbors (KNN), and Support Vector Machine (SVM) for predicting liver diseases. Our works provide the implementation of hybrid model construction and comparative analysis for improving prediction performance. At first, classification algorithms are applied to the original liver patient datasets collected from the UCI repository. Then we analyzed features and tweaked to improve the performance of our predictor and made a comparative analysis among the classifiers. We examined that, KNN algorithm outperformed all other techniques with feature selection.


2021 ◽  
pp. 1-29
Author(s):  
Ahmed Alsaihati ◽  
Mahmoud Abughaban ◽  
Salaheldin Elkatatny ◽  
Abdulazeez Abdulraheem

Abstract Fluid loss into formations is a common operational issue that is frequently encountered when drilling across naturally or induced fractured formations. This could pose significant operational risks, such as well-control, stuck pipe, and wellbore instability, which, in turn, lead to an increase of well time and cost. This research aims to use and evaluate different machine learning techniques, namely: support vector machines, random forests, and K-nearest neighbors in detecting loss circulation occurrences while drilling using solely drilling surface parameters. Actual field data of seven wells, which had suffered partial or severe loss circulation, were used to build predictive models, while Well-8 was used to compare the performance of the developed models. Different performance metrics were used to evaluate the performance of the developed models. Recall, precision, and F1-score measures were used to evaluate the ability of the developed model to detect loss circulation occurrences. The results showed the K-nearest neighbors classifier achieved a high F1-score of 0.912 in detecting loss circulation occurrence in the testing set, while the random forests was the second-best classifier with almost the same F1-score of 0.910. The support vector machines achieved an F1-score of 0.83 in predicting the loss circulation occurrence in the testing set. The K-nearest neighbors outperformed other models in detecting the loss circulation occurrences in Well-8 with an F1-score of 0.80. The main contribution of this research as compared to previous studies is that it identifies losses events based on real-time measurements of the active pit volume.


2017 ◽  
Vol 17 (3) ◽  
pp. 423-437 ◽  
Author(s):  
Paul J. Smith ◽  
Sarah Brown ◽  
Sumit Dugar

Abstract. This paper focuses on the use of community-based early warning systems for flood resilience in Nepal. The first part of the work outlines the evolution and current status of these community-based systems, highlighting the limited lead times currently available for early warning. The second part of the paper focuses on the development of a robust operational flood forecasting methodology for use by the Nepal Department of Hydrology and Meteorology (DHM) to enhance early warning lead times. The methodology uses data-based physically interpretable time series models and data assimilation to generate probabilistic forecasts, which are presented in a simple visual tool. The approach is designed to work in situations of limited data availability with an emphasis on sustainability and appropriate technology. The successful application of the forecast methodology to the flood-prone Karnali River basin in western Nepal is outlined, increasing lead times from 2–3 to 7–8 h. The challenges faced in communicating probabilistic forecasts to the last mile of the existing community-based early warning systems across Nepal is discussed. The paper concludes with an assessment of the applicability of this approach in basins and countries beyond Karnali and Nepal and an overview of key lessons learnt from this initiative.


2021 ◽  
Author(s):  
Moohanad Jawthari ◽  
Veronika Stoffová

AbstractThe target (dependent) variable is often influenced not only by ratio scale variables, but also by qualitative (nominal scale) variables in classification analysis. Majority of machine learning techniques accept only numerical inputs. Hence, it is necessary to encode these categorical variables into numerical values using encoding techniques. If the variable does not have relation or order between its values, assigning numbers will mislead the machine learning techniques. This paper presents a modified k-nearest-neighbors algorithm that calculates the distances values of categorical (nominal) variables without encoding them. A student’s academic performance dataset is used for testing the enhanced algorithm. It shows that the proposed algorithm outperforms standard one that needs nominal variables encoding to calculate the distance between the nominal variables. The results show the proposed algorithm preforms 14% better than standard one in accuracy, and it is not sensitive to outliers.


Sign in / Sign up

Export Citation Format

Share Document