Water Hazard Prediction using Machine Learning

Water is the most essential need of all life forms. This essential need can also create hazards to us which comes in the form of water hazards (flood and drought). Catastrophic events, for example, flood is respected to be brought about by outrageous climate conditions just as changes in worldwide and territorial atmosphere. If precautions are not taken beforehand it becomes more and more difficult to control when it occurs. This study aimed to forecast both flood and drought using Machine Learning (ML). So as to have a clear and precise forecast of flood and drought hazard is fundamental to play out a specific and multivariate analysis among the various kinds of data sets. Multi variate Analysis means that all measurable strategies will concurrently analyses manifold variables. Among multi variate investigation, ML will give expanding levels of exactness, accuracy, and productivity by finding designs in enormous and variegated data sets. Basically, ML methods naturally acquires proficiency data from dataset. This is finished by the way toward learning, by which the calculation can sum up past the models given via preparing information in info. AI is intriguing for forecasts since it adjusts the goal methodologies to the highlights of the data set. This uniqueness can be utilized to foresee outrageous from high factor information, as on account of the risks. This paper proposes systems and contextual analysis on the application on ML calculations on water hazard occurrence forecast. Especially the examination will concentrate on the utilization of Support Vector Machines and Artificial Neural Networks on a multivariate arrangement of information identified with water level of lakes in and around Chennai and measurement of rainfall in the lakes.

Download Full-text

A Review of Machine Learning Techniques for Anomaly Detection in Static Graphs

Implementing Computational Intelligence Techniques for Security Systems Design - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-7998-2418-3.ch007 ◽

2020 ◽

pp. 146-162

Author(s):

Hesham M. Al-Ammal

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Anomaly Detection ◽

Real Life ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Methods ◽

Data Set ◽

Learning Techniques ◽

Vector Machines

Detection of anomalies in a given data set is a vital step in several applications in cybersecurity; including intrusion detection, fraud, and social network analysis. Many of these techniques detect anomalies by examining graph-based data. Analyzing graphs makes it possible to capture relationships, communities, as well as anomalies. The advantage of using graphs is that many real-life situations can be easily modeled by a graph that captures their structure and inter-dependencies. Although anomaly detection in graphs dates back to the 1990s, recent advances in research utilized machine learning methods for anomaly detection over graphs. This chapter will concentrate on static graphs (both labeled and unlabeled), and the chapter summarizes some of these recent studies in machine learning for anomaly detection in graphs. This includes methods such as support vector machines, neural networks, generative neural networks, and deep learning methods. The chapter will reflect the success and challenges of using these methods in the context of graph-based anomaly detection.

Download Full-text

Learning to Identify At-Risk Students in Distance Education Using Interaction Counts

Revista de Informática Teórica e Aplicada ◽

10.22456/2175-2745.62211 ◽

2016 ◽

Vol 23 (2) ◽

pp. 124 ◽

Cited By ~ 2

Author(s):

Douglas Detoni ◽

Cristian Cechinel ◽

Ricardo Araujo Matsumura ◽

Daniela Francisco Brauner

Keyword(s):

Machine Learning ◽

At Risk ◽

At Risk Students ◽

Drop Out ◽

Support Vector ◽

Learning Models ◽

Data Set ◽

Student Dropout ◽

Vector Machines ◽

Machine Learning Models

Student dropout is one of the main problems faced by distance learning courses. One of the major challenges for researchers is to develop methods to predict the behavior of students so that teachers and tutors are able to identify at-risk students as early as possible and provide assistance before they drop out or fail in their courses. Machine Learning models have been used to predict or classify students in these settings. However, while these models have shown promising results in several settings, they usually attain these results using attributes that are not immediately transferable to other courses or platforms. In this paper, we provide a methodology to classify students using only interaction counts from each student. We evaluate this methodology on a data set from two majors based on the Moodle platform. We run experiments consisting of training and evaluating three machine learning models (Support Vector Machines, Naive Bayes and Adaboost decision trees) under different scenarios. We provide evidences that patterns from interaction counts can provide useful information for classifying at-risk students. This classification allows the customization of the activities presented to at-risk students (automatically or through tutors) as an attempt to avoid students drop out.

Download Full-text

The Application of Machine Learning to a General Risk–Need Assessment Instrument in the Prediction of Criminal Recidivism

Criminal Justice and Behavior ◽

10.1177/0093854820969753 ◽

2020 ◽

pp. 009385482096975

Author(s):

Mehdi Ghasemi ◽

Daniel Anvari ◽

Mahshid Atapour ◽

J. Stephen wormith ◽

Keira C. Stockdale ◽

...

Keyword(s):

Machine Learning ◽

Predictive Accuracy ◽

Characteristic Curve ◽

Assessment Instrument ◽

Support Vector ◽

Data Set ◽

Applied Machine Learning ◽

Vector Machines ◽

Individual Scores

The Level of Service/Case Management Inventory (LS/CMI) is one of the most frequently used tools to assess criminogenic risk–need in justice-involved individuals. Meta-analytic research demonstrates strong predictive accuracy for various recidivism outcomes. In this exploratory study, we applied machine learning (ML) algorithms (decision trees, random forests, and support vector machines) to a data set with nearly 100,000 LS/CMI administrations to provincial corrections clientele in Ontario, Canada, and approximately 3 years follow-up. The overall accuracies and areas under the receiver operating characteristic curve (AUCs) were comparable, although ML outperformed LS/CMI in terms of predictive accuracy for the middle scores where it is hardest to predict the recidivism outcome. Moreover, ML improved the AUCs for individual scores to near 0.60, from 0.50 for the LS/CMI, indicating that ML also improves the ability to rank individuals according to their probability of recidivating. Potential considerations, applications, and future directions are discussed.

Download Full-text

Prediction of Heart Disease using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1081.0982s1019 ◽

2019 ◽

Vol 8 (2S10) ◽

pp. 474-477

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Support Vector Machines ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Data Set ◽

Vector Machines ◽

Naive Bayes Classification ◽

Naïve Bayes Classification

Machine learning is one of the fast growing aspect in current world. Machine learning (ML) and Artificial Neural Network (ANN) are helpful in detection and diagnosis of various heart diseases. Naïve Bayes Classification is a vital approach of classification in machine learning. The heart disease consists of set of range disorders affecting the heart. It includes blood vessel problems such as irregular heart beat issues, weak heart muscles, congenital heart defects, cardio vascular disease and coronary artery disease. Coronary heart disorder is a familiar type of heart disease. It reduces the blood flow to the heart leading to a heart attack. In this paper the UCI machine learning repository data set consisting of patients suffering from heart disease is analyzed using Naïve Bayes classification and support vector machines. The classification accuracy of the patients suffering from heart disease is predicted using Naïve Bayes classification and support vector machines. Implementation is done using R language.

Download Full-text

Social Reminiscence in Older Adults’ Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning (Preprint)

10.2196/preprints.19133 ◽

2020 ◽

Author(s):

Andrea Ferrario ◽

Burcu Demiray ◽

Kristina Yordanova ◽

Minxia Luo ◽

Mike Martin

Keyword(s):

Machine Learning ◽

Older Adults ◽

Support Vector Machines ◽

Learning Strategies ◽

Support Vector ◽

Bag Of Words ◽

Word Embeddings ◽

Data Set ◽

Extreme Gradient Boosting ◽

Vector Machines

BACKGROUND Reminiscence is the act of thinking or talking about personal experiences that occurred in the past. It is a central task of old age that is essential for healthy aging, and it serves multiple functions, such as decision-making and introspection, transmitting life lessons, and bonding with others. The study of social reminiscence behavior in everyday life can be used to generate data and detect reminiscence from general conversations. OBJECTIVE The aims of this original paper are to (1) preprocess coded transcripts of conversations in German of older adults with natural language processing (NLP), and (2) implement and evaluate learning strategies using different NLP features and machine learning algorithms to detect reminiscence in a corpus of transcripts. METHODS The methods in this study comprise (1) collecting and coding of transcripts of older adults’ conversations in German, (2) preprocessing transcripts to generate NLP features (bag-of-words models, part-of-speech tags, pretrained German word embeddings), and (3) training machine learning models to detect reminiscence using random forests, support vector machines, and adaptive and extreme gradient boosting algorithms. The data set comprises 2214 transcripts, including 109 transcripts with reminiscence. Due to class imbalance in the data, we introduced three learning strategies: (1) class-weighted learning, (2) a meta-classifier consisting of a voting ensemble, and (3) data augmentation with the Synthetic Minority Oversampling Technique (SMOTE) algorithm. For each learning strategy, we performed cross-validation on a random sample of the training data set of transcripts. We computed the area under the curve (AUC), the average precision (AP), precision, recall, as well as F1 score and specificity measures on the test data, for all combinations of NLP features, algorithms, and learning strategies. RESULTS Class-weighted support vector machines on bag-of-words features outperformed all other classifiers (AUC=0.91, AP=0.56, precision=0.5, recall=0.45, F1=0.48, specificity=0.98), followed by support vector machines on SMOTE-augmented data and word embeddings features (AUC=0.89, AP=0.54, precision=0.35, recall=0.59, F1=0.44, specificity=0.94). For the meta-classifier strategy, adaptive and extreme gradient boosting algorithms trained on word embeddings and bag-of-words outperformed all other classifiers and NLP features; however, the performance of the meta-classifier learning strategy was lower compared to other strategies, with highly imbalanced precision-recall trade-offs. CONCLUSIONS This study provides evidence of the applicability of NLP and machine learning pipelines for the automated detection of reminiscence in older adults’ everyday conversations in German. The methods and findings of this study could be relevant for designing unobtrusive computer systems for the real-time detection of social reminiscence in the everyday life of older adults and classifying their functions. With further improvements, these systems could be deployed in health interventions aimed at improving older adults’ well-being by promoting self-reflection and suggesting coping strategies to be used in the case of dysfunctional reminiscence cases, which can undermine physical and mental health.

Download Full-text

Integrating Label Uncertainty in Ultrasound Image Classification using Weighted Support Vector Machines

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2019-0072 ◽

2019 ◽

Vol 5 (1) ◽

pp. 285-287

Author(s):

Jannis Hagenah ◽

Sascha Leymann ◽

Floris Ernst

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Image Classification ◽

Training Data ◽

Support Vector ◽

Ultrasound Images ◽

Data Set ◽

Vector Machines ◽

Data Point ◽

Multiple Experts

AbstractInference from medical image data using machine learning still suffers from the disregard of label uncertainty. Usually, medical images are labeled by multiple experts. However, the uncertainty of this training data, assessible as the unity of opinions of observers, is neglected as training is commonly performed on binary decision labels. In this work, we present a novel method to incorporate this label uncertainty into the learning problem using weighted Support Vector Machines (wSVM). The idea is to assign an uncertainty score to each data point. The score is between 0 and 1 and is calculated based on the unity of opinions of all observers, where u = 1 if all observers have the same opinion and u = 0 if the observers opinions are exactly 50/50, with linear interpolation in between. This score is integrated in the Support Vector Machine (SVM) optimization as a weighting of errors made for the corresponding data point. For evaluation, we asked 15 observers to label 48 2D ultrasound images of aortic roots addressing whether the images show a healthy or a pathologically dilated anatomy, where the ground truth was known. As the observers were not trained experts, a high diversity of opinions was present in the data set. We performed image classification using both approaches, i.e. classical SVM and wSVM with integrated uncertainty weighting, utilizing 10-fold Cross Validation, respectively (linear kernel, C = 7). By incorporating the observer uncertainty, the classification accuracy could be improved by 3.1 percentage points (SVM: 83.5%, wSVM: 86.6%). This indicates that integrating information on the observers’ unity of opinions increases the generalization performance of the classifier and that uncertainty weighted wSVM could present a promising method for machine learning in the medical domain.

Download Full-text

Text and metadata extraction from scanned Arabic documents using support vector machines

Journal of Information Science ◽

10.1177/0165551520961256 ◽

2020 ◽

pp. 016555152096125

Author(s):

Wenda Qin ◽

Randa Elanwar ◽

Margrit Betke

Keyword(s):

Support Vector Machines ◽

State Of The Art ◽

Support Vector ◽

Data Sets ◽

Layout Analysis ◽

Data Set ◽

Metadata Extraction ◽

Vector Machines ◽

Text Information ◽

Multiple Support Vector Machines

Text information in scanned documents becomes accessible only when extracted and interpreted by a text recognizer. For a recognizer to work successfully, it must have detailed location information about the regions of the document images that it is asked to analyse. It will need focus on page regions with text skipping non-text regions that include illustrations or photographs. However, text recognizers do not work as logical analyzers. Logical layout analysis automatically determines the function of a document text region, that is, it labels each region as a title, paragraph, or caption, and so on, and thus is an essential part of a document understanding system. In the past, rule-based algorithms have been used to conduct logical layout analysis, using limited size data sets. We here instead focus on supervised learning methods for logical layout analysis. We describe LABA, a system based on multiple support vector machines to perform logical Layout Analysis of scanned Books pages in Arabic. The system detects the function of a text region based on the analysis of various images features and a voting mechanism. For a baseline comparison, we implemented an older but state-of-the-art neural network method. We evaluated LABA using a data set of scanned pages from illustrated Arabic books and obtained high recall and precision values. We also found that the F-measure of LABA is higher for five of the tested six classes compared to the state-of-the-art method.

Download Full-text

Generalized Support Vector Machines (GSVMs) model for real-world time series forecasting

10.21203/rs.3.rs-180407/v1 ◽

2021 ◽

Author(s):

Mehrnaz Ahmadi ◽

Mehdi Khashei

Keyword(s):

Support Vector Machine ◽

Support Vector Machines ◽

Real World ◽

Support Vector ◽

Data Sets ◽

Fuzzy Support Vector Machine ◽

Data Set ◽

Proposed Model ◽

Vector Machines ◽

Forecasting Performance

Abstract Support vector machines (SVMs) are one of the most popular and widely-used approaches in modeling. Various kinds of SVM models have been developed in the literature of prediction and classification in order to cover different purposes. Fuzzy and crisp support vector machines are a well-known branch of modeling approaches that frequently applied for certain and uncertain modeling, respectively. However, each of these models can only be efficiently used in its specified domain and cannot yield appropriate and accurate results if the opposite situations have occurred. While the real-world systems and data sets often contain both certain and uncertain patterns that are complicatedly mixed together and need to be simultaneously modeled. In this paper, a generalized support vector machine (GSVM) is proposed that can simultaneously benefit the unique advantages of certain and uncertain versions of the traditional support vector machines in their own specialized categories. In the proposed model, the underlying data set is first categorized into two classes of certain and uncertain patterns. Then, certain patterns are modeled by a support vector machine, and uncertain patterns are modeled by a fuzzy support vector machine. After that, the function of the relationship, as well as the relative importance of each component, are estimated by another support vector machine, and subsequently, the final forecasts of the proposed model are calculated. Empirical results of wind speed forecasting indicate that the proposed method not only can achieve more accurate results than support vector machines (SVMs) and fuzzy support vector machines (FSVMs) but also can yield better forecasting performance than traditional fuzzy and nonfuzzy single models and traditional preprocessing-based hybrid models of SVMs.

Download Full-text

Various Classifiers Performance Based Machine Learning Methods

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c1069.1183s319 ◽

2019 ◽

Vol 8 (3S3) ◽

pp. 305-310 ◽

Cited By ~ 1

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Research Work ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Vector Machines ◽

Machine Learning Approach ◽

Future Work ◽

Neighbor Classifier

Classification is a form of data mining (regarding machine learning) approach that is helpful in the prediction of group membership for data instances, where the data input is used by the computer program for learning and thereafter this learning is used for classifying the fresh observation made. This data set might just be bi-class or it can be multi-class also. Few instances of the problems in classification include: speech identification, handwriting identification, bio metric detection, document classification etc. Many classification methods exist, which can be utilized for classification. In this research work, the fundamental classification approaches and few important kinds of classification approaches that include decision tree induction, Bayesian networks,k-nearest neighbor classifier and Support Vector Machines (SVM) and fuzzy learning classifiers with their merits, drawbacks, probable applications and challenges faced with the solution available. There are different problems that have an effect on the classification and prediction. The objective of this research work is to render an extensive review of various classification approaches in machine learning. At last, the future work intended on the best classification techniques for the input data are discussed.

Download Full-text

Social Reminiscence in Older Adults’ Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning

Journal of Medical Internet Research ◽

10.2196/19133 ◽

2020 ◽

Vol 22 (9) ◽

pp. e19133 ◽

Cited By ~ 1

Author(s):

Andrea Ferrario ◽

Burcu Demiray ◽

Kristina Yordanova ◽

Minxia Luo ◽

Mike Martin

Keyword(s):

Machine Learning ◽

Older Adults ◽

Support Vector Machines ◽

Learning Strategies ◽

Support Vector ◽

Bag Of Words ◽

Word Embeddings ◽

Data Set ◽

Extreme Gradient Boosting ◽

Vector Machines

Background Reminiscence is the act of thinking or talking about personal experiences that occurred in the past. It is a central task of old age that is essential for healthy aging, and it serves multiple functions, such as decision-making and introspection, transmitting life lessons, and bonding with others. The study of social reminiscence behavior in everyday life can be used to generate data and detect reminiscence from general conversations. Objective The aims of this original paper are to (1) preprocess coded transcripts of conversations in German of older adults with natural language processing (NLP), and (2) implement and evaluate learning strategies using different NLP features and machine learning algorithms to detect reminiscence in a corpus of transcripts. Methods The methods in this study comprise (1) collecting and coding of transcripts of older adults’ conversations in German, (2) preprocessing transcripts to generate NLP features (bag-of-words models, part-of-speech tags, pretrained German word embeddings), and (3) training machine learning models to detect reminiscence using random forests, support vector machines, and adaptive and extreme gradient boosting algorithms. The data set comprises 2214 transcripts, including 109 transcripts with reminiscence. Due to class imbalance in the data, we introduced three learning strategies: (1) class-weighted learning, (2) a meta-classifier consisting of a voting ensemble, and (3) data augmentation with the Synthetic Minority Oversampling Technique (SMOTE) algorithm. For each learning strategy, we performed cross-validation on a random sample of the training data set of transcripts. We computed the area under the curve (AUC), the average precision (AP), precision, recall, as well as F1 score and specificity measures on the test data, for all combinations of NLP features, algorithms, and learning strategies. Results Class-weighted support vector machines on bag-of-words features outperformed all other classifiers (AUC=0.91, AP=0.56, precision=0.5, recall=0.45, F1=0.48, specificity=0.98), followed by support vector machines on SMOTE-augmented data and word embeddings features (AUC=0.89, AP=0.54, precision=0.35, recall=0.59, F1=0.44, specificity=0.94). For the meta-classifier strategy, adaptive and extreme gradient boosting algorithms trained on word embeddings and bag-of-words outperformed all other classifiers and NLP features; however, the performance of the meta-classifier learning strategy was lower compared to other strategies, with highly imbalanced precision-recall trade-offs. Conclusions This study provides evidence of the applicability of NLP and machine learning pipelines for the automated detection of reminiscence in older adults’ everyday conversations in German. The methods and findings of this study could be relevant for designing unobtrusive computer systems for the real-time detection of social reminiscence in the everyday life of older adults and classifying their functions. With further improvements, these systems could be deployed in health interventions aimed at improving older adults’ well-being by promoting self-reflection and suggesting coping strategies to be used in the case of dysfunctional reminiscence cases, which can undermine physical and mental health.

Download Full-text