Anomaly Detection in Dam Behaviour with Machine Learning Classification Models

Dam safety assessment is typically made by comparison between the outcome of some predictive model and measured monitoring data. This is done separately for each response variable, and the results are later interpreted before decision making. In this work, three approaches based on machine learning classifiers are evaluated for the joint analysis of a set of monitoring variables: multi-class, two-class and one-class classification. Support vector machines are applied to all prediction tasks, and random forest is also used for multi-class and two-class. The results show high accuracy for multi-class classification, although the approach has limitations for practical use. The performance in two-class classification is strongly dependent on the features of the anomalies to detect and their similarity to those used for model fitting. The one-class classification model based on support vector machines showed high prediction accuracy, while avoiding the need for correctly selecting and modelling the potential anomalies. A criterion for anomaly detection based on model predictions is defined, which results in a decrease in the misclassification rate. The possibilities and limitations of all three approaches for practical use are discussed.

Download Full-text

Combination with Machine Learning Algorithms for the Classification in E-Bussiness

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.230-232.625 ◽

2011 ◽

Vol 230-232 ◽

pp. 625-628

Author(s):

Lei Shi ◽

Xin Ming Ma ◽

Xiao Hong Hu

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Set Theory ◽

Rough Set ◽

Rough Set Theory ◽

Machine Learning Algorithms ◽

Classification Model ◽

Support Vector ◽

Mathematical Tool ◽

Vector Machines

E-bussiness has grown rapidly in the last decade and massive amount of data on customer purchases, browsing pattern and preferences has been generated. Classification of electronic data plays a pivotal role to mine the valuable information and thus has become one of the most important applications of E-bussiness. Support Vector Machines are popular and powerful machine learning techniques, and they offer state-of-the-art performance. Rough set theory is a formal mathematical tool to deal with incomplete or imprecise information and one of its important applications is feature selection. In this paper, rough set theory and support vector machines are combined to construct a classification model to classify the data of E-bussiness effectively.

Download Full-text

Twitter sentiment analysis for the estimation of voting intention in the 2017 Chilean elections

Intelligent Data Analysis ◽

10.3233/ida-194768 ◽

2020 ◽

Vol 24 (5) ◽

pp. 1141-1160

Author(s):

Tomás Alegre Sepúlveda ◽

Brian Keith Norambuena

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Sentiment Analysis ◽

Classification Model ◽

Machine Learning Techniques ◽

Support Vector ◽

Traditional Methods ◽

Actual Result ◽

Learning Techniques ◽

Vector Machines

In this paper, we apply sentiment analysis methods in the context of the first round of the 2017 Chilean elections. The purpose of this work is to estimate the voting intention associated with each candidate in order to contrast this with the results from classical methods (e.g., polls and surveys). The data are collected from Twitter, because of its high usage in Chile and in the sentiment analysis literature. We obtained tweets associated with the three main candidates: Sebastián Piñera (SP), Alejandro Guillier (AG) and Beatriz Sánchez (BS). For each candidate, we estimated the voting intention and compared it to the traditional methods. To do this, we first acquired the data and labeled the tweets as positive or negative. Afterward, we built a model using machine learning techniques. The classification model had an accuracy of 76.45% using support vector machines, which yielded the best model for our case. Finally, we use a formula to estimate the voting intention from the number of positive and negative tweets for each candidate. For the last period, we obtained a voting intention of 35.84% for SP, compared to a range of 34–44% according to traditional polls and 36% in the actual elections. For AG we obtained an estimate of 37%, compared with a range of 15.40% to 30.00% for traditional polls and 20.27% in the elections. For BS we obtained an estimate of 27.77%, compared with the range of 8.50% to 11.00% given by traditional polls and an actual result of 22.70% in the elections. These results are promising, in some cases providing an estimate closer to reality than traditional polls. Some differences can be explained due to the fact that some candidates have been omitted, even though they held a significant number of votes.

Download Full-text

An Exploratory Study on the Use of Machine Learning to Predict Student Academic Performance

International Journal of Knowledge-Based Organizations ◽

10.4018/ijkbo.2018100104 ◽

2018 ◽

Vol 8 (4) ◽

pp. 67-79 ◽

Cited By ~ 1

Author(s):

Patrick Kenekayoro

Keyword(s):

Higher Education ◽

Machine Learning ◽

Academic Performance ◽

Support Vector Machines ◽

Student Performance ◽

Higher Education Institutions ◽

Classification Model ◽

Support Vector ◽

Student Academic Performance ◽

Vector Machines

Optimal student performance is integral for successful higher education institutions. The consensus is that big data analytics can be used to identify ways for achieving better student academic performance. This article used support vector machines to predict future student performance in computing and mathematics disciplines based on past scores in computing, mathematics and statistics subjects. Past subjects passed by students were ranked with state of art feature selection techniques in an attempt to identify any connection between good performance in a particular discipline and past subject knowledge. Up to 80% classification accuracy was achieved with support vector machines, demonstrating that this method can be developed to produce recommender or guidance systems for students, however the classification model will still benefit from more training examples. The results from this research reemphasizes the possibility and benefits of using machine learning techniques to improve teaching and learning in higher education institutions.

Download Full-text

A comparison study: Support vector machines for binary classification in machine learning

2011 4th International Conference on Biomedical Engineering and Informatics (BMEI) ◽

10.1109/bmei.2011.6098517 ◽

2011 ◽

Cited By ~ 4

Author(s):

Wencai Zeng ◽

Jiong Jia ◽

Zhonglong Zheng ◽

Chenmao Xie ◽

Li Guo

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Binary Classification ◽

Support Vector ◽

Comparison Study ◽

Vector Machines ◽

Study Support

Download Full-text

A machine learning based method for classification of fractal features of forearm sEMG using Twin Support vector machines

2010 Annual International Conference of the IEEE Engineering in Medicine and Biology ◽

10.1109/iembs.2010.5627902 ◽

2010 ◽

Cited By ~ 12

Author(s):

S P Arjunan ◽

D K Kumar ◽

G R Naik

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Support Vector ◽

Twin Support Vector Machines ◽

Vector Machines

Download Full-text

Model selection for support vector machines: Advantages and disadvantages of the Machine Learning Theory

The 2010 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2010.5596450 ◽

2010 ◽

Cited By ~ 19

Author(s):

Davide Anguita ◽

Alessandro Ghio ◽

Noemi Greco ◽

Luca Oneto ◽

Sandro Ridella

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Model Selection ◽

Learning Theory ◽

Support Vector ◽

Advantages And Disadvantages ◽

Vector Machines ◽

Selection For

Download Full-text

Research on Parallel Support Vector Machine Based on Spark Big Data Platform

Scientific Programming ◽

10.1155/2021/7998417 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Yao Huimin

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Big Data ◽

Support Vector Machines ◽

Cross Validation ◽

Machine Learning Algorithms ◽

Support Vector ◽

Lambda Architecture ◽

Vector Machines ◽

Data Platform

With the development of cloud computing and distributed cluster technology, the concept of big data has been expanded and extended in terms of capacity and value, and machine learning technology has also received unprecedented attention in recent years. Traditional machine learning algorithms cannot solve the problem of effective parallelization, so a parallelization support vector machine based on Spark big data platform is proposed. Firstly, the big data platform is designed with Lambda architecture, which is divided into three layers: Batch Layer, Serving Layer, and Speed Layer. Secondly, in order to improve the training efficiency of support vector machines on large-scale data, when merging two support vector machines, the “special points” other than support vectors are considered, that is, the points where the nonsupport vectors in one subset violate the training results of the other subset, and a cross-validation merging algorithm is proposed. Then, a parallelized support vector machine based on cross-validation is proposed, and the parallelization process of the support vector machine is realized on the Spark platform. Finally, experiments on different datasets verify the effectiveness and stability of the proposed method. Experimental results show that the proposed parallelized support vector machine has outstanding performance in speed-up ratio, training time, and prediction accuracy.

Download Full-text

Impacts of multicollinearity on CAPT modalities: An heterogeneous machine learning framework for computer-assisted French phoneme pronunciation training

PLoS ONE ◽

10.1371/journal.pone.0257901 ◽

2021 ◽

Vol 16 (10) ◽

pp. e0257901

Author(s):

Yanjing Bi ◽

Chao Li ◽

Yannick Benezeth ◽

Fan Yang

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Support Vector Machines ◽

Partial Least Square ◽

Least Square ◽

Support Vector ◽

Computer Assisted ◽

Long Distance ◽

Relationship Analysis ◽

Vector Machines

Phoneme pronunciations are usually considered as basic skills for learning a foreign language. Practicing the pronunciations in a computer-assisted way is helpful in a self-directed or long-distance learning environment. Recent researches indicate that machine learning is a promising method to build high-performance computer-assisted pronunciation training modalities. Many data-driven classifying models, such as support vector machines, back-propagation networks, deep neural networks and convolutional neural networks, are increasingly widely used for it. Yet, the acoustic waveforms of phoneme are essentially modulated from the base vibrations of vocal cords, and this fact somehow makes the predictors collinear, distorting the classifying models. A commonly-used solution to address this issue is to suppressing the collinearity of predictors via partial least square regressing algorithm. It allows to obtain high-quality predictor weighting results via predictor relationship analysis. However, as a linear regressor, the classifiers of this type possess very simple topology structures, constraining the universality of the regressors. For this issue, this paper presents an heterogeneous phoneme recognition framework which can further benefit the phoneme pronunciation diagnostic tasks by combining the partial least square with support vector machines. A French phoneme data set containing 4830 samples is established for the evaluation experiments. The experiments of this paper demonstrates that the new method improves the accuracy performance of the phoneme classifiers by 0.21 − 8.47% comparing to state-of-the-arts with different data training data density.

Download Full-text

Classification of the Priority of Auditing XBRL Instance Documents with Fuzzy Support Vector Machines Algorithm

Journal of Autonomous Intelligence ◽

10.32629/jai.v2i2.40 ◽

2019 ◽

Vol 2 (2) ◽

Author(s):

Guang-Yih Sheu

Keyword(s):

Support Vector Machines ◽

Financial Distress ◽

Fuzzy Variable ◽

Financial Ratios ◽

Misclassification Rate ◽

Benford’S Law ◽

Support Vector ◽

Vector Machines ◽

Benford's Law ◽

Fuzzy Support Vector Machines

Concluding the conformity of XBRL (eXtensible Business Reporting Language) instance documents law to the Benford's law yields apparently different results before and after a company's financial distress. These results bring an idea of finding fraudulent documents from the inspection of financial ratios since the unacceptable conformity implies a large likelihood of a fraudulent document. Fuzzy support vector machines models are developed to implement such an idea. The dependent variable is a fuzzy variable quantifying the conformity of an XBRL instance document to the Benford's law; whereas, independent variables are financial ratios. Nevertheless, insufficient data are available to define any membership function for describing the fuzziness in independent and dependent variables, but the interval factor method is introduced to express that fuzziness. Using the resulting fuzzy support vector machines model, it is suggested that the price-to-book ratio versus equity ratio may be used to classify the priority of auditing XBRL instance documents. The misclassification rate is less than 30 \%. In conclusion, a new and promising application of fuzzy support vector machines algorithm has been found in this study.

Download Full-text