Reservoir Drought Prediction Using Two-Stage SVM

The support vector machine (SVM) has been applied to drought prediction and it typically yields good performance on overall accuracy. However, the prediction accuracy of the drought category is much lower than that of the non-drought and severe drought categories. In this study, a two-stage approach was used to improve the SVM to increase the drought prediction accuracy. Four features, (1) reservoir storage, (2) inflows, (3) critical limit of operation rule curves, and (4) the Nth ten-day in a year, were used as input data to predict reservoir drought. We used these features as input data because they are the most commonly kept records in all reservoir offices. Empirical results show that the two-stage SVM outperforms the original SVM and the three other approaches (artificial neural networks, maximum likelihood classifier, Bayes classifier) for drought prediction. Not surprisingly, the longer the prediction time period, the lower the prediction accuracy is. However, the accuracy of predicting conditions within the next 50 days was approximately 85% both in training and testing data set by the two-stage SVM. Drought prediction provides information for reservoir operation and decision making in terms of water allocation and water quality issues. The result shows the benefit of a two-stage approach of SVM for drought prediction, as the accuracy of drought prediction increased quite substantially.

Download Full-text

Reservoir Drought Prediction Using Support Vector Machines

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.145.455 ◽

2011 ◽

Vol 145 ◽

pp. 455-459 ◽

Cited By ~ 6

Author(s):

Jie Lun Chiang ◽

Yu Shiue Tsai

Keyword(s):

Prediction Accuracy ◽

Input Data ◽

Water Shortage ◽

Training Data ◽

Annual Rainfall ◽

Support Vector ◽

Time Interval ◽

Data Set ◽

Drought Prediction ◽

Testing Data

In Taiwan, even though the average annual rainfall is up to 2500 mm, water shortage during the dry season happens sometimes. Especially in recent years, water shortage has seriously affected the agriculture, industry, commerce, and even the essential daily water use. Under the threat of climate change in the future, efficient use of water resources becomes even more challenging. For a comparative study, support vector machine (SVM) and other three models (artificial neural networks, maximum likelihood classifier, Bayesian classifier) were established to predict reservoir drought status in next 10-90 days in Tsengwen Reservoir. (The ten-days time interval was applied in this study as it is the conventional time unit for reservoir operation.) Four features (which are easily obtainable in most reservoir offices), including reservoir storage capacity, inflows, critical limit of operation rule curves, and the number of ten-days in a year, were used as input data to predict drought. The records of years from 1975 to 1999 were selected as training data, and those of years from 2000 to 2010 were selected as testing data. The empirical results showed that SVM outperforms the other three approaches for drought prediction. Unsurprisingly the longer the prediction time period is, the lower the prediction accuracy is. However, the accuracy of predicting next 50 days is about 85% both in training and testing data set by SVM. As a result, we believe that the SVM model has high potential for predicting reservoir drought due to its high prediction accuracy and simple input data.

Download Full-text

A Data-Driven Two-Stage Prediction Model for Train Primary-Delay Recovery Time

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194020400124 ◽

2020 ◽

Vol 30 (07) ◽

pp. 921-940

Author(s):

Bowen Gao ◽

Dongxiu Ou ◽

Decun Dong ◽

Yusen Wu

Keyword(s):

Prediction Model ◽

Knowledge Base ◽

Recovery Time ◽

Prediction Accuracy ◽

Gradient Boosting ◽

Support Vector ◽

Model Framework ◽

Two Stage ◽

Delay Propagation ◽

Proposed Model

Accurate prediction of train delay recovery is critical for railway incident management and providing passengers with accurate journey time. In this paper, a two-stage prediction model is proposed to predict the recovery time of train primary-delay based on the real records from High-Speed Railway (HSR). In Stage 1, two models are built to study the influence of feature space and model framework on the prediction accuracy of buffer time in each section or station. It is found that explicitly inputting the attribute features of stations and sections to the model, instead of implicit simulation, will improve the prediction accuracy effectively. For validation purpose, the proposed model has been compared with several alternative models, namely, Logistic Regression (LR), Artificial Neutral Network (ANN), Support Vector Machine (SVM) and Gradient Boosting Tree (GBT). The results show that its remarkable performance is better than other schemes. Specifically, when the error is extended to 3[Formula: see text]min, the proposed model can achieve up to the accuracy of 94.63%. It proves that our method has high value in practical engineering application. Considering the delay propagation of trains is a complex process, our future study will focus on building delay propagation knowledge base and dispatcher experience knowledge base.

Download Full-text

Automatic Task Classification via Support Vector Machine and Crowdsourcing

Mobile Information Systems ◽

10.1155/2018/6920679 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9 ◽

Cited By ~ 3

Author(s):

Hyungsik Shin ◽

Jeongyeup Paek

Keyword(s):

Support Vector Machine ◽

Mobile Devices ◽

Prediction Accuracy ◽

Training Data ◽

Amazon Mechanical Turk ◽

Support Vector ◽

Data Set ◽

English Sentence ◽

Task Classification ◽

Personal Assistant

Automatic task classification is a core part of personal assistant systems that are widely used in mobile devices such as smartphones and tablets. Even though many industry leaders are providing their own personal assistant services, their proprietary internals and implementations are not well known to the public. In this work, we show through real implementation and evaluation that automatic task classification can be implemented for mobile devices by using the support vector machine algorithm and crowdsourcing. To train our task classifier, we collected our training data set via crowdsourcing using the Amazon Mechanical Turk platform. Our classifier can classify a short English sentence into one of the thirty-two predefined tasks that are frequently requested while using personal mobile devices. Evaluation results show high prediction accuracy of our classifier ranging from 82% to 99%. By using large amount of crowdsourced data, we also illustrate the relationship between training data size and the prediction accuracy of our task classifier.

Download Full-text

Learning Dynamic Factors to Improve the Accuracy of Bus Arrival Time Prediction via a Recurrent Neural Network

Future Internet ◽

10.3390/fi11120247 ◽

2019 ◽

Vol 11 (12) ◽

pp. 247

Author(s):

Xin Zhou ◽

Peixin Dong ◽

Jianping Xing ◽

Peijia Sun

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Public Transportation ◽

Prediction Accuracy ◽

Arrival Time ◽

Attention Mechanism ◽

Experimental Results ◽

Support Vector ◽

Data Set ◽

Dynamic Factors

Accurate prediction of bus arrival times is a challenging problem in the public transportation field. Previous studies have shown that to improve prediction accuracy, more heterogeneous measurements provide better results. So what other factors should be added into the prediction model? Traditional prediction methods mainly use the arrival time and the distance between stations, but do not make full use of dynamic factors such as passenger number, dwell time, bus driving efficiency, etc. We propose a novel approach that takes full advantage of dynamic factors. Our approach is based on a Recurrent Neural Network (RNN). The experimental results indicate that a variety of prediction algorithms (such as Support Vector Machine, Kalman filter, Multilayer Perceptron, and RNN) have significantly improved performance after using dynamic factors. Further, we introduce RNN with an attention mechanism to adaptively select the most relevant input factors. Experiments demonstrate that the prediction accuracy of RNN with an attention mechanism is better than RNN with no attention mechanism when there are heterogeneous input factors. The experimental results show the superior performances of our approach on the data set provided by Jinan Public Transportation Corporation.

Download Full-text

The use of machine learning in rare diseases: a scoping review

Orphanet Journal of Rare Diseases ◽

10.1186/s13023-020-01424-6 ◽

2020 ◽

Vol 15 (1) ◽

Cited By ~ 1

Author(s):

Julia Schaefer ◽

Moritz Lehne ◽

Josef Schepers ◽

Fabian Prasser ◽

Sylvia Thun

Keyword(s):

Machine Learning ◽

Rare Disease ◽

Rare Diseases ◽

Scoping Review ◽

Input Data ◽

Skin Diseases ◽

Demographic Data ◽

Support Vector ◽

Research Activity ◽

Data Set

Abstract Background Emerging machine learning technologies are beginning to transform medicine and healthcare and could also improve the diagnosis and treatment of rare diseases. Currently, there are no systematic reviews that investigate, from a general perspective, how machine learning is used in a rare disease context. This scoping review aims to address this gap and explores the use of machine learning in rare diseases, investigating, for example, in which rare diseases machine learning is applied, which types of algorithms and input data are used or which medical applications (e.g., diagnosis, prognosis or treatment) are studied. Methods Using a complex search string including generic search terms and 381 individual disease names, studies from the past 10 years (2010–2019) that applied machine learning in a rare disease context were identified on PubMed. To systematically map the research activity, eligible studies were categorized along different dimensions (e.g., rare disease group, type of algorithm, input data), and the number of studies within these categories was analyzed. Results Two hundred eleven studies from 32 countries investigating 74 different rare diseases were identified. Diseases with a higher prevalence appeared more often in the studies than diseases with a lower prevalence. Moreover, some rare disease groups were investigated more frequently than to be expected (e.g., rare neurologic diseases and rare systemic or rheumatologic diseases), others less frequently (e.g., rare inborn errors of metabolism and rare skin diseases). Ensemble methods (36.0%), support vector machines (32.2%) and artificial neural networks (31.8%) were the algorithms most commonly applied in the studies. Only a small proportion of studies evaluated their algorithms on an external data set (11.8%) or against a human expert (2.4%). As input data, images (32.2%), demographic data (27.0%) and “omics” data (26.5%) were used most frequently. Most studies used machine learning for diagnosis (40.8%) or prognosis (38.4%) whereas studies aiming to improve treatment were relatively scarce (4.7%). Patient numbers in the studies were small, typically ranging from 20 to 99 (35.5%). Conclusion Our review provides an overview of the use of machine learning in rare diseases. Mapping the current research activity, it can guide future work and help to facilitate the successful application of machine learning in rare diseases.

Download Full-text

KOMPARASI MODEL SUPPORT VECTOR MACHINES (SVM) DAN NEURAL NETWORK UNTUK MENGETAHUI TINGKAT AKURASI PREDIKSI TERTINGGI HARGA SAHAM

Jurnal Informatika Upgris ◽

10.26877/jiu.v3i1.1536 ◽

2017 ◽

Vol 3 (1) ◽

Author(s):

R. Hadapiningradja Kusumodestoni ◽

Sarwido Sarwido

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Stock Prices ◽

Prediction Accuracy ◽

High Accuracy ◽

Stock Index ◽

Support Vector ◽

Accuracy Rate ◽

Trend Prediction ◽

Data Set

There are many types of investments to make money, one of which is in the form of shares. Shares is a trading company dealing with securities in the global capital markets. Stock Exchange or also called stock market is actually the activities of private companies in the form of buying and selling investments. To avoid losses in investing, we need a model of predictive analysis with high accuracy and supported by data - lots of data and accurately. The correct techniques in the analysis will be able to reduce the risk for investors in investing. There are many models used in the analysis of stock price movement prediction, in this study the researchers used models of neural networks (NN) and a model of support vector machine (SVM). Based on the background of the problems that have been mentioned in the previous description it can be formulated the problem as follows: need an algorithm that can predict stock prices, and need a high accuracy rate by adding a data set on the prediction, two algorithms will be investigated expected results last researchers can deduce where the algorithm accuracy rate predictions are the highest or accurate, then the purpose of this study was to mengkomparasi or compare between the two algorithms are algorithms Neural Network algorithm and Support Vector Machine which later on the end result has an accuracy rate forecast stock prices highest to see the error value RMSEnya. After doing research using the model of neural network and model of support vector machine (SVM) to predict the stock using the data value of the shares on the stock index hongkong dated July 20, 2016 at 16:26 pm until the date of 15 September 2016 at 17:40 pm as many as 729 data sets within an interval of 5 minute through a process of training, learning, and then continue the process of testing so the result is that by using a neural network model of the prediction accuracy of 0.503 +/- 0.009 (micro 503) while using the model of support vector machine (SVM) accuracy of the predictions for 0477 + / - 0.008 (micro: 0477) so that after a comparison can be concluded that the neural network models have trend prediction accuracy higher than the model of support vector machine (SVM).

Download Full-text

Application of a Hybrid ARIMA-LSTM Model Based on The SPEI For Drought Forecasting

10.21203/rs.3.rs-301080/v1 ◽

2021 ◽

Author(s):

Dehe Xu ◽

Qi Zhang ◽

Yan Ding ◽

De Zhang

Keyword(s):

Lead Time ◽

Prediction Accuracy ◽

Prediction Models ◽

Arima Model ◽

Least Square ◽

Support Vector ◽

Drought Forecasting ◽

Short Term ◽

Model Based ◽

Drought Prediction

Abstract Drought forecasting can effectively reduce the risk of drought. We proposed a hybrid model based on deep learning methods that integrates an autoregressive integrated moving average (ARIMA) model and a long short-term memory (LSTM) model to improve the accuracy of short-term drought prediction. Taking China as an example, this paper compares and analyzes the prediction accuracy of six drought prediction models, ARIMA, support vector regression (SVR), LSTM, ARIMA-SVR, least square-SVR (LS-SVR) and ARIMA-LSTM, for SPEI. The performance of all the models was compared using measures of persistence, such as the Nash-Sutcliffe efficiency (NSE) and so on. The results show that all three hybrid models (ARIMA-SVR, LS-SVR and ARIMA-LSTM) had higher prediction accuracy than the single model. (ARIMA, SVR and LSTM), for a given lead time, at different scales. the NSEs of the hybrid ARIMA-SVR, LS-SVR and ARIMA-LSTM models for the predicted SPEI1 are 0.043,0.168 and 0.368, respectively, and the NSEs of SPEI24 is 0.781, 0.543 and 0.93, respectively. This finding indicates that when the lead time remains unchanged, the prediction accuracy of the hybrid ARIMA-SVR, LS-SVR and ARIMA-LSTM models for the SPEI at various scales is gradually improved with increasing time scale, and the prediction accuracy of the model with a one-month lead time is higher than that of the model with a two-month lead time. In addition, the ARIMA-LSTM model has the highest prediction accuracy at the 6-, 12-, and 24-month scales, indicating that the model is more suitable for the forecasting of long-term drought in China.

Download Full-text

Multienergy Load Forecasting for Regional Integrated Energy Systems Considering Multienergy Coupling of Variation Characteristic Curves

Frontiers in Energy Research ◽

10.3389/fenrg.2021.635234 ◽

2021 ◽

Vol 9 ◽

Author(s):

Shouxiang Wang ◽

Kaixin Wu ◽

Qianyu Zhao ◽

Shaomin Wang ◽

Liang Feng ◽

...

Keyword(s):

Prediction Accuracy ◽

Load Forecasting ◽

Energy Systems ◽

Least Square ◽

Support Vector ◽

Data Set ◽

Characteristic Curves ◽

Integrated Energy Systems ◽

Variation Characteristics ◽

Trend Curve

Multienergy load forecasting (MELF) is the premise of regional integrated energy systems (RIES) production planning and energy dispatch. The key of MELF is the consideration of multienergy coupling and the improvement of prediction accuracy. Therefore, a MELF method considering the multienergy coupling of variation characteristic curves (MELF_MECVCC) for RIES is proposed. The novelty of MELF_MECVCC lies in the following three aspects. 1) For the trend stripping and volatility extraction of multienergy load time series, the extreme-point symmetric mode decomposition-sample entropy (ESMD-SE) method is introduced to decompose and reconstruct the variation characteristic curves of multienergy, including trend curve and fluctuation curve. 2) The multienergy coupling of the variation characteristic curves is considered to reflect the variation characteristics of the multienergy loads. 3) Different methods are applied according to different variation characteristics; i.e., the combined method based on multitask learning and long short-term memory network (MTL-LSTM) is applied to predict the trend curve with strong correlation and the least square support vector regression (LSSVR) method is applied to predict the fluctuation curve with strong volatility and high complexity. Based on the actual data set of the University of Texas in Austin, the MELF_MECVCC model is simulated and verified, which shows that the model reduces the mean absolute percentage error (MAPE) and the root mean square error (RMSE) and fits better with the original load and has higher prediction accuracy, compared with current advanced algorithms.

Download Full-text

Two-Stage Automobile Insurance Fraud Detection by Using Optimized Fuzzy C-Means Clustering and Supervised Learning

International Journal of Information Security and Privacy ◽

10.4018/ijisp.2020070102 ◽

2020 ◽

Vol 14 (3) ◽

pp. 18-37 ◽

Cited By ~ 1

Author(s):

Sharmila Subudhi ◽

Suvasini Panigrahi

Keyword(s):

Fraud Detection ◽

Support Vector ◽

Group Method ◽

Final Decision ◽

Automobile Insurance ◽

Insurance Fraud ◽

Two Stage ◽

Data Set ◽

Fuzzy C Means ◽

Fuzzy C Means Clustering

A novel two-stage automobile insurance fraud detection system is proposed that initially extracts a test set from the original imbalanced insurance dataset. A genetic algorithm based optimized fuzzy c-means clustering is then applied on the remaining data set for undersampling the majority samples by eliminating the outliers among them. Thereafter, the detection of the fraudulent claims occurs in two stages. In the first stage, each insurance record is passed to the clustering module that identifies the claim as genuine, malicious, or suspicious. The genuine and malicious samples are removed and only the suspicious instances are further scrutinized in the second stage by four trained supervised classifiers − Decision Tree, Support Vector Machine, Group Method for Data Handling and Multi-Layer Perceptron individually for final decision making. Extensive experiments and comparative analysis with another recent approach using a real-world automobile insurance dataset justifies the effectiveness of the proposed system.

Download Full-text

Qualitative Analysis for Improving Prediction Accuracy in Parkinson's Disease Detection Using Hybrid Technique

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.7738 ◽

2019 ◽

Vol 16 (2) ◽

pp. 393-399 ◽

Cited By ~ 1

Author(s):

S. Geeitha ◽

M. Thangamani

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Expectation Maximization ◽

Prediction Accuracy ◽

Gene Expression Pattern ◽

Support Vector ◽

Hybrid Technique ◽

Computational Tool ◽

Bayesian Network Model ◽

Data Set

A PSO based SVM method has been implemented in diagnosing Parkinson's disease. This hybrid method produces parameter optimization and it helps to predict the gene expression pattern of the patient affected from Parkinson's disease. Implementing a computational tool on the PD data set alleviates the symptoms to predict accurately the occurrence of the disease. In data classification, there may arise some incomplete or missing data during pre-processing in the probabilistic model. In order to overcome this, an Expectation Maximization (EM) algorithm is implemented. The proposed Particle Swarm Optimization (PSO) based Support Vector Machine (SVM) technique is also compared with the Bayesian network model that outperforms in prediction accuracy.

Download Full-text