Reservoir Drought Prediction Using Support Vector Machines

In Taiwan, even though the average annual rainfall is up to 2500 mm, water shortage during the dry season happens sometimes. Especially in recent years, water shortage has seriously affected the agriculture, industry, commerce, and even the essential daily water use. Under the threat of climate change in the future, efficient use of water resources becomes even more challenging. For a comparative study, support vector machine (SVM) and other three models (artificial neural networks, maximum likelihood classifier, Bayesian classifier) were established to predict reservoir drought status in next 10-90 days in Tsengwen Reservoir. (The ten-days time interval was applied in this study as it is the conventional time unit for reservoir operation.) Four features (which are easily obtainable in most reservoir offices), including reservoir storage capacity, inflows, critical limit of operation rule curves, and the number of ten-days in a year, were used as input data to predict drought. The records of years from 1975 to 1999 were selected as training data, and those of years from 2000 to 2010 were selected as testing data. The empirical results showed that SVM outperforms the other three approaches for drought prediction. Unsurprisingly the longer the prediction time period is, the lower the prediction accuracy is. However, the accuracy of predicting next 50 days is about 85% both in training and testing data set by SVM. As a result, we believe that the SVM model has high potential for predicting reservoir drought due to its high prediction accuracy and simple input data.

Download Full-text

Reservoir Drought Prediction Using Two-Stage SVM

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.284-287.1473 ◽

2013 ◽

Vol 284-287 ◽

pp. 1473-1477 ◽

Cited By ~ 4

Author(s):

Jie Lun Chiang ◽

Yu Shiue Tsai

Keyword(s):

Prediction Accuracy ◽

Input Data ◽

Support Vector ◽

Severe Drought ◽

Two Stage ◽

Critical Limit ◽

Data Set ◽

Drought Prediction ◽

Time Period ◽

Operation Rule

The support vector machine (SVM) has been applied to drought prediction and it typically yields good performance on overall accuracy. However, the prediction accuracy of the drought category is much lower than that of the non-drought and severe drought categories. In this study, a two-stage approach was used to improve the SVM to increase the drought prediction accuracy. Four features, (1) reservoir storage, (2) inflows, (3) critical limit of operation rule curves, and (4) the Nth ten-day in a year, were used as input data to predict reservoir drought. We used these features as input data because they are the most commonly kept records in all reservoir offices. Empirical results show that the two-stage SVM outperforms the original SVM and the three other approaches (artificial neural networks, maximum likelihood classifier, Bayes classifier) for drought prediction. Not surprisingly, the longer the prediction time period, the lower the prediction accuracy is. However, the accuracy of predicting conditions within the next 50 days was approximately 85% both in training and testing data set by the two-stage SVM. Drought prediction provides information for reservoir operation and decision making in terms of water allocation and water quality issues. The result shows the benefit of a two-stage approach of SVM for drought prediction, as the accuracy of drought prediction increased quite substantially.

Download Full-text

Automatic Task Classification via Support Vector Machine and Crowdsourcing

Mobile Information Systems ◽

10.1155/2018/6920679 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9 ◽

Cited By ~ 3

Author(s):

Hyungsik Shin ◽

Jeongyeup Paek

Keyword(s):

Support Vector Machine ◽

Mobile Devices ◽

Prediction Accuracy ◽

Training Data ◽

Amazon Mechanical Turk ◽

Support Vector ◽

Data Set ◽

English Sentence ◽

Task Classification ◽

Personal Assistant

Automatic task classification is a core part of personal assistant systems that are widely used in mobile devices such as smartphones and tablets. Even though many industry leaders are providing their own personal assistant services, their proprietary internals and implementations are not well known to the public. In this work, we show through real implementation and evaluation that automatic task classification can be implemented for mobile devices by using the support vector machine algorithm and crowdsourcing. To train our task classifier, we collected our training data set via crowdsourcing using the Amazon Mechanical Turk platform. Our classifier can classify a short English sentence into one of the thirty-two predefined tasks that are frequently requested while using personal mobile devices. Evaluation results show high prediction accuracy of our classifier ranging from 82% to 99%. By using large amount of crowdsourced data, we also illustrate the relationship between training data size and the prediction accuracy of our task classifier.

Download Full-text

Predicting Humphrey 10-2 visual field from 24-2 visual field in eyes with advanced glaucoma

British Journal of Ophthalmology ◽

10.1136/bjophthalmol-2019-314170 ◽

2019 ◽

Vol 104 (5) ◽

pp. 642-647

Author(s):

Kenji Sugisaki ◽

Ryo Asaoka ◽

Toshihiro Inoue ◽

Keiji Yoshikawa ◽

Akiyasu Kanamori ◽

...

Keyword(s):

Visual Field ◽

Standard Test ◽

Training Data ◽

Support Vector ◽

Mean Deviation ◽

Data Set ◽

Error Range ◽

Testing Data ◽

Advanced Glaucoma ◽

Test Points

AimsTo predict Humphrey Field Analyzer Central 10-2 Swedish Interactive Threshold Algorithm-Standard test (HFA 10-2) results (Carl Zeiss Meditec, San Leandro, CA) from HFA 24-2 results of the same eyes with advanced glaucoma.MethodsTraining and testing HFA 24-2 and 10-2 data sets, respectively, consisted of 175 eyes (175 patients) and 44 eyes (44 patients) with open advanced glaucoma (mean deviation of HFA 24-2 ≤−20 dB). Using the training data set, the 68 total deviation (TD) values of the HFA 10-2 test points were predicted from those of the innermost 16 HFA 24-2 test points in the same eye, using image processing or various machine learning methods including bilinear interpolation (IP) as a standard for comparison. The absolute prediction error (PredError) was calculated by applying each method to the testing data set.ResultsThe mean (SD) test–retest variability of the HFA 10-2 results in the testing data set was 2.1±1.0 dB, while the IP method yielded a PredError of 5.0±1.7 dB. Among the methods tested, support vector regression (SVR) provided a smallest PredError (4.0±1.5 dB). SVR predicted retinal sensitivity at HFA 10-2 test points in the preserved ‘central isle’ of advanced glaucoma from HFA 24-2 results of the same eye within an error range of about 25%, while error range was approximately twice of the test–retest variability.ConclusionApplying SVR to HFA 24-2 results allowed us to predict TD values at HFA 10-2 test points of the same eye with advanced glaucoma with an error range of about 25%.

Download Full-text

An Analog Circuit Fault Diagnosis Approach Based on Wavelet-based fractal analysis and Multiple Kernel SVM

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666201207154641 ◽

2020 ◽

Vol 13 ◽

Author(s):

Jianfeng Jiang

Keyword(s):

Fault Diagnosis ◽

Fractal Analysis ◽

Analog Circuit ◽

Training Data ◽

Support Vector ◽

Pass Filter ◽

Multiple Kernel ◽

Testing Data ◽

Circuit Fault Diagnosis ◽

Diagnosis Approach

Objective: In order to diagnose the analog circuit fault correctly, an analog circuit fault diagnosis approach on basis of wavelet-based fractal analysis and multiple kernel support vector machine (MKSVM) is presented in the paper. Methods: Time responses of the circuit under different faults are measured, and then wavelet-based fractal analysis is used to process the collected time responses for the purpose of generating features for the signals. Kernel principal component analysis (KPCA) is applied to reduce the features’ dimensionality. Afterwards, features are divided into training data and testing data. MKSVM with its multiple parameters optimized by chaos particle swarm optimization (CPSO) algorithm is utilized to construct an analog circuit fault diagnosis model based on the testing data. Results: The proposed analog diagnosis approach is revealed by a four opamp biquad high-pass filter fault diagnosis simulation. Conclusion: The approach outperforms other commonly used methods in the comparisons.

Download Full-text

Optimal breeding-value prediction using a Sparse Selection Index

Genetics ◽

10.1093/genetics/iyab030 ◽

2021 ◽

Author(s):

Marco Lopez-Cruz ◽

Gustavo de los Campos

Keyword(s):

Sample Size ◽

Dna Sequences ◽

Genomic Prediction ◽

Prediction Accuracy ◽

Regularization Parameter ◽

Selection Index ◽

Prediction Method ◽

Training Data ◽

Breeding Value ◽

Data Set

Abstract Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and in linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a Sparse Selection Index (SSI) that integrates Selection Index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-BLUP (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in ten different environments) that the SSI can achieve significant (anywhere between 5-10%) gains in prediction accuracy relative to the G-BLUP.

Download Full-text

A sentiment analysis system for social media using machine learning techniques: Social enablement

Digital Scholarship in the Humanities ◽

10.1093/llc/fqy037 ◽

2018 ◽

Vol 34 (3) ◽

pp. 569-581 ◽

Cited By ~ 1

Author(s):

Sujata Rani ◽

Parteek Kumar

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Media Analysis ◽

Training Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Analysis Tool ◽

Data Set ◽

Learning Techniques

Abstract In this article, an innovative approach to perform the sentiment analysis (SA) has been presented. The proposed system handles the issues of Romanized or abbreviated text and spelling variations in the text to perform the sentiment analysis. The training data set of 3,000 movie reviews and tweets has been manually labeled by native speakers of Hindi in three classes, i.e. positive, negative, and neutral. The system uses WEKA (Waikato Environment for Knowledge Analysis) tool to convert these string data into numerical matrices and applies three machine learning techniques, i.e. Naive Bayes (NB), J48, and support vector machine (SVM). The proposed system has been tested on 100 movie reviews and tweets, and it has been observed that SVM has performed best in comparison to other classifiers, and it has an accuracy of 68% for movie reviews and 82% in case of tweets. The results of the proposed system are very promising and can be used in emerging applications like SA of product reviews and social media analysis. Additionally, the proposed system can be used in other cultural/social benefits like predicting/fighting human riots.

Download Full-text

A max-margin training of RNA secondary structure prediction integrated with the thermodynamic model

10.1101/205047 ◽

2017 ◽

Cited By ~ 1

Author(s):

Manato Akiyama ◽

Kengo Sato ◽

Yasubumi Sakakibara

Keyword(s):

Machine Learning ◽

Secondary Structure ◽

Structure Prediction ◽

Rna Secondary Structure ◽

Prediction Accuracy ◽

Secondary Structure Prediction ◽

Training Data ◽

Support Vector ◽

Rna Secondary Structure Prediction ◽

Fine Grained

AbstractMotivation: A popular approach for predicting RNA secondary structure is the thermodynamic nearest neighbor model that finds a thermodynamically most stable secondary structure with the minimum free energy (MFE). For further improvement, an alternative approach that is based on machine learning techniques has been developed. The machine learning based approach can employ a fine-grained model that includes much richer feature representations with the ability to fit the training data. Although a machine learning based fine-grained model achieved extremely high performance in prediction accuracy, a possibility of the risk of overfitting for such model has been reported.Results: In this paper, we propose a novel algorithm for RNA secondary structure prediction that integrates the thermodynamic approach and the machine learning based weighted approach. Ourfine-grained model combines the experimentally determined thermodynamic parameters with a large number of scoring parameters for detailed contexts of features that are trained by the structured support vector machine (SSVM) with the ℓ1 regularization to avoid overfitting. Our benchmark shows that our algorithm achieves the best prediction accuracy compared with existing methods, and heavy overfitting cannot be observed.Availability: The implementation of our algorithm is available at https://github.com/keio-bioinformatics/mxfold.Contact:[email protected]

Download Full-text

An Incremental Isomap Method for Hyperspectral Dimensionality Reduction and Classification

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.7.445 ◽

2021 ◽

Vol 87 (6) ◽

pp. 445-455

Author(s):

Yi Ma ◽

Zezhong Zheng ◽

Yutang Ma ◽

Mingcang Zhu ◽

Ran Huang ◽

...

Keyword(s):

Manifold Learning ◽

Nearest Neighbor ◽

Hyperspectral Image ◽

Hyperspectral Data ◽

Training Data ◽

Support Vector ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Data Points

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.

Download Full-text

Exploiting Rules to Enhance Machine Learning in Extracting Information From Multi-Institutional Prostate Pathology Reports

JCO Clinical Cancer Informatics ◽

10.1200/cci.20.00028 ◽

2020 ◽

pp. 865-874

Author(s):

Enrico Santus ◽

Tal Schuster ◽

Amir M. Tahmasebi ◽

Clara Li ◽

Adam Yala ◽

...

Keyword(s):

Machine Learning ◽

Hybrid Systems ◽

High Performance ◽

Feature Model ◽

Training Data ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Extreme Gradient Boosting ◽

Pathology Reports

PURPOSE Literature on clinical note mining has highlighted the superiority of machine learning (ML) over hand-crafted rules. Nevertheless, most studies assume the availability of large training sets, which is rarely the case. For this reason, in the clinical setting, rules are still common. We suggest 2 methods to leverage the knowledge encoded in pre-existing rules to inform ML decisions and obtain high performance, even with scarce annotations. METHODS We collected 501 prostate pathology reports from 6 American hospitals. Reports were split into 2,711 core segments, annotated with 20 attributes describing the histology, grade, extension, and location of tumors. The data set was split by institutions to generate a cross-institutional evaluation setting. We assessed 4 systems, namely a rule-based approach, an ML model, and 2 hybrid systems integrating the previous methods: a Rule as Feature model and a Classifier Confidence model. Several ML algorithms were tested, including logistic regression (LR), support vector machine (SVM), and eXtreme gradient boosting (XGB). RESULTS When training on data from a single institution, LR lags behind the rules by 3.5% (F1 score: 92.2% v 95.7%). Hybrid models, instead, obtain competitive results, with Classifier Confidence outperforming the rules by +0.5% (96.2%). When a larger amount of data from multiple institutions is used, LR improves by +1.5% over the rules (97.2%), whereas hybrid systems obtain +2.2% for Rule as Feature (97.7%) and +2.6% for Classifier Confidence (98.3%). Replacing LR with SVM or XGB yielded similar performance gains. CONCLUSION We developed methods to use pre-existing handcrafted rules to inform ML algorithms. These hybrid systems obtain better performance than either rules or ML models alone, even when training data are limited.

Download Full-text

On Realistically Attacking Tor with Website Fingerprinting

Proceedings on Privacy Enhancing Technologies ◽

10.1515/popets-2016-0027 ◽

2016 ◽

Vol 2016 (4) ◽

pp. 21-36 ◽

Cited By ~ 25

Author(s):

Tao Wang ◽

Ian Goldberg

Keyword(s):

Background Noise ◽

Laboratory Tests ◽

Training Data ◽

Web Traffic ◽

Training Set ◽

Data Set ◽

Laboratory Conditions ◽

Testing Data ◽

In The Wild ◽

New Algorithms

Abstract Website fingerprinting allows a local, passive observer monitoring a web-browsing client’s encrypted channel to determine her web activity. Previous attacks have shown that website fingerprinting could be a threat to anonymity networks such as Tor under laboratory conditions. However, there are significant differences between laboratory conditions and realistic conditions. First, in laboratory tests we collect the training data set together with the testing data set, so the training data set is fresh, but an attacker may not be able to maintain a fresh data set. Second, laboratory packet sequences correspond to a single page each, but for realistic packet sequences the split between pages is not obvious. Third, packet sequences may include background noise from other types of web traffic. These differences adversely affect website fingerprinting under realistic conditions. In this paper, we tackle these three problems to bridge the gap between laboratory and realistic conditions for website fingerprinting. We show that we can maintain a fresh training set with minimal resources. We demonstrate several classification-based techniques that allow us to split full packet sequences effectively into sequences corresponding to a single page each. We describe several new algorithms for tackling background noise. With our techniques, we are able to build the first website fingerprinting system that can operate directly on packet sequences collected in the wild.

Download Full-text