FEATURE SELECTION IN VALIDATING MASS SPECTROMETRY DATABASE SEARCH RESULTS

Tandem mass spectrometry (MS/MS) combined with protein database searching has been widely used in protein identification. A validation procedure is generally required to reduce the number of false positives. Advanced tools using statistical and machine learning approaches may provide faster and more accurate validation than manual inspection and empirical filtering criteria. In this study, we use two feature selection algorithms based on random forest and support vector machine to identify peptide properties that can be used to improve validation models. We demonstrate that an improved model based on an optimized set of features reduces the number of false positives by 58% relative to the model which used only search engine scores, at the same sensitivity score of 0.8. In addition, we develop classification models based on the physicochemical properties and protein sequence environment of these peptides without using search engine scores. The performance of the best model based on the support vector machine algorithm is at 0.8 AUC, 0.78 accuracy, and 0.7 specificity, suggesting a reasonably accurate classification. The identified properties important to fragmentation and ionization can be either used in independent validation tools or incorporated into peptide sequencing and database search algorithms to improve existing software programs.

Download Full-text

A hybrid load forecasting model based on support vector machine with intelligent methods for feature selection and parameter optimization

Applied Energy ◽

10.1016/j.apenergy.2020.115332 ◽

2020 ◽

Vol 279 ◽

pp. 115332

Author(s):

Yeming Dai ◽

Pei Zhao

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Parameter Optimization ◽

Load Forecasting ◽

Support Vector ◽

Forecasting Model ◽

Model Based ◽

Intelligent Methods

Download Full-text

Proteomic Database Search Engine for Two-Dimensional Partial Covariance Mass Spectrometry

10.26434/chemrxiv.13623173 ◽

2021 ◽

Author(s):

Taran Driver ◽

Ruediger Pipkorn ◽

Leszek Frasinski ◽

Jon P. Marangos ◽

Marina Edelson-Averbukh ◽

...

Keyword(s):

Mass Spectrometry ◽

Search Engine ◽

Database Search ◽

Peptide Sequence ◽

Two Dimensional ◽

Protein Database ◽

Intact Proteins ◽

Individual Fragment ◽

Decomposition Processes ◽

Complementary Fragment

<div>We present a protein database search engine for the automatic identi?cation of peptide and protein sequences using the recently introduced method of two-dimensional partial covariance mass spectrometry (2D-PC-MS). Since 2D-PC-MS measurement reveals correlations between fragments stemming from the same or consecutive decomposition processes, the ?first-of-its-kind 2D-PC-MS search engine is based entirely on the direct matching of the pairs of theoretical and the experimentally detected correlating fragments, rather than of individual fragment signals or their series. We demonstrate that the high structural speci?city a?orded by 2D-PC-MS fragment correlations enables our search engine to reliably identify the correct peptide sequence, even from a spectrum with a large proportion of contaminant signals. While for peptides the 2D-PC-MS correlation matching procedure is based on complementary and internal ion correlations, the identi?cation of intact proteins is entirely based on the ability of 2D-PC-MS to spatially separate and resolve the experimental correlations between complementary fragment ions.</div>

Download Full-text

Credit Scoring Model based on Kernel Density Estimation and Support Vector Machine for Group Feature Selection

2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI) ◽

10.1109/icacci.2018.8554524 ◽

2018 ◽

Author(s):

Xingzhi Zhang ◽

Zhurong Zhou

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Density Estimation ◽

Kernel Density Estimation ◽

Credit Scoring ◽

Kernel Density ◽

Support Vector ◽

Scoring Model ◽

Model Based ◽

Credit Scoring Model

Download Full-text

A method for handling metabonomics data from liquid chromatography/mass spectrometry: combinational use of support vector machine recursive feature elimination, genetic algorithm and random forest for feature selection

Metabolomics ◽

10.1007/s11306-011-0274-7 ◽

2011 ◽

Vol 7 (4) ◽

pp. 549-558 ◽

Cited By ~ 40

Author(s):

Xiaohui Lin ◽

Quancai Wang ◽

Peiyuan Yin ◽

Liang Tang ◽

Yexiong Tan ◽

...

Keyword(s):

Mass Spectrometry ◽

Genetic Algorithm ◽

Support Vector Machine ◽

Feature Selection ◽

Liquid Chromatography ◽

Random Forest ◽

Recursive Feature Elimination ◽

Support Vector ◽

Liquid Chromatography Mass Spectrometry ◽

Chromatography Mass Spectrometry

Download Full-text

Proteomic Database Search Engine for Two-Dimensional Partial Covariance Mass Spectrometry

10.26434/chemrxiv.13623173.v1 ◽

2021 ◽

Author(s):

Taran Driver ◽

Ruediger Pipkorn ◽

Leszek Frasinski ◽

Jon P. Marangos ◽

Marina Edelson-Averbukh ◽

...

Keyword(s):

Mass Spectrometry ◽

Search Engine ◽

Database Search ◽

Peptide Sequence ◽

Two Dimensional ◽

Protein Database ◽

Intact Proteins ◽

Individual Fragment ◽

Decomposition Processes ◽

Complementary Fragment

Download Full-text

Novel forecasting model based on improved wavelet transform, informative feature selection, and hybrid support vector machine on wind power forecasting

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-018-0886-0 ◽

2018 ◽

Vol 9 (6) ◽

pp. 1919-1931 ◽

Cited By ~ 13

Author(s):

Zhenling Liu ◽

Mahdi Hajiali ◽

Amirhosein Torabi ◽

Bahman Ahmadi ◽

Rolando Simoes

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Wavelet Transform ◽

Wind Power ◽

Support Vector ◽

Forecasting Model ◽

Informative Feature ◽

Wind Power Forecasting ◽

Model Based ◽

Power Forecasting

Download Full-text

A combined model based on feature selection and support vector machine for PM2.5 prediction

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202812 ◽

2021 ◽

pp. 1-15

Author(s):

Xiaocong Lai ◽

Hua Li ◽

Ying Pan

Keyword(s):

Air Pollution ◽

Support Vector Machine ◽

Feature Selection ◽

Air Quality ◽

Prediction Model ◽

Meteorological Factors ◽

Feature Selection Method ◽

Support Vector ◽

Combined Model ◽

Model Based

With the increasing attention to the environment and air quality, PM2.5 has been paid more and more attention. It is expected to excavate useful information in meteorological data to predict air pollution, however, the air quality is greatly affected by meteorological factors, and how to establish an effective air quality prediction model has always been a problem that people urgently need to solve. This paper proposed a combined model based on feature selection and Support Vector Machine (SVM) for PM2.5 prediction. Firstly, aiming at the influence of meteorological factors on PM2.5, a feature selection method based on linear causality is proposed to find out the causality between features and select the features with strong causality, so as to remove the redundant features in air pollution data and reduce the workload of data analysis. Then, a method based on SVM is proposed to analyze and solve the nonlinear problems in the data, for reducing the prediction error, a method of particle swarm optimization is also used to optimize SVM parameters. Finally, the above methods are combined into a prediction model, which is suitable for the current air pollution control. 12 representative data sets on the UCI (University of California, Irvine) website are used to verify the combined model, and the experimental results show that the model is feasible and effective.

Download Full-text

Analysis of Sentiment of Moving a National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i3.1942 ◽

2020 ◽

Vol 4 (3) ◽

pp. 504-512

Author(s):

Faried Zamachsari ◽

Gabriel Vangeran Saragih ◽

Susafa'ati ◽

Windu Gata

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Feature Selection ◽

Public Opinion ◽

Naive Bayes ◽

Naïve Bayes ◽

Capital City ◽

Support Vector ◽

National Capital ◽

Bayes Algorithm

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.

Download Full-text

Product Sales Forecasting Model Based on Robust Wavelet v-Support Vector Machine

ACTA AUTOMATICA SINICA ◽

10.3724/sp.j.1004.2009.01027 ◽

2009 ◽

Vol 35 (7) ◽

pp. 1027-1032 ◽

Cited By ~ 5

Author(s):

Qi WU ◽

Hong-Sen YAN ◽

Bin WANG

Keyword(s):

Support Vector Machine ◽

Support Vector ◽

Forecasting Model ◽

Model Based ◽

Product Sales

Download Full-text

An Improved Intelligent Approach to Enhance the Sentiment Classifier for Knowledge Discovery Using Machine Learning

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327910999200528114552 ◽

2020 ◽

Vol 10 (4) ◽

pp. 582-593

Author(s):

Midde Venkateswarlu Naik ◽

D. Vasumathi ◽

A.P. Siva Kumar

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Global Warming ◽

Particle Swarm Optimization ◽

Sentiment Analysis ◽

Optimization Technique ◽

Particle Swarm ◽

Sentiment Classification ◽

Support Vector ◽

Swarm Optimization

Aims: The proposed research work is on an evolutionary enhanced method for sentiment or emotion classification on unstructured review text in the big data field. The sentiment analysis plays a vital role for current generation of people for extracting valid decision points about any aspect such as movie ratings, education institute or politics ratings, etc. The proposed hybrid approach combined the optimal feature selection using Particle Swarm Optimization (PSO) and sentiment classification through Support Vector Machine (SVM). The current approach performance is evaluated with statistical measures, such as precision, recall, sensitivity, specificity, and was compared with the existing approaches. The earlier authors have achieved an accuracy of sentiment classifier in the English text up to 94% as of now. In the proposed scheme, an average accuracy of sentiment classifier on distinguishing datasets outperformed as 99% by tuning various parameters of SVM, such as constant c value and kernel gamma value in association with PSO optimization technique. The proposed method utilized three datasets, such as airline sentiment data, weather, and global warming datasets, that are publically available. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Background: The sentiment analysis plays a vital role for current generation people for extracting valid decisions about any aspect such as movie rating, education institute or even politics ratings, etc. Sentiment Analysis (SA) or opinion mining has become fascinated scientifically as a research domain for the present environment. The key area is sentiment classification on semi-structured or unstructured data in distinguish languages, which has become a major research aspect. User-Generated Content [UGC] from distinguishing sources has been hiked significantly with rapid growth in a web environment. The huge user-generated data over social media provides substantial value for discovering hidden knowledge or correlations, patterns, and trends or sentiment extraction about any specific entity. SA is a computational analysis to determine the actual opinion of an entity which is expressed in terms of text. SA is also called as computation of emotional polarity expressed over social media as natural text in miscellaneous languages. Usually, the automatic superlative sentiment classifier model depends on feature selection and classification algorithms. Methods: The proposed work used Support vector machine as classification technique and particle swarm optimization technique as feature selection purpose. In this methodology, we tune various permutations and combination parameters in order to obtain expected desired results with kernel and without kernel technique for sentiment classification on three datasets, including airline, global warming, weather sentiment datasets, that are freely hosted for research practices. Results: In the proposed scheme, The proposed method has outperformed with 99.2% of average accuracy to classify the sentiment on different datasets, among other machine learning techniques. The attained high accuracy in classifying sentiment or opinion about review text proves superior effectiveness over existing sentiment classifiers. The current experiment produced results that are trained and tested based on 10- Fold Cross-Validations (FCV) and confusion matrix for predicting sentiment classifier accuracy. Conclusion: The objective of the research issue sentiment classifier accuracy has been hiked with the help of Kernel-based Support Vector Machine (SVM) based on parameter optimization. The optimal feature selection to classify sentiment or opinion towards review documents has been determined with the help of a particle swarm optimization approach. The proposed method utilized three datasets to simulate the results, such as airline sentiment data, weather sentiment data, and global warming data that are freely available datasets.

Download Full-text