PredGly: predicting lysine glycation sites for Homo sapiens based on XGboost feature optimization

Abstract Motivation Protein glycation is a familiar post-translational modification (PTM) which is a two-step non-enzymatic reaction. Glycation not only impairs the function but also changes the characteristics of the proteins so that it is related to many human diseases. It is still much more difficult to systematically detect glycation sites due to the glycated residues without crucial patterns. Computational approaches, which can filter supposed sites prior to experimental verification, can extremely increase the efficiency of experiment work. However, the previous lysine glycation prediction method uses a small number of training datasets. Hence, the model is not generalized or pervasive. Results By searching from a new database, we collected a large dataset in Homo sapiens. PredGly, a novel software, can predict lysine glycation sites for H.sapiens, which was developed by combining multiple features. In addition, XGboost was adopted to optimize feature vectors and to improve the model performance. Through comparing various classifiers, support vector machine achieved an optimal performance. On the basis of a new independent test set, PredGly outperformed other glycation tools. It suggests that PredGly can provide more instructive guidance for further experimental research of lysine glycation. Availability and implementation https://github.com/yujialinncu/PredGly Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Glypre: In Silico Prediction of Protein Glycation Sites by Fusing Multiple Features and Support Vector Machine

Molecules ◽

10.3390/molecules22111891 ◽

2017 ◽

Vol 22 (11) ◽

pp. 1891 ◽

Cited By ~ 9

Author(s):

Xiaowei Zhao ◽

Xiaosa Zhao ◽

Lingling Bao ◽

Yonggang Zhang ◽

Jiangyan Dai ◽

...

Keyword(s):

Support Vector Machine ◽

In Silico ◽

Protein Glycation ◽

Support Vector ◽

In Silico Prediction ◽

Multiple Features

Download Full-text

A Deep Learning Method for Yogurt Preferences Prediction Using Sensory Attributes

Processes ◽

10.3390/pr8050518 ◽

2020 ◽

Vol 8 (5) ◽

pp. 518

Author(s):

Kexin Bi ◽

Tong Qiu ◽

Yizhen Huang

Keyword(s):

Deep Learning ◽

High Performance ◽

Consumer Preferences ◽

Model Performance ◽

Prediction Method ◽

Sensory Attributes ◽

Support Vector ◽

Learning Method ◽

Analysis Model ◽

C Storage

During the development of innovative products, consumer preferences are the essential factors for yogurt producers to improve their market share. A high-performance prediction method will be beneficial to understand the intrinsic relevance between preferences and sensory attributes. In this study, a novel deep learning method is proposed that uses an autoencoder to extract product features from the sensory attributes scored by experts, and the sensory features acquired are regressed on consumer preferences with support vector machine analysis. Model performance analysis, hedonic contour mapping, and feature clustering were implemented to validate the overall learning process. The results showed that the deep learning model can vouch an acceptable level of accuracy, and the hedonic mapping reflected could supply a great help for producers’ product design or modification. Finally, hierarchical clustering analysis revealed that for all three brands of yogurts, low temperature (4 °C) storage for no more than 4 weeks can promise the highest consumer preferences.

Download Full-text

Predicting Future Occurrence of Acute Hypotensive Episodes Using Noninvasive and Invasive Features

Military Medicine ◽

10.1093/milmed/usaa418 ◽

2021 ◽

Vol 186 (Supplement_1) ◽

pp. 445-451

Author(s):

Yifei Sun ◽

Navid Rashedi ◽

Vikrant Vaze ◽

Parikshit Shah ◽

Ryan Halter ◽

...

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Real World ◽

Short Term Memory ◽

Model Performance ◽

Learning Technologies ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor ◽

Continuous Map

ABSTRACT Introduction Early prediction of the acute hypotensive episode (AHE) in critically ill patients has the potential to improve outcomes. In this study, we apply different machine learning algorithms to the MIMIC III Physionet dataset, containing more than 60,000 real-world intensive care unit records, to test commonly used machine learning technologies and compare their performances. Materials and Methods Five classification methods including K-nearest neighbor, logistic regression, support vector machine, random forest, and a deep learning method called long short-term memory are applied to predict an AHE 30 minutes in advance. An analysis comparing model performance when including versus excluding invasive features was conducted. To further study the pattern of the underlying mean arterial pressure (MAP), we apply a regression method to predict the continuous MAP values using linear regression over the next 60 minutes. Results Support vector machine yields the best performance in terms of recall (84%). Including the invasive features in the classification improves the performance significantly with both recall and precision increasing by more than 20 percentage points. We were able to predict the MAP with a root mean square error (a frequently used measure of the differences between the predicted values and the observed values) of 10 mmHg 60 minutes in the future. After converting continuous MAP predictions into AHE binary predictions, we achieve a 91% recall and 68% precision. In addition to predicting AHE, the MAP predictions provide clinically useful information regarding the timing and severity of the AHE occurrence. Conclusion We were able to predict AHE with precision and recall above 80% 30 minutes in advance with the large real-world dataset. The prediction of regression model can provide a more fine-grained, interpretable signal to practitioners. Model performance is improved by the inclusion of invasive features in predicting AHE, when compared to predicting the AHE based on only the available, restricted set of noninvasive technologies. This demonstrates the importance of exploring more noninvasive technologies for AHE prediction.

Download Full-text

Intelligent breakout prediction method based on support vector machine

Journal of Physics Conference Series ◽

10.1088/1742-6596/1653/1/012052 ◽

2020 ◽

Vol 1653 ◽

pp. 012052

Author(s):

Yuanpeng Tian ◽

Yu Liu

Keyword(s):

Support Vector Machine ◽

Prediction Method ◽

Support Vector

Download Full-text

FastSK: fast sequence analysis with gapped string kernels

Bioinformatics ◽

10.1093/bioinformatics/btaa817 ◽

2020 ◽

Vol 36 (Supplement_2) ◽

pp. i857-i865

Author(s):

Derrick Blakely ◽

Eamon Collins ◽

Ritambhara Singh ◽

Andrew Norton ◽

Jack Lanchantin ◽

...

Keyword(s):

Sequence Analysis ◽

Dna Sequences ◽

English Language ◽

Computation Time ◽

Entity Recognition ◽

Supplementary Information ◽

Support Vector ◽

Homology Detection ◽

Scalable Algorithm ◽

String Kernels

Abstract Motivation Gapped k-mer kernels with support vector machines (gkm-SVMs) have achieved strong predictive performance on regulatory DNA sequences on modestly sized training sets. However, existing gkm-SVM algorithms suffer from slow kernel computation time, as they depend exponentially on the sub-sequence feature length, number of mismatch positions, and the task’s alphabet size. Results In this work, we introduce a fast and scalable algorithm for calculating gapped k-mer string kernels. Our method, named FastSK, uses a simplified kernel formulation that decomposes the kernel calculation into a set of independent counting operations over the possible mismatch positions. This simplified decomposition allows us to devise a fast Monte Carlo approximation that rapidly converges. FastSK can scale to much greater feature lengths, allows us to consider more mismatches, and is performant on a variety of sequence analysis tasks. On multiple DNA transcription factor binding site prediction datasets, FastSK consistently matches or outperforms the state-of-the-art gkmSVM-2.0 algorithms in area under the ROC curve, while achieving average speedups in kernel computation of ∼100× and speedups of ∼800× for large feature lengths. We further show that FastSK outperforms character-level recurrent and convolutional neural networks while achieving low variance. We then extend FastSK to 7 English-language medical named entity recognition datasets and 10 protein remote homology detection datasets. FastSK consistently matches or outperforms these baselines. Availability and implementation Our algorithm is available as a Python package and as C++ source code at https://github.com/QData/FastSK Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Prediction method for surface finishing of spiral bevel gear tooth based on least square support vector machine

Journal of Central South University ◽

10.1007/s11771-011-0748-9 ◽

2011 ◽

Vol 18 (3) ◽

pp. 685-689 ◽

Cited By ~ 4

Author(s):

Ning Ma ◽

Wen-ji Xu ◽

Xu-yue Wang ◽

Ze-fei Wei ◽

Gui-bing Pang

Keyword(s):

Support Vector Machine ◽

Gear Tooth ◽

Prediction Method ◽

Bevel Gear ◽

Least Square ◽

Surface Finishing ◽

Support Vector ◽

Spiral Bevel Gear ◽

Spiral Bevel

Download Full-text

Motor Over-temperature Fault Estimation and Prediction Method Based on Linear Support Vector Machine Algorithm

10.1109/sdpc52933.2021.9563574 ◽

2021 ◽

Author(s):

Bo Chen ◽

Zhen Fu ◽

Kai Peng ◽

Xiangchao Liu

Keyword(s):

Support Vector Machine ◽

Prediction Method ◽

Fault Estimation ◽

Support Vector ◽

Support Vector Machine Algorithm ◽

Linear Support Vector Machine ◽

Estimation And Prediction

Download Full-text

Forecasting the risk at infractions: an ensemble comparison of machine learning approach

Industrial Management & Data Systems ◽

10.1108/imds-10-2020-0603 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Lei Li ◽

Desheng Wu

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Short Term Memory ◽

Model Performance ◽

Large Data ◽

Support Vector ◽

Learning Approaches ◽

Content Type ◽

Day To Day Operations ◽

Prediction Approach

PurposeThe infraction of securities regulations (ISRs) of listed firms in their day-to-day operations and management has become one of common problems. This paper proposed several machine learning approaches to forecast the risk at infractions of listed corporates to solve financial problems that are not effective and precise in supervision.Design/methodology/approachThe overall proposed research framework designed for forecasting the infractions (ISRs) include data collection and cleaning, feature engineering, data split, prediction approach application and model performance evaluation. We select Logistic Regression, Naïve Bayes, Random Forest, Support Vector Machines, Artificial Neural Network and Long Short-Term Memory Networks (LSTMs) as ISRs prediction models.FindingsThe research results show that prediction performance of proposed models with the prior infractions provides a significant improvement of the ISRs than those without prior, especially for large sample set. The results also indicate when judging whether a company has infractions, we should pay attention to novel artificial intelligence methods, previous infractions of the company, and large data sets.Originality/valueThe findings could be utilized to address the problems of identifying listed corporates' ISRs at hand to a certain degree. Overall, results elucidate the value of the prior infraction of securities regulations (ISRs). This shows the importance of including more data sources when constructing distress models and not only focus on building increasingly more complex models on the same data. This is also beneficial to the regulatory authorities.

Download Full-text

mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation

Bioinformatics ◽

10.1093/bioinformatics/bty1047 ◽

2018 ◽

Vol 35 (16) ◽

pp. 2757-2765 ◽

Cited By ~ 63

Author(s):

Balachandran Manavalan ◽

Shaherin Basith ◽

Tae Hwan Shin ◽

Leyi Wei ◽

Gwang Lee

Keyword(s):

Nearest Neighbor ◽

Feature Representation ◽

Superior Performance ◽

Supplementary Information ◽

Gradient Boosting ◽

Support Vector ◽

Pharmaceutical Drugs ◽

K Nearest Neighbor ◽

Feature Descriptors ◽

Predicted Probability

AbstractMotivationCardiovascular disease is the primary cause of death globally accounting for approximately 17.7 million deaths per year. One of the stakes linked with cardiovascular diseases and other complications is hypertension. Naturally derived bioactive peptides with antihypertensive activities serve as promising alternatives to pharmaceutical drugs. So far, there is no comprehensive analysis, assessment of diverse features and implementation of various machine-learning (ML) algorithms applied for antihypertensive peptide (AHTP) model construction.ResultsIn this study, we utilized six different ML algorithms, namely, Adaboost, extremely randomized tree (ERT), gradient boosting (GB), k-nearest neighbor, random forest (RF) and support vector machine (SVM) using 51 feature descriptors derived from eight different feature encodings for the prediction of AHTPs. While ERT-based trained models performed consistently better than other algorithms regardless of various feature descriptors, we treated them as baseline predictors, whose predicted probability of AHTPs was further used as input features separately for four different ML-algorithms (ERT, GB, RF and SVM) and developed their corresponding meta-predictors using a two-step feature selection protocol. Subsequently, the integration of four meta-predictors through an ensemble learning approach improved the balanced prediction performance and model robustness on the independent dataset. Upon comparison with existing methods, mAHTPred showed superior performance with an overall improvement of approximately 6–7% in both benchmarking and independent datasets.Availability and implementationThe user-friendly online prediction tool, mAHTPred is freely accessible at http://thegleelab.org/mAHTPred.Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

Classification of journal bearing friction states based on acoustic emission signals

tm - Technisches Messen ◽

10.1515/teme-2018-0004 ◽

2018 ◽

Vol 85 (6) ◽

pp. 434-442 ◽

Cited By ~ 6

Author(s):

Noushin Mokhtari ◽

Clemens Gühmann

Keyword(s):

Acoustic Emission ◽

Time Domain ◽

Journal Bearing ◽

Prediction Method ◽

Support Vector ◽

Mechatronic Systems ◽

Prediction Time ◽

Ae Signals ◽

Bearing Friction

Abstract For diagnosis and predictive maintenance of mechatronic systems, monitoring of bearings is essential. An important building block for this is the determination of the bearing friction condition. This paper deals with the possibility of monitoring different journal bearing friction states, such as mixed and fluid friction, and examines a new approach to distinguish between different friction intensities under several speed and load combinations based on feature extraction and feature selection methods applied on acoustic emission (AE) signals. The aim of this work is to identify separation effective features of AE signals to subsequently classify the journal bearing friction states. Furthermore, the acquired features give information about the mixed friction intensity, which is significant for remaining useful lifetime (RUL) prediction. Time domain features as well as features in the frequency domain have been investigated in this work. To increase the sensitivity of the extracted features the AE signals were transformed to the frequency-time-domain using continuous wavelet transform (CWT). Significant frequency bands are determined to separate different friction states more effective. A support vector machine (SVM) is used to classify the signals into three different friction classes. In the end the idea for an RUL prediction method by using the already determined information is given and explained.

Download Full-text