EpitopeVec: Linear Epitope Prediction Using Deep Protein Sequence Embeddings

AbstractMotivationB-cell epitopes (BCEs) play a pivotal role in the development of peptide vaccines, immunodiagnostic reagents, and antibody production, and thus generally in infectious disease prevention and diagnosis. Experimental methods used to determine BCEs are costly and time-consuming. It thus becomes essential to develop computational methods for the rapid identification of BCEs. Though several computational methods have been developed for this task, cross-testing of classifiers trained and tested on different datasets revealed their limitations, with accuracies of 51 to 53%.ResultsWe describe a new method called EpitopeVec, which utilizes residue properties, modified antigenicity scales, and a Protvec representation of peptides for linear BCE prediction with machine learning techniques. Evaluating on several large and small data sets, as well as cross-testing demonstrated an improvement of the state-of-the-art performances in terms of accuracy and AUC. Predictive performance depended on the type of antigen (viral, bacterial, eukaryote, etc.). In view of that, we also trained our method on a large viral dataset to create a linear viral BCE predictor.AvailablityThe software is available at https://github.com/hzi-bifo/epitope-prediction under the GPL3.0 [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

Porter 5: state-of-the-art ab initio prediction of protein secondary structure in 3 and 8 classes

10.1101/289033 ◽

2018 ◽

Cited By ~ 17

Author(s):

Mirko Torrisi ◽

Manaz Kaleel ◽

Gianluca Pollastri

Keyword(s):

Secondary Structure ◽

Ab Initio ◽

State Of The Art ◽

Protein Secondary Structure ◽

Independent Set ◽

Machine Learning Techniques ◽

Supplementary Information ◽

Data Sets ◽

Learning Techniques ◽

Ab Initio Prediction

AbstractMotivationAlthough Secondary Structure Predictors have been developed for more than 60 years, current ab initio methods have still some way to go to reach their theoretical limits. Moreover, the continuous effort towards harnessing ever increasing data sets and more sophisticated, deeper Machine Learning techniques, has not come to an end.ResultsHere we present Porter 5, the last release of one of the best performing ab initio secondary structure predictor. Version 5 achieves 84% accuracy (84% SOV) when tested on 3 classes, and 73% accuracy (82% SOV) on 8 classes, on a large independent set, significantly outperforming all the most recent ab initio predictors we have tested.AvailabilityThe web and standalone versions of Porter5 are available at http://distilldeep.ucd.ie/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

Application of Machine Learning Techniques to Predict Binding Affinity for Drug Targets: A Study of Cyclin-Dependent Kinase 2

Current Medicinal Chemistry ◽

10.2174/2213275912666191102162959 ◽

2020 ◽

Vol 28 (2) ◽

pp. 253-265 ◽

Cited By ~ 3

Author(s):

Gabriela Bitencourt-Ferreira ◽

Amauri Duarte da Silva ◽

Walter Filgueira de Azevedo

Keyword(s):

Machine Learning ◽

Binding Affinity ◽

Predictive Performance ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Scoring Functions ◽

Cyclin Dependent Kinase ◽

Learning Models ◽

Learning Techniques ◽

Machine Learning Models

Background: The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. Objective: Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. Methods: We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. Results: Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. Conclusion: Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.

Download Full-text

Classification of jujube defects in small data sets based on transfer learning

Neural Computing and Applications ◽

10.1007/s00521-021-05715-2 ◽

2021 ◽

Author(s):

Jianping Ju ◽

Hong Zheng ◽

Xiaohang Xu ◽

Zhongyuan Guo ◽

Zhaohui Zheng ◽

...

Keyword(s):

Transfer Learning ◽

Loss Function ◽

Training Model ◽

Parameter Distribution ◽

Test Accuracy ◽

Small Data ◽

Data Sets ◽

Data Set ◽

Small Data Sets

AbstractAlthough convolutional neural networks have achieved success in the field of image classification, there are still challenges in the field of agricultural product quality sorting such as machine vision-based jujube defects detection. The performance of jujube defect detection mainly depends on the feature extraction and the classifier used. Due to the diversity of the jujube materials and the variability of the testing environment, the traditional method of manually extracting the features often fails to meet the requirements of practical application. In this paper, a jujube sorting model in small data sets based on convolutional neural network and transfer learning is proposed to meet the actual demand of jujube defects detection. Firstly, the original images collected from the actual jujube sorting production line were pre-processed, and the data were augmented to establish a data set of five categories of jujube defects. The original CNN model is then improved by embedding the SE module and using the triplet loss function and the center loss function to replace the softmax loss function. Finally, the depth pre-training model on the ImageNet image data set was used to conduct training on the jujube defects data set, so that the parameters of the pre-training model could fit the parameter distribution of the jujube defects image, and the parameter distribution was transferred to the jujube defects data set to complete the transfer of the model and realize the detection and classification of the jujube defects. The classification results are visualized by heatmap through the analysis of classification accuracy and confusion matrix compared with the comparison models. The experimental results show that the SE-ResNet50-CL model optimizes the fine-grained classification problem of jujube defect recognition, and the test accuracy reaches 94.15%. The model has good stability and high recognition accuracy in complex environments.

Download Full-text

45 A permutation test for validation of genomic estimated breeding values

Journal of Animal Science ◽

10.1093/jas/skaa278.016 ◽

2020 ◽

Vol 98 (Supplement_4) ◽

pp. 8-9

Author(s):

Zahra Karimi ◽

Brian Sullivan ◽

Mohsen Jafarikia

Keyword(s):

Permutation Test ◽

Breeding Value ◽

Small Data ◽

Data Sets ◽

Type I ◽

Future Performance ◽

Small Data Sets ◽

Estimated Breeding Value ◽

Estimated Breeding Values ◽

Top 40

Abstract Previous studies have shown that the accuracy of Genomic Estimated Breeding Value (GEBV) as a predictor of future performance is higher than the traditional Estimated Breeding Value (EBV). The purpose of this study was to estimate the potential advantage of selection on GEBV for litter size (LS) compared to selection on EBV in the Canadian swine dam line breeds. The study included 236 Landrace and 210 Yorkshire gilts born in 2017 which had their first farrowing after 2017. GEBV and EBV for LS were calculated with data that was available at the end of 2017 (GEBV2017 and EBV2017, respectively). De-regressed EBV for LS in July 2019 (dEBV2019) was used as an adjusted phenotype. The average dEBV2019 for the top 40% of sows based on GEBV2017 was compared to the average dEBV2019 for the top 40% of sows based on EBV2017. The standard error of the estimated difference for each breed was estimated by comparing the average dEBV2019 for repeated random samples of two sets of 40% of the gilts. In comparison to the top 40% ranked based on EBV2017, ranking based on GEBV2017 resulted in an extra 0.45 (±0.29) and 0.37 (±0.25) piglets born per litter in Landrace and Yorkshire replacement gilts, respectively. The estimated Type I errors of the GEBV2017 gain over EBV2017 were 6% and 7% in Landrace and Yorkshire, respectively. Considering selection of both replacement boars and replacement gilts using GEBV instead of EBV can translate into increased annual genetic gain of 0.3 extra piglets per litter, which would more than double the rate of gain observed from typical EBV based selection. The permutation test for validation used in this study appears effective with relatively small data sets and could be applied to other traits, other species and other prediction methods.

Download Full-text

A Comparison Study of Mahalanobis-Taguchi System and Neural Network for Multivariate Pattern Recognition

Design Engineering, Parts A and B ◽

10.1115/imece2005-80029 ◽

2005 ◽

Cited By ~ 10

Author(s):

Jungeui Hong ◽

Elizabeth A. Cudney ◽

Genichi Taguchi ◽

Rajesh Jugulum ◽

Kioumars Paryani ◽

...

Keyword(s):

Neural Network ◽

Small Data ◽

Data Sets ◽

Comparison Study ◽

Data Set ◽

Set Size ◽

Breast Cancer Study ◽

Discriminant Ability ◽

Small Data Sets ◽

Multivariate Pattern

The Mahalanobis-Taguchi System is a diagnosis and predictive method for analyzing patterns in multivariate cases. The goal of this study is to compare the ability of the Mahalanobis-Taguchi System and a neural network to discriminate using small data sets. We examine the discriminant ability as a function of data set size using an application area where reliable data is publicly available. The study uses the Wisconsin Breast Cancer study with nine attributes and one class.

Download Full-text

Ensemble CNN in Transform Domains for Image Super-resolution from Small Data Sets

2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA) ◽

10.1109/icmla51294.2020.00068 ◽

2020 ◽

Author(s):

Yingnan Liu ◽

Randy Clinton Paffenroth

Keyword(s):

Super Resolution ◽

Small Data ◽

Data Sets ◽

Small Data Sets ◽

Image Super Resolution ◽

Transform Domains

Download Full-text

A research of intelligent parameters searching in small data sets

2010 IEEE 17Th International Conference on Industrial Engineering and Engineering Management ◽

10.1109/icieem.2010.5646589 ◽

2010 ◽

Author(s):

Wei-Hua Andrew Wang ◽

Ya-Chun Chang ◽

Wen-Hsin Chen

Keyword(s):

Small Data ◽

Data Sets ◽

Small Data Sets

Download Full-text

Improving Reliability Estimation for Individual Numeric Predictions: A Machine Learning Approach

INFORMS Journal on Computing ◽

10.1287/ijoc.2020.1019 ◽

2021 ◽

Author(s):

Gediminas Adomavicius ◽

Yaqiong Wang

Keyword(s):

Machine Learning ◽

General Purpose ◽

Reliability Estimation ◽

Machine Learning Techniques ◽

Data Sets ◽

Real World Data ◽

Learning Techniques ◽

Reliability Indicator ◽

Machine Learning Approach ◽

Prediction Reliability

Numerical predictive modeling is widely used in different application domains. Although many modeling techniques have been proposed, and a number of different aggregate accuracy metrics exist for evaluating the overall performance of predictive models, other important aspects, such as the reliability (or confidence and uncertainty) of individual predictions, have been underexplored. We propose to use estimated absolute prediction error as the indicator of individual prediction reliability, which has the benefits of being intuitive and providing highly interpretable information to decision makers, as well as allowing for more precise evaluation of reliability estimation quality. As importantly, the proposed reliability indicator allows the reframing of reliability estimation itself as a canonical numeric prediction problem, which makes the proposed approach general-purpose (i.e., it can work in conjunction with any outcome prediction model), alleviates the need for distributional assumptions, and enables the use of advanced, state-of-the-art machine learning techniques to learn individual prediction reliability patterns directly from data. Extensive experimental results on multiple real-world data sets show that the proposed machine learning-based approach can significantly improve individual prediction reliability estimation as compared with a number of baselines from prior work, especially in more complex predictive scenarios.

Download Full-text