peptide classification
Recently Published Documents


TOTAL DOCUMENTS

17
(FIVE YEARS 4)

H-INDEX

6
(FIVE YEARS 2)

2020 ◽  
Vol 35 (5) ◽  
pp. 263-271
Author(s):  
A. I. Mikhalskii ◽  
I. V. Petrov ◽  
V. V. Tsurko ◽  
A. A. Anashkina ◽  
A. N. Nekrasov

AbstractA novel non-parametric method for mutual information estimation is presented. The method is suited for informative feature selection in classification and regression problems. Performance of the method is demonstrated on problem of stable short peptide classification.


2020 ◽  
Author(s):  
Meisam Ahmadi ◽  
Mohammad Reza Jahed-Motlagh ◽  
Ehsaneddin Asgari ◽  
Adel Torkaman Rahmani ◽  
Alice C. McHardy

ABSTRACTVenom is a mixture of substances produced by a venomous organism aiming at preying, defending, or intraspecific competing resulting in certain unwanted conditions for the target organism. Venom sequences are a highly divergent class of proteins making their machine learning-based and homology-based identification challenging. Prominent applications in drug discovery and healthcare, while having scarcity of annotations in the protein databases, made automatic identification of venom an important protein informatics task. Most of the existing machine learning approaches rely on engineered features, where the predictive model is trained on top of those manually designed features. Recently, transfer learning and representation learning resulted in significant advancements in many machine learning problem settings by automatically learning the essential features. This paper proposes an approach, called ToxVec, for automatic representation learning of protein sequences for the task of venom identification. We show that pre-trained language model-based representation outperforms the existing approaches in terms of the F1 score of both positive and negative classes achieving a macro-F1 of 0.89. We also show that an ensemble classifier trained over multiple training sets constructed from multiple down-samplings of the negative class instances can substantially improve a macro-F1 score to 0.93, which is 7 percent higher than the state-of-the-art performance.AvailabilityThe ToxVec application is available to use at https://github.com/meahmadi/ToxVec


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Emmanuel L. C. de los Santos

Abstract Significant progress has been made in the past few years on the computational identification of biosynthetic gene clusters (BGCs) that encode ribosomally synthesized and post-translationally modified peptides (RiPPs). This is done by identifying both RiPP tailoring enzymes (RTEs) and RiPP precursor peptides (PPs). However, identification of PPs, particularly for novel RiPP classes remains challenging. To address this, machine learning has been used to accurately identify PP sequences. Current machine learning tools have limitations, since they are specific to the RiPPclass they are trained for and are context-dependent, requiring information about the surrounding genetic environment of the putative PP sequences. NeuRiPP overcomes these limitations. It does this by leveraging the rich data set of high-confidence putative PP sequences from existing programs, along with experimentally verified PPs from RiPP databases. NeuRiPP uses neural network archictectures that are suitable for peptide classification with weights trained on PP datasets. It is able to identify known PP sequences, and sequences that are likely PPs. When tested on existing RiPP BGC datasets, NeuRiPP was able to identify PP sequences in significantly more putative RiPP clusters than current tools while maintaining the same HMM hit accuracy. Finally, NeuRiPP was able to successfully identify PP sequences from novel RiPP classes that were recently characterized experimentally, highlighting its utility in complementing existing bioinformatics tools.


2016 ◽  
Vol 49 (1) ◽  
Author(s):  
Stefan Simm ◽  
Jens Einloft ◽  
Oliver Mirus ◽  
Enrico Schleiff

2011 ◽  
Vol 38 (4) ◽  
pp. 3185-3191 ◽  
Author(s):  
Loris Nanni ◽  
Alessandra Lumini

2010 ◽  
Vol 43 (11) ◽  
pp. 3891-3899 ◽  
Author(s):  
E. Aygün ◽  
B.J. Oommen ◽  
Z. Cataltepe

Sign in / Sign up

Export Citation Format

Share Document