protein classification
Recently Published Documents


TOTAL DOCUMENTS

218
(FIVE YEARS 35)

H-INDEX

28
(FIVE YEARS 3)

2022 ◽  
Vol 23 (1) ◽  
pp. 24-33
Author(s):  
Ahmed Abd El-Hakim ◽  
Emad Mady ◽  
Ayman M. Abou Tahoun ◽  
Mohammed S.A. Ghaly ◽  
Mohamed A. Eissa

2021 ◽  
Author(s):  
Edoardo Farnesi ◽  
Andrea Barucci ◽  
Cristiano D'Andrea ◽  
Martina Banchelli ◽  
Chiara Amicucci ◽  
...  

2021 ◽  
Author(s):  
Deepro Banerjee ◽  
Michael A. Jindra ◽  
Alec J. Linot ◽  
Brian F. Pfleger ◽  
Costas D. Maranas

Classification of proteins into their respective functional categories remains a long-standing key challenge in computational biology. Machine Learning (ML) based discriminative algorithms have been used extensively to address this challenge; however, the presence of small-sized, noisy, unbalanced protein classification datasets where high sequence similarity does not always imply identical functional properties have prevented robust prediction performance. Herein we present a ML method, En semble method for en Zym e Class ification (EnZymClass), that is specifically designed to address these issues. EnZymClass makes use of 47 alignment-free feature extraction techniques as numerically encoded descriptors of protein sequences to construct a stacked ensemble classification scheme capable of categorizing proteins based on their functional attributes.  We used EnZymClass to classify plant acyl-ACP thioesterases (TEs) into short, long and mixed free fatty acid substrate specificity categories. While general guidelines for inferring substrate specificity have been proposed before, prediction of chain-length preference from primary sequence has remained elusive. EnZymClass achieved high classification metric scores on the TE substrate specificity prediction task (average accuracy score of 0.8, average precision and recall scores of 0.87 and 0.89 respectively on medium-chain TE prediction) producing accuracy scores that are about twice as effective at avoiding misclassifications than existing similarity-based methods of substrate specificity prediction. By applying EnZymClass to a subset of TEs in the ThYme database, we identified two acyl-ACP TE, ClFatB3 and CwFatB2, with previously uncharacterized activity in E. coli fatty acid production hosts. We incorporated modifications into ClFatB3 established in prior TE engineering studies, resulting in a 4.2-fold overall improvement in observed C 10 titers over the wildtype enzyme. EnZymClass can be readily applied to other protein classification challenges and is available at: https://github.com/deeprob/ThioesteraseEnzymeSpecificity


2021 ◽  
Vol 22 (S3) ◽  
Author(s):  
Enze Zhang ◽  
Boheng Zhang ◽  
Shaohan Hu ◽  
Fa Zhang ◽  
Zhiyong Liu ◽  
...  

Abstract Background Proteins are of extremely vital importance in the human body, and no movement or activity can be performed without proteins. Currently, microscopy imaging technologies developed rapidly are employed to observe proteins in various cells and tissues. In addition, due to the complex and crowded cellular environments as well as various types and sizes of proteins, a considerable number of protein images are generated every day and cannot be classified manually. Therefore, an automatic and accurate method should be designed to properly solve and analyse protein images with mixed patterns. Results In this paper, we first propose a novel customized architecture with adaptive concatenate pooling and “buffering” layers in the classifier part, which could make the networks more adaptive to training and testing datasets, and develop a novel hard sampler at the end of our network to effectively mine the samples from small classes. Furthermore, a new loss is presented to handle the label imbalance based on the effectiveness of samples. In addition, in our method, several novel and effective optimization strategies are adopted to solve the difficult training-time optimization problem and further increase the accuracy by post-processing. Conclusion Our methods outperformed the SOTA method of multi-labelled protein classification on the HPA dataset, GapNet-PL, by above 2% in the F1 score. Therefore, experimental results based on the test set split from the Human Protein Atlas dataset show that our methods have good performance in automatically classifying multi-class and multi-labelled high-throughput microscopy protein images.


Author(s):  
Hsin-Hung Chou ◽  
Ching-Tien Hsu ◽  
Chin-Wei Hsu ◽  
Kai-Hsun Yao ◽  
Hao-Ching Wang ◽  
...  

2020 ◽  
Vol 99 (5) ◽  
pp. 473-479
Author(s):  
Omar Arafat Kdudsi Khalil ◽  
Sara da Silva Khalil

Introdução: O SARS-CoV-2 é um novo coronavírus, responsável pela atual pandemia de COVID-19, o qual já infectou e causou a morte de milhares de pessoas em todo o mundo. Objetivo: Descrever aspectos básicos e fundamentais sobre o SARS-CoV-2, como nome, constituição, possíveis origens e classificação. Método: Revisão bibliográfica exploratória e descritiva, elaborada por meio de pesquisas nas plataformas PubMed, Scopus, Google Acadêmico, e SciELO. Os termos utilizados para a seleção dos materiais foram: “SARS-CoV-2”, “COVID-19”, “spike-protein”, “classification”, “coronavirus” e suas combinações. Resultados: Os coronavírus pertencem à família Coronaviridae, a qual abrange 2 subfamílias, 5 gêneros, 26 subgêneros e 46 espécies de vírus. O SARS-CoV-2 pertence ao gênero Betacoronavirus, subgênero Sarbecovirus, espécie Severe acute respiratory syndrome-related coronavirus, e está relacionado à síndrome respiratória aguda. Sua classificação foi realizada pelo International Committee on Taxonomy of Viruses (ICTV) levando-se em consideração principalmente características moleculares e filogenéticas e não a doença que causa. Este vírus surgiu na China, país em que há o hábito de consumir animais domésticos ou selvagens recentemente abatidos. Especula-se que o morcego-ferradura (Rhinolophus sinicus) seja o seu hospedeiro primário e que o pangolim malaio (Manis javanica) o hospedeiro intermediário. O SARS-CoV-2 é um vírus envelopado, aproximadamente esférico, e os seus vírions têm diâmetros médios de 80 a 120 nm. Possui genoma de RNA não segmentado, fita simples, codificando quatro proteínas principais: glicoproteína espicular (S), proteína do envelope (E), glicoproteína da membrana (M) e proteína do nucleocapsídeo (N). Os coronavírus utilizam a S como principal alvo para neutralizar anticorpos e para se ligar ao receptor enzima conversora de angiotensina 2. Conclusão: O conhecimento aprofundado sobre as características básicas do SARS-CoV-2 é fundamental para a melhor compreensão e entendimento dos aspectos epidemiológicos, clínicos, fisiopatológicos e para o tratamento da COVID-19.


Sign in / Sign up

Export Citation Format

Share Document