scholarly journals SPOT-Contact-Single: Improving Single-Sequence-Based Prediction of Protein Contact Map using a Transformer Language Model, Large Training Set and Ensembled Deep Learning

2021 ◽  
Author(s):  
Jaspreet Singh ◽  
Thomas Litfin ◽  
Jaswinder Singh ◽  
Kuldip Paliwal ◽  
Yaoqi Zhou

Motivation: Accurate prediction of protein contact map is essential for accurate proteins structure and function prediction. As a result, many methods have been developed for protein contact map prediction. However, most contact map prediction methods rely on protein sequence evolutionary information which may not exist for many proteins due to lack of sequence homology. Moreover, generating evolutionary profiles is computationally intensive and time consuming. Therefore, we developed a contact map predictor utilizing the output of a pre-trained language model ESM-1B as an input along with a large training set and an ensemble of residual neural networks. Results: We showed that the proposed method makes a significant improvement over a single-sequence-based predictor SSCpred with 15% improvement in the F1-score for the independent CASP14-FM test set. It also outperforms evolutionary-profile-based methods TrRosetta and SPOT-Contact with 48.7% and 48.5% respective improvement in the F1-score on the proteins in the SPOT-2018 set without homologs (Neff=1). The new method provides a much faster and reasonably accurate alternative to profile-based methods, useful for large-scale prediction, in particular.

2014 ◽  
Vol 155 (26) ◽  
pp. 1011-1018 ◽  
Author(s):  
György Végvári ◽  
Edina Vidéki

Plants seem to be rather defenceless, they are unable to do motion, have no nervous system or immune system unlike animals. Besides this, plants do have hormones, though these substances are produced not in glands. In view of their complexity they lagged behind animals, however, plant organisms show large scale integration in their structure and function. In higher plants, such as in animals, the intercellular communication is fulfilled through chemical messengers. These specific compounds in plants are called phytohormones, or in a wide sense, bioregulators. Even a small quantity of these endogenous organic compounds are able to regulate the operation, growth and development of higher plants, and keep the connection between cells, tissues and synergy beween organs. Since they do not have nervous and immume systems, phytohormones play essential role in plants’ life. Orv. Hetil., 2014, 155(26), 1011–1018.


2004 ◽  
Vol 18 (2) ◽  
pp. 167-183 ◽  
Author(s):  
Jianhua Zhang ◽  
Amy Moseley ◽  
Anil G. Jegga ◽  
Ashima Gupta ◽  
David P. Witte ◽  
...  

To understand the commitment of the genome to nervous system differentiation and function, we sought to compare nervous system gene expression to that of a wide variety of other tissues by gene expression database construction and mining. Gene expression profiles of 10 different adult nervous tissues were compared with that of 72 other tissues. Using ANOVA, we identified 1,361 genes whose expression was higher in the nervous system than other organs and, separately, 600 genes whose expression was at least threefold higher in one or more regions of the nervous system compared with their median expression across all organs. Of the 600 genes, 381 overlapped with the 1,361-gene list. Limited in situ gene expression analysis confirmed that identified genes did represent nervous system-enriched gene expression, and we therefore sought to evaluate the validity and significance of these top-ranked nervous system genes using known gene literature and gene ontology categorization criteria. Diverse functional categories were present in the 381 genes, including genes involved in intracellular signaling, cytoskeleton structure and function, enzymes, RNA metabolism and transcription, membrane proteins, as well as cell differentiation, death, proliferation, and division. We searched existing public sites and identified 110 known genes related to mental retardation, neurological disease, and neurodegeneration. Twenty-one of the 381 genes were within the 110-gene list, compared with a random expectation of 5. This suggests that the 381 genes provide a candidate set for further analyses in neurological and psychiatric disease studies and that as a field, we are as yet, far from a large-scale understanding of the genes that are critical for nervous system structure and function. Together, our data indicate the power of profiling an individual biologic system in a multisystem context to gain insight into the genomic basis of its structure and function.


2017 ◽  
Vol 127 (5) ◽  
pp. 1798-1812 ◽  
Author(s):  
Philipp S. Wild ◽  
Janine F. Felix ◽  
Arne Schillert ◽  
Alexander Teumer ◽  
Ming-Huei Chen ◽  
...  

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Yang Li ◽  
Zheng Wang ◽  
Li-Ping Li ◽  
Zhu-Hong You ◽  
Wen-Zhun Huang ◽  
...  

AbstractVarious biochemical functions of organisms are performed by protein–protein interactions (PPIs). Therefore, recognition of protein–protein interactions is very important for understanding most life activities, such as DNA replication and transcription, protein synthesis and secretion, signal transduction and metabolism. Although high-throughput technology makes it possible to generate large-scale PPIs data, it requires expensive cost of both time and labor, and leave a risk of high false positive rate. In order to formulate a more ingenious solution, biology community is looking for computational methods to quickly and efficiently discover massive protein interaction data. In this paper, we propose a computational method for predicting PPIs based on a fresh idea of combining orthogonal locality preserving projections (OLPP) and rotation forest (RoF) models, using protein sequence information. Specifically, the protein sequence is first converted into position-specific scoring matrices (PSSMs) containing protein evolutionary information by using the Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST). Then we characterize a protein as a fixed length feature vector by applying OLPP to PSSMs. Finally, we train an RoF classifier for the purpose of identifying non-interacting and interacting protein pairs. The proposed method yielded a significantly better results than existing methods, with 90.07% and 96.09% prediction accuracy on Yeast and Human datasets. Our experiment show the proposed method can serve as a useful tool to accelerate the process of solving key problems in proteomics.


2004 ◽  
Vol 132 (2) ◽  
pp. 105-115 ◽  
Author(s):  
Dan Goldowitz ◽  
Wayne N. Frankel ◽  
Joseph S. Takahashi ◽  
Martha Holtz-Vitaterna ◽  
Carol Bult ◽  
...  

2014 ◽  
Vol 149 (1-2) ◽  
pp. 3-10 ◽  
Author(s):  
Alberto M. Luciano ◽  
Federica Franciosi ◽  
Cecilia Dieci ◽  
Valentina Lodde

2020 ◽  
Author(s):  
R Christian McDonald ◽  
Matthew J Schott ◽  
Temitope A Idowu ◽  
Peter J Lyons

Abstract Background. Like most major enzyme families, the M14 family of metallocarboxypeptidases (MCPs) contains a number of pseudoenzymes predicted to lack enzyme activity and with poorly characterized molecular function. The genome of the yeast Saccharomyces cerevisiae encodes one member of the M14 MCP family, a pseudoenzyme named Ecm14 proposed to function in the extracellular matrix. In order to better understand the function of such pseudoenzymes, we studied the structure and function of Ecm14 in S. cerevisiae. Results. A phylogenetic analysis of Ecm14 in fungi found it to be conserved throughout the ascomycete phylum, with a group of related pseudoenzymes found in basidiomycetes. To investigate the structure and function of this conserved protein, His6-tagged Ecm14 was overexpressed in Sf9 cells and purified. The prodomain of Ecm14 was cleaved in vivo and in vitro by endopeptidases, suggesting an activation mechanism; however, no activity was detectable using standard carboxypeptidase substrates. In order to determine the function of Ecm14 using an unbiased screen, we undertook a synthetic lethal assay. Upon screening approximately 27,000 yeast colonies, twenty-two putative synthetic lethal clones were identified. Further analysis showed many to be synthetic lethal with auxotrophic marker genes and requiring multiple mutations, suggesting that there are few, if any, single S. cerevisiae genes that present synthetic lethal interactions with ecm14Δ. Conclusions. We show in this study that Ecm14, although lacking detectable enzyme activity, is a conserved carboxypeptidase-like protein that is secreted from cells and is processed to a mature form by the action of an endopeptidase. Our study and datasets from other recent large-scale screens suggest a role for Ecm14 in processes such as vesicle-mediated transport and aggregate invasion, a fungal process that has been selected against in modern laboratory strains of S. cerevisiae.


Sign in / Sign up

Export Citation Format

Share Document