scholarly journals Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Yan Wang ◽  
Qiang Shi ◽  
Pengshuo Yang ◽  
Chengxin Zhang ◽  
S. M. Mortuza ◽  
...  

Abstract Introduction The ocean microbiome represents one of the largest microbiomes and produces nearly half of the primary energy on the planet through photosynthesis or chemosynthesis. Using recent advances in marine genomics, we explore new applications of oceanic metagenomes for protein structure and function prediction. Results By processing 1.3 TB of high-quality reads from the Tara Oceans data, we obtain 97 million non-redundant genes. Of the 5721 Pfam families that lack experimental structures, 2801 have at least one member associated with the oceanic metagenomics dataset. We apply C-QUARK, a deep-learning contact-guided ab initio structure prediction pipeline, to model 27 families, where 20 are predicted to have a reliable fold with estimated template modeling score (TM-score) at least 0.5. Detailed analyses reveal that the abundance of microbial genera in the ocean is highly correlated to the frequency of occurrence in the modeled Pfam families, suggesting the significant role of the Tara Oceans genomes in the contact-map prediction and subsequent ab initio folding simulations. Of interesting note, PF15461, which has a majority of members coming from ocean-related bacteria, is identified as an important photosynthetic protein by structure-based function annotations. The pipeline is extended to a set of 417 Pfam families, built on the combination of Tara with other metagenomics datasets, which results in 235 families with an estimated TM-score over 0.5. Conclusions These results demonstrate a new avenue to improve the capacity of protein structure and function modeling through marine metagenomics, especially for difficult proteins with few homologous sequences.

2021 ◽  
Author(s):  
Pengshuo Yang ◽  
Wei Zheng ◽  
Kang Ning ◽  
Yang Zhang

Information extracted from microbiome sequences through deep-learning techniques can significantly improve protein structure and function modeling. However, the model training and metagenome search were largely blind with low efficiency. Built on 4.25 billion microbiome sequences from four major biomes (Gut, Lake, Soil and Fermentor), we proposed a MetaSource model to decode the inherent link of microbial niches with protein homologous families. Large-scale protein family folding experiments showed that a targeted approach using predicted biomes significantly outperform combined metagenome datasets in both speed of MSA collection and accuracy of deep-learning structure assembly. These results revealed the important link of biomes with protein families and provided a useful bluebook to guide future microbiome sequence database and modeling development for protein structure and function prediction.


2021 ◽  
Author(s):  
Liang Hong ◽  
Siqi Sun ◽  
Liangzhen Zheng ◽  
Qingxiong Tan ◽  
Yu Li

Evolutionarily related sequences provide information for the protein structure and function. Multiple sequence alignment, which includes homolog searching from large databases and sequence alignment, is efficient to dig out the information and assist protein structure and function prediction, whose efficiency has been proved by AlphaFold. Despite the existing tools for multiple sequence alignment, searching homologs from the entire UniProt is still time-consuming. Considering the success of AlphaFold, foreseeably, large- scale multiple sequence alignments against massive databases will be a trend in the field. It is very desirable to accelerate this step. Here, we propose a novel method, fastMSA, to improve the speed significantly. Our idea is orthogonal to all the previous accelerating methods. Taking advantage of the protein language model based on BERT, we propose a novel dual encoder architecture that can embed the protein sequences into a low-dimension space and filter the unrelated sequences efficiently before running BLAST. Extensive experimental results suggest that we can recall most of the homologs with a 34-fold speed-up. Moreover, our method is compatible with the downstream tasks, such as structure prediction using AlphaFold. Using multiple sequence alignments generated from our method, we have little performance compromise on the protein structure prediction with much less running time. fastMSA will effectively assist protein sequence, structure, and function analysis based on homologs and multiple sequence alignment.


2020 ◽  
Author(s):  
Khondker Rufaka Hossain ◽  
Daniel Clayton ◽  
Sophia C Goodchild ◽  
Alison Rodger ◽  
Richard James Payne ◽  
...  

Membrane protein structure and function are modulated via interactions with their lipid environment. This is particularly true for the integral membrane pumps, the P-type ATPases. These ATPases play vital roles...


2017 ◽  
Vol 6 (1) ◽  
pp. 75-92 ◽  
Author(s):  
Elka R. Georgieva

AbstractCellular membranes and associated proteins play critical physiological roles in organisms from all life kingdoms. In many cases, malfunction of biological membranes triggered by changes in the lipid bilayer properties or membrane protein functional abnormalities lead to severe diseases. To understand in detail the processes that govern the life of cells and to control diseases, one of the major tasks in biological sciences is to learn how the membrane proteins function. To do so, a variety of biochemical and biophysical approaches have been used in molecular studies of membrane protein structure and function on the nanoscale. This review focuses on electron paramagnetic resonance with site-directed nitroxide spin-labeling (SDSL EPR), which is a rapidly expanding and powerful technique reporting on the local protein/spin-label dynamics and on large functionally important structural rearrangements. On the other hand, adequate to nanoscale study membrane mimetics have been developed and used in conjunction with SDSL EPR. Primarily, these mimetics include various liposomes, bicelles, and nanodiscs. This review provides a basic description of the EPR methods, continuous-wave and pulse, applied to spin-labeled proteins, and highlights several representative applications of EPR to liposome-, bicelle-, or nanodisc-reconstituted membrane proteins.


Sign in / Sign up

Export Citation Format

Share Document