scholarly journals The structural coverage of the human proteome before and after AlphaFold

2021 ◽  
Author(s):  
Eduard Porta-Pardo ◽  
Victoria Ruiz-Serra ◽  
Alfonso Valencia

The protein structure field is experiencing a revolution. From the increased throughput of techniques to determine experimental structures, to developments such as cryo-EM that allow us to find the structures of large protein complexes or, more recently, the development of artificial intelligence tools, such as AlphaFold, that can predict with high accuracy the folding of proteins for which the availability of homology templates is limited. Here we quantify the effect of the recently released AlphaFold database of protein structural models in our knowledge on human proteins. Our results indicate that our current baseline for structural coverage of 47%, considering experimentally-derived or template-based homology models, elevates up to 75% when including AlphaFold predictions, reducing the fraction of dark proteome from 22% to just 7% and the number of proteins without structural information from 4.832 to just 29. Furthermore, although the coverage of disease-associated genes and mutations was near complete before AlphaFold release (70% of ClinVar pathogenic mutations and 74% of oncogenic mutations), AlphaFold models still provide an additional coverage of 2% to 14% of these critically important sets of biomedical genes and mutations. We also provide several examples of disease-associated proteins where AlphaFold provides critical new insights. Overall, our results show that the sequence-structure gap of human proteins has almost disappeared, an outstanding success of direct consequences for the knowledge on the human genome and the derived medical applications.

2020 ◽  
Author(s):  
Krishna Praneeth Kilambi ◽  
Qifang Xu ◽  
Guruharsha Kuthethur Gururaj ◽  
Kejie Li ◽  
Spyros Artavanis-Tsakonas ◽  
...  

AbstractA high-quality map of the human protein–protein interaction (PPI) network can help us better understand complex genotype–phenotype relationships. Each edge between two interacting proteins supported through an interface in a three-dimensional (3D) structure of a protein complex adds credibility to the biological relevance of the interaction. Such structure-supported interactions would augment an interaction map primarily built using high-throughput cell-based biophysical methods. Here, we integrate structural information with the human PPI network to build the structure-supported human interactome, a subnetwork of PPI between proteins that contain domains or regions known to form interfaces in the 3D structures of protein complexes. We expand the coverage of our structure-supported human interactome by using Pfam-based domain definitions, whereby we include homologous interactions if a human complex structure is unavailable. The structure-supported interactome predicts one-eighth of the total network PPI to interact through domain–domain interfaces. It identifies with higher resolution the interacting subunits in multi-protein complexes and enables us to characterize functional and disease-relevant neighborhoods in the network map with higher accuracy, allowing for structural insights into disease-associated genes and pathways. We expand the structural coverage beyond domain–domain interfaces by identifying the most common non-enzymatic peptide-binding domains with structural support. Adding these interactions between protein domains on one side and peptide regions on the other approximately doubles the number of structure-supported PPI. The human structure-supported interactome is a resource to prioritize investigations of smaller-scale context-specific experimental PPI neighborhoods of biological or clinical significance.Short abstractA high-quality map of the human protein–protein interaction (PPI) network can help us better understand genotype–phenotype relationships. Each edge between two interacting proteins supported through an interface in a three-dimensional structure of a protein complex adds credibility to the biological relevance of the interaction aiding experimental prioritization. Here, we integrate structural information with the human interactome to build the structure-supported human interactome, a subnetwork of PPI between proteins that contain domains or regions known to form interfaces in the structures of protein complexes. The structure-supported interactome predicts one-eighth of the total PPI to interact through domain–domain interfaces. It identifies with higher resolution the interacting subunits in multi-protein complexes and enables us to structurally characterize functional, disease-relevant network neighborhoods. We also expand the structural coverage by identifying PPI between non-enzymatic peptide-binding domains on one side and peptide regions on the other, thereby doubling the number of structure-supported PPI.


2019 ◽  
Author(s):  
Zachary VanAernum ◽  
Florian Busch ◽  
Benjamin J. Jones ◽  
Mengxuan Jia ◽  
Zibo Chen ◽  
...  

It is important to assess the identity and purity of proteins and protein complexes during and after protein purification to ensure that samples are of sufficient quality for further biochemical and structural characterization, as well as for use in consumer products, chemical processes, and therapeutics. Native mass spectrometry (nMS) has become an important tool in protein analysis due to its ability to retain non-covalent interactions during measurements, making it possible to obtain protein structural information with high sensitivity and at high speed. Interferences from the presence of non-volatiles are typically alleviated by offline buffer exchange, which is timeconsuming and difficult to automate. We provide a protocol for rapid online buffer exchange (OBE) nMS to directly screen structural features of pre-purified proteins, protein complexes, or clarified cell lysates. Information obtained by OBE nMS can be used for fast (<5 min) quality control and can further guide protein expression and purification optimization.


2020 ◽  
Vol 27 (37) ◽  
pp. 6306-6355 ◽  
Author(s):  
Marian Vincenzi ◽  
Flavia Anna Mercurio ◽  
Marilisa Leone

Background:: Many pathways regarding healthy cells and/or linked to diseases onset and progression depend on large assemblies including multi-protein complexes. Protein-protein interactions may occur through a vast array of modules known as protein interaction domains (PIDs). Objective:: This review concerns with PIDs recognizing post-translationally modified peptide sequences and intends to provide the scientific community with state of art knowledge on their 3D structures, binding topologies and potential applications in the drug discovery field. Method:: Several databases, such as the Pfam (Protein family), the SMART (Simple Modular Architecture Research Tool) and the PDB (Protein Data Bank), were searched to look for different domain families and gain structural information on protein complexes in which particular PIDs are involved. Recent literature on PIDs and related drug discovery campaigns was retrieved through Pubmed and analyzed. Results and Conclusion:: PIDs are rather versatile as concerning their binding preferences. Many of them recognize specifically only determined amino acid stretches with post-translational modifications, a few others are able to interact with several post-translationally modified sequences or with unmodified ones. Many PIDs can be linked to different diseases including cancer. The tremendous amount of available structural data led to the structure-based design of several molecules targeting protein-protein interactions mediated by PIDs, including peptides, peptidomimetics and small compounds. More studies are needed to fully role out, among different families, PIDs that can be considered reliable therapeutic targets, however, attacking PIDs rather than catalytic domains of a particular protein may represent a route to obtain selective inhibitors.


Biomolecules ◽  
2021 ◽  
Vol 11 (5) ◽  
pp. 726
Author(s):  
Ronald Biemann ◽  
Enrico Buß ◽  
Dirk Benndorf ◽  
Theresa Lehmann ◽  
Kay Schallert ◽  
...  

Gut microbiota-mediated inflammation promotes obesity-associated low-grade inflammation, which represents a hallmark of metabolic syndrome. To investigate if lifestyle-induced weight loss (WL) may modulate the gut microbiome composition and its interaction with the host on a functional level, we analyzed the fecal metaproteome of 33 individuals with metabolic syndrome in a longitudinal study before and after lifestyle-induced WL in a well-defined cohort. The 6-month WL intervention resulted in reduced BMI (−13.7%), improved insulin sensitivity (HOMA-IR, −46.1%), and reduced levels of circulating hsCRP (−39.9%), indicating metabolic syndrome reversal. The metaprotein spectra revealed a decrease of human proteins associated with gut inflammation. Taxonomic analysis revealed only minor changes in the bacterial composition with an increase of the families Desulfovibrionaceae, Leptospiraceae, Syntrophomonadaceae, Thermotogaceae and Verrucomicrobiaceae. Yet we detected an increased abundance of microbial metaprotein spectra that suggest an enhanced hydrolysis of complex carbohydrates. Hence, lifestyle-induced WL was associated with reduced gut inflammation and functional changes of human and microbial enzymes for carbohydrate hydrolysis while the taxonomic composition of the gut microbiome remained almost stable. The metaproteomics workflow has proven to be a suitable method for monitoring inflammatory changes in the fecal metaproteome.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Pablo Mier ◽  
Miguel A. Andrade-Navarro

Abstract According to the amino acid composition of natural proteins, it could be expected that all possible sequences of three or four amino acids will occur at least once in large protein datasets purely by chance. However, in some species or cellular context, specific short amino acid motifs are missing due to unknown reasons. We describe these as Avoided Motifs, short amino acid combinations missing from biological sequences. Here we identify 209 human and 154 bacterial Avoided Motifs of length four amino acids, and discuss their possible functionality according to their presence in other species. Furthermore, we determine two Avoided Motifs of length three amino acids in human proteins specifically located in the cytoplasm, and two more in secreted proteins. Our results support the hypothesis that the characterization of Avoided Motifs in particular contexts can provide us with information about functional motifs, pointing to a new approach in the use of molecular sequences for the discovery of protein function.


Molecules ◽  
2021 ◽  
Vol 26 (5) ◽  
pp. 1225
Author(s):  
Jiawen Cao ◽  
Tiantian Fan ◽  
Yanlian Li ◽  
Zhiyan Du ◽  
Lin Chen ◽  
...  

WD40 is a ubiquitous domain presented in at least 361 human proteins and acts as scaffold to form protein complexes. Among them, WDR5 protein is an important mediator in several protein complexes to exert its functions in histone modification and chromatin remodeling. Therefore, it was considered as a promising epigenetic target involving in anti-cancer drug development. In view of the protein–protein interaction nature of WDR5, we initialized a campaign to discover new peptide-mimic inhibitors of WDR5. In current study, we utilized the phage display technique and screened with a disulfide-based cyclic peptide phage library. Five rounds of biopanning were performed and isolated clones were sequenced. By analyzing the sequences, total five peptides were synthesized for binding assay. The four peptides are shown to have the moderate binding affinity. Finally, the detailed binding interactions were revealed by solving a WDR5-peptide cocrystal structure.


Mitochondrion ◽  
2015 ◽  
Vol 21 ◽  
pp. 27-32 ◽  
Author(s):  
Yang Xu ◽  
Ashim Malhotra ◽  
Steven M. Claypool ◽  
Mindong Ren ◽  
Michael Schlame

Sign in / Sign up

Export Citation Format

Share Document