protein universe Latest Research Papers

fLPS 2.0: rapid annotation of compositionally-biased regions in biological sequences

PeerJ ◽

10.7717/peerj.12363 ◽

2021 ◽

Vol 9 ◽

pp. e12363

Author(s):

Paul M. Harrison

Keyword(s):

Dark Matter ◽

Dna Sequences ◽

Low Complexity ◽

Biological Sequences ◽

Link Type ◽

Physico Chemical ◽

Protein Universe ◽

Supplemental File ◽

Chemical Character

Compositionally-biased (CB) regions in biological sequences are enriched for a subset of sequence residue types. These can be shorter regions with a concentrated bias (i.e., those termed ‘low-complexity’), or longer regions that have a compositional skew. These regions comprise a prominent class of the uncharacterized ‘dark matter’ of the protein universe. Here, I report the latest version of the fLPS package for the annotation of CB regions, which includes added consideration of DNA sequences, to label the eight possible biased regions of DNA. In this version, the user is now able to restrict analysis to a specified subset of residue types, and also to filter for previously annotated domains to enable detection of discontinuous CB regions. A ‘thorough’ option has been added which enables the labelling of subtler biases, typically made from a skew for several residue types. In the output, protein CB regions are now labelled with bias classes reflecting the physico-chemical character of the biasing residues. The fLPS 2.0 package is available from: https://github.com/pmharrison/flps2 or in a Supplemental File of this paper.

Evidence for the Emergence of β-Trefoils by ‘Peptide Budding’ from an IgG-like β-Sandwich

10.1101/2021.10.04.462989 ◽

2021 ◽

Author(s):

Liam M. Longo ◽

Rachel Kolodny ◽

Shawn E. McGlynn

Keyword(s):

De Novo ◽

General Trend ◽

Sequence Structure ◽

Structure Comparison ◽

Related Sequence ◽

Structure Space ◽

Protein Universe ◽

Remote Islands ◽

Comparison Algorithms ◽

Hallmark Feature

AbstractAs sequence and structure comparison algorithms gain sensitivity, the intrinsic interconnectedness of the protein universe has become increasingly apparent. Despite this general trend, β-trefoils have emerged as an uncommon counterexample: They are an isolated protein lineage for which few, if any, sequence or structure associations to other lineages have been identified. If β-trefoils are, in fact, remote islands in sequence-structure space, it implies that the oligomerizing peptide that founded the β-trefoil lineage itself arose de novo. To better understand β-trefoil evolution, and to probe the limits of fragment sharing across the protein universe, we identified both ‘β-trefoil bridging themes’ (evolutionarily-related sequence segments) and ‘β-trefoil-like motifs’ (structure motifs with a hallmark feature of the β-trefoil architecture) in multiple, ostensibly unrelated, protein lineages. The success of the present approach stems, in part, from considering β-trefoil sequence segments or structure motifs rather than the β-trefoil architecture as a whole, as has been done previously. The newly uncovered inter-lineage connections presented here suggest a novel hypothesis about the origins of the β-trefoil fold itself – namely, that it is a derived fold formed by ‘budding’ from an Immunoglobulin-like β-sandwich protein. These results demonstrate how the emergence of a folded domain from a peptide need not be a signature of antiquity and underpin an emerging truth: few protein lineages escape nature’s sewing table.

NMR in structural genomics to increase structural coverage of the protein universe

NMR with Biological Macromolecules in Solution ◽

10.1142/9789811235795_0019 ◽

2021 ◽

pp. 143-154

Author(s):

Pedro Serrano ◽

Samit K. Dutta ◽

Andrew Proudfoot ◽

Biswaranjan Mohanty ◽

Lukas Susac ◽

...

Keyword(s):

Structural Genomics ◽

Protein Universe ◽

Structural Coverage

Fine tuned exploration of evolutionary relationships within the protein universe

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2019-0039 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Danilo Gullotto

Keyword(s):

Sequence Similarity ◽

Hierarchical Classification ◽

Domain Size ◽

Protein Motifs ◽

Physical Constraints ◽

Protein Universe ◽

Sequence Profiles ◽

Polypeptide Chains ◽

Hierarchical Features ◽

Protein Space

Abstract In the regime of domain classifications, the protein universe unveils a discrete set of folds connected by hierarchical relationships. Instead, at sub-domain-size resolution and because of physical constraints not necessarily requiring evolution to shape polypeptide chains, networks of protein motifs depict a continuous view that lies beyond the extent of hierarchical classification schemes. A number of studies, however, suggest that universal sub-sequences could be the descendants of peptides emerged in an ancient pre-biotic world. Should this be the case, evolutionary signals retained by structurally conserved motifs, along with hierarchical features of ancient domains, could sew relationships among folds that diverged beyond the point where homology is discernable. In view of the aforementioned, this paper provides a rationale where a network with hierarchical and continuous levels of the protein space, together with sequence profiles that probe the extent of sequence similarity and contacting residues that capture the transition from pre-biotic to domain world, has been used to explore relationships between ancient folds. Statistics of detected signals have been reported. As a result, an example of an emergent sub-network that makes sense from an evolutionary perspective, where conserved signals retrieved from the assessed protein space have been co-opted, has been discussed.

Unsupervised multi-instance learning for protein structure determination

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720021400023 ◽

2021 ◽

Vol 19 (01) ◽

pp. 2140002

Author(s):

Fardina Fathmiul Alam ◽

Amarda Shehu

Keyword(s):

Structure Determination ◽

State Of The Art ◽

Protein Structure Determination ◽

Protein Molecule ◽

Significant Challenge ◽

Protein Molecules ◽

Protein Universe ◽

Three Stages ◽

Determination Methods ◽

Parametric Algorithms

Many regions of the protein universe remain inaccessible by wet-laboratory or computational structure determination methods. A significant challenge in elucidating these dark regions in silico relates to the ability to discriminate relevant structure(s) among many structures/decoys computed for a protein of interest, a problem known as decoy selection. Clustering decoys based on geometric similarity remains popular. However, it is unclear how exactly to exploit the groups of decoys revealed via clustering to select individual structures for prediction. In this paper, we provide an intuitive formulation of the decoy selection problem as an instance of unsupervised multi-instance learning. We address the problem in three stages, first organizing given decoys of a protein molecule into bags, then identifying relevant bags, and finally drawing individual instances from these bags to offer as prediction. We propose both non-parametric and parametric algorithms for drawing individual instances. Our evaluation utilizes two datasets, one benchmark dataset of ensembles of decoys for a varied list of protein molecules, and a dataset of decoy ensembles for targets drawn from recent CASP competitions. A comparative analysis with state-of-the-art methods reveals that the proposed approach outperforms existing methods, thus warranting further investigation of multi-instance learning to advance our treatment of decoy selection.

Compositionally Biased Dark Matter in the Protein Universe

PROTEOMICS ◽

10.1002/pmic.201970134 ◽

2019 ◽

Vol 19 (15) ◽

pp. 1970134

Author(s):

Paul M. Harrison

Keyword(s):

Dark Matter ◽

Protein Universe

Dark Proteome Database: Studies on Dark Proteins

High-Throughput ◽

10.3390/ht8020008 ◽

2019 ◽

Vol 8 (2) ◽

pp. 8 ◽

Cited By ~ 2

Author(s):

Nelson Perdigão ◽

Agostinho Rosa

Keyword(s):

Homology Modeling ◽

3D Structure ◽

Model Organisms ◽

Human Organism ◽

Special Importance ◽

Experimental Characterization ◽

Proteome Database ◽

Protein Universe ◽

Cervical Mucosa ◽

Higher Eukaryotes

The dark proteome, as we define it, is the part of the proteome where 3D structure has not been observed either by homology modeling or by experimental characterization in the protein universe. From the 550.116 proteins available in Swiss-Prot (as of July 2016), 43.2% of the eukarya universe and 49.2% of the virus universe are part of the dark proteome. In bacteria and archaea, the percentage of the dark proteome presence is significantly less, at 12.6% and 13.3% respectively. In this work, we present a necessary step to complete the dark proteome picture by introducing the map of the dark proteome in the human and in other model organisms of special importance to mankind. The most significant result is that around 40% to 50% of the proteome of these organisms are still in the dark, where the higher percentages belong to higher eukaryotes (mouse and human organisms). Due to the amount of darkness present in the human organism being more than 50%, deeper studies were made, including the identification of ‘dark’ genes that are responsible for the production of so-called dark proteins, as well as the identification of the ‘dark’ tissues where dark proteins are over represented, namely, the heart, cervical mucosa, and natural killer cells. This is a step forward in the direction of gaining a deeper knowledge of the human dark proteome.

Dark Proteome Database: Studies on Dark Proteins

10.20944/preprints201901.0198.v1 ◽

2019 ◽

Author(s):

Nelson Perdigão

Keyword(s):

Natural Killer Cells ◽

Homology Modeling ◽

3D Structure ◽

Model Organisms ◽

Human Organism ◽

Experimental Characterization ◽

Proteome Database ◽

Protein Universe ◽

Cervical Mucosa ◽

Higher Eukaryotes

The dark proteome as we define it, is the part of the proteome where 3D structure has not been observed either by homology modeling or by experimental characterization in the protein universe. From the 550.116 proteins available in Swiss-Prot (as of July 2016) 43.2% of the Eukarya universe and 49.2% of the Virus universe are part of the dark proteome. In Bacteria and Archaea, the percentage of the dark proteome presence is significantly less, with 12.6% and 13.3% respectively. In this work, we present the map of the dark proteome in Human and in other model organisms. The most significant result is that around 40%- 50% of the proteome of these organisms are still in the dark, where the higher percentages belong to higher eukaryotes (mouse and human organisms). Due to the amount of darkness present in the human organism being more than 50%, deeper studies were made, including the identification of ‘dark’ genes that are responsible for the production of the so-called dark proteins, as well as, the identification of the ‘dark’ organs where dark proteins are over represented, namely heart, cervical mucosa and natural killer cells. This is a step forward in the direction of the human dark proteome.

Compositionally Biased Dark Matter in the Protein Universe

PROTEOMICS ◽

10.1002/pmic.201800069 ◽

2018 ◽

Vol 18 (21-22) ◽

pp. 1800069 ◽

Cited By ~ 4

Author(s):

Paul M. Harrison

Keyword(s):

Dark Matter ◽

Protein Universe

fLPS: Fast discovery of compositional biases for the protein universe

BMC Bioinformatics ◽

10.1186/s12859-017-1906-3 ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 21

Author(s):

Paul M. Harrison

Keyword(s):

Protein Universe

protein universe
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

fLPS 2.0: rapid annotation of compositionally-biased regions in biological sequences

Evidence for the Emergence of β-Trefoils by ‘Peptide Budding’ from an IgG-like β-Sandwich

NMR in structural genomics to increase structural coverage of the protein universe

Fine tuned exploration of evolutionary relationships within the protein universe

Unsupervised multi-instance learning for protein structure determination

Compositionally Biased Dark Matter in the Protein Universe

Dark Proteome Database: Studies on Dark Proteins

Dark Proteome Database: Studies on Dark Proteins

Compositionally Biased Dark Matter in the Protein Universe

fLPS: Fast discovery of compositional biases for the protein universe

Export Citation Format

protein universeRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

fLPS 2.0: rapid annotation of compositionally-biased regions in biological sequences

Evidence for the Emergence of β-Trefoils by ‘Peptide Budding’ from an IgG-like β-Sandwich

NMR in structural genomics to increase structural coverage of the protein universe

Fine tuned exploration of evolutionary relationships within the protein universe

Unsupervised multi-instance learning for protein structure determination

Compositionally Biased Dark Matter in the Protein Universe

Dark Proteome Database: Studies on Dark Proteins

Dark Proteome Database: Studies on Dark Proteins

Compositionally Biased Dark Matter in the Protein Universe

fLPS: Fast discovery of compositional biases for the protein universe

protein universe
Recently Published Documents