scholarly journals About the Protein Space Vastness

2020 ◽  
Vol 39 (5) ◽  
pp. 472-475
Author(s):  
Jorge A. Vila
Keyword(s):  
2021 ◽  
Vol 22 (15) ◽  
pp. 7773
Author(s):  
Neann Mathai ◽  
Conrad Stork ◽  
Johannes Kirchmair

Experimental screening of large sets of compounds against macromolecular targets is a key strategy to identify novel bioactivities. However, large-scale screening requires substantial experimental resources and is time-consuming and challenging. Therefore, small to medium-sized compound libraries with a high chance of producing genuine hits on an arbitrary protein of interest would be of great value to fields related to early drug discovery, in particular biochemical and cell research. Here, we present a computational approach that incorporates drug-likeness, predicted bioactivities, biological space coverage, and target novelty, to generate optimized compound libraries with maximized chances of producing genuine hits for a wide range of proteins. The computational approach evaluates drug-likeness with a set of established rules, predicts bioactivities with a validated, similarity-based approach, and optimizes the composition of small sets of compounds towards maximum target coverage and novelty. We found that, in comparison to the random selection of compounds for a library, our approach generates substantially improved compound sets. Quantified as the “fitness” of compound libraries, the calculated improvements ranged from +60% (for a library of 15,000 compounds) to +184% (for a library of 1000 compounds). The best of the optimized compound libraries prepared in this work are available for download as a dataset bundle (“BonMOLière”).


2013 ◽  
Vol 318 ◽  
pp. 197-204 ◽  
Author(s):  
Chenglong Yu ◽  
Mo Deng ◽  
Shiu-Yuen Cheng ◽  
Shek-Chung Yau ◽  
Rong L. He ◽  
...  

2013 ◽  
Vol 320 ◽  
pp. 152-158 ◽  
Author(s):  
Danilo Gullotto ◽  
Mario Salvatore Nolassi ◽  
Andrea Bernini ◽  
Ottavia Spiga ◽  
Neri Niccolai
Keyword(s):  

2007 ◽  
Vol 2 (1) ◽  
Author(s):  
Mihaly Mezei ◽  
Ming-Ming Zhou
Keyword(s):  

2020 ◽  
Author(s):  
Kentaro Tomii ◽  
Shravan Kumar ◽  
Degui Zhi ◽  
Steven E. Brenner

AbstractBackgroundInsertion and deletion sequencing errors are relatively common in next-generation sequencing data and produce long stretches of mistranslated sequence. These frameshifting errors can cause very serious damages to downstream data analysis of reads. However, it is possible to obtain more precise alignment of DNA sequences by taking into account both coding frame and sequencing errors estimated by quality scores.ResultsHere we designed and proposed a novel hidden Markov model (HMM)-based pairwise alignment algorithm, Meta-Align, that aligns DNA sequences in the protein space, incorporating quality scores from the DNA sequences and allowing frameshifts caused by insertions and deletions. Our model is based on both an HMM transducer of a pair HMM and profile HMMs for all possible amino acid pairs. A Viterbi algorithm over our model produces the optimal alignment of a pair of metagenomic reads taking into account all possible translating frames and gap penalties in both the protein space and the DNA space. To reduce the sheer number of states of this model, we also derived and implemented a computationally feasible model, leveraging the degeneracy of the genetic code. In a benchmark test on a diverse set of simulated reads based on BAliBASE we show that Meta-Align outperforms TBLASTX which compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database using the BLAST algorithm. We also demonstrate the effects of incorporating quality scores on Meta-Align.ConclusionsMeta-Align will be particularly effective when applied to error-prone DNA sequences. The package of our software can be downloaded at https://github.com/shravan-repos/Metaalign.


2012 ◽  
Vol 9 (1) ◽  
pp. 286-293 ◽  
Author(s):  
Noah Daniels ◽  
Anoop Kumar ◽  
Lenore Cowen ◽  
Matt Menke
Keyword(s):  

2003 ◽  
Vol 4 (5) ◽  
pp. 542-548
Author(s):  
Michal Linial

Structural genomics strives to represent the entire protein space. The first step towards achieving this goal is by rationally selecting proteins whose structures have not been determined, but that represent an as yet unknown structural superfamily or fold. Once such a structure is solved, it can be used as a template for modelling homologous proteins. This will aid in unveiling the structural diversity of the protein space. Currently, no reliable method for accurate 3D structural prediction is available when a sequence or a structure homologue is not available. Here we present a systematic methodology for selecting target proteins whose structure is likely to adopt a new, as yet unknown superfamily or fold. Our method takes advantage of a global classification of the sequence space as presented by ProtoNet-3D, which is a hierarchical agglomerative clustering of the proteins of interest (the proteins in Swiss-Prot) along with all solved structures (taken from the PDB). By navigating in the scaffold of ProtoNet-3D, we yield a prioritized list of proteins that are not yet structurally solved, along with the probability of each of the proteins belonging to a new superfamily or fold. The sorted list has been self-validated against real structural data that was not available when the predictions were made. The practical application of using our computational–statistical method to determine novel superfamilies for structural genomics projects is also discussed.


2006 ◽  
Vol 39 (2) ◽  
pp. 185-193 ◽  
Author(s):  
Rocco Caliandro ◽  
Benedetta Carrozzini ◽  
Giovanni Luca Cascarano ◽  
Liberato De Caro ◽  
Carmelo Giacovazzo ◽  
...  

A new program for molecular replacement,REMO, has been written. In the rotation step, the orientation of the model molecule is found by rotating the weighted reciprocal lattice of the protein with respect to the calculated transform of the model structure: the fitting is searched in the reciprocal space. The space group of the model structure is assumed to be the symmorphic variant of the protein space group. The algebra necessary to optimize the correlation factor between protein and model structure-factor moduli is described. The oriented model molecule is located by using the correlation function coupled with a translation function calculated by fast Fourier transforms.REMOhas been successfully applied to a variety of test problems and extensively compared with other currently available molecular replacement programs.


Sign in / Sign up

Export Citation Format

Share Document