evolutionary information
Recently Published Documents


TOTAL DOCUMENTS

405
(FIVE YEARS 160)

H-INDEX

40
(FIVE YEARS 8)

2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Danyu Jin ◽  
Ping Zhu

The prediction of protein subcellular localization not only is important for the study of protein structure and function but also can facilitate the design and development of new drugs. In recent years, feature extraction methods based on protein evolution information have attracted much attention and made good progress. Based on the protein position-specific score matrix (PSSM) obtained by PSI-BLAST, PSSM-GSD method is proposed according to the data distribution characteristics. In order to reflect the protein sequence information as much as possible, AAO method, PSSM-AAO method, and PSSM-GSD method are fused together. Then, conditional entropy-based classifier chain algorithm and support vector machine are used to locate multilabel proteins. Finally, we test Gpos-mPLoc and Gneg-mPLoc datasets, considering the severe imbalance of data, and select SMOTE algorithm to expand a few sample; the experiment shows that the AAO + PSSM ∗ method in the paper achieved 83.1% and 86.8% overall accuracy, respectively. After experimental comparison of different methods, AAO + PSSM ∗ has good performance and can effectively predict protein subcellular location.


2021 ◽  
Author(s):  
Angelina Cordone ◽  
Alessandro Coppola ◽  
Angelica Severino ◽  
Monica Correggia ◽  
Matteo Selci ◽  
...  

Comparative genomics is a research field that allows comparison between genomes of different life forms providing information on the organization of the compared genomes, both in terms of structure and encoded functions. Moreover, this approach provides apowerful tool to study and understand the evolutionary changes and adaptation among organisms. Comparative genomics can be used to compare phylogenetically close marine organisms showing different vital strategies and lifestyles and obtain information regarding specific adaptations and/or their evolutionary history. Here we report a basic comparative genomics protocol to extrapolate evolutionary information about a protein of interest conserved across diverse marine microbes. The outlined approach can be used in a number of different settings and might help to gain new insight into the evolution and adaptation of marine microorganisms.


Molecules ◽  
2021 ◽  
Vol 26 (23) ◽  
pp. 7314
Author(s):  
Subash C. Pakhrin ◽  
Kiyoko F. Aoki-Kinoshita ◽  
Doina Caragea ◽  
Dukka B. KC

Protein N-linked glycosylation is a post-translational modification that plays an important role in a myriad of biological processes. Computational prediction approaches serve as complementary methods for the characterization of glycosylation sites. Most of the existing predictors for N-linked glycosylation utilize the information that the glycosylation site occurs at the N-X-[S/T] sequon, where X is any amino acid except proline. Not all N-X-[S/T] sequons are glycosylated, thus the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In that regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem. Here, we report DeepNGlyPred a deep learning-based approach that encodes the positive and negative sequences in the human proteome dataset (extracted from N-GlycositeAtlas) using sequence-based features (gapped-dipeptide), predicted structural features, and evolutionary information. DeepNGlyPred produces SN, SP, MCC, and ACC of 88.62%, 73.92%, 0.60, and 79.41%, respectively on N-GlyDE independent test set, which is better than the compared approaches. These results demonstrate that DeepNGlyPred is a robust computational technique to predict N-Linked glycosylation sites confined to N-X-[S/T] sequon. DeepNGlyPred will be a useful resource for the glycobiology community.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Seema Mishra ◽  
Santosh Kumar ◽  
Kesaban Sankar Roy Choudhuri ◽  
Imliyangla Longkumer ◽  
Praveena Koyyada ◽  
...  

AbstractSTAT3, an important transcription factor constitutively activated in cancers, is bound specifically by GRIM-19 and this interaction inhibits STAT3-dependent gene expression. GRIM-19 is therefore, considered as an inhibitor of STAT3 and may be an effective anti-cancer therapeutic target. While STAT3 exists in a dimeric form in the cytoplasm and nucleus, it is mostly present in a monomeric form in the mitochondria. Although GRIM-19-binding domains of STAT3 have been identified in independent experiments, yet the identified domains are not the same, and hence, discrepancies exist. Human STAT3-GRIM-19 complex has not been crystallised yet. Dictated by fundamental biophysical principles, the binding region, interactions and effects of hotspot mutations can provide us a clue to the negative regulatory mechanisms of GRIM-19. Prompted by the very nature of STAT3 being a challenging molecule, and to understand the structural basis of binding and interactions in STAT3α-GRIM-19 complex, we performed homology modelling and ab-initio modelling with evolutionary information using I-TASSER and avant-garde AlphaFold2, respectively, to generate monomeric, and subsequently, dimeric STAT3α structures. The dimeric form of STAT3α structure was observed to potentially exist in an anti-parallel orientation of monomers. We demonstrate that during the interactions with both unphosphorylated and phosphorylated STAT3α, the NTD of GRIM-19 binds most strongly to the NTD of STAT3α, in direct contrast to the earlier works. Key arginine residues at positions 57, 58 and 68 of GRIM-19 are mainly involved in the hydrogen-bonded interactions. An intriguing feature of these arginine residues is that these display a consistent interaction pattern across unphosphorylated and phosphorylated monomers as well as unphosphorylated dimers in STAT3α-GRIM-19 complexes. MD studies verified the stability of these complexes. Analysing the binding affinity and stability through free energy changes upon mutation, we found GRIM-19 mutations Y33P and Q61L and among GRIM-19 arginines, R68P and R57M, to be one of the top-most major and minor disruptors of binding, respectively. The proportionate increase in average change in binding affinity upon mutation was inclined more towards GRIM-19 mutants, leading to the surmise that GRIM-19 may play a greater role in the complex formation. These studies propound a novel structural perspective of STAT3α-GRIM-19 binding and inhibitory mechanisms in both the monomeric and dimeric forms of STAT3α as compared to that observed from the earlier experiments, these experimental observations being inconsistent among each other.


2021 ◽  
Vol 12 ◽  
Author(s):  
Lauren E. Eldred ◽  
R. Greg Thorn ◽  
David Roy Smith

Simple nucleotide matching identification methods are not as accurate as once thought at identifying environmental fungal sequences. This is largely because of incorrect naming and the underrepresentation of various fungal groups in reference datasets. Here, we explore these issues by examining an environmental metabarcoding dataset of partial large subunit rRNA sequences of Basidiomycota and basal fungi. We employed the simple matching method using the QIIME 2 classifier and the RDP Classifier in conjunction with the latest releases of the SILVA (138.1, 2020) and RDP (11, 2014) reference datasets and then compared the results with a manual phylogenetic binning approach. Of the 71 query sequences tested, 21 and 42% were misidentified using QIIME 2 and the RDP Classifier, respectively. Of these simple matching misidentifications, more than half resulted from the underrepresentation of various groups of fungi in the SILVA and RDP reference datasets. More comprehensive reference datasets with fewer misidentified sequences will increase the accuracy of simple matching identifications. However, we argue that the phylogenetic binning approach is a better alternative to simple matching since, in addition to better accuracy, it provides evolutionary information about query sequences.


2021 ◽  
Author(s):  
Gibran Renoy Pérez-Toledo ◽  
Fabricio Villalobos ◽  
Rogerio R. Silva ◽  
Claudia E. Moreno ◽  
Marcio Pie ◽  
...  

Abstract Despite the long-standing interest in the organization of ant communities across elevational gradients, few studies have incorporated the evolutionary information to understand the historical processes that underlay such patterns. Through the evaluation of phylogenetic α and β-diversity, we analyzed the structure of leaf-litter ant communities along the Cofre de Perote mountain in Mexico and inferred its putative driving forces. Lowland and some highland sites showed phylogenetic clustering, whereas intermediate elevations and the highest site presented phylogenetic overdispersion. We infer that strong environmental constrains found at the bottom and the top elevations are favoring closely-related species to prevail at those elevations. Conversely, more benign conditions at intermediate elevations suggest interspecific interactions being more important in these environments. Total phylogenetic dissimilarity was driven by the turnover component, indicating that the turnover of ant species along the mountain is actually shifts of lineages adapted to particular locations resembling their ancestral niche. The greater phylogenetic dissimilarity between communities was related to greater temperature distances probably due to narrow thermal tolerances inherit to several ant lineages that evolved in more stable conditions. Our results suggest that the interplay between environmental filtering, interspecific competition and habitat specialization plays an important role in the assembly of leaf-litter ant communities along elevational gradients.


2021 ◽  
Author(s):  
Emidio Capriotti ◽  
Piero Fariselli

Abstract Evolutionary information is the primary tool for detecting functional conservation in nucleic acid and protein. This information has been extensively used to predict structure, interactions and functions in macromolecules. Pathogenicity prediction models rely on multiple sequence alignment information at different levels. However, most accurate genome-wide variant deleteriousness ranking algorithms consider different features to assess the impact of variants. Here, we analyze three different ways of extracting evolutionary information from sequence alignments in the context of pathogenicity predictions at DNA and protein levels. We showed that protein sequence-based information is slightly more informative in the annotation of Clinvar missense variants than those obtained at the DNA level. Furthermore, to achieve the performance of state-of-the-art methods, such as CADD, the conservation of reference and variant, encoded as frequencies of reference/alternate alleles or wild-type/mutant residues, should be included. Our results on a large set of missense variants show that a basic method based on three input features derived from the protein sequence profile performs similarly to the CADD algorithm which uses hundreds of genomic features. This observation indicates that for missense variants, evolutionary information, when properly encoded, plays the primary role in ranking pathogenicity.


Molecules ◽  
2021 ◽  
Vol 26 (17) ◽  
pp. 5359
Author(s):  
Jie Pan ◽  
Li-Ping Li ◽  
Zhu-Hong You ◽  
Chang-Qing Yu ◽  
Zhong-Hao Ren ◽  
...  

Identification of drug–target interactions (DTIs) is vital for drug discovery. However, traditional biological approaches have some unavoidable shortcomings, such as being time consuming and expensive. Therefore, there is an urgent need to develop novel and effective computational methods to predict DTIs in order to shorten the development cycles of new drugs. In this study, we present a novel computational approach to identify DTIs, which uses protein sequence information and the dual-tree complex wavelet transform (DTCWT). More specifically, a position-specific scoring matrix (PSSM) was performed on the target protein sequence to obtain its evolutionary information. Then, DTCWT was used to extract representative features from the PSSM, which were then combined with the drug fingerprint features to form the feature descriptors. Finally, these descriptors were sent to the Rotation Forest (RoF) model for classification. A 5-fold cross validation (CV) was adopted on four datasets (Enzyme, Ion Channel, GPCRs (G-protein-coupled receptors), and NRs (Nuclear Receptors)) to validate the proposed model; our method yielded high average accuracies of 89.21%, 85.49%, 81.02%, and 74.44%, respectively. To further verify the performance of our model, we compared the RoF classifier with two state-of-the-art algorithms: the support vector machine (SVM) and the k-nearest neighbor (KNN) classifier. We also compared it with some other published methods. Moreover, the prediction results for the independent dataset further indicated that our method is effective for predicting potential DTIs. Thus, we believe that our method is suitable for facilitating drug discovery and development.


2021 ◽  
Author(s):  
Mateo Gray ◽  
Sean Chester ◽  
Hosna Jabbari

Abstract BackgroundImproving the prediction of structures, especially those containing pseudoknots (structures with crossing base pairs) is an ongoing challenge. Homology-based methods utilize structural similarities within a family to predict the structure. However, their prediction is limited to the consensus structure, and the quality of the alignment. Minimum free energy (MFE) based methods, on the other hand, do not rely on familial information and can predict structures of novel RNA molecules. Their prediction normally suffers from inaccuracies due to their underlying energy parameters. ResultsWe present a new method for prediction of RNA pseudoknotted secondary structures that combines the strengths of MFE prediction and alignment-based methods. KnotAli takes a multiple RNA sequence alignment and uses covariation and thermodynamic energy minimization to predict secondary structures for each individual sequence in the alignment. We compared KnotAli’s performance to that of three other alignment-based programs, on a large data set of 10 families with pseudoknotted and pseudoknot-free reference structures. We produced sequence alignments for each family using two well-known sequence aligners (MUSCLE and MAFFT). We found KnotAli to be superior in 6 of the 10 families for MUSCLE and 7 of the 10 for MAFFT. ConclusionsWe find KnotAli’s predictions to be less dependent on alignment quality. In particular, KnotAli is shown to have more accurate predictions compared to other leading methods as alignment quality deteriorates. KnotAli can be found online on github at https://github.com/mateog4712/KnotAli


Sign in / Sign up

Export Citation Format

Share Document