scholarly journals PyMod: sequence similarity searches, multiple sequence-structure alignments, and homology modeling within PyMOL

2012 ◽  
Vol 13 (Suppl 4) ◽  
pp. S2 ◽  
Author(s):  
Emanuele Bramucci ◽  
Alessandro Paiardini ◽  
Francesco Bossa ◽  
Stefano Pascarella

The ever increasing number of sequences in protein databases usually turns out large numbers of homologs in sequence similarity searches. While information from homology can be very useful for functional prediction based on amino acid conservation, many of these homologs usually have high levels of identity among themselves, which hinders multiple sequence alignment (MSA) computation and, especially, visualization. More generally, high redundancy reduces the usability of a protein set in machine learning applications and biases statistical analyses. We developed an algorithm to identify redundant sequence homologs that can be culled producing a streamlined FASTA file. As a difference from other automatic approaches that only aggregate sequences with high identity, our method clusters near-full length homologs allowing for lower sequence identity thresholds. Our method was fully tested and implemented in a web application called FASTA Herder, publicly available at http://www.ogic.ca/projects/fh/orain.html .


Author(s):  
Miguel Andrade ◽  
Caroline Louis-Jeune ◽  
Carol Perez-Iratxeta

Abstract The ever increasing number of sequences in protein databases usually turns out large numbers of homologs in sequence similarity searches. While information from homology can be very useful for functional prediction based on amino acid conservation, many of these homologs usually have high levels of identity among themselves, which hinders multiple sequence alignment computation and, especially, visualization. More generally, high redundancy reduces the usability of a protein set in machine learning applications and biases statistical analyses. We developed an algorithm to identify redundant sequence homologs that can be culled producing a streamlined FASTA file. As a difference from other automatic approaches that only aggregate sequences with high identity, our method clusters near-full length homologs allowing for lower sequence identity thresholds. Our method was fully tested and implemented in a web application called FASTA Herder, publicly available at http://fh.ogic.ca/.


2019 ◽  
Vol 15 (4) ◽  
pp. 353-362
Author(s):  
Sambhaji B. Thakar ◽  
Maruti J. Dhanavade ◽  
Kailas D. Sonawane

Background: Legume plants are known for their rich medicinal and nutritional values. Large amount of medicinal information of various legume plants have been dispersed in the form of text. Objective: It is essential to design and construct a legume medicinal plants database, which integrate respective classes of legumes and include knowledge regarding medicinal applications along with their protein/enzyme sequences. Methods: The design and development of Legume Medicinal Plants Database (LegumeDB) has been done by using Microsoft Structure Query Language Server 2017. DBMS was used as back end and ASP.Net was used to lay out front end operations. VB.Net was used as arranged program for coding. Multiple sequence alignment, phylogenetic analysis and homology modeling techniques were also used. Results: This database includes information of 50 Legume medicinal species, which might be helpful to explore the information for researchers. Further, maturase K (matK) protein sequences of legumes and mangroves were retrieved from NCBI for multiple sequence alignment and phylogenetic analysis to understand evolutionary lineage between legumes and mangroves. Homology modeling technique was used to determine three-dimensional structure of matK from Legume species i.e. Vigna unguiculata using matK of mangrove species, Thespesia populnea as a template. The matK sequence analysis results indicate the conserved residues among legume and mangrove species. Conclusion: Phylogenetic analysis revealed closeness between legume species Vigna unguiculata and mangrove species Thespesia populnea to each other, indicating their similarity and origin from common ancestor. Thus, these studies might be helpful to understand evolutionary relationship between legumes and mangroves. : LegumeDB availability: http://legumedatabase.co.in


Author(s):  
Sona. S Dev ◽  
P. Poornima ◽  
Akhil Venu

Eggplantor brinjal (Solanum melongena L.), is highly susceptible to various soil-borne diseases. The extensive use of chemical fungicides to combat these diseases can be minimized by identification of resistance gene analogs (RGAs) in wild species of cultivated plants.In the present study, degenerate PCR primers for the conserved regions ofnucleotide binding site-leucine rich repeat (NBS-LRR) were used to amplify RGAs from wild relatives of eggplant (Black nightshade (Solanum nigrum), Indian nightshade (Solanumviolaceum)and Solanu mincanum) which showed resistance to the bacterial wilt pathogen, Ralstonia solanacearumin the preliminary investigation. The amino acid sequence of the amplicons when compared to each other and to the amino acid sequences of known RGAs deposited in Gen Bank revealed significant sequence similarity. The phylogenetic analysis indicated that they belonged to the toll interleukin-1 receptors (TIR)-NBS-LRR type R-genes. Multiple sequence alignment with other known R genes showed significant homology with P-loop, Kinase 2 and GLPL domains of NBS-LRR class genes. There has been no report on R genes from these wild eggplants and hence the diversity analysis of these novel RGAs can lead to the identification of other novel R genes within the germplasm of different brinjal plants as well as other species of Solanum.


2020 ◽  
Vol 6 (10) ◽  
Author(s):  
Ao Li ◽  
Elisabeth Laville ◽  
Laurence Tarquis ◽  
Vincent Lombard ◽  
David Ropartz ◽  
...  

Mannoside phosphorylases are involved in the intracellular metabolization of mannooligosaccharides, and are also useful enzymes for the in vitro synthesis of oligosaccharides. They are found in glycoside hydrolase family GH130. Here we report on an analysis of 6308 GH130 sequences, including 4714 from the human, bovine, porcine and murine microbiomes. Using sequence similarity networks, we divided the diversity of sequences into 15 mostly isofunctional meta-nodes; of these, 9 contained no experimentally characterized member. By examining the multiple sequence alignments in each meta-node, we predicted the determinants of the phosphorolytic mechanism and linkage specificity. We thus hypothesized that eight uncharacterized meta-nodes would be phosphorylases. These sequences are characterized by the absence of signal peptides and of the catalytic base. Those sequences with the conserved E/K, E/R and Y/R pairs of residues involved in substrate binding would target β-1,2-, β-1,3- and β-1,4-linked mannosyl residues, respectively. These predictions were tested by characterizing members of three of the uncharacterized meta-nodes from gut bacteria. We discovered the first known β-1,4-mannosyl-glucuronic acid phosphorylase, which targets a motif of the Shigella lipopolysaccharide O-antigen. This work uncovers a reliable strategy for the discovery of novel mannoside-phosphorylases, reveals possible interactions between gut bacteria, and identifies a biotechnological tool for the synthesis of antigenic oligosaccharides.


2020 ◽  
Vol 49 (D1) ◽  
pp. D192-D200 ◽  
Author(s):  
Ioanna Kalvari ◽  
Eric P Nawrocki ◽  
Nancy Ontiveros-Palacios ◽  
Joanna Argasinska ◽  
Kevin Lamkiewicz ◽  
...  

Abstract Rfam is a database of RNA families where each of the 3444 families is represented by a multiple sequence alignment of known RNA sequences and a covariance model that can be used to search for additional members of the family. Recent developments have involved expert collaborations to improve the quality and coverage of Rfam data, focusing on microRNAs, viral and bacterial RNAs. We have completed the first phase of synchronising microRNA families in Rfam and miRBase, creating 356 new Rfam families and updating 40. We established a procedure for comprehensive annotation of viral RNA families starting with Flavivirus and Coronaviridae RNAs. We have also increased the coverage of bacterial and metagenome-based RNA families from the ZWD database. These developments have enabled a significant growth of the database, with the addition of 759 new families in Rfam 14. To facilitate further community contribution to Rfam, expert users are now able to build and submit new families using the newly developed Rfam Cloud family curation system. New Rfam website features include a new sequence similarity search powered by RNAcentral, as well as search and visualisation of families with pseudoknots. Rfam is freely available at https://rfam.org.


2020 ◽  
Vol 40 (11) ◽  
Author(s):  
Kevin J. McNaught ◽  
Elizabeth T. Wiles ◽  
Eric U. Selker

ABSTRACT Polycomb repressive complex 2 (PRC2) catalyzes methylation of histone H3 at lysine 27 (H3K27) in genomic regions of most eukaryotes and is critical for maintenance of the associated transcriptional repression. However, the mechanisms that shape the distribution of H3K27 methylation, such as recruitment of PRC2 to chromatin and/or stimulation of PRC2 activity, are unclear. Here, using a forward genetic approach in the model organism Neurospora crassa, we identified two alleles of a gene, NCU04278, encoding an unknown PRC2 accessory subunit (PAS). Loss of PAS resulted in losses of H3K27 methylation concentrated near the chromosome ends and derepression of a subset of associated subtelomeric genes. Immunoprecipitation followed by mass spectrometry confirmed reciprocal interactions between PAS and known PRC2 subunits, and sequence similarity searches demonstrated that PAS is not unique to N. crassa. PAS homologs likely influence the distribution of H3K27 methylation and underlying gene repression in a variety of fungal lineages.


Entropy ◽  
2019 ◽  
Vol 21 (8) ◽  
pp. 764 ◽  
Author(s):  
Eshel Faraggi ◽  
A. Keith Dunker ◽  
Robert L. Jernigan ◽  
Andrzej Kloczkowski

Entropy should directly reflect the extent of disorder in proteins. By clustering structurally related proteins and studying the multiple-sequence-alignment of the sequences of these clusters, we were able to link between sequence, structure, and disorder information. We introduced several parameters as measures of fluctuations at a given MSA site and used these as representative of the sequence and structure entropy at that site. In general, we found a tendency for negative correlations between disorder and structure, and significant positive correlations between disorder and the fluctuations in the system. We also found evidence for residue-type conservation for those residues proximate to potentially disordered sites. Mutation at the disorder site itself appear to be allowed. In addition, we found positive correlation for disorder and accessible surface area, validating that disordered residues occur in exposed regions of proteins. Finally, we also found that fluctuations in the dihedral angles at the original mutated residue and disorder are positively correlated while dihedral angle fluctuations in spatially proximal residues are negatively correlated with disorder. Our results seem to indicate permissible variability in the disordered site, but greater rigidity in the parts of the protein with which the disordered site interacts. This is another indication that disordered residues are involved in protein function.


Author(s):  
RA Begum ◽  
MT Alam ◽  
H Jahan ◽  
MS Alam

Labeo calbasu (Family Cyprinidae) was studied at DNA level to know genetic diversity within and between species. The mitochondrial cytochrome b (cyt-b) gene of L. calbasu was sequenced and compared to the corresponding sequences of other Labeo species. DNA was isolated from the tissue sample of L. calbasu using phenol: chloroform extraction method. Forward and reverse primers were designed to amplify the target region of cytochrome b gene. A standard PCR protocol was used for the amplification of the desired region. Then, the forward and reverse sequences obtained were aligned and edited to finalize a length of 510 nucleotides which was submitted to NCBI genbank database. Nucleotide BLAST of this sequence at NCBI resulted 100% sequence similarity with L. calbasu sequence of the same region of cyt-b gene. Multiple sequence alignment of the sequence with seven more Labeo species sequences revealed 120 polymorphic sites, which have been mark of diversity among the species and might be used in molecular identification of the Labeo species. A constructed phylogenetic tree has shown relationship among the Labeo species. This research demonstrated the usefulness of mitochondrial DNA-based approach in species identification. Further, the data will provide appropriate background for studying genetic diversity within-species of the Labeo species in general and of L. calbasu in particular. J. Biodivers. Conserv. Bioresour. Manag. 2019, 5(1): 25-30


2020 ◽  
Vol 21 (S6) ◽  
Author(s):  
Sriram P. Chockalingam ◽  
Jodh Pannu ◽  
Sahar Hooshmand ◽  
Sharma V. Thankachan ◽  
Srinivas Aluru

Abstract Background Alignment-free methods for sequence comparisons have become popular in many bioinformatics applications, specifically in the estimation of sequence similarity measures to construct phylogenetic trees. Recently, the average common substring measure, ACS, and its k-mismatch counterpart, ACSk, have been shown to produce results as effective as multiple-sequence alignment based methods for reconstruction of phylogeny trees. Since computing ACSk takes O(n logkn) time and hence impractical for large datasets, multiple heuristics that can approximate ACSk have been introduced. Results In this paper, we present a novel linear-time heuristic to approximate ACSk, which is faster than computing the exact ACSk while being closer to the exact ACSk values compared to previously published linear-time greedy heuristics. Using four real datasets, containing both DNA and protein sequences, we evaluate our algorithm in terms of accuracy, runtime and demonstrate its applicability for phylogeny reconstruction. Our algorithm provides better accuracy than previously published heuristic methods, while being comparable in its applications to phylogeny reconstruction. Conclusions Our method produces a better approximation for ACSk and is applicable for the alignment-free comparison of biological sequences at highly competitive speed. The algorithm is implemented in Rust programming language and the source code is available at https://github.com/srirampc/adyar-rs.


Sign in / Sign up

Export Citation Format

Share Document