Flexible information visualization of multivariate data from biological sequence similarity searches

Author(s):  
E.H.-H. Chi ◽  
J. Riedl ◽  
E. Shoop ◽  
J.V. Carlis ◽  
E. Retzel ◽  
...  
2012 ◽  
Vol 13 (Suppl 4) ◽  
pp. S2 ◽  
Author(s):  
Emanuele Bramucci ◽  
Alessandro Paiardini ◽  
Francesco Bossa ◽  
Stefano Pascarella

Author(s):  
Dan Wei ◽  
Qingshan Jiang ◽  
Sheng Li

Similarity analysis of DNA sequences is a fundamental research area in Bioinformatics. The characteristic distribution of L-tuple, which is the tuple of length L, reflects the valuable information contained in a biological sequence and thus may be used in DNA sequence similarity analysis. However, similarity analysis based on characteristic distribution of L-tuple is not effective for the comparison of highly conservative sequences. In this paper, a new similarity measurement approach based on Triplets of Nucleic Acid Bases (TNAB) is introduced for DNA sequence similarity analysis. The new approach characterizes both the content feature and position feature of a DNA sequence using the frequency and position of occurrence of TNAB in the sequence. The experimental results show that the approach based on TNAB is effective for analysing DNA sequence similarity.


2020 ◽  
Vol 40 (11) ◽  
Author(s):  
Kevin J. McNaught ◽  
Elizabeth T. Wiles ◽  
Eric U. Selker

ABSTRACT Polycomb repressive complex 2 (PRC2) catalyzes methylation of histone H3 at lysine 27 (H3K27) in genomic regions of most eukaryotes and is critical for maintenance of the associated transcriptional repression. However, the mechanisms that shape the distribution of H3K27 methylation, such as recruitment of PRC2 to chromatin and/or stimulation of PRC2 activity, are unclear. Here, using a forward genetic approach in the model organism Neurospora crassa, we identified two alleles of a gene, NCU04278, encoding an unknown PRC2 accessory subunit (PAS). Loss of PAS resulted in losses of H3K27 methylation concentrated near the chromosome ends and derepression of a subset of associated subtelomeric genes. Immunoprecipitation followed by mass spectrometry confirmed reciprocal interactions between PAS and known PRC2 subunits, and sequence similarity searches demonstrated that PAS is not unique to N. crassa. PAS homologs likely influence the distribution of H3K27 methylation and underlying gene repression in a variety of fungal lineages.


2005 ◽  
Vol 71 (10) ◽  
pp. 6104-6114 ◽  
Author(s):  
D. J. Koch ◽  
C. Rückert ◽  
D. A. Rey ◽  
A. Mix ◽  
A. Pühler ◽  
...  

ABSTRACT Corynebacterium glutamicum ATCC 13032 was found to be able to utilize a broad range of sulfonates and sulfonate esters as sulfur sources. The two gene clusters potentially involved in sulfonate utilization, ssuD1CBA and ssuI-seuABC-ssuD2, were identified in the genome of C. glutamicum ATCC 13032 by similarity searches. While the ssu genes encode proteins resembling Ssu proteins from Escherichia coli or Bacillus subtilis, the seu gene products exhibited similarity to the dibenzothiophene-degrading Dsz monooxygenases of Rhodococcus strain IGTS8. Growth tests with the C. glutamicum wild-type and appropriate mutant strains showed that the clustered genes ssuC, ssuB, and ssuA, putatively encoding the components of an ABC-type transporter system, are required for the utilization of aliphatic sulfonates. In C. glutamicum sulfonates are apparently degraded by sulfonatases encoded by ssuD1 and ssuD2. It was also found that the seu genes seuA, seuB, and seuC can effectively replace ssuD1 and ssuD2 for the degradation of sulfonate esters. The utilization of all sulfonates and sulfonate esters tested is dependent on a novel putative reductase encoded by ssuI. Obviously, all monooxygenases encoded by the ssu and seu genes, including SsuD1, SsuD2, SeuA, SeuB, and SeuC, which are reduced flavin mononucleotide dependent according to sequence similarity, have SsuI as an essential component. Using real-time reverse transcription-PCR, the ssu and seu gene cluster was found to be expressed considerably more strongly during growth on sulfonates and sulfonate esters than during growth on sulfate.


2021 ◽  
Author(s):  
Yoonjin Kim ◽  
Zhen Guo ◽  
Jeffrey A. Robertson ◽  
Benjamin Reidys ◽  
Ziyan Zhang ◽  
...  

Biological sequence alignment using computational power has received increasing attention as technology develops. It is important to predict if a novel DNA sequence is potentially dangerous by determining its taxonomic identity and functional characteristics through sequence identification. This task can be facilitated by the rapidly increasing amounts of biological data in DNA and protein databases thanks to the corresponding increase in computational and storage costs. Unfortunately, the growth in biological databases has caused difficulty in exploiting this information. EnTrance presents an approach that can expedite the analysis of this large database by employing entropy scaling. This allows scaling with the amount of entropy in the database instead of scaling with the absolute size of the database. Since DNA and protein sequences are biologically meaningful, the space of biological sequences demonstrates the structure exploited by entropy scaling. As biological sequence databases grow, taking advantage of this structure can be extremely beneficial for reducing query times. EnTrance, the entropy scaling search algorithm introduced here, accelerates the biological sequence search exemplified by tools such as BLAST. EnTrance does this by utilizing a two step search approach. In this fashion, EnTrance quickly reduces the number of potential matches before more exhaustively searching the remaining sequences. Tests of EnTrance show that this approach can lead to improved query times. However, constructing the required entropy scaling indices beforehand can be challenging. To improve performance, EnTrance investigates several ideas for accelerating index build time that supports entropy scaling searches. In particular, EnTrance makes full use of the concurrency features of Go language greatly reducing the index build time. Our results identify key tradeoffs and demonstrate that there is potential in using these techniques for sequence similarity searches. Finally, EnTrance returns more matches and higher percentage identity matches when compared with existing tools.


Genes ◽  
2019 ◽  
Vol 10 (11) ◽  
pp. 885 ◽  
Author(s):  
Tomii ◽  
Santos ◽  
Nozaki

Tetraspanins are membrane proteins involved in intra- and/or intercellular signaling, and membrane protein complex formation. In some organisms, their role is associated with virulence and pathogenesis. Here, we investigate known and potential tetraspanins in the human intestinal protozoan parasite Entamoeba histolytica. We conducted sequence similarity searches against the proteome data of E. histolytica and newly identified nine uncharacterized proteins as potential tetraspanins in E. histolytica. We found three subgroups within known and potential tetraspanins, as well as subgroup-associated features in both their amino acid and nucleotide sequences. We also examined the subcellular localization of a few representative tetraspanins that might be potentially related to pathogenicity. The results in this study could be useful resources for further understanding and downstream analyses of tetraspanins in Entamoeba.


Sign in / Sign up

Export Citation Format

Share Document