scholarly journals Pro-Frame: similarity-based gene recognition in eukaryotic DNA sequences with errors

2001 ◽  
Vol 17 (1) ◽  
pp. 13-15 ◽  
Author(s):  
A. A. Mironov ◽  
P. S. Novichkov ◽  
M. S. Gelfand
2001 ◽  
Vol 17 (11) ◽  
pp. 1011-1018 ◽  
Author(s):  
P. S. Novichkov ◽  
M. S. Gelfand ◽  
A. A. Mironov

2014 ◽  
Vol 2014 ◽  
pp. 1-14 ◽  
Author(s):  
Guangchen Liu ◽  
Yihui Luan

The identification of protein coding regions (exons) plays a critical role in eukaryotic gene structure prediction. Many techniques have been introduced for discriminating between the exons and the introns in the eukaryotic DNA sequences, such as the discrete Fourier transform (DFT) based techniques, but these DFT-based methods rapidly lose their effectiveness in the case of short DNA sequences. In this paper, a novel integrated algorithm based on autoregressive spectrum analysis and wavelet packets transform is presented to improve the efficiency and accuracy of the coding regions identification. The experimental results show that the new algorithm outperforms the conventional DFT-based approaches in improving the prediction accuracy of protein coding regions distinctly by testing GENSCAN65, HMR195, and BG570 benchmark datasets.


2021 ◽  
Vol 12 ◽  
Author(s):  
Kai Song

Metagenomes can be considered as mixtures of viral, bacterial, and other eukaryotic DNA sequences. Mining viral sequences from metagenomes could shed insight into virus–host relationships and expand viral databases. Current alignment-based methods are unsuitable for identifying viral sequences from metagenome sequences because most assembled metagenomic contigs are short and possess few or no predicted genes, and most metagenomic viral genes are dissimilar to known viral genes. In this study, I developed a Markov model-based method, VirMC, to identify viral sequences from metagenomic data. VirMC uses Markov chains to model sequence signatures and construct a scoring model using a likelihood test to distinguish viral and bacterial sequences. Compared with the other two state-of-the-art viral sequence-prediction methods, VirFinder and PPR-Meta, my proposed method outperformed VirFinder and had similar performance with PPR-Meta for short contigs with length less than 400 bp. VirMC outperformed VirFinder and PPR-Meta for identifying viral sequences in contaminated metagenomic samples with eukaryotic sequences. VirMC showed better performance in assembling viral-genome sequences from metagenomic data (based on filtering potential bacterial reads). Applying VirMC to human gut metagenomes from healthy subjects and patients with type-2 diabetes (T2D) revealed that viral contigs could help classify healthy and diseased statuses. This alignment-free method complements gene-based alignment approaches and will significantly improve the precision of viral sequence identification.


Fractals ◽  
1995 ◽  
Vol 03 (02) ◽  
pp. 269-284 ◽  
Author(s):  
S. HAVLIN ◽  
S.V. BULDYREV ◽  
A.L. GOLDBERGER ◽  
R.N. MANTEGNA ◽  
C.-K. PENG ◽  
...  

We present evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range—indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the “non-stationarity” feature of the sequence of base pairs by applying a new algorithm called Detrended Fluctuation Analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and noncoding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to all eukaryotic DNA sequences (33 301 coding and 29 453 noncoding) in the entire GenBank database. We describe a simple model to account for the presence of long-range power-law correlations which is based upon a generalization of the classic Lévy walk. Finally, we describe briefly some recent work showing that the noncoding sequences have certain statistical features in common with natural languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the “redundancy” of a linguistic text in terms of a measurable entropy function. We suggest that noncoding regions in plants and invertebrates may display a smaller entropy and larger redundancy than coding regions, further supporting the possibility that noncoding regions of DNA may carry biological information.


2016 ◽  
Author(s):  
Rachel E. Diner ◽  
Chari M. Noddings ◽  
Nathan C. Lian ◽  
Anthony K. Kang ◽  
Jeffrey B. McQuaid ◽  
...  

AbstractCentromeres are essential for cell division and growth in all eukaryotes, and knowledge of their sequence and structure guides the development of artificial chromosomes for functional cellular biology studies. Centromeric proteins are conserved among eukaryotes; however, centromeric DNA sequences are highly variable. We combined forward and reverse genetic approaches with chromatin immunoprecipitation to identify centromeres of the model diatom Phaeodactylum tricornutum. Diatom centromere sequences contain low GC content regions and an abundance of long contiguous AT windows, but lack repeats or other conserved sequence features. Native and foreign sequences of similar GC content can maintain episomes and recruit the diatom centromeric histone protein CENP-A, suggesting non-native sequences can also function as diatom centromeres. Thus, simple sequence requirements enable DNA from foreign sources to incorporate into the nuclear genome repertoire as stable extra-chromosomal episomes, revealing a potential mechanism for bacterial and foreign eukaryotic DNA acquisition.


Author(s):  
R.W. DAVIS ◽  
M. THOMAS ◽  
D. BENTON ◽  
J. CAMERON ◽  
P. PHILIPPSEN ◽  
...  

1980 ◽  
Vol 77 (8) ◽  
pp. 4852-4856 ◽  
Author(s):  
N. Hsiung ◽  
H. Warrick ◽  
J. K. deRiel ◽  
D. Tuan ◽  
B. G. Forget ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document