Representation of DNA sequences in genetic codon context with applications in exon and intron prediction

2015 ◽  
Vol 13 (02) ◽  
pp. 1550004 ◽  
Author(s):  
Changchuan Yin

To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.

2019 ◽  
Vol 31 (01) ◽  
pp. 1950002
Author(s):  
Subhajit Kar ◽  
Madhabi Ganguly ◽  
Saptarshi Das

The new research platform on biomedical engineering by Digital Signal Processing (DSP) is playing a vital role in the prediction of protein coding regions (Exons) from genomic sequences with great accuracy. We can determine the protein coding area in DNA sequences with the help of period-3 property. It has been seen that in order to find out the period-3 property, the DFT algorithm is mostly used but in this paper, we have tested FFT algorithm instead of DFT algorithm. DSP is basically concerned with processing numerical sequences. When digital signal processing used in DNA sequences analysis, it requires conversion of base characters sequence to the numerical version. The numerical representation of DNA sequences strongly impacts the biological properties mirrored through the numerical genre. In this work, the proposed technique based on DIT-FFT algorithm has been used to identify the exonic area with the help of integer value representation for transforming the DNA sequences. Digital filters are used to read out period 3 components from the output spectrum and to eliminate the unwanted high frequency noise from DNA sequences. To overcome background noise means to suppress the non-coding regions, i.e., Introns. Proposed algorithm is tested on four nucleotide sequences having single or multiple numbers of exons.


2018 ◽  
Vol 7 (2.17) ◽  
pp. 116 ◽  
Author(s):  
Srinivasareddy Putluri ◽  
Md Zia Ur Rahman

In the field of Bio-informatics, locating the exon fragments in a deoxyribonucleic acid (DNA) sequence is an important and vital work. Study of protein coding regions is a wide phenomenon in identification of diseases and design of drugs. The regions of DNA that have the protein coding information are termed as exons. Hence identifying the exon segments in a genomic sequence is a crucial job in bio-informatics. Three base periodicity (TBP) has been observed in the regions of DNA sequences can be easily determined by applying signal processing methods. Adaptive signal processing techniques found to be useful than other available methods. This is due to their unique capability to alter weight coefficients based on genomic sequence. We propose efficient adaptive exon predictors (AEPs) based on these considerations using Proportionate Normalized LMS (PNLMS) algorithm and Maximum Proportionate Normalized LMS (MPNLMS) algorithm to improve exon locating ability and better convergence. To ease the complexity of computations in the denominator during filtering process, proposed AEPs using PNLMS and its maximum variants are combined with signature algorithms. Hybrid variants of proposed AEPs include PNLMS, DCPNLMS, ECPNLMS, SSPNLMS, MPNLMS, MDCPNLMS, MECPNLMS and MSSPNLMS algorithms. It was shown that the AEP based on MDCPNLMS is superior in applications of exon identification depending on performance measures with Sensitivity 0.7346, Specificity 0.7483 and precision 0.7325 for a genomic sequence with accession AF009962 at a threshold of 0.8. Finally the capability of several AEPs in predicting exon locations is verified using different DNA sequences found in National Center for Biotechnology Information (NCBI) gene database.  


2019 ◽  
Vol 116 (16) ◽  
pp. 8070-8079 ◽  
Author(s):  
Jonathan E. Venetz ◽  
Luca Del Medico ◽  
Alexander Wölfle ◽  
Philipp Schächle ◽  
Yves Bucher ◽  
...  

Understanding how to program biological functions into artificial DNA sequences remains a key challenge in synthetic genomics. Here, we report the chemical synthesis and testing of Caulobacter ethensis-2.0 (C. eth-2.0), a rewritten bacterial genome composed of the most fundamental functions of a bacterial cell. We rebuilt the essential genome of Caulobacter crescentus through the process of chemical synthesis rewriting and studied the genetic information content at the level of its essential genes. Within the 785,701-bp genome, we used sequence rewriting to reduce the number of encoded genetic features from 6,290 to 799. Overall, we introduced 133,313 base substitutions, resulting in the rewriting of 123,562 codons. We tested the biological functionality of the genome design in C. crescentus by transposon mutagenesis. Our analysis revealed that 432 essential genes of C. eth-2.0, corresponding to 81.5% of the design, are equal in functionality to natural genes. These findings suggest that neither changing mRNA structure nor changing the codon context have significant influence on biological functionality of synthetic genomes. Discovery of 98 genes that lost their function identified essential genes with incorrect annotation, including a limited set of 27 genes where we uncovered noncoding control features embedded within protein-coding sequences. In sum, our results highlight the promise of chemical synthesis rewriting to decode fundamental genome functions and its utility toward the design of improved organisms for industrial purposes and health benefits.


Author(s):  
J. S. Ashwin ◽  
N. Manoharan

<p>This paper presents a novel audio de-noising scheme in a given speech signal. The recovery of original from the communication channel without any noise is a difficult task. Many de-noising techniques have been proposed for the removal of noises from a digital signal. In this paper, an audio de-noising technique based on Short Time Fourier Transform (STFT) is implemented. The proposed architecture uses a novel approach to estimate environmental noise from speech adaptively. Here original speech signals are given as input signal. Using AWGN, noises are added to the signal. Then noised signals are de-noised using STFT techniques. Finally Signal to Noise Ratio (SNR), Peak Signal to Noise Ratio (PSNR) values for noised and de-noised signals are obtained.</p>


Author(s):  
Muneer Ahmad

Biologically inspired computational solutions for protein coding regions identification are termed as optimized solutions that could enhance regions of interest in noisy DNA signals contrary to contemporary identification. Exponentially growing genomic data needs better protein translation. The solutions proposed so far rely on statistical, digital signal processing and Fourier transforms approaches lacking the reflection for optimal biologically inspired identification of coding regions. This paper presents a peculiar biologically inspired solution for coding regions identification based on wavelet transforms with notion of a peculiar indicator sequence. DNA signal noise has been reduced considerably and exon peaks can be discriminated from introns significantly. A comparative analysis performed over datasets commonly used for protein coding identification revealed the outperformance of proposed solution in power spectral density estimation graphs and numerical discrimination measure's calculations. The significant results achieved depict 75% reduction in computational complexity than Binary indicator sequence method and 32% to 266% improvement than other methods in literature (as a comparison with standard NCBI range). The significance in results has been achieved by efficiently denosing the target DNA signal employing wavelets and peculiar indicator sequence.


2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
T. M. Inbamalar ◽  
R. Sivakumar

Bioinformatics and genomic signal processing use computational techniques to solve various biological problems. They aim to study the information allied with genetic materials such as the deoxyribonucleic acid (DNA), the ribonucleic acid (RNA), and the proteins. Fast and precise identification of the protein coding regions in DNA sequence is one of the most important tasks in analysis. Existing digital signal processing (DSP) methods provide less accurate and computationally complex solution with greater background noise. Hence, improvements in accuracy, computational complexity, and reduction in background noise are essential in identification of the protein coding regions in the DNA sequences. In this paper, a new DSP based method is introduced to detect the protein coding regions in DNA sequences. Here, the DNA sequences are converted into numeric sequences using electron ion interaction potential (EIIP) representation. Then discrete wavelet transformation is taken. Absolute value of the energy is found followed by proper threshold. The test is conducted using the data bases available in the National Centre for Biotechnology Information (NCBI) site. The comparative analysis is done and it ensures the efficiency of the proposed system.


Cell ◽  
1984 ◽  
Vol 38 (3) ◽  
pp. 667-673 ◽  
Author(s):  
Michael Levine ◽  
Gerald M. Rubin ◽  
Robert Tjian

2011 ◽  
Vol 383-390 ◽  
pp. 471-475
Author(s):  
Yong Bin Hong ◽  
Cheng Fa Xu ◽  
Mei Guo Gao ◽  
Li Zhi Zhao

A radar signal processing system characterizing high instantaneous dynamic range and low system latency is designed based on a specifically developed signal processing platform. Instantaneous dynamic range loss is a critical problem when digital signal processing is performed on fixed-point FPGAs. In this paper, the problem is well resolved by increasing the wordlength according to signal-to-noise ratio (SNR) gain of the algorithms through the data path. The distinctive software structure featuring parallel pipelined processing and “data flow drive” reduces the system latency to one coherent processing interval (CPI), which significantly improves the maximum tracking angular velocity of the monopulse tracking radar. Additionally, some important electronic counter-countermeasures (ECCM) are incorporated into this signal processing system.


1991 ◽  
Vol 11 (1) ◽  
pp. 533-543
Author(s):  
R M Mulligan ◽  
P Leon ◽  
V Walbot

Lysed maize mitochondria synthesize RNA in the presence of radioactive nucleoside triphosphates, and this assay was utilized to compare the rates of transcription of seven genes. The rates of incorporation varied over a 14-fold range, with the following rank order: 18S rRNA greater than 26S rRNA greater than atp1 greater than atp6 greater than atp9 greater than cob greater than cox3. The products of run-on transcription hybridized specifically to known transcribed regions and selectively to the antisense DNA strand; thus, the isolated run-on transcription system appears to be an accurate representation of endogenous transcription. Although there were small differences in gene copy abundance, these differences cannot account for the differences in apparent transcription rates; we conclude that promoter strength is the main determinant. Among the protein coding genes, incorporation was greatest for atp1. The most active transcription initiation site of this gene was characterized by hybridization with in vitro-capped RNA and by primer extension analyses. The DNA sequences at this and other transcription initiation sites that we have previously mapped were analyzed with respect to the apparent promoter strengths. We propose that two short sequence elements just upstream of initiation sites form at least a portion of the sequence requirements for a maize mitochondrial promoter. In addition to modulation at the level of transcription, steady-state abundance of protein-coding mRNAs varied over a 20-fold range and did not correlate with transcriptional activity. These observations suggest that posttranscriptional processes are important in the modulation of mRNA abundance.


Sign in / Sign up

Export Citation Format

Share Document