USING DIT-FFT ALGORITHM FOR IDENTIFICATION OF PROTEIN CODING REGION IN EUKARYOTIC GENE

2019 ◽  
Vol 31 (01) ◽  
pp. 1950002
Author(s):  
Subhajit Kar ◽  
Madhabi Ganguly ◽  
Saptarshi Das

The new research platform on biomedical engineering by Digital Signal Processing (DSP) is playing a vital role in the prediction of protein coding regions (Exons) from genomic sequences with great accuracy. We can determine the protein coding area in DNA sequences with the help of period-3 property. It has been seen that in order to find out the period-3 property, the DFT algorithm is mostly used but in this paper, we have tested FFT algorithm instead of DFT algorithm. DSP is basically concerned with processing numerical sequences. When digital signal processing used in DNA sequences analysis, it requires conversion of base characters sequence to the numerical version. The numerical representation of DNA sequences strongly impacts the biological properties mirrored through the numerical genre. In this work, the proposed technique based on DIT-FFT algorithm has been used to identify the exonic area with the help of integer value representation for transforming the DNA sequences. Digital filters are used to read out period 3 components from the output spectrum and to eliminate the unwanted high frequency noise from DNA sequences. To overcome background noise means to suppress the non-coding regions, i.e., Introns. Proposed algorithm is tested on four nucleotide sequences having single or multiple numbers of exons.

2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
T. M. Inbamalar ◽  
R. Sivakumar

Bioinformatics and genomic signal processing use computational techniques to solve various biological problems. They aim to study the information allied with genetic materials such as the deoxyribonucleic acid (DNA), the ribonucleic acid (RNA), and the proteins. Fast and precise identification of the protein coding regions in DNA sequence is one of the most important tasks in analysis. Existing digital signal processing (DSP) methods provide less accurate and computationally complex solution with greater background noise. Hence, improvements in accuracy, computational complexity, and reduction in background noise are essential in identification of the protein coding regions in the DNA sequences. In this paper, a new DSP based method is introduced to detect the protein coding regions in DNA sequences. Here, the DNA sequences are converted into numeric sequences using electron ion interaction potential (EIIP) representation. Then discrete wavelet transformation is taken. Absolute value of the energy is found followed by proper threshold. The test is conducted using the data bases available in the National Centre for Biotechnology Information (NCBI) site. The comparative analysis is done and it ensures the efficiency of the proposed system.


2017 ◽  
Author(s):  
Anuj Kumar ◽  
Aditi Chauhan ◽  
Mansi Sharma ◽  
Sai Kumar Kompelli ◽  
Vijay Gahlaut ◽  
...  

AbstractSimple Sequence Repeats (SSRs), also known as microsatellites are short tandem repeats of DNA sequences that are 1-6 bp long. In plants, SSRs serve as a source of important class of molecular markers because of their hypervariabile and co-dominant nature, making them useful both for the genetic studies and marker-assisted breeding. The SSRs are widespread throughout the genome of an organism, so that a large number of SSR datasets are available, most of them from either protein-coding regions or untranslated regions. It is only recently, that their occurrence within microRNAs (miRNA) genes has received attention. As is widely known, miRNA themselves are a class of non-coding RNAs (ncRNAs) with varying length of 19-22 nucleotides (nts), which play an important role in regulating gene expression in plants under different biotic and abiotic stresses. In this communication, we describe the results of a study, where miRNA-SSRs in full length pre-miRNA sequences of Arabidopsis thaliana were mined. The sequences were retrieved by annotations available at EnsemblPlants using BatchPrimer3 server with miRNA-SSR flanking primers found to be well distributed. Our analysis shows that miRNA-SSRs are relatively rare in protein-coding regions but abundant in non-coding region. All the observed 147 di-, tri-, tetra-, penta- and hexanucleotide SSRs were located in non-coding regions of all the 5 chromosomes of A. thaliana. While we confirm that miRNA-SSRs were commonly spread across the full length pre-miRNAs, we envisage that such studies would allow us to identify newly discovered markers for breeding studies.


2015 ◽  
Vol 13 (02) ◽  
pp. 1550004 ◽  
Author(s):  
Changchuan Yin

To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.


2021 ◽  
Vol 2021 (1) ◽  
Author(s):  
Pritish Kumar Varadwaj ◽  
◽  
Neetesh Purohit ◽  
Tapobrata Lahiri ◽  
V.E. Antisiperov ◽  
...  

More than ninety percent of genes in Homo Sapience are reported to exist as discontinuous segments of coding regions called as ‘Exons’ and are separated by intervening non-coding regions, called “Introns”. During the splicing mechanism, the non-coding regions got removed and coding regions are joined together for producing the precursor messenger RNA. The site of these Exon-Intron splicing is called Splice Site. The anomalies caused due to genetic mutation in spice site during the processing of precursor m-RNA into mature m-RNA causes several genetic diseases like Cancer, Dementia, Epilepsy, Hematological Disorders, Parathyroid Deficiency etc. It is estimated that as many as 50% of disease-causing mutations affect splicing. The present invention describes the design of digital signal processing-based approach to detect these Splicing Site. A successful identification of the splice site will help in finding the mutations hence can be used as an inference tool for predicting genetic disease.


Author(s):  
Muneer Ahmad

Biologically inspired computational solutions for protein coding regions identification are termed as optimized solutions that could enhance regions of interest in noisy DNA signals contrary to contemporary identification. Exponentially growing genomic data needs better protein translation. The solutions proposed so far rely on statistical, digital signal processing and Fourier transforms approaches lacking the reflection for optimal biologically inspired identification of coding regions. This paper presents a peculiar biologically inspired solution for coding regions identification based on wavelet transforms with notion of a peculiar indicator sequence. DNA signal noise has been reduced considerably and exon peaks can be discriminated from introns significantly. A comparative analysis performed over datasets commonly used for protein coding identification revealed the outperformance of proposed solution in power spectral density estimation graphs and numerical discrimination measure's calculations. The significant results achieved depict 75% reduction in computational complexity than Binary indicator sequence method and 32% to 266% improvement than other methods in literature (as a comparison with standard NCBI range). The significance in results has been achieved by efficiently denosing the target DNA signal employing wavelets and peculiar indicator sequence.


Sign in / Sign up

Export Citation Format

Share Document