genomic signal processing
Recently Published Documents


TOTAL DOCUMENTS

77
(FIVE YEARS 13)

H-INDEX

10
(FIVE YEARS 3)

2022 ◽  
Vol 9 (1) ◽  
Author(s):  
Emmanuel Adetiba ◽  
Joshua A. Abolarinwa ◽  
Anthony A. Adegoke ◽  
Tunmike B. Taiwo ◽  
Oluwaseun T. Ajayi ◽  
...  

Author(s):  
P Kamala Kumari ◽  
Joseph Beatrice Seventline

Mutated genes are one of the prominent factors in origination and spread of cancer disease. Here we have used Genomic signal processing methods to identify the patterns that differentiate cancer and non-cancerous genes. Furthermore, Deep learning algorithms were used to model a system that automatically predicts the cancer gene. Unlike the existing methods, two feature extraction modules are deployed to extract six attributes. Power Spectral Density based module was used to extract statistical parameters like Mean, Median, Standard deviation, Mean Deviation and Median Deviation. Adaptive Functional Link Network (AFLN) based filter module was used to extract Normalized Mean Square Error (NMSE). The uniqueness of this paper is identification of six input features that differentiates cancer genes. In this work artificial neural network is developed to predict cancer genes. Comparison is done on three sets of datasets with 6 attributes, 5 attributes and one attribute. We performed all the training and testing on the Tensorflow using the Keras library in Python using Google Colab. The developed approach proved its efficiency with 6 attributes attaining an accuracy of 98% for 150 epochs. The ANN model was also compared with existing work and attained a 10 fold cross validation accuracy of 96.26% with an increase of 1.2%.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Safaa M. Naeem ◽  
Mai S. Mabrouk ◽  
Mohamed A. Eldosoky ◽  
Ahmed Y. Sayed

Abstract Background Disorders in deoxyribonucleic acid (DNA) mutations are the common cause of colon cancer. Detection of these mutations is the first step in colon cancer diagnosis. Differentiation among normal and cancerous colon gene sequences is a method used for mutation identification. Early detection of this type of disease can avoid complications that can lead to death. In this study, 55 healthy and 55 cancerous genes for colon cells obtained from the national center for biotechnology information GenBank are used. After applying the electron–ion interaction pseudopotential (EIIP) numbering representation method for the sequences, single-level discrete wavelet transform (DWT) is applied using Haar wavelet. Then, some statistical features are obtained from the wavelet domain. These features are mean, variance, standard deviation, autocorrelation, entropy, skewness, and kurtosis. The resulting values are applied to the k-nearest neighbor (KNN) and support vector machine (SVM) algorithms to obtain satisfactory classification results. Results Four important parameters are calculated to evaluate the performance of the classifiers. Accuracy (ACC), F1 score, and Matthews correlation coefficient (MCC) are 95%, 94.74%, and 0.9045%, respectively, for SVM and 97.5%, 97.44%, and 0.9512%, respectively, for KNN. Conclusion This study has created a novel successful system for colorectal cancer classification and detection with the well-satisfied results. The K-nearest network results are the best with low error for the generated classification system, even though the results of the SVM network are acceptable.


2020 ◽  
Author(s):  
Robson P. Bonidia ◽  
Danilo S. Sanches ◽  
André C.P.L.F. de Carvalho

AbstractMachine learning algorithms have been very successfully applied to extract new and relevant knowledge from biological sequences. However, the predictive performance of these algorithms is largely affected by how the sequences are represented. Thereby, the main challenge is how to numerically represent a biological sequence in a numeric vector with an efficient mathematical expression. Several feature extraction techniques have been proposed for biological sequences, where most of them are available in feature extraction packages. However, there are relevant approaches that are not available in existing packages, techniques based on mathematical descriptors, e.g., Fourier, entropy, and graphs. Therefore, this paper presents a new package, named MathFeature, which implements mathematical descriptors able to extract relevant information from biological sequences. MathFeature provides 20 approaches based on several studies found in the literature, e.g., multiple numeric mappings, genomic signal processing, chaos game theory, entropy, and complex networks. MathFeature also allows the extraction of alternative features, complementing the existing packages.Availability and implementationMathFeature is freely available at https://bonidia.github.io/MathFeature/ or https://github.com/Bonidia/[email protected], [email protected]


Author(s):  
Safaa M Naeem ◽  
Mai S Mabrouk ◽  
Samir Y Marzouk ◽  
Mohamed A Eldosoky

Abstract Coronavirus Disease 2019 (COVID-19) is a sudden viral contagion that appeared at the end of last year in Wuhan city, the Chinese province of Hubei, China. The fast spread of COVID-19 has led to a dangerous threat to worldwide health. Also in the last two decades, several viral epidemics have been listed like the severe acute respiratory syndrome coronavirus (SARS-CoV) in 2002/2003, the influenza H1N1 in 2009 and recently the Middle East respiratory syndrome coronavirus (MERS-CoV) which appeared in Saudi Arabia in 2012. In this research, an automated system is created to differentiate between the COVID-19, SARS-CoV and MERS-CoV epidemics by using their genomic sequences recorded in the NCBI GenBank in order to facilitate the diagnosis process and increase the accuracy of disease detection in less time. The selected database contains 76 genes for each epidemic. Then, some features are extracted like a discrete Fourier transform (DFT), discrete cosine transform (DCT) and the seven moment invariants to two different classifiers. These classifiers are the k-nearest neighbor (KNN) algorithm and the trainable cascade-forward back propagation neural network where they give satisfying results to compare. To evaluate the performance of classifiers, there are some effective parameters calculated. They are accuracy (ACC), F1 score, error rate and Matthews correlation coefficient (MCC) that are 100%, 100%, 0 and 1, respectively, for the KNN algorithm and 98.89%, 98.34%, 0.0111 and 0.9754, respectively, for the cascade-forward network.


2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
J. Alejandro Morales ◽  
Román Saldaña ◽  
Manuel H. Santana-Castolo ◽  
Carlos E. Torres-Cerna ◽  
Ernesto Borrayo ◽  
...  

Genomic signal processing (GSP) is based on the use of digital signal processing methods for the analysis of genomic data. Convolutional neural networks (CNN) are the state-of-the-art machine learning classifiers that have been widely applied to solve complex problems successfully. In this paper, we present a deep learning architecture and a method for the classification of three different functional genome types: coding regions (CDS), long noncoding regions (LNC), and pseudogenes (PSD) in genomic data, based on the use of GSP methods to convert the nucleotide sequence into a graphical representation of the information contained in it. The obtained accuracy scores of 83% and 84% when classifying between CDS vs. LNC and CDS vs. PSD, respectively, indicate the feasibility of employing this methodology for the classification of these types of sequences. The model was not able to differentiate from PSD and LNC. Our results indicate the feasibility of employing CNN with GSP for the classification of these types of DNA data.


2019 ◽  
Vol 98 ◽  
pp. 233-237 ◽  
Author(s):  
Dong-Wei Liu ◽  
Run-Ping Jia ◽  
Cai-Feng Wang ◽  
N. Arunkumar ◽  
K. Narasimhan ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document