scholarly journals Feature Extraction In Gene Expression Dataset Using Multilayer Perceptron

Author(s):  
Nageswara Rao Eluri, Et. al.

Numerous amount of gene expression datasets that are publicly available have accumulated since decades. It is hence essential to recognize and extract the instances in terms of quantitative and qualitative means.In this study, Keras is utilized to model the multilayer perceptron (MLP) to extract the features from the given input gene expression dataset. The MLP extracts the features from the test datasets after its initial training with the top extracted features from the training classifiers. Finally with the top extracted features, the MLP is fine tuned to extract optimal features from the gene expression datasets namely Gene Expression database of Normal and Tumor tissues 2 (GENT2). The experimental results shows that the proposed model achieves better feature selection than other methods in terms of accuracy, f-measure, precision and recall.

2021 ◽  
pp. 100572
Author(s):  
Malek Alzaqebah ◽  
Khaoula Briki ◽  
Nashat Alrefai ◽  
Sami Brini ◽  
Sana Jawarneh ◽  
...  

2011 ◽  
Vol 10 ◽  
pp. CIN.S7226 ◽  
Author(s):  
Gwangsik Shin ◽  
Tae-Wook Kang ◽  
Sungjin Yang ◽  
Su-Jin Baek ◽  
Yong-Su Jeong ◽  
...  

Background Some oncogenes such as ERBB2 and EGFR are over-expressed in only a subset of patients. Cancer outlier profile analysis is one of computational approaches to identify outliers in gene expression data. A database with a large sample size would be a great advantage when searching for genes over-expressed in only a subset of patients. Description GENT (Gene Expression database of Normal and Tumor tissues) is a web-accessible database that provides gene expression patterns across diverse human cancer and normal tissues. More than 40000 samples, profiled by Affymetrix U133A or U133plus2 platforms in many different laboratories across the world, were collected from public resources and combined into two large data sets, helping the identification of cancer outliers that are over-expressed in only a subset of patients. Gene expression patterns in nearly 1000 human cancer cell lines are also provided. In each tissue, users can retrieve gene expression patterns classified by more detailed clinical information. Conclusions The large samples size (>24300 for U133plus2 and >16400 for U133A) of GENT provides an advantage in identifying cancer outliers. A cancer cell line gene expression database is useful for target validation by in vitro experiment. We hope GENT will be a useful resource for cancer researchers in many stages from target discovery to target validation. GENT is available at http://medicalgenome.kribb.re.kr/GENT/ or http://genome.kobic.re.kr/GENT/ .


2019 ◽  
Vol 12 (S5) ◽  
Author(s):  
Seung-Jin Park ◽  
Byoung-Ha Yoon ◽  
Seon-Kyu Kim ◽  
Seon-Young Kim

A microarray gene expression data is an efficient dataset for analyzing expression of thousands of genes and related disease. The more accurate analysis can be obtained by comparing Gene expression of disease tissues with normal tissues which helps to recognize the type of cancer. The processing of microarray datasets such as feature selection, sampling and classification is highly challenged due to its high dimensionality. Many recent researchers used various feature selection techniques for dimensionality reduction. Dragonfly optimization Algorithm (DA) was a feature selection technique used to reduce the dimensionality of lung cancer gene expression dataset. The dragonflies in DA are flying randomly based on the model developed by using the Levy Flight Mechanism (LFM). Because of huge searching steps, LFM has some drawbacks like interruption of arbitrary flights and overflowing of the search area. In fact, DA lacks an internal resemblance that record past potential solutions that can lead to its premature convergence into local optima. So, in this paper an Improved Dragonfly optimization Algorithm (IDA) is introduced which effectively reduces the dimensionality of the lung cancer gene expression dataset. In IDA, Brownian motion method is used to solve the issues of LFM and pbest and gbest idea of Particle Swarm Optimization (PSO) is used to direct the search method for finding potential candidate solutions to further refine the search space for avoiding premature convergence. The wrapper feature selection approach is followed by IDA to select optimal subset of features. The Random Sub space (RS), Artificial Neural Network (ANN) and Sequential Minimal Optimization (SMO) classifiers are utilized for feature selection of IDA and recognize Lung cancer subtypes. The accuracy of the classifier for selected features of Dragon flies in training instances is used as fitness value of Dragon flies in each iteration. Finally, the experimental results prove the effectiveness of the IDA in terms of accuracy, precision, recall and F-measure.


2019 ◽  
Vol 21 (9) ◽  
pp. 631-645 ◽  
Author(s):  
Saeed Ahmed ◽  
Muhammad Kabir ◽  
Zakir Ali ◽  
Muhammad Arif ◽  
Farman Ali ◽  
...  

Aim and Objective: Cancer is a dangerous disease worldwide, caused by somatic mutations in the genome. Diagnosis of this deadly disease at an early stage is exceptionally new clinical application of microarray data. In DNA microarray technology, gene expression data have a high dimension with small sample size. Therefore, the development of efficient and robust feature selection methods is indispensable that identify a small set of genes to achieve better classification performance. Materials and Methods: In this study, we developed a hybrid feature selection method that integrates correlation-based feature selection (CFS) and Multi-Objective Evolutionary Algorithm (MOEA) approaches which select the highly informative genes. The hybrid model with Redial base function neural network (RBFNN) classifier has been evaluated on 11 benchmark gene expression datasets by employing a 10-fold cross-validation test. Results: The experimental results are compared with seven conventional-based feature selection and other methods in the literature, which shows that our approach owned the obvious merits in the aspect of classification accuracy ratio and some genes selected by extensive comparing with other methods. Conclusion: Our proposed CFS-MOEA algorithm attained up to 100% classification accuracy for six out of eleven datasets with a minimal sized predictive gene subset.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Joe W. Chen ◽  
Joseph Dhahbi

AbstractLung cancer is one of the deadliest cancers in the world. Two of the most common subtypes, lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC), have drastically different biological signatures, yet they are often treated similarly and classified together as non-small cell lung cancer (NSCLC). LUAD and LUSC biomarkers are scarce, and their distinct biological mechanisms have yet to be elucidated. To detect biologically relevant markers, many studies have attempted to improve traditional machine learning algorithms or develop novel algorithms for biomarker discovery. However, few have used overlapping machine learning or feature selection methods for cancer classification, biomarker identification, or gene expression analysis. This study proposes to use overlapping traditional feature selection or feature reduction techniques for cancer classification and biomarker discovery. The genes selected by the overlapping method were then verified using random forest. The classification statistics of the overlapping method were compared to those of the traditional feature selection methods. The identified biomarkers were validated in an external dataset using AUC and ROC analysis. Gene expression analysis was then performed to further investigate biological differences between LUAD and LUSC. Overall, our method achieved classification results comparable to, if not better than, the traditional algorithms. It also identified multiple known biomarkers, and five potentially novel biomarkers with high discriminating values between LUAD and LUSC. Many of the biomarkers also exhibit significant prognostic potential, particularly in LUAD. Our study also unraveled distinct biological pathways between LUAD and LUSC.


Sign in / Sign up

Export Citation Format

Share Document