scholarly journals A Construction Method of Gene Expression Data Based on Information Gain and Extreme Learning Machine Classifier on Cloud Platform

2014 ◽  
Vol 7 (2) ◽  
pp. 99-108 ◽  
Author(s):  
Sha-Sha Wei ◽  
Hui-Juan Lu ◽  
Wei Jin ◽  
Chao Li
2014 ◽  
Vol 36 (2) ◽  
pp. 341-348 ◽  
Author(s):  
Hui-Juan LU ◽  
Chun-Lin AN ◽  
Xiao-Ping MA ◽  
En-Hui ZHENG ◽  
Xiao-Bing YANG

2019 ◽  
Vol 20 (S25) ◽  
Author(s):  
Huijuan Lu ◽  
Yige Xu ◽  
Minchao Ye ◽  
Ke Yan ◽  
Zhigang Gao ◽  
...  

Abstract Background Cost-sensitive algorithm is an effective strategy to solve imbalanced classification problem. However, the misclassification costs are usually determined empirically based on user expertise, which leads to unstable performance of cost-sensitive classification. Therefore, an efficient and accurate method is needed to calculate the optimal cost weights. Results In this paper, two approaches are proposed to search for the optimal cost weights, targeting at the highest weighted classification accuracy (WCA). One is the optimal cost weights grid searching and the other is the function fitting. Comparisons are made between these between the two algorithms above. In experiments, we classify imbalanced gene expression data using extreme learning machine to test the cost weights obtained by the two approaches. Conclusions Comprehensive experimental results show that the function fitting method is generally more efficient, which can well find the optimal cost weights with acceptable WCA.


Author(s):  
Mekour Norreddine

One of the problems that gene expression data resolved is feature selection. There is an important process for choosing which features are important for prediction; there are two general approaches for feature selection: filter approach and wrapper approach. In this chapter, the authors combine the filter approach with method ranked information gain and wrapper approach with a searching method of the genetic algorithm. The authors evaluate their approach on two data sets of gene expression data: Leukemia, and the Central Nervous System. The classifier Decision tree (C4.5) is used for improving the classification performance.


2016 ◽  
Vol 2016 ◽  
pp. 1-9 ◽  
Author(s):  
Yanqiu Liu ◽  
Huijuan Lu ◽  
Ke Yan ◽  
Haixia Xia ◽  
Chunlin An

Embedding cost-sensitive factors into the classifiers increases the classification stability and reduces the classification costs for classifying high-scale, redundant, and imbalanced datasets, such as the gene expression data. In this study, we extend our previous work, that is, Dissimilar ELM (D-ELM), by introducing misclassification costs into the classifier. We name the proposed algorithm as the cost-sensitive D-ELM (CS-D-ELM). Furthermore, we embed rejection cost into the CS-D-ELM to increase the classification stability of the proposed algorithm. Experimental results show that the rejection cost embedded CS-D-ELM algorithm effectively reduces the average and overall cost of the classification process, while the classification accuracy still remains competitive. The proposed method can be extended to classification problems of other redundant and imbalanced data.


2019 ◽  
Vol 20 (S9) ◽  
Author(s):  
Damiano Verda ◽  
Stefano Parodi ◽  
Enrico Ferrari ◽  
Marco Muselli

Abstract Background Logic Learning Machine (LLM) is an innovative method of supervised analysis capable of constructing models based on simple and intelligible rules. In this investigation the performance of LLM in classifying patients with cancer was evaluated using a set of eight publicly available gene expression databases for cancer diagnosis. LLM accuracy was assessed by summary ROC curve (sROC) analysis and estimated by the area under an sROC curve (sAUC). Its performance was compared in cross validation with that of standard supervised methods, namely: decision tree, artificial neural network, support vector machine (SVM) and k-nearest neighbor classifier. Results LLM showed an excellent accuracy (sAUC = 0.99, 95%CI: 0.98–1.0) and outperformed any other method except SVM. Conclusions LLM is a new powerful tool for the analysis of gene expression data for cancer diagnosis. Simple rules generated by LLM could contribute to a better understanding of cancer biology, potentially addressing therapeutic approaches.


Author(s):  
Phayung Meesad ◽  
Sageemas Na Wichian ◽  
Unger Herwig ◽  
Patharawut Saengsiri

This Gene expression data illustrates levels of genes that DNA encodes into the protein such as muscle or brain cells. However, some abnormal cells may evolve from unnatural expression levels. Therefore, finding a subset of informative gene would be beneficial to biologists because it can identify discriminative genes. Unfortunately, genes grow up rapidlyinto the tens of thousands gene which make it difficult for classifying processes such as curse of dimensionality and misclassification problems. This paper proposed classification model based-on incremental learning algorithm and feature selection on gene expression data. Three feature selection methods: Correlation based Feature Selection (Cfs), Gain Ratio (GR), and Information Gain (Info) combined with Incremental Learning Algorithm based-on Mahalanobis Distance (ILM). Result of the experiment represented proposed models CfsILM, GRILM and InfoILM not only to reduce many dimensions from 2001, 7130 and 4026 into 26, 135, and 135 that save time-resource but also to improve accuracy rate 64.52%, 34.29%, and 8.33% into 90%, 97.14%, and 83.33% respectively. Particularly, CfsILM is more outstanding than other models on three public gene expression datasets.


Sign in / Sign up

Export Citation Format

Share Document