WEIGHTED NEIGHBORHOOD CLASSIFIER FOR THE CLASSIFICATION OF IMBALANCED TUMOR DATASET

2010 ◽  
Vol 19 (01) ◽  
pp. 259-273 ◽  
Author(s):  
SHU-LIN WANG ◽  
XUELING LI ◽  
JUN-FENG XIA ◽  
Xiao-Ping Zhang

Machine learning is widely applied to gene expression profiles based molecular tumor classification, but sample imbalance problem is often overlooked. This paper proposed a subclass-weighted neighborhood classifier to address the imbalanced sample set problem and a novel neighborhood rough set model to select informative genes for classification performance improvement. Experiments on three publicly available tumor datasets demonstrated that the proposed method is obviously effective on imbalanced dataset with obscure boundary between two subtypes and informative gene selection and it can achieve higher cross-validation accuracy with much fewer tumor-related genes.

2021 ◽  
Vol 16 ◽  
Author(s):  
Yueling Xiong ◽  
Qingqing Li ◽  
Peipei Wang ◽  
Mingquan Ye

Background: Informative gene selection is an essential step in performing tumor classification. However, it is difficult to select informative genes related to tumors from large-scale gene expression profiles because of their characteristics, such as high dimensionality, relatively small samples, and class imbalance, and some genes being superfluous and irrelevant. Objective: Many researchers analyze and process gene expression data to obtain classified gene subsets by using machine learning methods. However, the gene expression profiles of tumors often have massive computational challenges. In addition, when improving feature importance and classification accuracy, cost estimation is often ignored in traditional feature selection algorithms, which makes tumor classification more difficult. Method: In this study, a novel informative gene selection method based on cost-sensitive fast correlation-based feature selection (CS-FCBF) is proposed. Results: First, the symmetric uncertainty index is used to evaluate the correlation between informative genes and class labels, and then a large number of irrelevant and redundant genes are quickly filtered according to importance. Thereby, a candidate gene subset is generated. Second, cost-sensitive learning, which introduces the misclassification cost matrix and support vector machine attribute evaluation, is used to obtain the top-ranked gene subset with minimum misclassification loss. Finally, the candidate gene subset is optimized. Conclusion: This experiment was verified in eight independent tumor datasets. By comparing and analyzing CS-FCBF with another three hybrids of typical gene selection algorithms combined with cost-sensitive learning, we found that the method proposed in this study exhibited a better classification performance with fewer selected genes, which might provide guidance in tumor diagnosis and research.


Author(s):  
Edward C. Emery ◽  
Patrik Ernfors

Primary sensory neurons of the dorsal root ganglion (DRG) respond and relay sensations that are felt, such as those for touch, pain, temperature, itch, and more. The ability to discriminate between the various types of stimuli is reflected by the existence of specialized DRG neurons tuned to respond to specific stimuli. Because of this, a comprehensive classification of DRG neurons is critical for determining exactly how somatosensation works and for providing insights into cell types involved during chronic pain. This article reviews the recent advances in unbiased classification of molecular types of DRG neurons in the perspective of known functions as well as predicted functions based on gene expression profiles. The data show that sensory neurons are organized in a basal structure of three cold-sensitive neuron types, five mechano-heat sensitive nociceptor types, four A-Low threshold mechanoreceptor types, five itch-mechano-heat–sensitive nociceptor types and a single C–low-threshold mechanoreceptor type with a strong relation between molecular neuron types and functional types. As a general feature, each neuron type displays a unique and predicable response profile; at the same time, most neuron types convey multiple modalities and intensities. Therefore, sensation is likely determined by the summation of ensembles of active primary afferent types. The new classification scheme will be instructive in determining the exact cellular and molecular mechanisms underlying somatosensation, facilitating the development of rational strategies to identify causes for chronic pain.


2010 ◽  
Vol 9 ◽  
pp. CIN.S3794 ◽  
Author(s):  
Xiaosheng Wang ◽  
Osamu Gotoh

Gene selection is of vital importance in molecular classification of cancer using high-dimensional gene expression data. Because of the distinct characteristics inherent to specific cancerous gene expression profiles, developing flexible and robust feature selection methods is extremely crucial. We investigated the properties of one feature selection approach proposed in our previous work, which was the generalization of the feature selection method based on the depended degree of attribute in rough sets. We compared the feature selection method with the established methods: the depended degree, chi-square, information gain, Relief-F and symmetric uncertainty, and analyzed its properties through a series of classification experiments. The results revealed that our method was superior to the canonical depended degree of attribute based method in robustness and applicability. Moreover, the method was comparable to the other four commonly used methods. More importantly, the method can exhibit the inherent classification difficulty with respect to different gene expression datasets, indicating the inherent biology of specific cancers.


2018 ◽  
Vol 8 (9) ◽  
pp. 1569 ◽  
Author(s):  
Shengbing Wu ◽  
Hongkun Jiang ◽  
Haiwei Shen ◽  
Ziyi Yang

In recent years, gene selection for cancer classification based on the expression of a small number of gene biomarkers has been the subject of much research in genetics and molecular biology. The successful identification of gene biomarkers will help in the classification of different types of cancer and improve the prediction accuracy. Recently, regularized logistic regression using the L 1 regularization has been successfully applied in high-dimensional cancer classification to tackle both the estimation of gene coefficients and the simultaneous performance of gene selection. However, the L 1 has a biased gene selection and dose not have the oracle property. To address these problems, we investigate L 1 / 2 regularized logistic regression for gene selection in cancer classification. Experimental results on three DNA microarray datasets demonstrate that our proposed method outperforms other commonly used sparse methods ( L 1 and L E N ) in terms of classification performance.


Sign in / Sign up

Export Citation Format

Share Document