scholarly journals Research on Complex Classification Algorithm of Breast Cancer Chip Based on SVM-RFE Gene Feature Screening

Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Guobin Chen ◽  
Xianzhong Xie ◽  
Shijin Li

Screening and classification of characteristic genes is a complex classification problem, and the characteristic sequences of gene expression show high-dimensional characteristics. How to select an effective gene screening algorithm is the main problem to be solved by analyzing gene chips. The combination of KNN, SVM, and SVM-RFE is selected to screen complex classification problems, and a new method to solve complex classification problems is provided. In the process of gene chip pretreatment, LogFC and P value equivalents in the gene expression matrix are screened, and different gene features are screened, and then SVM-RFE algorithm is used to sort and screen genes. Firstly, the characteristics of gene chips are analyzed and the number between probes and genes is counted. Clustering analysis among each sample and PCA classification analysis of different samples are carried out. Secondly, the basic algorithms of SVM and KNN are tested, and the important indexes such as error rate and accuracy rate of the algorithms are tested to obtain the optimal parameters. Finally, the performance indexes of accuracy, precision, recall, and F1 of several complex classification algorithms are compared through the complex classification of SVM, KNN, KNN-PCA, SVM-PCA, SVM-RFE-SVM, and SVM-RFE-KNN at P=0. 01,0.05,0.001. SVM-RFE-SVM has the best classification effect and can be used as a gene chip classification algorithm to analyze the characteristics of genes.

2009 ◽  
Vol 2009 ◽  
pp. 1-6 ◽  
Author(s):  
Xiyi Hang ◽  
Fang-Xiang Wu

Personalized drug design requires the classification of cancer patients as accurate as possible. With advances in genome sequencing and microarray technology, a large amount of gene expression data has been and will continuously be produced from various cancerous patients. Such cancer-alerted gene expression data allows us to classify tumors at the genomewide level. However, cancer-alerted gene expression datasets typically have much more number of genes (features) than that of samples (patients), which imposes a challenge for classification of tumors. In this paper, a new method is proposed for cancer diagnosis using gene expression data by casting the classification problem as finding sparse representations of test samples with respect to training samples. The sparse representation is computed by thel1-regularized least square method. To investigate its performance, the proposed method is applied to six tumor gene expression datasets and compared with various support vector machine (SVM) methods. The experimental results have shown that the performance of the proposed method is comparable with or better than those of SVMs. In addition, the proposed method is more efficient than SVMs as it has no need of model selection.


2001 ◽  
Vol 7 (3) ◽  
pp. 361-375 ◽  
Author(s):  
John D. Clemens ◽  
Su Gao ◽  
Alexander S. Kechris

§ 1. Introduction. In this communication we present some recent results on the classification of Polish metric spaces up to isometry and on the isometry groups of Polish metric spaces. A Polish metric space is a complete separable metric space (X, d).Our first goal is to determine the exact complexity of the classification problem of general Polish metric spaces up to isometry. This work was motivated by a paper of Vershik [1998], where he remarks (in the beginning of Section 2): “The classification of Polish spaces up to isometry is an enormous task. More precisely, this classification is not ‘smooth’ in the modern terminology.” Our Theorem 2.1 below quantifies precisely the enormity of this task.After doing this, we turn to special classes of Polish metric spaces and investigate the classification problems associated with them. Note that these classification problems are in principle no more complicated than the general one above. However, the determination of their exact complexity is not necessarily easier.The investigation of the classification problems naturally leads to some interesting results on the groups of isometries of Polish metric spaces. We shall also present these results below.The rest of this section is devoted to an introduction of some basic ideas of a theory of complexity for classification problems, which will help to put our results in perspective. Detailed expositions of this general theory can be found, e.g., in Hjorth [2000], Kechris [1999], [2001].


Author(s):  
Cheng-San Yang ◽  
◽  
Li-Yeh Chuang ◽  
Chao-Hsuan Ke ◽  
Cheng-Hong Yang ◽  
...  

Microarray data referencing to gene expression profiles provides valuable answers to a variety of problems, and contributes to advances in clinical medicine. The application of microarray data to the classification of cancer types has recently assumed increasing importance. The classification of microarray data samples involves feature selection, whose goal is to identify subsets of differentially expressed gene potentially relevant for distinguishing sample classes and classifier design. We propose an efficient evolutionary approach for selecting gene subsets from gene expression data that effectively achieves higher accuracy for classification problems. Our proposal combines a shuffled frog-leaping algorithm (SFLA) and a genetic algorithm (GA), and chooses genes (features) related to classification. The K-nearest neighbor (KNN) with leave-one-out cross validation (LOOCV) is used to evaluate classification accuracy. We apply a novel hybrid approach based on SFLA-GA and KNN classification and compare 11 classification problems from the literature. Experimental results show that classification accuracy obtained using selected features was higher than the accuracy of datasets without feature selection.


Author(s):  
GEORGE RIGOPOULOS ◽  
DIMITRIOS TH. ASKOUNIS ◽  
KONSTANTINOS METAXIOTIS

This paper presents NexClass, a Decision Support System (DSS) which supports classification of alternatives into predefined non-ordered categories according to their performance on evaluation criteria and implements a novel classification algorithm based on multicriteria analysis and outranking relations. More detailed, assignment to classes is based on the concept of non-exclusivity, which defines at what degree an alternative can be included in a specific category. For each category, a threshold is defined by the decision maker, which indicates its limit with respect to the evaluation criteria. Alternatives are next evaluated according to the criteria and non-excluding degrees are calculated for each category. Finally, an alternative is assigned to the category for which non-excluding degree gets the lowest value. NexClass DSS implements the above classification algorithm, providing a user-friendly interface, which supports decision makers to formulate and solve classification problems. In addition to the methodology and the DSS, we present a real world application at a classification problem in banking environment. Our findings derived from evaluation experiments in the banking environment provide valid evidence that the proposed methodology and the DSS effectively support decision makers in classification decisions.


2014 ◽  
Vol 2014 ◽  
pp. 1-5 ◽  
Author(s):  
James A. Koziol ◽  
Eng M. Tan ◽  
Liping Dai ◽  
Pengfei Ren ◽  
Jian-Ying Zhang

Multiple antigen miniarrays can provide accurate tools for cancer detection and diagnosis. These miniarrays can be validated by examining their operating characteristics in classifying individuals as either cancer patients or normal (non-cancer) subjects. We describe the use of restricted Boltzmann machines for this classification problem, relative to diagnosis of hepatocellular carcinoma. In this setting, we find that its operating characteristics are similar to a logistic regression standard and suggest that restricted Boltzmann machines merit further consideration for classification problems.


2020 ◽  
Author(s):  
Aixiang Jiang ◽  
Laura K. Hilton ◽  
Jeffrey Tang ◽  
Christopher K. Rushton ◽  
Bruno M. Grande ◽  
...  

AbstractBinary classification using gene expression data is commonly used to stratify cancers into molecular subgroups that may have distinct prognoses and therapeutic options. A limitation of many such methods is the requirement for comparable training and testing data sets. Here, we describe and demonstrate a self-training implementation of probability ratio-based classification prediction score (PRPS-ST) that facilitates the porting of existing classification models to other gene expression data sets. We demonstrate its robustness through application to two binary classification problems in diffuse large B-cell lymphoma using a diverse variety of gene expression data types and normalization methods.


Author(s):  
Kun Wu ◽  
Jianshe Kang ◽  
Kuo Chi

In view of the problems in traditional fault diagnosis method, such as small samples and nonlinear relations, a fault diagnosis method based on improved multi-class classification algorithm and relevance vector machine (RVM) is proposed in the paper. Through improving the majority-vote strategy of traditional One-Against-One (OAO) algorithm and combining the features of OAO and One-Against-Rest (OAR) algorithms, the k-class classification problem is transformed into k(k-1)/2 three-class classification problems based on the proposed majority-vote strategy of double-layer and thereby an improved multi-class classification algorithm of One-Against-One-Against-Rest (OAOAR) is presented. And on each three-class classification issue, OAO and RVM as the binary classifier are adopted to achieve the multi-class classification of RVM. Numerical simulations of UCI datasets and fault diagnostic experiments results of power transformers both demonstrate that the proposed method performs significantly better than other traditional methods in terms of increasing the diagnostic accuracy, optimizing the voting results, strengthening the diagnostic confidence and identifying the hidden classes, and has more practical value in engineering.


Sign in / Sign up

Export Citation Format

Share Document