Design and Procedures for the Investigation Conducted

In this chapter, the design of each proposed case study model mentioned in Chapter 3 is presented with their different experimental procedures. The chapter includes the data preparation, suitable parameters and data pre-processing, and detailed design of two case studies. Case 1: examining the accuracy and efficiency (time complexity) of high-performance gene selection and cancer classification algorithms; Case 2: A two-stage hybrid multi-filter feature selection method for high colon-cancer classification. It shows the experimental setup and environment and the description of the hardware and software components used.

2010 ◽  
Vol 9 ◽  
pp. CIN.S3794 ◽  
Author(s):  
Xiaosheng Wang ◽  
Osamu Gotoh

Gene selection is of vital importance in molecular classification of cancer using high-dimensional gene expression data. Because of the distinct characteristics inherent to specific cancerous gene expression profiles, developing flexible and robust feature selection methods is extremely crucial. We investigated the properties of one feature selection approach proposed in our previous work, which was the generalization of the feature selection method based on the depended degree of attribute in rough sets. We compared the feature selection method with the established methods: the depended degree, chi-square, information gain, Relief-F and symmetric uncertainty, and analyzed its properties through a series of classification experiments. The results revealed that our method was superior to the canonical depended degree of attribute based method in robustness and applicability. Moreover, the method was comparable to the other four commonly used methods. More importantly, the method can exhibit the inherent classification difficulty with respect to different gene expression datasets, indicating the inherent biology of specific cancers.


Generally, classification accuracy is very important to gene processing and selection and cancer classification. It is needed to achieve better cancer treatments and improve medical drug assignments. However, the time complexity analysis will enhance the application's significance. To answer the research questions in Chapter 1, several case studies have been implemented (see Chapters 4 and 5), each was essential to sustain the methodologies discussed in Chapter 3. The study used a colon-cancer dataset comprising 2000 genes. The best search algorithm, GA, showed high performance with a good efficient time complexity. However, both DTs and SVMs showed the best classification contribution with reference to performance accuracy and time efficiency. However, it is difficult to apply a completely fair comparative study because existing algorithms and methods were tested by different authors to reflect the effectiveness and powerful of their own methods.


2017 ◽  
Vol 2017 ◽  
pp. 1-10 ◽  
Author(s):  
Khairan D. Rajab

Phishing is one of the serious web threats that involves mimicking authenticated websites to deceive users in order to obtain their financial information. Phishing has caused financial damage to the different online stakeholders. It is massive in the magnitude of hundreds of millions; hence it is essential to minimize this risk. Classifying websites into “phishy” and legitimate types is a primary task in data mining that security experts and decision makers are hoping to improve particularly with respect to the detection rate and reliability of the results. One way to ensure the reliability of the results and to enhance performance is to identify a set of related features early on so the data dimensionality reduces and irrelevant features are discarded. To increase reliability of preprocessing, this article proposes a new feature selection method that combines the scores of multiple known methods to minimize discrepancies in feature selection results. The proposed method has been applied to the problem of website phishing classification to show its pros and cons in identifying relevant features. Results against a security dataset reveal that the proposed preprocessing method was able to derive new features datasets which when mined generate high competitive classifiers with reference to detection rate when compared to results obtained from other features selection methods.


Symmetry ◽  
2020 ◽  
Vol 12 (2) ◽  
pp. 271 ◽  
Author(s):  
Md Akizur Rahman ◽  
Ravie Chandren Muniyandi

An artificial neural network (ANN) is a tool that can be utilized to recognize cancer effectively. Nowadays, the risk of cancer is increasing dramatically all over the world. Detecting cancer is very difficult due to a lack of data. Proper data are essential for detecting cancer accurately. Cancer classification has been carried out by many researchers, but there is still a need to improve classification accuracy. For this purpose, in this research, a two-step feature selection (FS) technique with a 15-neuron neural network (NN), which classifies cancer with high accuracy, is proposed. The FS method is utilized to reduce feature attributes, and the 15-neuron network is utilized to classify the cancer. This research utilized the benchmark Wisconsin Diagnostic Breast Cancer (WDBC) dataset to compare the proposed method with other existing techniques, showing a significant improvement of up to 99.4% in classification accuracy. The results produced in this research are more promising and significant than those in existing papers.


This chapter focuses on the results produced from each case study experiment. For case one, the experiments were conducted in three phases. Phase one implemented GA, PSO, and IG as the gene/feature selection algorithms over the entire dataset. Phase =two2 utilised the original dataset to implement only the cancer classification algorithms without involving any gene/feature selection algorithms. Four recognised classification algorithms are employed: SVM, NB, GP, and DT. The third phase implemented the combined approach of gene selection and cancer classification algorithms. The results of these phases are presented in the next subsections. For case two, these experiments were implemented in two phases. Phase one implemented the classification algorithms over the features selected by the hybridised selection algorithms (GA+IG), whereas Phase two classified the features using the proposed two-stage multifilter selection system. In this section, the results are presented as follows


2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Jianzhong Wang ◽  
Shuang Zhou ◽  
Yugen Yi ◽  
Jun Kong

Feature selection is a key issue in the domain of machine learning and related fields. The results of feature selection can directly affect the classifier’s classification accuracy and generalization performance. Recently, a statistical feature selection method named effective range based gene selection (ERGS) is proposed. However, ERGS only considers the overlapping area (OA) among effective ranges of each class for every feature; it fails to handle the problem of the inclusion relation of effective ranges. In order to overcome this limitation, a novel efficient statistical feature selection approach called improved feature selection based on effective range (IFSER) is proposed in this paper. In IFSER, an including area (IA) is introduced to characterize the inclusion relation of effective ranges. Moreover, the samples’ proportion for each feature of every class in both OA and IA is also taken into consideration. Therefore, IFSER outperforms the original ERGS and some other state-of-the-art algorithms. Experiments on several well-known databases are performed to demonstrate the effectiveness of the proposed method.


Sign in / Sign up

Export Citation Format

Share Document