feature selection approach
Recently Published Documents


TOTAL DOCUMENTS

498
(FIVE YEARS 230)

H-INDEX

26
(FIVE YEARS 8)

Algorithms ◽  
2022 ◽  
Vol 15 (1) ◽  
pp. 21
Author(s):  
Consolata Gakii ◽  
Paul O. Mireji ◽  
Richard Rimiru

Analysis of high-dimensional data, with more features () than observations () (), places significant demand in cost and memory computational usage attributes. Feature selection can be used to reduce the dimensionality of the data. We used a graph-based approach, principal component analysis (PCA) and recursive feature elimination to select features for classification from RNAseq datasets from two lung cancer datasets. The selected features were discretized for association rule mining where support and lift were used to generate informative rules. Our results show that the graph-based feature selection improved the performance of sequential minimal optimization (SMO) and multilayer perceptron classifiers (MLP) in both datasets. In association rule mining, features selected using the graph-based approach outperformed the other two feature-selection techniques at a support of 0.5 and lift of 2. The non-redundant rules reflect the inherent relationships between features. Biological features are usually related to functions in living systems, a relationship that cannot be deduced by feature selection and classification alone. Therefore, the graph-based feature-selection approach combined with rule mining is a suitable way of selecting and finding associations between features in high-dimensional RNAseq data.


2022 ◽  
Vol 10 (1) ◽  
pp. 0-0

Parkinson’s is the second most common neurodegenerative disorder after Alzheimer’s disease which adversely affects the nervous system of the patients. During the nascent stage, the symptoms of Parkinson’s disease are mild and sometimes go unnoticeable but as the disease progresses the symptoms go severe, so its diagnosis at an early stage is not easy. Recent research has shown that changes in speech or distortion in voice can be taken effectively used for early Parkinson’s detection. In this work, the authors propose a system of Parkinson's disease detection using speech signals. As the feature selection plays an important role during classification, authors have proposed a hybrid MIRFE feature selection approach. The result of the proposed feature selection approach is compared with the 5 standard feature selection methods by XGBoost classifier. The proposed MIRFE approach selects 40 features out of 754 features with a feature reduction ratio of 94.69%. An accuracy of 93.88% and area under curve (AUC) of 0.978 is obtained by the proposed system.


2021 ◽  
Vol 5 (4) ◽  
pp. 395
Author(s):  
Muhammad Aqil Haqeemi Azmi ◽  
Cik Feresa Mohd Foozy ◽  
Khairul Amin Mohamad Sukri ◽  
Nurul Azma Abdullah ◽  
Isredza Rahmi A. Hamid ◽  
...  

Distributed Denial of Service (DDoS) attacks are dangerous attacks that can cause disruption to server, system or application layer. It will flood the target server with the amount of Internet traffic that the server could not afford at one time. Therefore, it is possible that the server will not work if it is affected by this DDoS attack. Due to this attack, the network security environment becomes insecure with the possibility of this attack. In recent years, the cases related to DDoS attacks have increased. Although previously there has been a lot of research on DDoS attacks, cases of DDoS attacks still exist. Therefore, the research on feature selection approach has been done in effort to detect the DDoS attacks by using machine learning techniques. In this paper, to detect DDoS attacks, features have been selected from the UNSW-NB 15 dataset by using Information Gain and Data Reduction method. To classify the selected features, ANN, Naïve Bayes, and Decision Table algorithms were used to test the dataset. To evaluate the result of the experiment, the parameters of Accuracy, Precision, True Positive and False Positive evaluated the results and classed the data into attacks and normal class. Hence, the good features have been obtained based on the experiments. To ensure the selected features are good or not, the results of classification have been compared with the past research that used the same UNSW-NB 15 dataset. To conclude, the accuracy of ANN, Naïve Bayes and Decision Table classifiers has been increased by using this feature selection approach compared to the past research.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Mariam Elhussein ◽  
Samiha Brahimi

PurposeThis paper aims to propose a novel way of using textual clustering as a feature selection method. It is applied to identify the most important keywords in the profile classification. The method is demonstrated through the problem of sick-leave promoters on Twitter.Design/methodology/approachFour machine learning classifiers were used on a total of 35,578 tweets posted on Twitter. The data were manually labeled into two categories: promoter and nonpromoter. Classification performance was compared when the proposed clustering feature selection approach and the standard feature selection were applied.FindingsRadom forest achieved the highest accuracy of 95.91% higher than similar work compared. Furthermore, using clustering as a feature selection method improved the Sensitivity of the model from 73.83% to 98.79%. Sensitivity (recall) is the most important measure of classifier performance when detecting promoters’ accounts that have spam-like behavior.Research limitations/implicationsThe method applied is novel, more testing is needed in other datasets before generalizing its results.Practical implicationsThe model applied can be used by Saudi authorities to report on the accounts that sell sick-leaves online.Originality/valueThe research is proposing a new way textual clustering can be used in feature selection.


2021 ◽  
Author(s):  
A B Pawar ◽  
M A Jawale ◽  
Ravi Kumar Tirandasu ◽  
Saiprasad Potharaju

High dimensionality is the serious issue in the preprocessing of data mining. Having large number of features in the dataset leads to several complications for classifying an unknown instance. In a initial dataspace there may be redundant and irrelevant features present, which leads to high memory consumption, and confuse the learning model created with those properties of features. Always it is advisable to select the best features and generate the classification model for better accuracy. In this research, we proposed a novel feature selection approach and Symmetrical uncertainty and Correlation Coefficient (SU-CCE) for reducing the high dimensional feature space and increasing the classification accuracy. The experiment is performed on colon cancer microarray dataset which has 2000 features. The proposed method derived 38 best features from it. To measure the strength of proposed method, top 38 features extracted by 4 traditional filter-based methods are compared with various classifiers. After careful investigation of result, the proposed approach is competing with most of the traditional methods.


2021 ◽  
pp. 101509
Author(s):  
Marcio Carneiro Brito Pache ◽  
Diego André Sant'Ana ◽  
Fábio Prestes Cesar Rezende ◽  
João Vitor de Andrade Porto ◽  
João Victor Araújo Rozales ◽  
...  

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Liqian Zhou ◽  
Qi Duan ◽  
Xiongfei Tian ◽  
He Xu ◽  
Jianxin Tang ◽  
...  

Abstract Background Long noncoding RNAs (lncRNAs) have dense linkages with a plethora of important cellular activities. lncRNAs exert functions by linking with corresponding RNA-binding proteins. Since experimental techniques to detect lncRNA-protein interactions (LPIs) are laborious and time-consuming, a few computational methods have been reported for LPI prediction. However, computation-based LPI identification methods have the following limitations: (1) Most methods were evaluated on a single dataset, and researchers may thus fail to measure their generalization ability. (2) The majority of methods were validated under cross validation on lncRNA-protein pairs, did not investigate the performance under other cross validations, especially for cross validation on independent lncRNAs and independent proteins. (3) lncRNAs and proteins have abundant biological information, how to select informative features need to further investigate. Results Under a hybrid framework (LPI-HyADBS) integrating feature selection based on AdaBoost, and classification models including deep neural network (DNN), extreme gradient Boost (XGBoost), and SVM with a penalty Coefficient of misclassification (C-SVM), this work focuses on finding new LPIs. First, five datasets are arranged. Each dataset contains lncRNA sequences, protein sequences, and an LPI network. Second, biological features of lncRNAs and proteins are acquired based on Pyfeat. Third, the obtained features of lncRNAs and proteins are selected based on AdaBoost and concatenated to depict each LPI sample. Fourth, DNN, XGBoost, and C-SVM are used to classify lncRNA-protein pairs based on the concatenated features. Finally, a hybrid framework is developed to integrate the classification results from the above three classifiers. LPI-HyADBS is compared to six classical LPI prediction approaches (LPI-SKF, LPI-NRLMF, Capsule-LPI, LPI-CNNCP, LPLNP, and LPBNI) on five datasets under 5-fold cross validations on lncRNAs, proteins, lncRNA-protein pairs, and independent lncRNAs and independent proteins. The results show LPI-HyADBS has the best LPI prediction performance under four different cross validations. In particular, LPI-HyADBS obtains better classification ability than other six approaches under the constructed independent dataset. Case analyses suggest that there is relevance between ZNF667-AS1 and Q15717. Conclusions Integrating feature selection approach based on AdaBoost, three classification techniques including DNN, XGBoost, and C-SVM, this work develops a hybrid framework to identify new linkages between lncRNAs and proteins.


Sign in / Sign up

Export Citation Format

Share Document