feature selection techniques
Recently Published Documents


TOTAL DOCUMENTS

394
(FIVE YEARS 196)

H-INDEX

23
(FIVE YEARS 4)

Algorithms ◽  
2022 ◽  
Vol 15 (1) ◽  
pp. 21
Author(s):  
Consolata Gakii ◽  
Paul O. Mireji ◽  
Richard Rimiru

Analysis of high-dimensional data, with more features () than observations () (), places significant demand in cost and memory computational usage attributes. Feature selection can be used to reduce the dimensionality of the data. We used a graph-based approach, principal component analysis (PCA) and recursive feature elimination to select features for classification from RNAseq datasets from two lung cancer datasets. The selected features were discretized for association rule mining where support and lift were used to generate informative rules. Our results show that the graph-based feature selection improved the performance of sequential minimal optimization (SMO) and multilayer perceptron classifiers (MLP) in both datasets. In association rule mining, features selected using the graph-based approach outperformed the other two feature-selection techniques at a support of 0.5 and lift of 2. The non-redundant rules reflect the inherent relationships between features. Biological features are usually related to functions in living systems, a relationship that cannot be deduced by feature selection and classification alone. Therefore, the graph-based feature-selection approach combined with rule mining is a suitable way of selecting and finding associations between features in high-dimensional RNAseq data.


2022 ◽  
Vol 9 (1) ◽  
Author(s):  
Joffrey L. Leevy ◽  
John Hancock ◽  
Taghi M. Khoshgoftaar ◽  
Jared M. Peterson

AbstractThe recent years have seen a proliferation of Internet of Things (IoT) devices and an associated security risk from an increasing volume of malicious traffic worldwide. For this reason, datasets such as Bot-IoT were created to train machine learning classifiers to identify attack traffic in IoT networks. In this study, we build predictive models with Bot-IoT to detect attacks represented by dataset instances from the Information Theft category, as well as dataset instances from the data exfiltration and keylogging subcategories. Our contribution is centered on the evaluation of ensemble feature selection techniques (FSTs) on classification performance for these specific attack instances. A group or ensemble of FSTs will often perform better than the best individual technique. The classifiers that we use are a diverse set of four ensemble learners (Light GBM, CatBoost, XGBoost, and random forest (RF)) and four non-ensemble learners (logistic regression (LR), decision tree (DT), Naive Bayes (NB), and a multi-layer perceptron (MLP)). The metrics used for evaluating classification performance are area under the receiver operating characteristic curve (AUC) and Area Under the precision-recall curve (AUPRC). For the most part, we determined that our ensemble FSTs do not affect classification performance but are beneficial because feature reduction eases computational burden and provides insight through improved data visualization.


2022 ◽  
Vol 2161 (1) ◽  
pp. 012003
Author(s):  
Rajat Jain ◽  
Pranam R Betrabet ◽  
B Ashwath Rao ◽  
N V Subba Reddy

Abstract Arrhythmia is one of the life-threatening heart diseases which is diagnosed and analyzed using electrocardiogram (ECG) recordings and other symptoms namely rapid heartbeat or chest-pounding, shortness of breath, near fainting spells, insufficient pumping of blood from the heart, etc along with sudden cardiac arrest. Arrhythmia records a hasty and aberrant ECG. In this implementation, the arrhythmia dataset is collected from the UCI machine learning repository and then classified the records into sixteen stated classes using multiclass classification. The large feature set of the dataset is reduced using improved feature selection techniques such as t-Distributed Stochastic Neighbor Embedding (TSNE), Principal Component Analysis (PCA), Uniform Manifold Approximation, and Projection (UMAP) and then an Ensemble Classifier is built to analyse the classification accuracy on arrhythmia dataset to conclude when and which approach gives optimal results.


2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

This paper proposes a novel hybrid framework with BWO based feature reduction technique which combines the merits of both machine learning and lexicon-based approaches to attain better scalability and accuracy. The scalability problem arises due to noisy, irrelevant and unique features present in the extracted features from proposed approach, which can be eliminated by adopting an effective feature reduction technique. In our proposed BWO approach, without changing the accuracy (90%), the feature-set size is reduced up to 43%. The proposed feature selection technique outperforms other commonly used PSO and GAbased feature selection techniques with reduced computation time of 21 sec. Moreover, our sentiment analysis approach is analysed using performance metrices such as precision, recall, F-measure, and computation time. Many organizations can use these online reviews to make well-informed decisions towards the users’ interests and preferences to enhance customer satisfaction, product quality and to find the aspects to improve the products, thereby to generate more profits.


2022 ◽  
Vol 192 ◽  
pp. 106578
Author(s):  
David Camilo Corrales ◽  
Céline Schoving ◽  
Hélène Raynal ◽  
Philippe Debaeke ◽  
Etienne-Pascal Journet ◽  
...  

2021 ◽  
Vol 16 (24) ◽  
pp. 255-272
Author(s):  
Edmund Evangelista

Virtual Learning Environments (VLE), such as Moodle and Blackboard, store vast data to help identify students' performance and engagement. As a result, researchers have been focusing their efforts on assisting educational institutions in providing machine learning models to predict at-risk students and improve their performance. However, it requires an efficient approach to construct a model that can ultimately provide accurate predictions. Consequently, this study proposes a hybrid machine learning framework to predict students' performance using eight classification algorithms and three ensemble methods (Bagging, Boosting, Voting) to determine the best-performing predictive model. In addition, this study used filter-based and wrapper-based feature selection techniques to select the best features of the dataset related to students' performance. The obtained results reveal that the ensemble methods recorded higher predictive accuracy when compared to single classifiers. Furthermore, the accuracy of the models improved due to the feature selection techniques utilized in this study.


2021 ◽  
pp. 231971452110626
Author(s):  
Jishnu Bhattacharyya ◽  
Manoj Kumar Dash

The literature on telecommunications customer churn behaviour has grown in importance and volume since the early 2000s. This study performed a quantitative bibliometric retrospection of selected journals that qualified for the ABDC journal quality list to examine relevant studies published by them on customer churn research in telecommunication. Using bibliometric data from 175 research articles available in the Scopus database, this review sheds light on the publication trends, articles, stakeholders, prevalent research techniques, and topics of interest over three decades (1985–2019). According to the findings of this review, the current level of contributions are manifested through ten overarching groups of scholarship—namely churn prediction and modelling, feature selection techniques and comparison, customer retention strategy and relationship management, service recovery, pricing and switching cost, legislation, legal, and policy, word-of-mouth and post-switching behaviour, new service adoption, brand credibility, and loyalty. The existing literature has predominantly utilized quantitative methods to their full potential. For far too long, scholars, according tothe study’s central thesis, have ignored the metatheoretical consequences of relying solely on a logical positivism paradigm. In addition, we highlight research directions and the need for customer churn research to go beyond feature selection and modelling.


Sign in / Sign up

Export Citation Format

Share Document