feature grouping
Recently Published Documents


TOTAL DOCUMENTS

86
(FIVE YEARS 20)

H-INDEX

13
(FIVE YEARS 1)

2022 ◽  
pp. 103-116
Author(s):  
Ravishanker ◽  
Monica Sood ◽  
Prikshat Angra ◽  
Sahil Verma ◽  
Kavita ◽  
...  

Author(s):  
Miguel García-Torres ◽  
Francisco Gómez-Vela ◽  
Federico Divina ◽  
Diego P. Pinto-Roa ◽  
José Luis Vázquez Noguera ◽  
...  

2021 ◽  
Vol 11 (11) ◽  
pp. 4742
Author(s):  
Tianpei Xu ◽  
Ying Ma ◽  
Kangchul Kim

In recent years, the telecom market has been very competitive. The cost of retaining existing telecom customers is lower than attracting new customers. It is necessary for a telecom company to understand customer churn through customer relationship management (CRM). Therefore, CRM analyzers are required to predict which customers will churn. This study proposes a customer-churn prediction system that uses an ensemble-learning technique consisting of stacking models and soft voting. Xgboost, Logistic regression, Decision tree, and Naïve Bayes machine-learning algorithms are selected to build a stacking model with two levels, and the three outputs of the second level are used for soft voting. Feature construction of the churn dataset includes equidistant grouping of customer behavior features to expand the space of features and discover latent information from the churn dataset. The original and new churn datasets are analyzed in the stacking ensemble model with four evaluation metrics. The experimental results show that the proposed customer churn predictions have accuracies of 96.12% and 98.09% for the original and new churn datasets, respectively. These results are better than state-of-the-art churn recognition systems.


Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3005
Author(s):  
A. N. M. Bazlur Rashid ◽  
Mohiuddin Ahmed ◽  
Al-Sakib Khan Pathan

While anomaly detection is very important in many domains, such as in cybersecurity, there are many rare anomalies or infrequent patterns in cybersecurity datasets. Detection of infrequent patterns is computationally expensive. Cybersecurity datasets consist of many features, mostly irrelevant, resulting in lower classification performance by machine learning algorithms. Hence, a feature selection (FS) approach, i.e., selecting relevant features only, is an essential preprocessing step in cybersecurity data analysis. Despite many FS approaches proposed in the literature, cooperative co-evolution (CC)-based FS approaches can be more suitable for cybersecurity data preprocessing considering the Big Data scenario. Accordingly, in this paper, we have applied our previously proposed CC-based FS with random feature grouping (CCFSRFG) to a benchmark cybersecurity dataset as the preprocessing step. The dataset with original features and the dataset with a reduced number of features were used for infrequent pattern detection. Experimental analysis was performed and evaluated using 10 unsupervised anomaly detection techniques. Therefore, the proposed infrequent pattern detection is termed Unsupervised Infrequent Pattern Detection (UIPD). Then, we compared the experimental results with and without FS in terms of true positive rate (TPR). Experimental analysis indicates that the highest rate of TPR improvement was by cluster-based local outlier factor (CBLOF) of the backdoor infrequent pattern detection, and it was 385.91% when using FS. Furthermore, the highest overall infrequent pattern detection TPR was improved by 61.47% for all infrequent patterns using clustering-based multivariate Gaussian outlier score (CMGOS) with FS.


2021 ◽  
Vol 546 ◽  
pp. 1256-1272
Author(s):  
Ling Zheng ◽  
Fei Chao ◽  
Neil Mac Parthaláin ◽  
Defu Zhang ◽  
Qiang Shen
Keyword(s):  

2020 ◽  
Vol 7 (1) ◽  
Author(s):  
A. N. M. Bazlur Rashid ◽  
Mohiuddin Ahmed ◽  
Leslie F. Sikos ◽  
Paul Haskell-Dowland

AbstractA massive amount of data is generated with the evolution of modern technologies. This high-throughput data generation results in Big Data, which consist of many features (attributes). However, irrelevant features may degrade the classification performance of machine learning (ML) algorithms. Feature selection (FS) is a technique used to select a subset of relevant features that represent the dataset. Evolutionary algorithms (EAs) are widely used search strategies in this domain. A variant of EAs, called cooperative co-evolution (CC), which uses a divide-and-conquer approach, is a good choice for optimization problems. The existing solutions have poor performance because of some limitations, such as not considering feature interactions, dealing with only an even number of features, and decomposing the dataset statically. In this paper, a novel random feature grouping (RFG) has been introduced with its three variants to dynamically decompose Big Data datasets and to ensure the probability of grouping interacting features into the same subcomponent. RFG can be used in CC-based FS processes, hence called Cooperative Co-Evolutionary-Based Feature Selection with Random Feature Grouping (CCFSRFG). Experiment analysis was performed using six widely used ML classifiers on seven different datasets from the UCI ML repository and Princeton University Genomics repository with and without FS. The experimental results indicate that in most cases [i.e., with naïve Bayes (NB), support vector machine (SVM), k-Nearest Neighbor (k-NN), J48, and random forest (RF)] the proposed CCFSRFG-1 outperforms an existing solution (a CC-based FS, called CCEAFS) and CCFSRFG-2, and also when using all features in terms of accuracy, sensitivity, and specificity.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
A. N. M. Bazlur Rashid ◽  
Mohiuddin Ahmed ◽  
Leslie F. Sikos ◽  
Paul Haskell‑Dowland

An amendment to this paper has been published and can be accessed via the original article.


Sign in / Sign up

Export Citation Format

Share Document