A Systematic Algorithm for Data Cluster Using Map-Reduce Approach

We have been studying the problem clustering data objects as we have implemented a new algorithm called algorithm of clustering data using map reduce approach. In cluster, main part is feature selection which involves in recognition of set of features of a subset, since feature selection is considered as a important process. They also produces the approximate and according requests with the original set of features used in this type of approach. The main concept beyond this paper is to give the outcome of the clustering features. This paper which also gives the knowledge about cluster and it's own process. To processing of large datasets the nature of clustering where some more concepts are more helpful and important in a clustering process. In a clustering methodology where more concepts are very useful. The feature selection algorithm which affects, the entire process of clustering is the map-reduce concept. since, feature selection or extraction which is also used in map-reduce approach. The most desirable component is time complexity where efficiency concerns in this criterion. Here time required to find the effective features, where features of quality subsets is equal to effectiveness. The complexity to find based on this criteria based map-reduce features selection approach, which is proposed and evaluated in this paper.

Download Full-text

A Systematic Data Mining Method for Clustering of Data using Map-Reduce Model

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b7026.019320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 716-720

Keyword(s):

Data Mining ◽

Feature Selection ◽

Map Reduce ◽

Mining Method ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Selection Approach ◽

Research Concept ◽

Future Data ◽

Feature Selection Approach

Data mining is an important research concept that has a vast scope in future. Data mining is used to find the unseen information from the data. In cluster, main half is feature choice. It involves recognition of a set of options of a set, because feature choice is taken into account as a necessary method. They additionally produce the approximate and according requests with the initial set of options employed in this kind of approach. The most construct on the far side this paper is to relinquish the end result of the bunch options. This paper conveys the cluster and the clustering process. The processing of large datasets the nature of clustering where some more concepts are more helpful and important in a clustering process. In clustering methodology many concepts are very useful. The feature selection algorithm which affects the entire process of clustering is the map-reduce concept. Here time needed to seek out the effective options, options of quality subsets is capable of providing effectiveness. The paper discussed map-reduce feature selection approach, its algorithm and framework of implementation.

Download Full-text

Effective Feature Selection for 5G IM Applications Traffic Classification

Mobile Information Systems ◽

10.1155/2017/6805056 ◽

2017 ◽

Vol 2017 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Muhammad Shafiq ◽

Xiangzhan Yu ◽

Asif Ali Laghari ◽

Dawei Wang

Keyword(s):

Feature Selection ◽

Classification Accuracy ◽

Statistical Test ◽

Traffic Classification ◽

Features Selection ◽

Traffic Flows ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Wrapper Method ◽

Selection For

Recently, machine learning (ML) algorithms have widely been applied in Internet traffic classification. However, due to the inappropriate features selection, ML-based classifiers are prone to misclassify Internet flows as that traffic occupies majority of traffic flows. To address this problem, a novel feature selection metric named weighted mutual information (WMI) is proposed. We develop a hybrid feature selection algorithm named WMI_ACC, which filters most of the features with WMI metric. It further uses a wrapper method to select features for ML classifiers with accuracy (ACC) metric. We evaluate our approach using five ML classifiers on the two different network environment traces captured. Furthermore, we also apply Wilcoxon pairwise statistical test on the results of our proposed algorithm to find out the robust features from the selected set of features. Experimental results show that our algorithm gives promising results in terms of classification accuracy, recall, and precision. Our proposed algorithm can achieve 99% flow accuracy results, which is very promising.

Download Full-text

A Feature Selection Approach in the Study of Azorean Proverbs

Exploring Innovative and Successful Applications of Soft Computing - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-4666-4785-5.ch003 ◽

2014 ◽

pp. 38-58 ◽

Cited By ~ 1

Author(s):

Luís Cavique ◽

Armando B. Mendes ◽

Matthias Funk ◽

Jorge M. A. Santos

Keyword(s):

Feature Selection ◽

Real World ◽

Rough Sets ◽

Noisy Data ◽

Logical Analysis ◽

Logical Analysis Of Data ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Selection Approach ◽

Feature Selection Approach

A paremiologic (study of proverbs) case is presented as part of a wider project based on data collected among the Azorean population. Given the considerable distance between the Azores islands, the authors present the hypothesis that there are significant differences in the proverbs from each island, thus permitting the identification of the native island of the interviewee based on his or her knowledge of proverbs. In this chapter, a feature selection algorithm that combines Rough Sets and the Logical Analysis of Data (LAD) is presented. The algorithm named LAID (Logical Analysis of Inconsistent Data) deals with noisy data, and the authors believe that an important link was established between the two different schools with similar approaches. The algorithm was applied to a real world dataset based on data collected using thousands of interviews of Azoreans, involving an initial set of twenty-two thousand Portuguese proverbs.

Download Full-text

Determining Threshold Value on Information Gain Feature Selection to Increase Speed and Prediction Accuracy of Random Forest

10.21203/rs.3.rs-132775/v1 ◽

2020 ◽

Author(s):

Maria Irmina Prasetiyowati ◽

Nur Ulfa Maulidevi ◽

Kridanto Surendro

Keyword(s):

Feature Selection ◽

Random Forest ◽

Information Gain ◽

Threshold Value ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Random Forest Classification ◽

Average Value ◽

Time Required

Abstract Feature selection is a preprocessing technique aims to remove the unnecessary features and speed up the algorithm's work process. One of the feature selection techniques is by calculating the information gain value of each feature in a dataset. From the information gain value obtained, then the determined threshold value will be used to make feature selection. Generally, the threshold value is used freely, or using a value of 0.05. This study proposed the determination of the threshold value using the standard deviation of the information gain value generated by each feature in the dataset. The determination of this threshold value was tested on ten original datasets and datasets that had been transformed by FFT and IFFT, then classified using Random Forest. The results of the average value of accuracy and the average time required from the Random Forest classification using the proposed threshold value are better compared to the results of feature selection with a threshold value of 0.05 and the Correlation-Base Feature Selection algorithm. Likewise, the result of the average accuracy value of the proposed threshold using a transformed dataset in terms are better than the threshold value of 0.05 and the Correlation-Base Feature Selection algorithm. However, the calculation results for the average time required are higher (slower).

Download Full-text

Research and implementation of Chinese text feature selection algorithm based on χ2statistics

Computational Intelligence and Industrial Engineering ◽

10.2495/ciie140191 ◽

2014 ◽

Author(s):

Weijiang Wu ◽

Shengkai Wen ◽

Dongmei Xia ◽

Guohe Li

Keyword(s):

Feature Selection ◽

Chinese Text ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Text Feature

Download Full-text

BagMeLiF: stable boosting-based hybrid-ensemble feature selection algorithm for high-dimensional data

2020 International Conference on Control, Robotics and Intelligent System ◽

10.1145/3437802.3437835 ◽

2020 ◽

Author(s):

Nikita Pilnenskiy ◽

Ivan Smetannikov

Keyword(s):

Feature Selection ◽

High Dimensional Data ◽

High Dimensional ◽

Selection Algorithm ◽

Feature Selection Algorithm

Download Full-text

Hybrid Feature Selection Algorithm Based on Discrete Artificial Bee Colony for Parkinson Diagnosis

ACM Transactions on Internet Technology ◽

10.1145/3397161 ◽

2020 ◽

Cited By ~ 1

Author(s):

Haolun Li ◽

Chi-Man Pun ◽

Feng Xu ◽

Longsheng Pan ◽

Rui Zong ◽

...

Keyword(s):

Feature Selection ◽

Artificial Bee Colony ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Bee Colony

Download Full-text

High-Accuracy Power Quality Disturbance Classification Using the Adaptive ABC-PSO as Optimal Feature Selection Algorithm

Energies ◽

10.3390/en14051238 ◽

2021 ◽

Vol 14 (5) ◽

pp. 1238

Author(s):

Supanat Chamchuen ◽

Apirat Siritaratiwat ◽

Pradit Fuangfoo ◽

Puripong Suthisopapan ◽

Pirat Khunkitti

Keyword(s):

Feature Selection ◽

Power Quality ◽

Distribution System ◽

Classification Accuracy ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Electrical Distribution ◽

Power Quality Disturbance ◽

Optimal Feature Selection ◽

Optimal Feature

Power quality disturbance (PQD) is an important issue in electrical distribution systems that needs to be detected promptly and identified to prevent the degradation of system reliability. This work proposes a PQD classification using a novel algorithm, comprised of the artificial bee colony (ABC) and the particle swarm optimization (PSO) algorithms, called “adaptive ABC-PSO” as the feature selection algorithm. The proposed adaptive technique is applied to a combination of ABC and PSO algorithms, and then used as the feature selection algorithm. A discrete wavelet transform is used as the feature extraction method, and a probabilistic neural network is used as the classifier. We found that the highest classification accuracy (99.31%) could be achieved through nine optimally selected features out of all 72 extracted features. Moreover, the proposed PQD classification system demonstrated high performance in a noisy environment, as well as the real distribution system. When comparing the presented PQD classification system’s performance to previous studies, PQD classification accuracy using adaptive ABC-PSO as the optimal feature selection algorithm is considered to be at a high-range scale; therefore, the adaptive ABC-PSO algorithm can be used to classify the PQD in a practical electrical distribution system.

Download Full-text