scholarly journals A Systematic Algorithm for Data Cluster Using Map-Reduce Approach

Author(s):  
Kechika. S ◽  
Sapthika. B ◽  
Keerthana. B ◽  
Abinaya. S ◽  
Abdulfaiz. A

We have been studying the problem clustering data objects as we have implemented a new algorithm called algorithm of clustering data using map reduce approach. In cluster, main part is feature selection which involves in recognition of set of features of a subset, since feature selection is considered as a important process. They also produces the approximate and according requests with the original set of features used in this type of approach. The main concept beyond this paper is to give the outcome of the clustering features. This paper which also gives the knowledge about cluster and it's own process. To processing of large datasets the nature of clustering where some more concepts are more helpful and important in a clustering process. In a clustering methodology where more concepts are very useful. The feature selection algorithm which affects, the entire process of clustering is the map-reduce concept. since, feature selection or extraction which is also used in map-reduce approach. The most desirable component is time complexity where efficiency concerns in this criterion. Here time required to find the effective features, where features of quality subsets is equal to effectiveness. The complexity to find based on this criteria based map-reduce features selection approach, which is proposed and evaluated in this paper.

Data mining is an important research concept that has a vast scope in future. Data mining is used to find the unseen information from the data. In cluster, main half is feature choice. It involves recognition of a set of options of a set, because feature choice is taken into account as a necessary method. They additionally produce the approximate and according requests with the initial set of options employed in this kind of approach. The most construct on the far side this paper is to relinquish the end result of the bunch options. This paper conveys the cluster and the clustering process. The processing of large datasets the nature of clustering where some more concepts are more helpful and important in a clustering process. In clustering methodology many concepts are very useful. The feature selection algorithm which affects the entire process of clustering is the map-reduce concept. Here time needed to seek out the effective options, options of quality subsets is capable of providing effectiveness. The paper discussed map-reduce feature selection approach, its algorithm and framework of implementation.


2017 ◽  
Vol 2017 ◽  
pp. 1-12 ◽  
Author(s):  
Muhammad Shafiq ◽  
Xiangzhan Yu ◽  
Asif Ali Laghari ◽  
Dawei Wang

Recently, machine learning (ML) algorithms have widely been applied in Internet traffic classification. However, due to the inappropriate features selection, ML-based classifiers are prone to misclassify Internet flows as that traffic occupies majority of traffic flows. To address this problem, a novel feature selection metric named weighted mutual information (WMI) is proposed. We develop a hybrid feature selection algorithm named WMI_ACC, which filters most of the features with WMI metric. It further uses a wrapper method to select features for ML classifiers with accuracy (ACC) metric. We evaluate our approach using five ML classifiers on the two different network environment traces captured. Furthermore, we also apply Wilcoxon pairwise statistical test on the results of our proposed algorithm to find out the robust features from the selected set of features. Experimental results show that our algorithm gives promising results in terms of classification accuracy, recall, and precision. Our proposed algorithm can achieve 99% flow accuracy results, which is very promising.


Author(s):  
Luís Cavique ◽  
Armando B. Mendes ◽  
Matthias Funk ◽  
Jorge M. A. Santos

A paremiologic (study of proverbs) case is presented as part of a wider project based on data collected among the Azorean population. Given the considerable distance between the Azores islands, the authors present the hypothesis that there are significant differences in the proverbs from each island, thus permitting the identification of the native island of the interviewee based on his or her knowledge of proverbs. In this chapter, a feature selection algorithm that combines Rough Sets and the Logical Analysis of Data (LAD) is presented. The algorithm named LAID (Logical Analysis of Inconsistent Data) deals with noisy data, and the authors believe that an important link was established between the two different schools with similar approaches. The algorithm was applied to a real world dataset based on data collected using thousands of interviews of Azoreans, involving an initial set of twenty-two thousand Portuguese proverbs.


2020 ◽  
Author(s):  
Maria Irmina Prasetiyowati ◽  
Nur Ulfa Maulidevi ◽  
Kridanto Surendro

Abstract Feature selection is a preprocessing technique aims to remove the unnecessary features and speed up the algorithm's work process. One of the feature selection techniques is by calculating the information gain value of each feature in a dataset. From the information gain value obtained, then the determined threshold value will be used to make feature selection. Generally, the threshold value is used freely, or using a value of 0.05. This study proposed the determination of the threshold value using the standard deviation of the information gain value generated by each feature in the dataset. The determination of this threshold value was tested on ten original datasets and datasets that had been transformed by FFT and IFFT, then classified using Random Forest. The results of the average value of accuracy and the average time required from the Random Forest classification using the proposed threshold value are better compared to the results of feature selection with a threshold value of 0.05 and the Correlation-Base Feature Selection algorithm. Likewise, the result of the average accuracy value of the proposed threshold using a transformed dataset in terms are better than the threshold value of 0.05 and the Correlation-Base Feature Selection algorithm. However, the calculation results for the average time required are higher (slower).


Energies ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 1238
Author(s):  
Supanat Chamchuen ◽  
Apirat Siritaratiwat ◽  
Pradit Fuangfoo ◽  
Puripong Suthisopapan ◽  
Pirat Khunkitti

Power quality disturbance (PQD) is an important issue in electrical distribution systems that needs to be detected promptly and identified to prevent the degradation of system reliability. This work proposes a PQD classification using a novel algorithm, comprised of the artificial bee colony (ABC) and the particle swarm optimization (PSO) algorithms, called “adaptive ABC-PSO” as the feature selection algorithm. The proposed adaptive technique is applied to a combination of ABC and PSO algorithms, and then used as the feature selection algorithm. A discrete wavelet transform is used as the feature extraction method, and a probabilistic neural network is used as the classifier. We found that the highest classification accuracy (99.31%) could be achieved through nine optimally selected features out of all 72 extracted features. Moreover, the proposed PQD classification system demonstrated high performance in a noisy environment, as well as the real distribution system. When comparing the presented PQD classification system’s performance to previous studies, PQD classification accuracy using adaptive ABC-PSO as the optimal feature selection algorithm is considered to be at a high-range scale; therefore, the adaptive ABC-PSO algorithm can be used to classify the PQD in a practical electrical distribution system.


Sign in / Sign up

Export Citation Format

Share Document