A Method of Feature Automatic Selection Based on Mutual Information Grouping and Clustering

For the problem about a large number of irrelevant and redundant features may reduce the performance of data classification in massive data sets, a method of feature automatic selection based on mutual information and fuzzy clustering algorithm is proposed. The method is carried out as follows: The first is to work out the feature correlation based on mutual information, and to group the data according to the feature of the maximum correlation. The second is to automatically determine the optimal number of feature and compression features dimension by fuzzy c-means clustering algorithm in the data groups. The theoretical analysis and the experiment indicate that the method can obtain higher efficiency in data classification.

Download Full-text

Delineation of irrigation management zones in a Quartzipsamment of the Brazilian semiarid region

Pesquisa Agropecuária Brasileira ◽

10.1590/s0100-204x2016000900028 ◽

2016 ◽

Vol 51 (9) ◽

pp. 1283-1294 ◽

Cited By ~ 5

Author(s):

Henrique Oldoni ◽

Luís Henrique Bassoi

Keyword(s):

Soil Properties ◽

Clustering Algorithm ◽

Irrigation Management ◽

Optimal Number ◽

Semiarid Region ◽

Management Zones ◽

Distribution Maps ◽

Management Zone ◽

Brazilian Semiarid ◽

Fuzzy C Means Clustering

Abstract The objective of this work was to delineate irrigation management zones using geostatistics and multivariate analysis in different combinations of physical and hydraulic soil properties, as well as to determine the optimal number of management zones in order to avoid overlaping. A field experiment was carried out in a Quartzipsamment, for two years, in an irrigated orchard of table grape, in the Senador Nilo Coelho Irrigation Scheme, in the municipality of Petrolina, in the state of Pernanbuco, Brazil. Soil samples were collected for the determination of soil physico-hydraulic properties. A portable meter was used to measure soil apparent electrical conductivity. Spatial distribution maps were generated using ordinary kriging. Management zones for five different combinations of soil properties were defined using the fuzzy c-means clustering algorithm, and two indexes were applied to determine the optimal number of management zones. Two combinations of soil properties can be used in the management zone planning in order to monitor soil moisture.

Download Full-text

A new approach to the fuzzy c-means clustering algorithm by automatic weights and local clustering

10.24271/psr.18 ◽

2021 ◽

Vol 3 (1) ◽

pp. 1-7

Author(s):

Yadgar Sirwan Abdulrahman

Keyword(s):

Clustering Algorithm ◽

Similarity Criterion ◽

Real Data ◽

Well Being ◽

Classical Solutions ◽

Data Sets ◽

Data Set ◽

New Approach ◽

Fuzzy C Means Clustering ◽

Global And Local

Clustering is one of the essential strategies in data analysis. In classical solutions, all features are assumed to contribute equally to the data clustering. Of course, some features are more important than others in real data sets. As a result, essential features will have a more significant impact on identifying optimal clusters than other features. In this article, a fuzzy clustering algorithm with local automatic weighting is presented. The proposed algorithm has many advantages such as: 1) the weights perform features locally, meaning that each cluster's weight is different from the rest. 2) calculating the distance between the samples using a non-euclidian similarity criterion to reduce the noise effect. 3) the weight of the features is obtained comparatively during the learning process. In this study, mathematical analyzes were done to obtain the clustering centers well-being and the features' weights. Experiments were done on the data set range to represent the progressive algorithm's efficiency compared to other proposed algorithms with global and local features

Download Full-text

A scalable parallel subspace clustering algorithm for massive data sets

Proceedings 2000 International Conference on Parallel Processing ◽

10.1109/icpp.2000.876164 ◽

2002 ◽

Cited By ~ 17

Author(s):

H.S. Nagesh ◽

S. Goil ◽

A. Choudhary

Keyword(s):

Clustering Algorithm ◽

Subspace Clustering ◽

Massive Data ◽

Data Sets ◽

Massive Data Sets

Download Full-text

Chicken swarm foraging algorithm for big data classification using the deep belief network classifier

Data Technologies and Applications ◽

10.1108/dta-08-2019-0146 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Sathyaraj R ◽

Ramanathan L ◽

Lavanya K ◽

Balasubramanian V ◽

Saira Banu J

Keyword(s):

Big Data ◽

Data Classification ◽

Massive Data ◽

Data Sets ◽

Jaccard Coefficient ◽

Training Phase ◽

Mapreduce Framework ◽

Massive Data Sets ◽

Content Type ◽

Big Data Classification

PurposeThe innovation in big data is increasing day by day in such a way that the conventional software tools face several problems in managing the big data. Moreover, the occurrence of the imbalance data in the massive data sets is a major constraint to the research industry.Design/methodology/approachThe purpose of the paper is to introduce a big data classification technique using the MapReduce framework based on an optimization algorithm. The big data classification is enabled using the MapReduce framework, which utilizes the proposed optimization algorithm, named chicken-based bacterial foraging (CBF) algorithm. The proposed algorithm is generated by integrating the bacterial foraging optimization (BFO) algorithm with the cat swarm optimization (CSO) algorithm. The proposed model executes the process in two stages, namely, training and testing phases. In the training phase, the big data that is produced from different distributed sources is subjected to parallel processing using the mappers in the mapper phase, which perform the preprocessing and feature selection based on the proposed CBF algorithm. The preprocessing step eliminates the redundant and inconsistent data, whereas the feature section step is done on the preprocessed data for extracting the significant features from the data, to provide improved classification accuracy. The selected features are fed into the reducer for data classification using the deep belief network (DBN) classifier, which is trained using the proposed CBF algorithm such that the data are classified into various classes, and finally, at the end of the training process, the individual reducers present the trained models. Thus, the incremental data are handled effectively based on the training model in the training phase. In the testing phase, the incremental data are taken and split into different subsets and fed into the different mappers for the classification. Each mapper contains a trained model which is obtained from the training phase. The trained model is utilized for classifying the incremental data. After classification, the output obtained from each mapper is fused and fed into the reducer for the classification.FindingsThe maximum accuracy and Jaccard coefficient are obtained using the epileptic seizure recognition database. The proposed CBF-DBN produces a maximal accuracy value of 91.129%, whereas the accuracy values of the existing neural network (NN), DBN, naive Bayes classifier-term frequency–inverse document frequency (NBC-TFIDF) are 82.894%, 86.184% and 86.512%, respectively. The Jaccard coefficient of the proposed CBF-DBN produces a maximal Jaccard coefficient value of 88.928%, whereas the Jaccard coefficient values of the existing NN, DBN, NBC-TFIDF are 75.891%, 79.850% and 81.103%, respectively.Originality/valueIn this paper, a big data classification method is proposed for categorizing massive data sets for meeting the constraints of huge data. The big data classification is performed on the MapReduce framework based on training and testing phases in such a way that the data are handled in parallel at the same time. In the training phase, the big data is obtained and partitioned into different subsets of data and fed into the mapper. In the mapper, the features extraction step is performed for extracting the significant features. The obtained features are subjected to the reducers for classifying the data using the obtained features. The DBN classifier is utilized for the classification wherein the DBN is trained using the proposed CBF algorithm. The trained model is obtained as an output after the classification. In the testing phase, the incremental data are considered for the classification. New data are first split into subsets and fed into the mapper for classification. The trained models obtained from the training phase are used for the classification. The classified results from each mapper are fused and fed into the reducer for the classification of big data.

Download Full-text

The Research of Key Data Classification Optimal Mining Methods for Massive Data

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.989-994.2001 ◽

2014 ◽

Vol 989-994 ◽

pp. 2001-2003

Author(s):

Yin Ying Li ◽

Yun Ze Wang

Keyword(s):

Data Classification ◽

Classification Performance ◽

Classification Algorithm ◽

Massive Data ◽

Data Sets ◽

Simulation Experiments ◽

Cell Classification ◽

Characteristic Points ◽

Mining Methods ◽

Optimal Characteristic

The data classification is an important issue in massive data classification. This paper proposes an inter-cell classification algorithm based on phase recombination neighbor points convergence which analyzes the convergence value weights of inter-cell characteristic points and filter the interferences of the minority local optimal characteristic points. The proposed algorithm can promote the convergence of the inter-cell classification data neighbor points. The simulation experiments testify the models by three types of actually collected data sets which illustrate the models have better classification performance.

Download Full-text

Kernel fuzzy C- means clustering with teaching learning based optimization algorithm (TLBO-KFCM)

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189771 ◽

2021 ◽

pp. 1-9

Author(s):

Saumya Singh ◽

Smriti Srivastava

Keyword(s):

Genetic Algorithm ◽

Data Analysis ◽

Optimization Algorithm ◽

Clustering Algorithm ◽

Data Sets ◽

New Approach ◽

Fuzzy C Means ◽

Fuzzy C Means Clustering ◽

Teaching Learning Based Optimization ◽

Teaching Learning

In the field of data analysis clustering is considered to be a major tool. Application of clustering in various field of science, has led to advancement in clustering algorithm. Traditional clustering algorithm have lot of defects, while these defects have been addressed but no clustering algorithm can be considered as superior. A new approach based on Kernel Fuzzy C-means clustering using teaching learning-based optimization algorithm (TLBO-KFCM) is proposed in this paper. Kernel function used in this algorithm improves separation and makes clustering more apprehensive. Teaching learning-based optimization algorithm discussed in the paper helps to improve clustering compactness. Simulation using five data sets are performed and the results are compared with two other optimization algorithms (genetic algorithm GA and particle swam optimization PSO). Results show that the proposed clustering algorithm has better performance. Another simulation on same set of data is also performed, and clustering results of TLBO-KFCM are compared with teaching learning-based optimization algorithm with Fuzzy C- Means Clustering (TLBO-FCM).

Download Full-text

AN EFFICIENTFUZZY C-MEANS CLUSTERING ALGORITHM FOR MULTI-VALUED DATA SETS

INFORMATION TECHNOLOGY IN INDUSTRY ◽

10.17762/itii.v9i1.265 ◽

2021 ◽

Vol 9 (1) ◽

pp. 1250-1264

Author(s):

P Gopala Krishna, D Lalitha Bhaskari

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Data Sets ◽

Similar Data ◽

Job Descriptions ◽

Fuzzy C Means ◽

Fuzzy C Means Clustering ◽

Ship Function ◽

Partition Clustering ◽

Diverse Data

In data analysis, items were mostly described by a set of characteristics called features, in which each feature contains only single value for each object. Even so, in existence, some features may include more than one value, such as a person with different job descriptions, activities, phone numbers, skills and different mailing addresses. Such features may be called as multi-valued features, and are mostly classified as null features while analyzing the data using machine learning and data mining techniques. In this paper, it is proposed a proximity function to be described between two substances with multi-valued features that are put into effect for clustering.The suggested distance approach allows iterative measurements of the similarities around objects as well as their characteristics. For facilitating the most suitable multi-valued factors, we put forward a model targeting at determining each factor’s relative prominence for diverse data extracting problems. The proposed algorithm is a partition clustering strategy that uses fuzzy c- means clustering for evolutions, which is using the novel member ship function by utilizing the proposed similarity measure. The proposed clustering algorithm as fuzzy c- means based Clustering of Multivalued Attribute Data (FCM-MVA).Therefore this becomes feasible using any mechanisms for cluster analysis to group similar data. The findings demonstrate that our test not only improves the performance the traditional measure of similarity but also outperforms other clustering algorithms on the multi-valued clustering framework.

Download Full-text

Neural Network Based Fuzzy C-MEANS Clustering Algorithm

International Journal of Electronics Signals and Systems ◽

10.47893/ijess.2011.1020 ◽

2011 ◽

pp. 100-104

Author(s):

Suneetha Chittinen ◽

Dr. Raveendra Babu Bhogapathi

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Data Sets ◽

Fuzzy C Means ◽

Fast Learning ◽

Presentation Sequence ◽

Fuzzy C Means Clustering ◽

Artificial Neural

In this paper, fuzzy c-means algorithm uses neural network algorithm is presented. In pattern recognition, fuzzy clustering algorithms have demonstrated advantage over crisp clustering algorithms to group the high dimensional data into clusters. The proposed work involves two steps. First, a recently developed and Enhanced Kmeans Fast Leaning Artificial Neural Network (KFLANN) frame work is used to determine cluster centers. Secondly, Fuzzy C-means uses these cluster centers to generate fuzzy membership functions. Enhanced K-means Fast Learning Artificial Neural Network (KFLANN) is an algorithm which produces consistent classification of the vectors in to the same clusters regardless of the data presentation sequence. Experiments are conducted on two artificial data sets Iris and New Thyroid. The result shows that Enhanced KFLANN is faster to generate consistent cluster centers and utilizes these for elicitation of efficient fuzzy memberships.

Download Full-text

Automatic Location of the Talairach Cortical Landmarks from T2-Weighted MR Images

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.467-469.629 ◽

2011 ◽

Vol 467-469 ◽

pp. 629-634

Author(s):

Yi Li Fu ◽

Guang Cai Zhang ◽

Qiu Yue Chang ◽

Shu Guo Wang ◽

Xian Wei Han

Keyword(s):

Clustering Algorithm ◽

Brain Atlas ◽

Watershed Algorithm ◽

Data Sets ◽

Gray Level ◽

Region Merging ◽

Mr Images ◽

Automatic Location ◽

Fuzzy C Means Clustering ◽

The Mean

For labeling the T2-weighted MR images using human brain atlas, it is prerequisite to the foundation of the Talairach space for T2W MR images, and the basic condition to found Talairach space is the location of Talairach cortical landmarks from T2W MR images. A method to locate the Talairach cortical landmarks from T2W MR images is proposed, it consists of three aspects: Firstly, determine the planes including the six cortical landmarks ; segment the planes based on fuzzy C-means clustering algorithm, gray level projection, watershed algorithm, region merging, thresholding, and morphologic operations; locate the cortical landmarks from the segmented planes. The algorithm has been validated quantitatively with 20 T2W MR images data sets. The mean errors of the Talairach cortical landmarks were below 1.00 mm. It took about 8 seconds for identifying them on P4 3.0 GHz. This fast, robust algorithm is potentially useful in clinic and for research.

Download Full-text

Quantum-Behaved Particle Swarm Optimization Dynamic Clustering Algorithm

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.694-697.2757 ◽

2013 ◽

Vol 694-697 ◽

pp. 2757-2760 ◽

Cited By ~ 1

Author(s):

Chun Yan Zhang ◽

Wei Chen

Keyword(s):

Particle Swarm Optimization ◽

Clustering Algorithm ◽

Learning Strategy ◽

Clustering Algorithms ◽

Particle Swarm ◽

Optimal Number ◽

Data Sets ◽

Dynamic Clustering ◽

Swarm Optimization ◽

Coding Method

This paper proposed a revised quantum-behaved particle swarm optimization algorithm utilizing comprehensive learning strategy to prevent the universal tendency of premature convergence, based on which introduced a novel data clustering algorithm as well. The optimal number of cluster could be automatically obtained by this novel clustering algorithm because a new special coding method for particles was used. Compared with another two dynamic clustering algorithms on five testing data sets, the proposed dynamic clustering algorithm based on the comprehensive learning strategy has the best performance and with the best potential application prospect.

Download Full-text