Clustering Based on NMTF Algorithm

2013 ◽  
Vol 718-720 ◽  
pp. 2365-2369
Author(s):  
Lei Huang ◽  
Chan Le Wu

NMTF(Normalizing Mapping Training Framework) operates by mapping initial cluster centers and then iteratively training points to clusters base on the proximate cluster center and updating cluster centers. we regard finding good cluster centers as a normalizing parameter estimation problem then constructing the parameters of other normalizing models yields a space of novel clustering methods. In this paper we propose the idea using abstract of a text to members of a data partition in place of estimation of cluster centers. The method can accurately reconstruct meaning representation group used to generate a given data set.

2013 ◽  
Vol 333-335 ◽  
pp. 1269-1272
Author(s):  
Guang Hui Chen

this paper proposes a hierarchical division method that divides a data set into two subsets along each dimension, and merges them into a division of the data set. Then the initial cluster centers are located in dense and separate subsets of the data set, and the means of data point in these subsets are selected as the initial cluster centers. Thus a new cluster center initialization method is developed. Experiments on real data sets show that the proposed cluster center initialization method is desirable.


2014 ◽  
Vol 998-999 ◽  
pp. 873-877
Author(s):  
Zhen Bo Wang ◽  
Bao Zhi Qiu

To reduce the impact of irrelevant attributes on clustering results, and improve the importance of relevant attributes to clustering, this paper proposes fuzzy C-means clustering algorithm based on coefficient of variation (CV-FCM). In the algorithm, coefficient of variation is used to weigh attributes so as to assign different weights to each attribute in the data set, and the magnitude of weight is used to express the importance of different attributes to clusters. In addition, for the characteristic of fuzzy C-means clustering algorithm that it is susceptible to initial cluster center value, the method for the selection of initial cluster center based on maximum distance is introduced on the basis of weighted coefficient of variation. The result of the experiment based on real data sets shows that this algorithm can select cluster center effectively, with the clustering result superior to general fuzzy C-means clustering algorithms.


Processes ◽  
2019 ◽  
Vol 7 (2) ◽  
pp. 75 ◽  
Author(s):  
Kris Villez ◽  
Julien Billeter ◽  
Dominique Bonvin

The computation and modeling of extents has been proposed to handle the complexity of large-scale model identification tasks. Unfortunately, the existing extent-based framework only applies when certain conditions apply. Most typically, it is required that a unique value for each extent can be computed. This severely limits the applicability of this approach. In this work, we propose a novel procedure for parameter estimation inspired by the existing extent-based framework. A key difference with prior work is that the proposed procedure combines structural observability labeling, matrix factorization, and graph-based system partitioning to split the original model parameter estimation problem into parameter estimation problems with the least number of parameters. The value of the proposed method is demonstrated with an extensive simulation study and a study based on a historical data set collected to characterize the isomerization of α -pinene. Most importantly, the obtained results indicate that an important barrier to the application of extent-based frameworks for process modeling and monitoring tasks has been lifted.


Energies ◽  
2021 ◽  
Vol 14 (21) ◽  
pp. 6889
Author(s):  
Yuxin Huang ◽  
Jingdao Fan ◽  
Zhenguo Yan ◽  
Shugang Li ◽  
Yanping Wang

In the process of gas prediction and early warning, outliers in the data series are often discarded. There is also a likelihood of missing key information in the analysis process. To this end, this paper proposes an early warning model of coal face gas multifactor coupling relationship analysis. The model contains the k-means algorithm based on initial cluster center optimization and an Apriori algorithm based on weight optimization. Optimizing the initial cluster center of all data is achieved using the cluster center of the preorder data subset, so as to optimize the k-means algorithm. The optimized algorithm is used to filter out the outliers in the collected data set to obtain the data set of outliers. Then, the Apriori algorithm is optimized so that it can identify more important information that appears less frequently in the events. It is also used to mine and analyze the association rules of abnormal values and obtain interesting association rule events among the gas outliers in different dimensions. Finally, four warning levels of gas risk are set according to different confidence intervals, the truth and reliable warning results are obtained. By mining association rules between abnormal data in different dimensions, the validity and effectiveness of the gas early warning model proposed in this paper are verified. Realizing the classification of early warning of gas risks has important practical significance for improving the safety of coal mines.


Kybernetes ◽  
2016 ◽  
Vol 45 (8) ◽  
pp. 1273-1291 ◽  
Author(s):  
Runhai Jiao ◽  
Shaolong Liu ◽  
Wu Wen ◽  
Biying Lin

Purpose The large volume of big data makes it impractical for traditional clustering algorithms which are usually designed for entire data set. The purpose of this paper is to focus on incremental clustering which divides data into series of data chunks and only a small amount of data need to be clustered at each time. Few researches on incremental clustering algorithm address the problem of optimizing cluster center initialization for each data chunk and selecting multiple passing points for each cluster. Design/methodology/approach Through optimizing initial cluster centers, quality of clustering results is improved for each data chunk and then quality of final clustering results is enhanced. Moreover, through selecting multiple passing points, more accurate information is passed down to improve the final clustering results. The method has been proposed to solve those two problems and is applied in the proposed algorithm based on streaming kernel fuzzy c-means (stKFCM) algorithm. Findings Experimental results show that the proposed algorithm demonstrates more accuracy and better performance than streaming kernel stKFCM algorithm. Originality/value This paper addresses the problem of improving the performance of increment clustering through optimizing cluster center initialization and selecting multiple passing points. The paper analyzed the performance of the proposed scheme and proved its effectiveness.


2014 ◽  
Vol 644-650 ◽  
pp. 2047-2050
Author(s):  
Li Ying Cao ◽  
He Long Yu ◽  
Gui Fen Chen ◽  
Ting Ting Yang

precision agriculture, soil fertility evaluation is the foundation of variable rate fertilization, the initial clustering centers of K means algorithm soil fertility levels in the traditional evaluation methods generated randomly from the data set, the clustering result is not stable. This paper proposes an improved K-means algorithm density algorithm to optimize the initial clustering center selection algorithm based on K, the most far away to each other in high density region point as the initial cluster center. Experiments show that, the improved K-means algorithm can eliminate the dependence on the initial cluster center; the clustering result has been greatly improved.


Entropy ◽  
2018 ◽  
Vol 20 (10) ◽  
pp. 723 ◽  
Author(s):  
Cenker Bicer

The geometric process (GP) is a simple and direct approach to modeling of the successive inter-arrival time data set with a monotonic trend. In addition, it is a quite important alternative to the non-homogeneous Poisson process. In the present paper, the parameter estimation problem for GP is considered, when the distribution of the first occurrence time is Power Lindley with parameters α and λ . To overcome the parameter estimation problem for GP, the maximum likelihood, modified moments, modified L-moments and modified least-squares estimators are obtained for parameters a, α and λ . The mean, bias and mean squared error (MSE) values associated with these estimators are evaluated for small, moderate and large sample sizes by using Monte Carlo simulations. Furthermore, two illustrative examples using real data sets are presented in the paper.


2008 ◽  
Vol 06 (02) ◽  
pp. 261-282 ◽  
Author(s):  
AO YUAN ◽  
WENQING HE

Clustering is a major tool for microarray gene expression data analysis. The existing clustering methods fall mainly into two categories: parametric and nonparametric. The parametric methods generally assume a mixture of parametric subdistributions. When the mixture distribution approximately fits the true data generating mechanism, the parametric methods perform well, but not so when there is nonnegligible deviation between them. On the other hand, the nonparametric methods, which usually do not make distributional assumptions, are robust but pay the price for efficiency loss. In an attempt to utilize the known mixture form to increase efficiency, and to free assumptions about the unknown subdistributions to enhance robustness, we propose a semiparametric method for clustering. The proposed approach possesses the form of parametric mixture, with no assumptions to the subdistributions. The subdistributions are estimated nonparametrically, with constraints just being imposed on the modes. An expectation-maximization (EM) algorithm along with a classification step is invoked to cluster the data, and a modified Bayesian information criterion (BIC) is employed to guide the determination of the optimal number of clusters. Simulation studies are conducted to assess the performance and the robustness of the proposed method. The results show that the proposed method yields reasonable partition of the data. As an illustration, the proposed method is applied to a real microarray data set to cluster genes.


Sign in / Sign up

Export Citation Format

Share Document