Clustering Based on NMTF Algorithm

NMTF(Normalizing Mapping Training Framework) operates by mapping initial cluster centers and then iteratively training points to clusters base on the proximate cluster center and updating cluster centers. we regard finding good cluster centers as a normalizing parameter estimation problem then constructing the parameters of other normalizing models yields a space of novel clustering methods. In this paper we propose the idea using abstract of a text to members of a data partition in place of estimation of cluster centers. The method can accurately reconstruct meaning representation group used to generate a given data set.

Download Full-text

A Center Initialization Method Based on Merger of Divisions of a Data Set along Each Dimension

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.333-335.1269 ◽

2013 ◽

Vol 333-335 ◽

pp. 1269-1272

Author(s):

Guang Hui Chen

Keyword(s):

Real Data ◽

Cluster Center ◽

Data Sets ◽

Data Set ◽

Initial Cluster ◽

Data Point

this paper proposes a hierarchical division method that divides a data set into two subsets along each dimension, and merges them into a division of the data set. Then the initial cluster centers are located in dense and separate subsets of the data set, and the means of data point in these subsets are selected as the initial cluster centers. Thus a new cluster center initialization method is developed. Experiments on real data sets show that the proposed cluster center initialization method is desirable.

Download Full-text

Fuzzy C-Means Clustering Algorithm Based on Coefficient of Variation

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.998-999.873 ◽

2014 ◽

Vol 998-999 ◽

pp. 873-877

Author(s):

Zhen Bo Wang ◽

Bao Zhi Qiu

Keyword(s):

Coefficient Of Variation ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Real Data ◽

Cluster Center ◽

Data Set ◽

Fuzzy C Means ◽

Initial Cluster ◽

Fuzzy C Means Clustering ◽

The Impact

To reduce the impact of irrelevant attributes on clustering results, and improve the importance of relevant attributes to clustering, this paper proposes fuzzy C-means clustering algorithm based on coefficient of variation (CV-FCM). In the algorithm, coefficient of variation is used to weigh attributes so as to assign different weights to each attribute in the data set, and the magnitude of weight is used to express the importance of different attributes to clusters. In addition, for the characteristic of fuzzy C-means clustering algorithm that it is susceptible to initial cluster center value, the method for the selection of initial cluster center based on maximum distance is introduced on the basis of weighted coefficient of variation. The result of the experiment based on real data sets shows that this algorithm can select cluster center effectively, with the clustering result superior to general fuzzy C-means clustering algorithms.

Download Full-text

Incremental Parameter Estimation under Rank-Deficient Measurement Conditions

Processes ◽

10.3390/pr7020075 ◽

2019 ◽

Vol 7 (2) ◽

pp. 75 ◽

Cited By ~ 1

Author(s):

Kris Villez ◽

Julien Billeter ◽

Dominique Bonvin

Keyword(s):

Parameter Estimation ◽

Large Scale ◽

Model Identification ◽

Scale Model ◽

Parameter Estimation Problem ◽

Data Set ◽

Large Scale Model ◽

Estimation Problems ◽

Important Barrier ◽

System Partitioning

The computation and modeling of extents has been proposed to handle the complexity of large-scale model identification tasks. Unfortunately, the existing extent-based framework only applies when certain conditions apply. Most typically, it is required that a unique value for each extent can be computed. This severely limits the applicability of this approach. In this work, we propose a novel procedure for parameter estimation inspired by the existing extent-based framework. A key difference with prior work is that the proposed procedure combines structural observability labeling, matrix factorization, and graph-based system partitioning to split the original model parameter estimation problem into parameter estimation problems with the least number of parameters. The value of the proposed method is demonstrated with an extensive simulation study and a study based on a historical data set collected to characterize the isomerization of α -pinene. Most importantly, the obtained results indicate that an important barrier to the application of extent-based frameworks for process modeling and monitoring tasks has been lifted.

Download Full-text

Research on Early Warning for Gas Risks at a Working Face Based on Association Rule Mining

Energies ◽

10.3390/en14216889 ◽

2021 ◽

Vol 14 (21) ◽

pp. 6889

Author(s):

Yuxin Huang ◽

Jingdao Fan ◽

Zhenguo Yan ◽

Shugang Li ◽

Yanping Wang

Keyword(s):

Association Rules ◽

Early Warning ◽

Association Rule ◽

Cluster Center ◽

Apriori Algorithm ◽

Data Set ◽

Early Warning Model ◽

Initial Cluster ◽

Different Dimensions ◽

Warning Model

In the process of gas prediction and early warning, outliers in the data series are often discarded. There is also a likelihood of missing key information in the analysis process. To this end, this paper proposes an early warning model of coal face gas multifactor coupling relationship analysis. The model contains the k-means algorithm based on initial cluster center optimization and an Apriori algorithm based on weight optimization. Optimizing the initial cluster center of all data is achieved using the cluster center of the preorder data subset, so as to optimize the k-means algorithm. The optimized algorithm is used to filter out the outliers in the collected data set to obtain the data set of outliers. Then, the Apriori algorithm is optimized so that it can identify more important information that appears less frequently in the events. It is also used to mine and analyze the association rules of abnormal values and obtain interesting association rule events among the gas outliers in different dimensions. Finally, four warning levels of gas risk are set according to different confidence intervals, the truth and reliable warning results are obtained. By mining association rules between abnormal data in different dimensions, the validity and effectiveness of the gas early warning model proposed in this paper are verified. Realizing the classification of early warning of gas risks has important practical significance for improving the safety of coal mines.

Download Full-text

Incremental kernel fuzzy c-means with optimizing cluster center initialization and delivery

Kybernetes ◽

10.1108/k-08-2015-0209 ◽

2016 ◽

Vol 45 (8) ◽

pp. 1273-1291 ◽

Cited By ~ 1

Author(s):

Runhai Jiao ◽

Shaolong Liu ◽

Wu Wen ◽

Biying Lin

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Cluster Center ◽

Accurate Information ◽

Incremental Clustering ◽

Data Set ◽

Content Type ◽

Fuzzy C Means ◽

Initial Cluster

Purpose The large volume of big data makes it impractical for traditional clustering algorithms which are usually designed for entire data set. The purpose of this paper is to focus on incremental clustering which divides data into series of data chunks and only a small amount of data need to be clustered at each time. Few researches on incremental clustering algorithm address the problem of optimizing cluster center initialization for each data chunk and selecting multiple passing points for each cluster. Design/methodology/approach Through optimizing initial cluster centers, quality of clustering results is improved for each data chunk and then quality of final clustering results is enhanced. Moreover, through selecting multiple passing points, more accurate information is passed down to improve the final clustering results. The method has been proposed to solve those two problems and is applied in the proposed algorithm based on streaming kernel fuzzy c-means (stKFCM) algorithm. Findings Experimental results show that the proposed algorithm demonstrates more accuracy and better performance than streaming kernel stKFCM algorithm. Originality/value This paper addresses the problem of improving the performance of increment clustering through optimizing cluster center initialization and selecting multiple passing points. The paper analyzed the performance of the proposed scheme and proved its effectiveness.

Download Full-text

Application of Improved K-Means Algorithm Density in the Grades of Cultivated Land Fertility Evaluation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.644-650.2047 ◽

2014 ◽

Vol 644-650 ◽

pp. 2047-2050

Author(s):

Li Ying Cao ◽

He Long Yu ◽

Gui Fen Chen ◽

Ting Ting Yang

Keyword(s):

Soil Fertility ◽

Precision Agriculture ◽

Cluster Center ◽

Selection Algorithm ◽

Density Region ◽

Data Set ◽

Agriculture Soil ◽

Initial Cluster ◽

High Density Region ◽

Clustering Center

precision agriculture, soil fertility evaluation is the foundation of variable rate fertilization, the initial clustering centers of K means algorithm soil fertility levels in the traditional evaluation methods generated randomly from the data set, the clustering result is not stable. This paper proposes an improved K-means algorithm density algorithm to optimize the initial clustering center selection algorithm based on K, the most far away to each other in high density region point as the initial cluster center. Experiments show that, the improved K-means algorithm can eliminate the dependence on the initial cluster center; the clustering result has been greatly improved.

Download Full-text

Statistical Inference for Geometric Process with the Power Lindley Distribution

Entropy ◽

10.3390/e20100723 ◽

2018 ◽

Vol 20 (10) ◽

pp. 723 ◽

Cited By ~ 4

Author(s):

Cenker Bicer

Keyword(s):

Parameter Estimation ◽

Mean Squared Error ◽

Real Data ◽

Estimation Problem ◽

Data Sets ◽

Time Data ◽

Parameter Estimation Problem ◽

Data Set ◽

Monotonic Trend ◽

Geometric Process

The geometric process (GP) is a simple and direct approach to modeling of the successive inter-arrival time data set with a monotonic trend. In addition, it is a quite important alternative to the non-homogeneous Poisson process. In the present paper, the parameter estimation problem for GP is considered, when the distribution of the first occurrence time is Power Lindley with parameters α and λ . To overcome the parameter estimation problem for GP, the maximum likelihood, modified moments, modified L-moments and modified least-squares estimators are obtained for parameters a, α and λ . The mean, bias and mean squared error (MSE) values associated with these estimators are evaluated for small, moderate and large sample sizes by using Monte Carlo simulations. Furthermore, two illustrative examples using real data sets are presented in the paper.

Download Full-text

Nonlinear Least Squares Parameter Estimation Problem Using Levenberg-Marquardt Method

Journal of Environmental Science Computer Science and Engineering & Technology ◽

10.24214/jecet.c.6.3.30313 ◽

2017 ◽

Vol 6 (3) ◽

Keyword(s):

Parameter Estimation ◽

Least Squares ◽

Nonlinear Least Squares ◽

Estimation Problem ◽

Parameter Estimation Problem ◽

Marquardt Method ◽

Levenberg Marquardt

Download Full-text

SEMIPARAMETRIC CLUSTERING METHOD FOR MICROARRAY DATA ANALYSIS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972000800345x ◽

2008 ◽

Vol 06 (02) ◽

pp. 261-282 ◽

Cited By ~ 2

Author(s):

AO YUAN ◽

WENQING HE

Keyword(s):

Data Analysis ◽

Microarray Data ◽

Mixture Distribution ◽

Information Criterion ◽

Optimal Number ◽

Microarray Data Analysis ◽

Parametric Methods ◽

Clustering Methods ◽

Microarray Gene Expression ◽

Data Set

Clustering is a major tool for microarray gene expression data analysis. The existing clustering methods fall mainly into two categories: parametric and nonparametric. The parametric methods generally assume a mixture of parametric subdistributions. When the mixture distribution approximately fits the true data generating mechanism, the parametric methods perform well, but not so when there is nonnegligible deviation between them. On the other hand, the nonparametric methods, which usually do not make distributional assumptions, are robust but pay the price for efficiency loss. In an attempt to utilize the known mixture form to increase efficiency, and to free assumptions about the unknown subdistributions to enhance robustness, we propose a semiparametric method for clustering. The proposed approach possesses the form of parametric mixture, with no assumptions to the subdistributions. The subdistributions are estimated nonparametrically, with constraints just being imposed on the modes. An expectation-maximization (EM) algorithm along with a classification step is invoked to cluster the data, and a modified Bayesian information criterion (BIC) is employed to guide the determination of the optimal number of clusters. Simulation studies are conducted to assess the performance and the robustness of the proposed method. The results show that the proposed method yields reasonable partition of the data. As an illustration, the proposed method is applied to a real microarray data set to cluster genes.

Download Full-text