Slice_OP: Selecting Initial Cluster Centers Using Observation Points

Author(s):  
Md Abdul Masud ◽  
Joshua Zhexue Huang ◽  
Ming Zhong ◽  
Xianghua Fu ◽  
Mohammad Sultan Mahmud
Keyword(s):  
2012 ◽  
Vol 40 (3) ◽  
pp. 539-566 ◽  
Author(s):  
MITSUHIKO OTA ◽  
SAM J. GREEN

ABSTRACTAlthough it has been often hypothesized that children learn to produce new sound patterns first in frequently heard words, the available evidence in support of this claim is inconclusive. To re-examine this question, we conducted a survival analysis of word-initial consonant clusters produced by three children in the Providence Corpus (0 ; 11–4 ; 0). The analysis took account of several lexical factors in addition to lexical input frequency, including the age of first production, production frequency, neighborhood density and number of phonemes. The results showed that lexical input frequency was a significant predictor of the age at which the accuracy level of cluster production in each word first reached 80%. The magnitude of the frequency effect differed across cluster types. Our findings indicate that some of the between-word variance found in the development of sound production can indeed be attributed to the frequency of words in the child's ambient language.


2021 ◽  
pp. 1-14
Author(s):  
Zhenggang Wang ◽  
Jin Jin

Remote sensing image segmentation provides technical support for decision making in many areas of environmental resource management. But, the quality of the remote sensing images obtained from different channels can vary considerably, and manually labeling a mass amount of image data is too expensive and Inefficiently. In this paper, we propose a point density force field clustering (PDFC) process. According to the spectral information from different ground objects, remote sensing superpixel points are divided into core and edge data points. The differences in the densities of core data points are used to form the local peak. The center of the initial cluster can be determined by the weighted density and position of the local peak. An iterative nebular clustering process is used to obtain the result, and a proposed new objective function is used to optimize the model parameters automatically to obtain the global optimal clustering solution. The proposed algorithm can cluster the area of different ground objects in remote sensing images automatically, and these categories are then labeled by humans simply.


2017 ◽  
Vol 8 (4) ◽  
pp. 99-112 ◽  
Author(s):  
Rojalina Priyadarshini ◽  
Rabindra Kumar Barik ◽  
Nilamadhab Dash ◽  
Brojo Kishore Mishra ◽  
Rachita Misra

Lots of research has been carried out globally to design a machine classifier which could predict it from some physical and bio-medical parameters. In this work a hybrid machine learning classifier has been proposed to design an artificial predictor to correctly classify diabetic and non-diabetic people. The classifier is an amalgamation of the widely used K-means algorithm and Gravitational search algorithm (GSA). GSA has been used as an optimization tool which will compute the best centroids from the two classes of training data; the positive class (who are diabetic) and negative class (who are non-diabetic). In K-means algorithm instead of using random samples as initial cluster head, the optimized centroids from GSA are used as the cluster centers. The inherent problem associated with k-means algorithm is the initial placement of cluster centers, which may cause convergence delay thereby degrading the overall performance. This problem is tried to overcome by using a combined GSA and K-means.


The proposed research work aims to perform the cluster analysis in the field of Precision Agriculture. The k-means technique is implemented to cluster the agriculture data. Selecting K value plays a major role in k-mean algorithm. Different techniques are used to identify the number of cluster value (k-value). Identification of suitable initial centroid has an important role in k-means algorithm. In general it will be selected randomly. In the proposed work to get the stability in the result Hybrid K-Mean clustering is used to identify the initial centroids. Since initial cluster centers are well defined Hybrid K-Means acts as a stable clustering technique.


Author(s):  
Yunming Ye ◽  
Joshua Zhexue Huang ◽  
Xiaojun Chen ◽  
Shuigeng Zhou ◽  
Graham Williams ◽  
...  

Author(s):  
Rashmi Nadubeediramesh ◽  
Aryya Gangopadhyay

Incremental document clustering is important in many applications, but particularly so in healthcare contexts where text data is found in abundance, ranging from published research in journals to day-to-day healthcare data such as discharge summaries and nursing notes. In such dynamic environments new documents are constantly added to the set of documents that have been used in the initial cluster formation. Hence it is important to be able to incrementally update the clusters at a low computational cost as new documents are added. In this paper the authors describe a novel, low cost approach for incremental document clustering. Their method is based on conducting singular value decomposition (SVD) incrementally. They dynamically fold in new documents into the existing term-document space and dynamically assign these new documents into pre-defined clusters based on intra-cluster similarity. This saves the cost of re-computing SVD on the entire document set every time updates occur. The authors also provide a way to retrieve documents based on different window sizes with high scalability and good clustering accuracy. They have tested their proposed method experimentally with 960 medical abstracts retrieved from the PubMed medical library. The authors’ incremental method is compared with the default situation where complete re-computation of SVD is done when new documents are added to the initial set of documents. The results show minor decreases in the quality of the cluster formation but much larger gains in computational throughput.


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Ziqi Jia ◽  
Ling Song

The k-prototypes algorithm is a hybrid clustering algorithm that can process Categorical Data and Numerical Data. In this study, the method of initial Cluster Center selection was improved and a new Hybrid Dissimilarity Coefficient was proposed. Based on the proposed Hybrid Dissimilarity Coefficient, a weighted k-prototype clustering algorithm based on the hybrid dissimilarity coefficient was proposed (WKPCA). The proposed WKPCA algorithm not only improves the selection of initial Cluster Centers, but also puts a new method to calculate the dissimilarity between data objects and Cluster Centers. The real dataset of UCI was used to test the WKPCA algorithm. Experimental results show that WKPCA algorithm is more efficient and robust than other k-prototypes algorithms.


Sign in / Sign up

Export Citation Format

Share Document