initial cluster Latest Research Papers

Research on Density-Based K-means Clustering Algorithm

Journal of Physics Conference Series ◽

10.1088/1742-6596/2137/1/012071 ◽

2021 ◽

Vol 2137 (1) ◽

pp. 012071

Author(s):

Shuxin Liu ◽

Xiangdong Liu

Keyword(s):

Clustering Algorithm ◽

Cluster Center ◽

Basic Principles ◽

K Value ◽

Advantages And Disadvantages ◽

Dbscan Algorithm ◽

Initial Cluster ◽

Development Direction ◽

Density Clustering ◽

Improved Algorithm

Abstract Cluster analysis is an unsupervised learning process, and its most classic algorithm K-means has the advantages of simple principle and easy implementation. In view of the K-means algorithm’s shortcoming, where is arbitrary processing of clusters k value, initial cluster center and outlier points. This paper discusses the improvement of traditional K-means algorithm and puts forward an improved algorithm with density clustering algorithm. First, it describes the basic principles and process of the K-means algorithm and the DBSCAN algorithm. Then summarizes improvement methods with the three aspects and their advantages and disadvantages, at the same time proposes a new density-based K-means improved algorithm. Finally, it prospects the development direction and trend of the density-based K-means clustering algorithm.

Combining Cluster Sampling and Link-Tracing Sampling to Estimate Totals and Means of Hidden Populations in Presence of Heterogeneous Probabilities of Links

Journal of Official Statistics ◽

10.2478/jos-2021-0038 ◽

2021 ◽

Vol 37 (4) ◽

pp. 865-905

Author(s):

Martín Humberto Félix-Medina

Keyword(s):

Confidence Intervals ◽

Sex Workers ◽

Drug Users ◽

Real Data ◽

Cluster Sampling ◽

Hidden Populations ◽

Numerical Studies ◽

Inclusion Probabilities ◽

Initial Cluster ◽

Variance Estimates

Abstract We propose Horvitz-Thompson-like and Hájek-like estimators of the total and mean of a response variable associated with the elements of a hard-to-reach population, such as drug users and sex workers. A portion of the population is assumed to be covered by a frame of venues where the members of the population tend to gather. An initial cluster sample of elements is selected from the frame, where the clusters are the venues, and the elements in the sample are asked to name their contacts who belong to the population. The sample size is increased by including in the sample the named elements who are not in the initial sample. The proposed estimators do not use design-based inclusion probabilities, but model-based inclusion probabilities which are derived from a Rasch model and are estimated by maximum likelihood estimators. The inclusion probabilities are assumed to be heterogeneous, that is, they depend on the sampled people. Variance estimates are obtained by bootstrap and are used to construct confidence intervals. The performance of the proposed estimators and confidence intervals is evaluated by two numerical studies, one of them based on real data, and the results show that their performance is acceptable.

Adaptive Initialization Method for K-Means Algorithm

Frontiers in Artificial Intelligence ◽

10.3389/frai.2021.740817 ◽

2021 ◽

Vol 4 ◽

Author(s):

Jie Yang ◽

Yu-Kai Wang ◽

Xin Yao ◽

Chin-Teng Lin

Keyword(s):

Time Complexity ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Real Life ◽

Superior Performance ◽

Local Optima ◽

Initial Cluster ◽

Higher Dimensional ◽

Real World Datasets ◽

Random Method

The K-means algorithm is a widely used clustering algorithm that offers simplicity and efficiency. However, the traditional K-means algorithm uses a random method to determine the initial cluster centers, which make clustering results prone to local optima and then result in worse clustering performance. In this research, we propose an adaptive initialization method for the K-means algorithm (AIMK) which can adapt to the various characteristics in different datasets and obtain better clustering performance with stable results. For larger or higher-dimensional datasets, we even leverage random sampling in AIMK (name as AIMK-RS) to reduce the time complexity. 22 real-world datasets were applied for performance comparisons. The experimental results show AIMK and AIMK-RS outperform the current initialization methods and several well-known clustering algorithms. Specifically, AIMK-RS can significantly reduce the time complexity to O (n). Moreover, we exploit AIMK to initialize K-medoids and spectral clustering, and better performance is also explored. The above results demonstrate superior performance and good scalability by AIMK or AIMK-RS. In the future, we would like to apply AIMK to more partition-based clustering algorithms to solve real-life practical problems.

Syllabic organization and phonetic indices in German stop-lateral clusters

Laboratory Phonology Journal of the Association for Laboratory Phonology ◽

10.16995/labphon.6440 ◽

2021 ◽

Author(s):

Stavroula Sotiropoulou ◽

Adamantios Gafos

Keyword(s):

Target Word ◽

Phrase Boundary ◽

Word Context ◽

Initial Cluster ◽

Prosodic Boundary ◽

Coordination Patterns ◽

Target Words

Using articulatory data from five German speakers, we study how segmental sequences under different syllabic organizations respond to perturbations of phonetic parameters in the segments that compose them. Target words contained stop-lateral clusters /bl, gl, kl, pl/ in a word-initial and a cross-word context and were embedded in carrier phrases with different prosodic boundary strengths, i.e., no phrase boundary versus an utterance phrase boundary preceded the target word in the case of word-initial clusters or separated the consonants in the case of cross-word clusters. For word-initial cluster onsets, we find that increasing the lag between two consonants and C1 stop duration leads to earlier vowel initiation and reduced local timing stability across CV and CCV. Furthermore, as the inter-consonantal lag increases, C2 lateral duration decreases. In contrast, for cross-word clusters, increasing the lag between two consonants does not lead to earlier vowel initiation across CV and C#CV and robust local timing stability is maintained across CV and C#CV. Overall, the findings indicate that the effect of phonetic perturbations on the coordination patterns depends on the syllabic organization superimposed on these clusters.

Klasifikasi Kebutuhan Jumlah Produk Makanan Customer Menggunakan K-Means Clustering dengan Optimasi Pusat Awal Cluster Algoritma Genetika

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2021842990 ◽

2021 ◽

Vol 8 (5) ◽

pp. 861

Author(s):

Yudi Istianto ◽

Shofwatul 'Uyun

Keyword(s):

Data Mining ◽

Genetic Algorithm ◽

Food Products ◽

Cluster Center ◽

Mining Method ◽

Data Mining Method ◽

Initial Cluster ◽

Average Accuracy ◽

Production And Distribution ◽

Product Sales

PT. Harum Bakery adalah salah satu perusahaan di Yogyakarta yang bergerak pada bidang produksi dan distribusi produk makanan roti. Setiap konsumen memiliki jumlah kebutuhan roti yang tidak teratur, sedangkan roti hanya dapat bertahan dalam waktu dua hari. Roti yang sudah berusia lebih dari dua hari akan diganti dengan yang baru oleh distributor, sehingga dapat menimbulkan kerugian bagi perusahaan. Penelitian ini mencoba untuk melakukan data mining dengan tujuan mengklasifikasikan jumlah produk makanan kepada customer menggunakan k-means clustering dengan optimasi pusat awal cluster algoritma genetika. Pada penelitian ini digunakan 210 data dari penjualan produk selama tiga minggu. Data tersebut akan diproses dengan menerapkan metode data mining melalui tahap preprocessing kemudian tahap klasifikasi. Preprocessing yang dilakukan antara lain, data transformation dan k-means clustering. Hasil dari clustering yang membutuhkan aturan tertentu lebih efektif dengan optimasi karena dari 210 data terdapat 200 data yang layak masuk tahap klasifikasi. Hasil dari pengujian mendapatkan akurasi terbaik sebesar 58.50 % dan crossvalidation untuk lima fold berhasil mendapatkan rata-rata akurasi sebesar 50.58% lebih besar 2.51 % dari KNN tanpa preprocessing.AbstractPT. Harum Bakery is one of the companies in Yogyakarta engaged in the production and distribution of bakery food products. Every consumer has an irregular amount of bread needs while bread can only last for two days. Bread that is more than two days old will be replaced by a new one by the distributor which causes losses for the company. This study tries to apply data mining to classify the number of customer needs for food products using k-means clustering with optimization initial cluster center genetic algorithm. In this study used 210 data from product sales for three weeks. Data will be processed by applying data mining method with preprocessing before going through classification. Preprocessing includes data transformation and k-means clustering. The results of clustering that require certain rules are more effective with optimization because 210 data have 200 data that are worth entering the classification stage. The results of the test get the best accuracy of 58.50% and crossvalidation for five fold managed to get an average accuracy of 50.58% greater than 2.51% of KNN without preprocessing.

Research on Early Warning for Gas Risks at a Working Face Based on Association Rule Mining

Energies ◽

10.3390/en14216889 ◽

2021 ◽

Vol 14 (21) ◽

pp. 6889

Author(s):

Yuxin Huang ◽

Jingdao Fan ◽

Zhenguo Yan ◽

Shugang Li ◽

Yanping Wang

Keyword(s):

Association Rules ◽

Early Warning ◽

Association Rule ◽

Cluster Center ◽

Apriori Algorithm ◽

Data Set ◽

Early Warning Model ◽

Initial Cluster ◽

Different Dimensions ◽

Warning Model

In the process of gas prediction and early warning, outliers in the data series are often discarded. There is also a likelihood of missing key information in the analysis process. To this end, this paper proposes an early warning model of coal face gas multifactor coupling relationship analysis. The model contains the k-means algorithm based on initial cluster center optimization and an Apriori algorithm based on weight optimization. Optimizing the initial cluster center of all data is achieved using the cluster center of the preorder data subset, so as to optimize the k-means algorithm. The optimized algorithm is used to filter out the outliers in the collected data set to obtain the data set of outliers. Then, the Apriori algorithm is optimized so that it can identify more important information that appears less frequently in the events. It is also used to mine and analyze the association rules of abnormal values and obtain interesting association rule events among the gas outliers in different dimensions. Finally, four warning levels of gas risk are set according to different confidence intervals, the truth and reliable warning results are obtained. By mining association rules between abnormal data in different dimensions, the validity and effectiveness of the gas early warning model proposed in this paper are verified. Realizing the classification of early warning of gas risks has important practical significance for improving the safety of coal mines.

Unsupervised labelling of remote sensing images based on force field clustering

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210802 ◽

2021 ◽

pp. 1-14

Author(s):

Zhenggang Wang ◽

Jin Jin

Keyword(s):

Remote Sensing ◽

Force Field ◽

Image Data ◽

Model Parameters ◽

Remote Sensing Images ◽

Initial Cluster ◽

Data Points ◽

Global Optimal ◽

Density Force

Remote sensing image segmentation provides technical support for decision making in many areas of environmental resource management. But, the quality of the remote sensing images obtained from different channels can vary considerably, and manually labeling a mass amount of image data is too expensive and Inefficiently. In this paper, we propose a point density force field clustering (PDFC) process. According to the spectral information from different ground objects, remote sensing superpixel points are divided into core and edge data points. The differences in the densities of core data points are used to form the local peak. The center of the initial cluster can be determined by the weighted density and position of the local peak. An iterative nebular clustering process is used to obtain the result, and a proposed new objective function is used to optimize the model parameters automatically to obtain the global optimal clustering solution. The proposed algorithm can cluster the area of different ground objects in remote sensing images automatically, and these categories are then labeled by humans simply.

An enhanced method of initial cluster center selection for K-means algorithm

10.1109/asyu52992.2021.9599017 ◽

2021 ◽

Author(s):

Zillur Rahman ◽

Md. Sabir Hossain ◽

Mohammad Hasan ◽

Ahmed Imteaj

Keyword(s):

Cluster Center ◽

Initial Cluster ◽

Selection For

Expression of concern: A novel optimized initial cluster center and enhanced objective function: Medical diagnosis through classification the journal editor and SAGE publishing hereby issue an expression of concern for the following article

Health Informatics Journal ◽

10.1177/14604582211050361 ◽

2021 ◽

Vol 27 (4) ◽

pp. 146045822110503

Author(s):

Alsadoon Abeer

Keyword(s):

Objective Function ◽

Medical Diagnosis ◽

Journal Editor ◽

Cluster Center ◽

Initial Cluster

A Clustering-based Method for Business Hall Efficiency Analysis

Scientific Programming ◽

10.1155/2021/7622576 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Tianlin Huang ◽

Ning Wang

Keyword(s):

Empirical Study ◽

Prediction Method ◽

Final Analysis ◽

Locality Sensitive Hashing ◽

Characteristic Analysis ◽

Decision Optimization ◽

Economic Operation ◽

Initial Cluster ◽

Load Intensity ◽

Stability And Accuracy

Excessive or insufficient business hall resources may result in unreasonable resource allocation, adversely affecting the value of an entity business hall. Therefore, proper characteristic parameters are the key factors for analyzing the business hall, which strongly affect the final analysis results. In this study, a characteristic analysis method for the economic operation of a business hall is developed and the feature engineering is established. Because of its simplicity and versatility, the k -means algorithm has been widely used since it was first proposed around 50 years ago. However, the classical k -means algorithm has poor stability and accuracy. In particular, it is difficult to achieve a suitable balance between of the centroid initialization and the clustering number k . We propose a new initialization (LSH- k -means) algorithm for k -means clustering. This algorithms is mainly based on locality-sensitive hashing (LSH) as an index for computing the initial cluster centroids, and it reduces the range of the clustering number. Furthermore, an empirical study is conducted. According to the load intensity and time change of the business hall, an index system reflecting the optimization analysis of the business hall is established, and the LSH- k -means algorithm is used to analyze the economic operation of the business hall. The results of the empirical study show that the LSH- k -means that the clustering method outperforms the direct prediction method, provides expected analysis results as well as decision optimization recommendations for the business hall, and serves as a basis for the optimal layout of the business hall.

initial cluster
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Research on Density-Based K-means Clustering Algorithm

Combining Cluster Sampling and Link-Tracing Sampling to Estimate Totals and Means of Hidden Populations in Presence of Heterogeneous Probabilities of Links

Adaptive Initialization Method for K-Means Algorithm

Syllabic organization and phonetic indices in German stop-lateral clusters

Klasifikasi Kebutuhan Jumlah Produk Makanan Customer Menggunakan K-Means Clustering dengan Optimasi Pusat Awal Cluster Algoritma Genetika

Research on Early Warning for Gas Risks at a Working Face Based on Association Rule Mining

Unsupervised labelling of remote sensing images based on force field clustering

An enhanced method of initial cluster center selection for K-means algorithm

Expression of concern: A novel optimized initial cluster center and enhanced objective function: Medical diagnosis through classification the journal editor and SAGE publishing hereby issue an expression of concern for the following article

A Clustering-based Method for Business Hall Efficiency Analysis

Export Citation Format

initial clusterRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Research on Density-Based K-means Clustering Algorithm

Combining Cluster Sampling and Link-Tracing Sampling to Estimate Totals and Means of Hidden Populations in Presence of Heterogeneous Probabilities of Links

Adaptive Initialization Method for K-Means Algorithm

Syllabic organization and phonetic indices in German stop-lateral clusters

Klasifikasi Kebutuhan Jumlah Produk Makanan Customer Menggunakan K-Means Clustering dengan Optimasi Pusat Awal Cluster Algoritma Genetika

Research on Early Warning for Gas Risks at a Working Face Based on Association Rule Mining

Unsupervised labelling of remote sensing images based on force field clustering

An enhanced method of initial cluster center selection for K-means algorithm

Expression of concern: A novel optimized initial cluster center and enhanced objective function: Medical diagnosis through classification the journal editor and SAGE publishing hereby issue an expression of concern for the following article

A Clustering-based Method for Business Hall Efficiency Analysis

initial cluster
Recently Published Documents