initial cluster
Recently Published Documents


TOTAL DOCUMENTS

180
(FIVE YEARS 60)

H-INDEX

16
(FIVE YEARS 5)

2021 ◽  
Vol 2137 (1) ◽  
pp. 012071
Author(s):  
Shuxin Liu ◽  
Xiangdong Liu

Abstract Cluster analysis is an unsupervised learning process, and its most classic algorithm K-means has the advantages of simple principle and easy implementation. In view of the K-means algorithm’s shortcoming, where is arbitrary processing of clusters k value, initial cluster center and outlier points. This paper discusses the improvement of traditional K-means algorithm and puts forward an improved algorithm with density clustering algorithm. First, it describes the basic principles and process of the K-means algorithm and the DBSCAN algorithm. Then summarizes improvement methods with the three aspects and their advantages and disadvantages, at the same time proposes a new density-based K-means improved algorithm. Finally, it prospects the development direction and trend of the density-based K-means clustering algorithm.


2021 ◽  
Vol 37 (4) ◽  
pp. 865-905
Author(s):  
Martín Humberto Félix-Medina

Abstract We propose Horvitz-Thompson-like and Hájek-like estimators of the total and mean of a response variable associated with the elements of a hard-to-reach population, such as drug users and sex workers. A portion of the population is assumed to be covered by a frame of venues where the members of the population tend to gather. An initial cluster sample of elements is selected from the frame, where the clusters are the venues, and the elements in the sample are asked to name their contacts who belong to the population. The sample size is increased by including in the sample the named elements who are not in the initial sample. The proposed estimators do not use design-based inclusion probabilities, but model-based inclusion probabilities which are derived from a Rasch model and are estimated by maximum likelihood estimators. The inclusion probabilities are assumed to be heterogeneous, that is, they depend on the sampled people. Variance estimates are obtained by bootstrap and are used to construct confidence intervals. The performance of the proposed estimators and confidence intervals is evaluated by two numerical studies, one of them based on real data, and the results show that their performance is acceptable.


2021 ◽  
Vol 4 ◽  
Author(s):  
Jie Yang ◽  
Yu-Kai Wang ◽  
Xin Yao ◽  
Chin-Teng Lin

The K-means algorithm is a widely used clustering algorithm that offers simplicity and efficiency. However, the traditional K-means algorithm uses a random method to determine the initial cluster centers, which make clustering results prone to local optima and then result in worse clustering performance. In this research, we propose an adaptive initialization method for the K-means algorithm (AIMK) which can adapt to the various characteristics in different datasets and obtain better clustering performance with stable results. For larger or higher-dimensional datasets, we even leverage random sampling in AIMK (name as AIMK-RS) to reduce the time complexity. 22 real-world datasets were applied for performance comparisons. The experimental results show AIMK and AIMK-RS outperform the current initialization methods and several well-known clustering algorithms. Specifically, AIMK-RS can significantly reduce the time complexity to O (n). Moreover, we exploit AIMK to initialize K-medoids and spectral clustering, and better performance is also explored. The above results demonstrate superior performance and good scalability by AIMK or AIMK-RS. In the future, we would like to apply AIMK to more partition-based clustering algorithms to solve real-life practical problems.


Author(s):  
Stavroula Sotiropoulou ◽  
Adamantios Gafos

Using articulatory data from five German speakers, we study how segmental sequences under different syllabic organizations respond to perturbations of phonetic parameters in the segments that compose them. Target words contained stop-lateral clusters /bl, gl, kl, pl/ in a word-initial and a cross-word context and were embedded in carrier phrases with different prosodic boundary strengths, i.e., no phrase boundary versus an utterance phrase boundary preceded the target word in the case of word-initial clusters or separated the consonants in the case of cross-word clusters. For word-initial cluster onsets, we find that increasing the lag between two consonants and C1 stop duration leads to earlier vowel initiation and reduced local timing stability across CV and CCV. Furthermore, as the inter-consonantal lag increases, C2 lateral duration decreases. In contrast, for cross-word clusters, increasing the lag between two consonants does not lead to earlier vowel initiation across CV and C#CV and robust local timing stability is maintained across CV and C#CV. Overall, the findings indicate that the effect of phonetic perturbations on the coordination patterns depends on the syllabic organization superimposed on these clusters.


2021 ◽  
Vol 8 (5) ◽  
pp. 861
Author(s):  
Yudi Istianto ◽  
Shofwatul 'Uyun

<p class="Abstrak">PT. Harum Bakery adalah salah satu perusahaan di Yogyakarta yang bergerak pada bidang produksi dan distribusi produk makanan roti. Setiap konsumen memiliki jumlah kebutuhan roti yang tidak teratur, sedangkan roti hanya dapat bertahan dalam waktu dua hari. Roti yang sudah berusia lebih dari dua hari akan diganti dengan yang baru oleh distributor, sehingga dapat menimbulkan kerugian bagi perusahaan. Penelitian ini mencoba untuk melakukan data mining dengan tujuan mengklasifikasikan jumlah produk makanan kepada <em>customer</em> menggunakan <em>k-</em><em>means clustering</em> dengan optimasi pusat awal <em>cluster</em> algoritma genetika. Pada penelitian ini digunakan 210 data dari penjualan produk selama tiga minggu. Data tersebut akan diproses dengan menerapkan metode data mining melalui tahap <em>preprocessing</em> kemudian tahap klasifikasi. <em>Preprocessing</em> yang dilakukan antara lain, data <em>transformation</em> dan <em>k-</em><em>means</em> <em>clustering</em>. Hasil dari <em>clustering</em> yang membutuhkan aturan tertentu lebih efektif dengan optimasi karena dari 210 data terdapat 200 data yang layak masuk tahap klasifikasi. Hasil dari pengujian mendapatkan akurasi terbaik sebesar 58.50 % dan <em>crossvalidation</em> untuk lima <em>fold</em> berhasil mendapatkan rata-rata akurasi sebesar 50.58% lebih besar 2.51 % dari KNN tanpa <em>preprocessing</em>.</p><p class="Judul2"><strong><em>Abstract</em></strong><em></em></p><p class="Judul2"><em>PT. Harum Bakery is one of the companies in Yogyakarta engaged in the production and distribution of bakery food products. Every consumer has an irregular amount of bread needs while bread can only last for two days. Bread that is more than two days old will be replaced by a new one by the distributor which causes losses for the company. This study tries to apply data mining to classify the number of customer needs for food products using k-means clustering with optimization initial cluster center genetic algorithm. In this study used 210 data from product sales for three weeks. Data will be processed by applying data mining method with preprocessing before going through classification. Preprocessing includes data transformation and k-means clustering. The results of clustering that require certain rules are more effective with optimization because 210 data have 200 data that are worth entering the classification stage. The results of the test get the best accuracy of 58.50% and crossvalidation for five fold managed to get an average accuracy of 50.58% greater than 2.51% of KNN without preprocessing.</em></p>


Energies ◽  
2021 ◽  
Vol 14 (21) ◽  
pp. 6889
Author(s):  
Yuxin Huang ◽  
Jingdao Fan ◽  
Zhenguo Yan ◽  
Shugang Li ◽  
Yanping Wang

In the process of gas prediction and early warning, outliers in the data series are often discarded. There is also a likelihood of missing key information in the analysis process. To this end, this paper proposes an early warning model of coal face gas multifactor coupling relationship analysis. The model contains the k-means algorithm based on initial cluster center optimization and an Apriori algorithm based on weight optimization. Optimizing the initial cluster center of all data is achieved using the cluster center of the preorder data subset, so as to optimize the k-means algorithm. The optimized algorithm is used to filter out the outliers in the collected data set to obtain the data set of outliers. Then, the Apriori algorithm is optimized so that it can identify more important information that appears less frequently in the events. It is also used to mine and analyze the association rules of abnormal values and obtain interesting association rule events among the gas outliers in different dimensions. Finally, four warning levels of gas risk are set according to different confidence intervals, the truth and reliable warning results are obtained. By mining association rules between abnormal data in different dimensions, the validity and effectiveness of the gas early warning model proposed in this paper are verified. Realizing the classification of early warning of gas risks has important practical significance for improving the safety of coal mines.


2021 ◽  
pp. 1-14
Author(s):  
Zhenggang Wang ◽  
Jin Jin

Remote sensing image segmentation provides technical support for decision making in many areas of environmental resource management. But, the quality of the remote sensing images obtained from different channels can vary considerably, and manually labeling a mass amount of image data is too expensive and Inefficiently. In this paper, we propose a point density force field clustering (PDFC) process. According to the spectral information from different ground objects, remote sensing superpixel points are divided into core and edge data points. The differences in the densities of core data points are used to form the local peak. The center of the initial cluster can be determined by the weighted density and position of the local peak. An iterative nebular clustering process is used to obtain the result, and a proposed new objective function is used to optimize the model parameters automatically to obtain the global optimal clustering solution. The proposed algorithm can cluster the area of different ground objects in remote sensing images automatically, and these categories are then labeled by humans simply.


2021 ◽  
Author(s):  
Zillur Rahman ◽  
Md. Sabir Hossain ◽  
Mohammad Hasan ◽  
Ahmed Imteaj

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Tianlin Huang ◽  
Ning Wang

Excessive or insufficient business hall resources may result in unreasonable resource allocation, adversely affecting the value of an entity business hall. Therefore, proper characteristic parameters are the key factors for analyzing the business hall, which strongly affect the final analysis results. In this study, a characteristic analysis method for the economic operation of a business hall is developed and the feature engineering is established. Because of its simplicity and versatility, the k -means algorithm has been widely used since it was first proposed around 50 years ago. However, the classical k -means algorithm has poor stability and accuracy. In particular, it is difficult to achieve a suitable balance between of the centroid initialization and the clustering number k . We propose a new initialization (LSH- k -means) algorithm for k -means clustering. This algorithms is mainly based on locality-sensitive hashing (LSH) as an index for computing the initial cluster centroids, and it reduces the range of the clustering number. Furthermore, an empirical study is conducted. According to the load intensity and time change of the business hall, an index system reflecting the optimization analysis of the business hall is established, and the LSH- k -means algorithm is used to analyze the economic operation of the business hall. The results of the empirical study show that the LSH- k -means that the clustering method outperforms the direct prediction method, provides expected analysis results as well as decision optimization recommendations for the business hall, and serves as a basis for the optimal layout of the business hall.


Sign in / Sign up

Export Citation Format

Share Document