clustering problem
Recently Published Documents


TOTAL DOCUMENTS

485
(FIVE YEARS 161)

H-INDEX

32
(FIVE YEARS 4)

2022 ◽  
Vol 2022 ◽  
pp. 1-8
Author(s):  
Qingsong Tang

A proper cluster is usually defined as maximally coherent groups from a set of objects using pairwise or more complicated similarities. In general hypergraphs, clustering problem refers to extraction of subhypergraphs with a higher internal density, for instance, maximal cliques in hypergraphs. The determination of clustering structure within hypergraphs is a significant problem in the area of data mining. Various works of detecting clusters on graphs and uniform hypergraphs have been published in the past decades. Recently, it has been shown that the maximum 1,2 -clique size in 1,2 -hypergraphs is related to the global maxima of a certain quadratic program based on the structure of the given nonuniform hypergraphs. In this paper, we first extend this result to relate strict local maxima of this program to certain maximal cliques including 2-cliques or 1,2 -cliques. We also explore the connection between edge-weighted clusters and strictly local optimum solutions of a class of polynomials resulting from nonuniform 1,2 -hypergraphs.


Author(s):  
Sai Ji ◽  
Jun Li ◽  
Zijun Wu ◽  
Yicheng Xu

In this paper, we propose a so-called capacitated min–max correlation clustering model, a natural variant of the min–max correlation clustering problem. As our main contribution, we present an integer programming and its integrality gap analysis for the proposed model. Furthermore, we provide two approximation algorithms for the model, one of which is a bi-criteria approximation algorithm and the other is based on LP-rounding technique.


Author(s):  
Awatif Karim ◽  
Chakir Loqman ◽  
Youssef Hami ◽  
Jaouad Boumhidi

In this paper, we propose a new approach to solve the document-clustering using the K-Means algorithm. The latter is sensitive to the random selection of the k cluster centroids in the initialization phase. To evaluate the quality of K-Means clustering we propose to model the text document clustering problem as the max stable set problem (MSSP) and use continuous Hopfield network to solve the MSSP problem to have initial centroids. The idea is inspired by the fact that MSSP and clustering share the same principle, MSSP consists to find the largest set of nodes completely disconnected in a graph, and in clustering, all objects are divided into disjoint clusters. Simulation results demonstrate that the proposed K-Means improved by MSSP (KM_MSSP) is efficient of large data sets, is much optimized in terms of time, and provides better quality of clustering than other methods.


Author(s):  
Н.Л. Резова ◽  
И.П. Рожнов ◽  
А.А. Истомина

В статье рассматривается применение алгоритма k-эталонов для задачи кластеризации на примере производственных партий электрорадиоизделий, сделан вывод о качестве работы алгоритма k-эталонов и целесообразности его использования при решении задач автоматической группировки продукции. The article discusses the application of the k-standards algorithm for the clustering problem on the example of production batches of electrical radio products, a conclusion was made about the quality of the k-standards algorithm and the expediency of its use in automatic grouping problems solving.


2021 ◽  
Vol 12 (5-2021) ◽  
pp. 75-90
Author(s):  
Alexander A. Zuenko ◽  
◽  
Olga V. Fridman ◽  
Olga N. Zuenko ◽  
◽  
...  

An approach to solving the constrained clustering problem has been developed, based on the aggregation of data obtained as a result of evaluating the characteristics of clustered objects by several independent experts, and the analysis of alternative variants of clustering by constraint programming methods using original heuristics. Objects clusterized are represented as multisets, which makes it possible to use appropriate methods of aggregation of expert opinions. It is proposed to solve the constrained clustering problem as a constraint satisfaction problem. The main attention is paid to the issue of reducing the number and simplifying the constraints of the constraint satisfaction problem at the stage of its formalization. Within the framework of the approach, we have created: a) a method for estimating the optimal value of the objective function by hierarchical clustering of multisets, taking into account a priori constraints of the subject domain, and b) a method for generating additional constraints on the desired solution in the form of “smart tables”, based on the obtained estimate. The approach allows us to find the best partition in the problems of the class under consideration, which are characterized by a high dimension.


2021 ◽  
pp. 1-14
Author(s):  
Feng Xue ◽  
Yongbo Liu ◽  
Xiaochen Ma ◽  
Bharat Pathak ◽  
Peng Liang

To solve the problem that the K-means algorithm is sensitive to the initial clustering centers and easily falls into local optima, we propose a new hybrid clustering algorithm called the IGWOKHM algorithm. In this paper, we first propose an improved strategy based on a nonlinear convergence factor, an inertial step size, and a dynamic weight to improve the search ability of the traditional grey wolf optimization (GWO) algorithm. Then, the improved GWO (IGWO) algorithm and the K-harmonic means (KHM) algorithm are fused to solve the clustering problem. This fusion clustering algorithm is called IGWOKHM, and it combines the global search ability of IGWO with the local fast optimization ability of KHM to both solve the problem of the K-means algorithm’s sensitivity to the initial clustering centers and address the shortcomings of KHM. The experimental results on 8 test functions and 4 University of California Irvine (UCI) datasets show that the IGWO algorithm greatly improves the efficiency of the model while ensuring the stability of the algorithm. The fusion clustering algorithm can effectively overcome the inadequacies of the K-means algorithm and has a good global optimization ability.


Author(s):  
Yudong Wang ◽  
Xiwei Bai ◽  
Chengbao Liu ◽  
Jie Tan

Abstract Consistence of lithium-ion power battery significantly affects the life and safety of battery modules and packs. To improve the consistence, battery grouping is employed, assembling batteries with similar electrochemical characteristics to make up modules and packs. Therefore, grouping process boils down to unsupervised clustering problem. Current used grouping approaches include two aspects, static characteristics based and dynamic based. However, there are three problems. First, the common problem is underutilization of multi-source data. Second, for the static characteristics based, there is grouping failure over time. Third, for the dynamic characteristics based, there is high computational complexity. To solve these problems, we propose a distributed multisource data fusion based battery grouping approach. The proposed approach designs an effective network structure for multisource data fusion, and a self supervised scheme for feature extraction from both static and dynamic multisource data. We apply our approach on real battery modules and test state of health (SOH) after charging-discharging cycles. Experimental results indicate that the proposed scheme can increase SOH of modules by 3.89%, and reduce the inconsistence by 68.4%. Meanwhile, with the distributed deployment the time cost is reduced by 87.9% than the centralized scheme.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Wei Zhao ◽  
Qinglan Li ◽  
Kuifeng Jin

Velocity dealiasing is an essential task for correcting the radial velocity data collected by Doppler radar. To improve the accuracy of velocity dealiasing, traditional dealiasing algorithms usually set a series of empirical thresholds, combine three- or four-dimensional data, or introduce other observation data as a reference. In this study, we transform the velocity dealiasing problem into a clustering problem and solve this problem using the density-based spatial clustering of applications with noise (DBSCAN) method. This algorithm is verified with a case study involving radar data on the tropical cyclone Mangkhut in 2018. The results show that the accuracy of the proposed algorithm is close to that of the four-dimensional dealiasing (4DD) method proposed by James and Houze; yet, it only requires two-dimensional velocity data and eliminates the need for other reference data. The results of the case study also show that the 4DD algorithm filters out many observation gates close to the missing data or radar center, whereas the proposed algorithm tends to retain and correct these gates.


2021 ◽  
pp. 1-28
Author(s):  
Hector Menendez

Machine learning is changing the world and fuelling Industry 4.0. These statistical methods focused on identifying patterns in data to provide an intelligent response to specific requests. Although understanding data tends to require expert knowledge to supervise the decision-making process, some techniques need no supervision. These unsupervised techniques can work blindly but they are based on data similarity. One of the most popular areas in this field is clustering. Clustering groups data to guarantee that the clusters’ elements have a strong similarity while the clusters are distinct among them. This field started with the K-means algorithm, one of the most popular algorithms in machine learning with extensive applications. Currently, there are multiple strategies to deal with the clustering problem. This review introduces some of the classical algorithms, focusing significantly on algorithms based on evolutionary computation, and explains some current applications of clustering to large datasets.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Li Guo ◽  
Kunlin Zhu ◽  
Ruijun Duan

In order to explore the economic development trend in the postepidemic era, this paper improves the traditional clustering algorithm and constructs a postepidemic economic development trend analysis model based on intelligent algorithms. In order to solve the clustering problem of large-scale nonuniform density data sets, this paper proposes an adaptive nonuniform density clustering algorithm based on balanced iterative reduction and uses the algorithm to further cluster the compressed data sets. For large-scale data sets, the clustering results can accurately reflect the class characteristics of the data set as a whole. Moreover, the algorithm greatly improves the time efficiency of clustering. From the research results, we can see that the improved clustering algorithm has a certain effect on the analysis of economic development trends in the postepidemic era and can continue to play a role in subsequent economic analysis.


Sign in / Sign up

Export Citation Format

Share Document