mixture model clustering
Recently Published Documents


TOTAL DOCUMENTS

57
(FIVE YEARS 21)

H-INDEX

11
(FIVE YEARS 3)

2022 ◽  
Vol 355 ◽  
pp. 02024
Author(s):  
Haojing Wang ◽  
Yingjie Tian ◽  
An Li ◽  
Jihai Wu ◽  
Gaiping Sun

In view of the limitation of “hard assignment” of clusters in traditional clustering methods and the difficulty of meeting the requirements of clustering efficiency and clustering accuracy simultaneously in regard to massive data sets, a load classification method based on a Gaussian mixture model combining clustering and principal component analysis is proposed. The load data are fed into a Gaussian mixture model clustering algorithm after principal component analysis and dimensionality reduction to achieve classification of large-scale load datasets. The method in this paper is used to classify loads in the Canadian AMPds2 public dataset and is compared with K-Means, Gaussian mixed model clustering and other methods. The results show that the proposed method can not only achieve load classification more effectively and finely, but also save computational cost and improve computational efficiency.


2021 ◽  
Author(s):  
Kingshuk Mukherjee ◽  
Massimiliano Rossi ◽  
Daniel Dole-Muinos ◽  
Ayomide Ajayi ◽  
Mattia Prosperi ◽  
...  

Optical mapping is a method for creating high resolution restriction maps of an entire genome. Optical mapping has been largely automated, and first produces single molecule restriction maps, called Rmaps, which are assembled to generate genome wide optical maps. Since the location and orientation of each Rmap is unknown, the first problem in the analysis of this data is finding related Rmaps, i.e., pairs of Rmaps that share the same orientation and have significant overlap in their genomic location. Although heuristics for identifying related Rmaps exist, they all require quantization of the data which leads to a loss in the precision. In this paper, we propose a Gaussian mixture modelling clustering based method, which we refer to as OMclust, that finds overlapping Rmaps without quantization. Using both simulated and real datasets, we show that OMclust substantially improves the precision (from 48.3% to 73.3%) over the state-of-the art methods while also reducing CPU time and memory consumption. Further, we integrated OMclust into the error correction methods (Elmeri and cOMet) to demonstrate the increase in the performance of these methods. When OMclust was combined with cOMet to error correct Rmap data generated from human DNA, it was able to error correct close to 3x more Rmaps, and reduced the CPU time by more than 35x. Our software is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/OMclust


Author(s):  
Yi Zhang ◽  
Miaomiao Li ◽  
Siwei Wang ◽  
Sisi Dai ◽  
Lei Luo ◽  
...  

Gaussian mixture model (GMM) clustering has been extensively studied due to its effectiveness and efficiency. Though demonstrating promising performance in various applications, it cannot effectively address the absent features among data, which is not uncommon in practical applications. In this article, different from existing approaches that first impute the absence and then perform GMM clustering tasks on the imputed data, we propose to integrate the imputation and GMM clustering into a unified learning procedure. Specifically, the missing data is filled by the result of GMM clustering, and the imputed data is then taken for GMM clustering. These two steps alternatively negotiate with each other to achieve optimum. By this way, the imputed data can best serve for GMM clustering. A two-step alternative algorithm with proved convergence is carefully designed to solve the resultant optimization problem. Extensive experiments have been conducted on eight UCI benchmark datasets, and the results have validated the effectiveness of the proposed algorithm.


Author(s):  
Delshad Fakoor ◽  
Vafa Maihami ◽  
Reza Maihami

Changing and moving toward online shopping has made it necessary to customize customers’ needs and provide them more selective options. The buyers search the products’ features before deciding to purchase items. The recommender systems facilitate the searching task for customers via narrowing down the search space within the specific products that align the customer needs. Clustering, as a typical machine learning approach, is applied in recommender systems. As an information filtering method, a recommender system clusters user’s data to indicate the required factors for more accurate predictions by calculating the similarity between members of a cluster. In this study, using the Gaussian mixture model clustering and considering the scores distance and the value of scores in the Pearson correlation coefficient, a new method is introduced for predicting scores in machine learning recommender systems. To study the proposed method’s performance, a Movie Lens data set is evaluated, and the results are compared to some other recommender systems, including the Pearson correlation coefficients similarity criteria, K-means, and fuzzy C-means algorithms. The simulation results indicate that our method has less error than others by increasing the number of neighbors. The results also illustrate that when the number of users increases, the proposed method’s accuracy will increase. The reason is that the Gaussian mixture clustering chooses similar users and considers the scores distance in choosing similar neighbors to the active user.


Sign in / Sign up

Export Citation Format

Share Document