An Efficient Graph-Based Flickr Photo Clustering Algorithm

2010 ◽  
Vol 29-32 ◽  
pp. 2649-2655
Author(s):  
Zheng Liu ◽  
Hua Yan ◽  
Zhen Li

Traditional image clustering methods mainly depends on visual features only. Due to the well-known “semantic gap”, visual features can hardly describe the semantics of the images independently. In the case of Web images, apart from visual features, there are rich metadata which could enhance the performance of image clustering, such as time information, GPS coordinate and initial annotations. This paper proposes an efficient Flickr photo clustering algorithm by simultaneous integration information of multiple types which are related to Flickr photos using k-partite graph partitioning. For a personal collection of Flickr, we firstly determine the value of k which means the number of data types we used. Secondly, these heterogeneous metadata are mapped to vertices of a k-partite graph, and relationship between the heterogeneous metadata is represented as edge weight. Finally, Flickr photos could be clustered by partitioning the k-partite graph. Experiments conducted on the photos in Flickr demonstrate the effectiveness of the proposed algorithm.

2019 ◽  
Vol 9 (2) ◽  
pp. 3892-3895
Author(s):  
B. K. Alsaidi ◽  
B. J. Al-Khafaji ◽  
S. A. A. Wahab

Text based-image clustering (TBIC) is an insufficient approach for clustering related web images. It is a challenging task to abstract the visual features of images with the support of textual information in a database. In content-based image clustering (CBIC), image data are clustered on the foundation of specific features like texture, colors, boundaries, shapes. In this paper, an effective CBIC) technique is presented, which uses texture and statistical features of the images. The statistical features or moments of colors (mean, skewness, standard deviation, kurtosis, and variance) are extracted from the images. These features are collected in a one dimension array, and then genetic algorithm (GA) is applied for image clustering. The extraction of features gave a high distinguishability and helped GA reach the solution more accurately and faster.


Author(s):  
Chengcui Zhang ◽  
Xin Chen ◽  
Wei-Bang Chen ◽  
Lin Yang ◽  
Gary Warner

In this article, we propose a spam image clustering approach that uses data mining techniques to study the image attachments of spam emails with the goal to help the investigation of spam clusters or phishing groups. Spam images are first modeled based on their visual features. In particular, the foreground text layout, foreground picture illustrations and background textures are analyzed. After the visual features are extracted from spam images, we use an unsupervised clustering algorithm to group visually similar spam images into clusters. The clustering results are evaluated by visual validation since there is no prior knowledge as to the actual sources of spam images. Our initial results show that the proposed approach is effective in identifying the visual similarity between spam images and thus can provide important indications of the common source of spam images.


2021 ◽  
Vol 15 (6) ◽  
pp. 1-18
Author(s):  
Kai Liu ◽  
Xiangyu Li ◽  
Zhihui Zhu ◽  
Lodewijk Brand ◽  
Hua Wang

Nonnegative Matrix Factorization (NMF) is broadly used to determine class membership in a variety of clustering applications. From movie recommendations and image clustering to visual feature extractions, NMF has applications to solve a large number of knowledge discovery and data mining problems. Traditional optimization methods, such as the Multiplicative Updating Algorithm (MUA), solves the NMF problem by utilizing an auxiliary function to ensure that the objective monotonically decreases. Although the objective in MUA converges, there exists no proof to show that the learned matrix factors converge as well. Without this rigorous analysis, the clustering performance and stability of the NMF algorithms cannot be guaranteed. To address this knowledge gap, in this article, we study the factor-bounded NMF problem and provide a solution algorithm with proven convergence by rigorous mathematical analysis, which ensures that both the objective and matrix factors converge. In addition, we show the relationship between MUA and our solution followed by an analysis of the convergence of MUA. Experiments on both toy data and real-world datasets validate the correctness of our proposed method and its utility as an effective clustering algorithm.


2007 ◽  
Vol 16 (06) ◽  
pp. 919-934
Author(s):  
YONGGUO LIU ◽  
XIAORONG PU ◽  
YIDONG SHEN ◽  
ZHANG YI ◽  
XIAOFENG LIAO

In this article, a new genetic clustering algorithm called the Improved Hybrid Genetic Clustering Algorithm (IHGCA) is proposed to deal with the clustering problem under the criterion of minimum sum of squares clustering. In IHGCA, the improvement operation including five local iteration methods is developed to tune the individual and accelerate the convergence speed of the clustering algorithm, and the partition-absorption mutation operation is designed to reassign objects among different clusters. By experimental simulations, its superiority over some known genetic clustering methods is demonstrated.


Genetics ◽  
2001 ◽  
Vol 159 (2) ◽  
pp. 699-713
Author(s):  
Noah A Rosenberg ◽  
Terry Burke ◽  
Kari Elo ◽  
Marcus W Feldman ◽  
Paul J Freidlin ◽  
...  

Abstract We tested the utility of genetic cluster analysis in ascertaining population structure of a large data set for which population structure was previously known. Each of 600 individuals representing 20 distinct chicken breeds was genotyped for 27 microsatellite loci, and individual multilocus genotypes were used to infer genetic clusters. Individuals from each breed were inferred to belong mostly to the same cluster. The clustering success rate, measuring the fraction of individuals that were properly inferred to belong to their correct breeds, was consistently ~98%. When markers of highest expected heterozygosity were used, genotypes that included at least 8–10 highly variable markers from among the 27 markers genotyped also achieved >95% clustering success. When 12–15 highly variable markers and only 15–20 of the 30 individuals per breed were used, clustering success was at least 90%. We suggest that in species for which population structure is of interest, databases of multilocus genotypes at highly variable markers should be compiled. These genotypes could then be used as training samples for genetic cluster analysis and to facilitate assignments of individuals of unknown origin to populations. The clustering algorithm has potential applications in defining the within-species genetic units that are useful in problems of conservation.


2014 ◽  
Vol 989-994 ◽  
pp. 3851-3855
Author(s):  
Guang Jin Lai

Digital X-ray photography technology is under the control of the computer, to use one-dimensional or 2D X-ray detector to convert the captured image into digital signals directly to using image processing technology. It can realize the function of image analysis. We introduce X-ray photography technology into the terminal identification in track and field, and use the clustering algorithm to improve computer image clustering algorithm. Through capturing the digital signal of human head, arms and legs, it enhances the terminal recognition method in track and field. Finally we use MATLAB to calculate the captured image value of X-ray photography. Through calculation, motion capture and recognition of X-ray image are enhanced obviously. It provides a theoretical basis for researching on motion capture technology in track and field.


2021 ◽  
Author(s):  
Feiyang Ren ◽  
Yi Han ◽  
Shaohan Wang ◽  
He Jiang

Abstract A novel marine transportation network based on high-dimensional AIS data with a multi-level clustering algorithm is proposed to discover important waypoints in trajectories based on selected navigation features. This network contains two parts: the calculation of major nodes with CLIQUE and BIRCH clustering methods and navigation network construction with edge construction theory. Unlike the state-of-art work for navigation clustering with only ship coordinate, the proposed method contains more high-dimensional features such as drafting, weather, and fuel consumption. By comparing the historical AIS data, more than 220,133 lines of data in 30 days were used to extract 440 major nodal points in less than 4 minutes with ordinary PC specs (i5 processer). The proposed method can be performed on more dimensional data for better ship path planning or even national economic analysis. Current work has shown good performance on complex ship trajectories distinction and great potential for future shipping transportation market analytical predictions.


2019 ◽  
Author(s):  
Suhas Srinivasan ◽  
Nathan T. Johnson ◽  
Dmitry Korkin

AbstractSingle-cell RNA sequencing (scRNA-seq) is a recent technology that enables fine-grained discovery of cellular subtypes and specific cell states. It routinely uses machine learning methods, such as feature learning, clustering, and classification, to assist in uncovering novel information from scRNA-seq data. However, current methods are not well suited to deal with the substantial amounts of noise that is created by the experiments or the variation that occurs due to differences in the cells of the same type. Here, we develop a new hybrid approach, Deep Unsupervised Single-cell Clustering (DUSC), that integrates feature generation based on a deep learning architecture with a model-based clustering algorithm, to find a compact and informative representation of the single-cell transcriptomic data generating robust clusters. We also include a technique to estimate an efficient number of latent features in the deep learning model. Our method outperforms both classical and state-of-the-art feature learning and clustering methods, approaching the accuracy of supervised learning. The method is freely available to the community and will hopefully facilitate our understanding of the cellular atlas of living organisms as well as provide the means to improve patient diagnostics and treatment.


2021 ◽  
Vol 10 (4) ◽  
pp. 2170-2180
Author(s):  
Untari N. Wisesty ◽  
Tati Rajab Mengko

This paper aims to conduct an analysis of the SARS-CoV-2 genome variation was carried out by comparing the results of genome clustering using several clustering algorithms and distribution of sequence in each cluster. The clustering algorithms used are K-means, Gaussian mixture models, agglomerative hierarchical clustering, mean-shift clustering, and DBSCAN. However, the clustering algorithm has a weakness in grouping data that has very high dimensions such as genome data, so that a dimensional reduction process is needed. In this research, dimensionality reduction was carried out using principal component analysis (PCA) and autoencoder method with three models that produce 2, 10, and 50 features. The main contributions achieved were the dimensional reduction and clustering scheme of SARS-CoV-2 sequence data and the performance analysis of each experiment on each scheme and hyper parameters for each method. Based on the results of experiments conducted, PCA and DBSCAN algorithm achieve the highest silhouette score of 0.8770 with three clusters when using two features. However, dimensionality reduction using autoencoder need more iterations to converge. On the testing process with Indonesian sequence data, more than half of them enter one cluster and the rest are distributed in the other two clusters.


Author(s):  
Bilih Priyogi ◽  
Nungki Selviandro ◽  
Zainal A. Hasibuan ◽  
Mubarik Ahmad

Sign in / Sign up

Export Citation Format

Share Document