scholarly journals Clustering Based on Eigenvectors of the Adjacency Matrix

2018 ◽  
Vol 28 (4) ◽  
pp. 771-786 ◽  
Author(s):  
Małgorzata Lucińska ◽  
Sławomir T. Wierzchoń

Abstract The paper presents a novel spectral algorithm EVSA (eigenvector structure analysis), which uses eigenvalues and eigenvectors of the adjacency matrix in order to discover clusters. Based on matrix perturbation theory and properties of graph spectra we show that the adjacency matrix can be more suitable for partitioning than other Laplacian matrices. The main problem concerning the use of the adjacency matrix is the selection of the appropriate eigenvectors. We thus propose an approach based on analysis of the adjacency matrix spectrum and eigenvector pairwise correlations. Formulated rules and heuristics allow choosing the right eigenvectors representing clusters, i.e., automatically establishing the number of groups. The algorithm requires only one parameter-the number of nearest neighbors. Unlike many other spectral methods, our solution does not need an additional clustering algorithm for final partitioning. We evaluate the proposed approach using real-world datasets of different sizes. Its performance is competitive to other both standard and new solutions, which require the number of clusters to be given as an input parameter.

2017 ◽  
Vol 2017 ◽  
pp. 1-4
Author(s):  
Seyed Morteza Mirafzal ◽  
Ali Zafari

Suppose thatΠ=Cay(Zn,Ω)andΛ=Cay(Zn,Ψm)are two Cayley graphs on the cyclic additive groupZn, wherenis an even integer,m=n/2+1,Ω=t∈Zn∣t  is  odd, andΨm=Ω∪{n/2}are the inverse-closed subsets ofZn-0. In this paper, it is shown thatΠis a distance-transitive graph, and, by this fact, we determine the adjacency matrix spectrum ofΠ. Finally, we show that ifn≥8andn/2is an even integer, then the adjacency matrix spectrum ofΛisn/2+11,1-n/21,1n-4/2,-1n/2(we write multiplicities as exponents).


Author(s):  
Mohana Priya K ◽  
Pooja Ragavi S ◽  
Krishna Priya G

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%


2021 ◽  
Vol 15 (6) ◽  
pp. 1-18
Author(s):  
Kai Liu ◽  
Xiangyu Li ◽  
Zhihui Zhu ◽  
Lodewijk Brand ◽  
Hua Wang

Nonnegative Matrix Factorization (NMF) is broadly used to determine class membership in a variety of clustering applications. From movie recommendations and image clustering to visual feature extractions, NMF has applications to solve a large number of knowledge discovery and data mining problems. Traditional optimization methods, such as the Multiplicative Updating Algorithm (MUA), solves the NMF problem by utilizing an auxiliary function to ensure that the objective monotonically decreases. Although the objective in MUA converges, there exists no proof to show that the learned matrix factors converge as well. Without this rigorous analysis, the clustering performance and stability of the NMF algorithms cannot be guaranteed. To address this knowledge gap, in this article, we study the factor-bounded NMF problem and provide a solution algorithm with proven convergence by rigorous mathematical analysis, which ensures that both the objective and matrix factors converge. In addition, we show the relationship between MUA and our solution followed by an analysis of the convergence of MUA. Experiments on both toy data and real-world datasets validate the correctness of our proposed method and its utility as an effective clustering algorithm.


2018 ◽  
Vol 12 (2) ◽  
pp. 116 ◽  
Author(s):  
Amjad Hudaib ◽  
Mohammad Khanafseh ◽  
Ola Surakhi

Clustering is the process of grouping a set of patterns into different disjoint clusters where each cluster contains the alike patterns. Many algorithms had been proposed before for clustering. K-medoid is a variant of k-mean that use an actual point in the cluster to represent it instead of the mean in the k-mean algorithm to get the outliers and reduce noise in the cluster. In order to enhance performance of k-medoid algorithm and get more accurate clusters, a hybrid algorithm is proposed which use CRO algorithm along with k-medoid. In this method, CRO is used to expand searching for the optimal medoid and enhance clustering by getting more precise results. The performance of the new algorithm is evaluated by comparing its results with five clustering algorithms, k-mean, k-medoid, DB/rand/1/bin, CRO based clustering algorithm and hybrid CRO-k-mean by using four real world datasets: Lung cancer, Iris, Breast cancer Wisconsin and Haberman’s survival from UCI machine learning data repository. The results were conducted and compared base on different metrics and show that proposed algorithm enhanced clustering technique by giving more accurate results.


2020 ◽  
pp. 1-31
Author(s):  
Abdul Rafae Khan ◽  
Asim Karim ◽  
Hassan Sajjad ◽  
Faisal Kamiran ◽  
Jia Xu

Abstract Roman Urdu is an informal form of the Urdu language written in Roman script, which is widely used in South Asia for online textual content. It lacks standard spelling and hence poses several normalization challenges during automatic language processing. In this article, we present a feature-based clustering framework for the lexical normalization of Roman Urdu corpora, which includes a phonetic algorithm UrduPhone, a string matching component, a feature-based similarity function, and a clustering algorithm Lex-Var. UrduPhone encodes Roman Urdu strings to their pronunciation-based representations. The string matching component handles character-level variations that occur when writing Urdu using Roman script. The similarity function incorporates various phonetic-based, string-based, and contextual features of words. The Lex-Var algorithm is a variant of the k-medoids clustering algorithm that groups lexical variations of words. It contains a similarity threshold to balance the number of clusters and their maximum similarity. The framework allows feature learning and optimization in addition to the use of predefined features and weights. We evaluate our framework extensively on four real-world datasets and show an F-measure gain of up to 15% from baseline methods. We also demonstrate the superiority of UrduPhone and Lex-Var in comparison to respective alternate algorithms in our clustering framework for the lexical normalization of Roman Urdu.


2017 ◽  
Vol 5 (1) ◽  
pp. 139-157 ◽  
Author(s):  
Sam Cole ◽  
Shmuel Friedland ◽  
Lev Reyzin

Abstract In this paper, we consider the planted partition model, in which n = ks vertices of a random graph are partitioned into k “clusters,” each of size s. Edges between vertices in the same cluster and different clusters are included with constant probability p and q, respectively (where 0 ≤ q < p ≤ 1). We give an efficient algorithm that, with high probability, recovers the clusters as long as the cluster sizes are are least (√n). Informally, our algorithm constructs the projection operator onto the dominant k-dimensional eigenspace of the graph’s adjacency matrix and uses it to recover one cluster at a time. To our knowledge, our algorithm is the first purely spectral algorithm which runs in polynomial time and works even when s = Θ (√n), though there have been several non-spectral algorithms which accomplish this. Our algorithm is also among the simplest of these spectral algorithms, and its proof of correctness illustrates the usefulness of the Cauchy integral formula in this domain.


Sign in / Sign up

Export Citation Format

Share Document