Clustering Based on Eigenvectors of the Adjacency Matrix

Abstract The paper presents a novel spectral algorithm EVSA (eigenvector structure analysis), which uses eigenvalues and eigenvectors of the adjacency matrix in order to discover clusters. Based on matrix perturbation theory and properties of graph spectra we show that the adjacency matrix can be more suitable for partitioning than other Laplacian matrices. The main problem concerning the use of the adjacency matrix is the selection of the appropriate eigenvectors. We thus propose an approach based on analysis of the adjacency matrix spectrum and eigenvector pairwise correlations. Formulated rules and heuristics allow choosing the right eigenvectors representing clusters, i.e., automatically establishing the number of groups. The algorithm requires only one parameter-the number of nearest neighbors. Unlike many other spectral methods, our solution does not need an additional clustering algorithm for final partitioning. We evaluate the proposed approach using real-world datasets of different sizes. Its performance is competitive to other both standard and new solutions, which require the number of clusters to be given as an input parameter.

Download Full-text

An Interesting Property of a Class of Circulant Graphs

Journal of Mathematics ◽

10.1155/2017/6454736 ◽

2017 ◽

Vol 2017 ◽

pp. 1-4

Author(s):

Seyed Morteza Mirafzal ◽

Ali Zafari

Keyword(s):

Adjacency Matrix ◽

Cayley Graphs ◽

Additive Group ◽

Interesting Property ◽

Circulant Graphs ◽

Transitive Graph ◽

Matrix Spectrum ◽

Adjacency Matrix Spectrum

Suppose thatΠ=Cay(Zn,Ω)andΛ=Cay(Zn,Ψm)are two Cayley graphs on the cyclic additive groupZn, wherenis an even integer,m=n/2+1,Ω=t∈Zn∣t is odd, andΨm=Ω∪{n/2}are the inverse-closed subsets ofZn-0. In this paper, it is shown thatΠis a distance-transitive graph, and, by this fact, we determine the adjacency matrix spectrum ofΠ. Finally, we show that ifn≥8andn/2is an even integer, then the adjacency matrix spectrum ofΛisn/2+11,1-n/21,1n-4/2,-1n/2(we write multiplicities as exponents).

Download Full-text

Handling WSD using Hierarchical Clustering Algorithm with sentences

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset1841120 ◽

2018 ◽

pp. 83-88

Author(s):

Mohana Priya K ◽

Pooja Ragavi S ◽

Krishna Priya G

Keyword(s):

Hierarchical Clustering ◽

Similarity Measure ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cosine Similarity Measure ◽

Hierarchical Clustering Algorithm ◽

Multiple Levels ◽

Pos Tagger ◽

Sentence Clustering ◽

The Right

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%

Download Full-text

Factor-Bounded Nonnegative Matrix Factorization

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3451395 ◽

2021 ◽

Vol 15 (6) ◽

pp. 1-18

Author(s):

Kai Liu ◽

Xiangyu Li ◽

Zhihui Zhu ◽

Lodewijk Brand ◽

Hua Wang

Keyword(s):

Matrix Factorization ◽

Clustering Algorithm ◽

Nonnegative Matrix Factorization ◽

Nonnegative Matrix ◽

Optimization Methods ◽

Auxiliary Function ◽

Image Clustering ◽

Real World Datasets ◽

The Relationship ◽

Matrix Factors

Nonnegative Matrix Factorization (NMF) is broadly used to determine class membership in a variety of clustering applications. From movie recommendations and image clustering to visual feature extractions, NMF has applications to solve a large number of knowledge discovery and data mining problems. Traditional optimization methods, such as the Multiplicative Updating Algorithm (MUA), solves the NMF problem by utilizing an auxiliary function to ensure that the objective monotonically decreases. Although the objective in MUA converges, there exists no proof to show that the learned matrix factors converge as well. Without this rigorous analysis, the clustering performance and stability of the NMF algorithms cannot be guaranteed. To address this knowledge gap, in this article, we study the factor-bounded NMF problem and provide a solution algorithm with proven convergence by rigorous mathematical analysis, which ensures that both the objective and matrix factors converge. In addition, we show the relationship between MUA and our solution followed by an analysis of the convergence of MUA. Experiments on both toy data and real-world datasets validate the correctness of our proposed method and its utility as an effective clustering algorithm.

Download Full-text

Linear scaling density matrix perturbation theory for basis-set-dependent quantum response calculations: An orthogonal formulation

The Journal of Chemical Physics ◽

10.1063/1.2755775 ◽

2007 ◽

Vol 127 (6) ◽

pp. 064105 ◽

Cited By ~ 19

Author(s):

Anders M. N. Niklasson ◽

Valéry Weber

Keyword(s):

Perturbation Theory ◽

Density Matrix ◽

Basis Set ◽

Linear Scaling ◽

Matrix Perturbation ◽

Matrix Perturbation Theory

Download Full-text

An Improved Version of K-medoid Algorithm using CRO

Modern Applied Science ◽

10.5539/mas.v12n2p116 ◽

2018 ◽

Vol 12 (2) ◽

pp. 116 ◽

Cited By ~ 2

Author(s):

Amjad Hudaib ◽

Mohammad Khanafseh ◽

Ola Surakhi

Keyword(s):

Breast Cancer ◽

Lung Cancer ◽

Hybrid Algorithm ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Data Repository ◽

The Mean ◽

Real World Datasets ◽

Actual Point ◽

Learning Data

Clustering is the process of grouping a set of patterns into different disjoint clusters where each cluster contains the alike patterns. Many algorithms had been proposed before for clustering. K-medoid is a variant of k-mean that use an actual point in the cluster to represent it instead of the mean in the k-mean algorithm to get the outliers and reduce noise in the cluster. In order to enhance performance of k-medoid algorithm and get more accurate clusters, a hybrid algorithm is proposed which use CRO algorithm along with k-medoid. In this method, CRO is used to expand searching for the optimal medoid and enhance clustering by getting more precise results. The performance of the new algorithm is evaluated by comparing its results with five clustering algorithms, k-mean, k-medoid, DB/rand/1/bin, CRO based clustering algorithm and hybrid CRO-k-mean by using four real world datasets: Lung cancer, Iris, Breast cancer Wisconsin and Haberman’s survival from UCI machine learning data repository. The results were conducted and compared base on different metrics and show that proposed algorithm enhanced clustering technique by giving more accurate results.

Download Full-text

Matrix Perturbation Theory

Discrete Mathematics and Its Applications - Handbook of Linear Algebra, Second Edition ◽

10.1201/b16113-25 ◽

2013 ◽

pp. 343-362

Author(s):

Ren-Cang Li

Keyword(s):

Perturbation Theory ◽

Matrix Perturbation ◽

Matrix Perturbation Theory

Download Full-text

Matrix Perturbation Theory

Handbook of Linear Algebra - Discrete Mathematics and Its Applications ◽

10.1201/9781420010572.ch15 ◽

2006 ◽

pp. 15-1-15-17

Author(s):

Ren-Cang Li

Keyword(s):

Perturbation Theory ◽

Matrix Perturbation ◽

Matrix Perturbation Theory

Download Full-text

Applications of matrix perturbation theory to delayed cyber-physical power system

International Journal of Electrical Power & Energy Systems ◽

10.1016/j.ijepes.2018.12.023 ◽

2019 ◽

Vol 107 ◽

pp. 507-515 ◽

Cited By ~ 2

Author(s):

Qianying Mou ◽

Hua Ye ◽

Yutian Liu ◽

Lei Gao

Keyword(s):

Perturbation Theory ◽

Power System ◽

Matrix Perturbation ◽

Matrix Perturbation Theory

Download Full-text

A clustering framework for lexical normalization of Roman Urdu

Natural Language Engineering ◽

10.1017/s1351324920000285 ◽

2020 ◽

pp. 1-31

Author(s):

Abdul Rafae Khan ◽

Asim Karim ◽

Hassan Sajjad ◽

Faisal Kamiran ◽

Jia Xu

Keyword(s):

Language Processing ◽

Clustering Algorithm ◽

String Matching ◽

Feature Learning ◽

Similarity Function ◽

Feature Based ◽

Real World Datasets ◽

Roman Script ◽

Maximum Similarity ◽

Textual Content

Abstract Roman Urdu is an informal form of the Urdu language written in Roman script, which is widely used in South Asia for online textual content. It lacks standard spelling and hence poses several normalization challenges during automatic language processing. In this article, we present a feature-based clustering framework for the lexical normalization of Roman Urdu corpora, which includes a phonetic algorithm UrduPhone, a string matching component, a feature-based similarity function, and a clustering algorithm Lex-Var. UrduPhone encodes Roman Urdu strings to their pronunciation-based representations. The string matching component handles character-level variations that occur when writing Urdu using Roman script. The similarity function incorporates various phonetic-based, string-based, and contextual features of words. The Lex-Var algorithm is a variant of the k-medoids clustering algorithm that groups lexical variations of words. It contains a similarity threshold to balance the number of clusters and their maximum similarity. The framework allows feature learning and optimization in addition to the use of predefined features and weights. We evaluate our framework extensively on four real-world datasets and show an F-measure gain of up to 15% from baseline methods. We also demonstrate the superiority of UrduPhone and Lex-Var in comparison to respective alternate algorithms in our clustering framework for the lexical normalization of Roman Urdu.

Download Full-text

A simple spectral algorithm for recovering planted partitions

Special Matrices ◽

10.1515/spma-2017-0013 ◽

2017 ◽

Vol 5 (1) ◽

pp. 139-157 ◽

Cited By ~ 2

Author(s):

Sam Cole ◽

Shmuel Friedland ◽

Lev Reyzin

Keyword(s):

Adjacency Matrix ◽

Projection Operator ◽

High Probability ◽

Integral Formula ◽

Cauchy Integral Formula ◽

Cauchy Integral ◽

Partition Model ◽

Spectral Algorithms ◽

Spectral Algorithm ◽

Proof Of Correctness

Abstract In this paper, we consider the planted partition model, in which n = ks vertices of a random graph are partitioned into k “clusters,” each of size s. Edges between vertices in the same cluster and different clusters are included with constant probability p and q, respectively (where 0 ≤ q < p ≤ 1). We give an efficient algorithm that, with high probability, recovers the clusters as long as the cluster sizes are are least (√n). Informally, our algorithm constructs the projection operator onto the dominant k-dimensional eigenspace of the graph’s adjacency matrix and uses it to recover one cluster at a time. To our knowledge, our algorithm is the first purely spectral algorithm which runs in polynomial time and works even when s = Θ (√n), though there have been several non-spectral algorithms which accomplish this. Our algorithm is also among the simplest of these spectral algorithms, and its proof of correctness illustrates the usefulness of the Cauchy integral formula in this domain.

Download Full-text