Dynamic Document Clustering Using Singular Value Decomposition

Incremental document clustering is important in many applications, but particularly so in healthcare contexts where text data is found in abundance, ranging from published research in journals to day-to-day healthcare data such as discharge summaries and nursing notes. In such dynamic environments new documents are constantly added to the set of documents that have been used in the initial cluster formation. Hence it is important to be able to incrementally update the clusters at a low computational cost as new documents are added. In this paper the authors describe a novel, low cost approach for incremental document clustering. Their method is based on conducting singular value decomposition (SVD) incrementally. They dynamically fold in new documents into the existing term-document space and dynamically assign these new documents into pre-defined clusters based on intra-cluster similarity. This saves the cost of re-computing SVD on the entire document set every time updates occur. The authors also provide a way to retrieve documents based on different window sizes with high scalability and good clustering accuracy. They have tested their proposed method experimentally with 960 medical abstracts retrieved from the PubMed medical library. The authors’ incremental method is compared with the default situation where complete re-computation of SVD is done when new documents are added to the initial set of documents. The results show minor decreases in the quality of the cluster formation but much larger gains in computational throughput.

Download Full-text

Evaluation of Clustering Patterns using Singular Value Decomposition (SVD)

Innovations in Data Methodologies and Computational Algorithms for Medical Applications ◽

10.4018/978-1-4666-0282-3.ch013 ◽

2012 ◽

pp. 222-232

Author(s):

Josephine M. Namayanja

Keyword(s):

Singular Value Decomposition ◽

Data Reduction ◽

Cluster Formation ◽

Age Groups ◽

Singular Value ◽

Computational Techniques ◽

Reduction Techniques ◽

Data Reduction Technique ◽

Clustering Data ◽

Value Decomposition

Computational techniques, such as Simple K, have been used for exploratory analysis in applications ranging from data mining research, machine learning, and computational biology. The medical domain has benefitted from these applications, and in this regard, the authors analyze patterns in individuals of selected age groups linked with the possibility of Metabolic Syndrome (MetS), a disorder affecting approximately 45% of the elderly. The study identifies groups of individuals behaving in two defined categories, that is, those diagnosed with MetS (MetS Positive) and those who are not (MetS Negative), comparing the pattern definition. The paper compares the cluster formation in patterns when using a data reduction technique referred to as Singular Value Decomposition (SVD) versus eliminating its application in clustering. Data reduction techniques like SVD have proved to be very useful in projecting only what is considered to be key relations in the data by suppressing the less important ones. With the existence of high dimensionality, the importance of SVD can be highly effective. By applying two internal measures to validate the cluster quality, findings in this study prove interesting in context to both approaches.

Download Full-text

Split-and-Combine Singular Value Decomposition for Large-Scale Matrix

Journal of Applied Mathematics ◽

10.1155/2013/683053 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 6

Author(s):

Jengnan Tzeng

Keyword(s):

Singular Value Decomposition ◽

Large Scale ◽

Semantic Analysis ◽

Computational Cost ◽

Matrix Decomposition ◽

Singular Value ◽

Fast Method ◽

The Matrix ◽

Scale Matrix ◽

Value Decomposition

The singular value decomposition (SVD) is a fundamental matrix decomposition in linear algebra. It is widely applied in many modern techniques, for example, high- dimensional data visualization, dimension reduction, data mining, latent semantic analysis, and so forth. Although the SVD plays an essential role in these fields, its apparent weakness is the order three computational cost. This order three computational cost makes many modern applications infeasible, especially when the scale of the data is huge and growing. Therefore, it is imperative to develop a fast SVD method in modern era. If the rank of matrix is much smaller than the matrix size, there are already some fast SVD approaches. In this paper, we focus on this case but with the additional condition that the data is considerably huge to be stored as a matrix form. We will demonstrate that this fast SVD result is sufficiently accurate, and most importantly it can be derived immediately. Using this fast method, many infeasible modern techniques based on the SVD will become viable.

Download Full-text

A Unified Scalable Equivalent Formulation for Schatten Quasi-Norms

Mathematics ◽

10.3390/math8081325 ◽

2020 ◽

Vol 8 (8) ◽

pp. 1325

Author(s):

Fanhua Shang ◽

Yuanyuan Liu ◽

Fanjie Shang ◽

Hongying Liu ◽

Lin Kong ◽

...

Keyword(s):

Singular Value Decomposition ◽

Computational Cost ◽

Singular Value ◽

Weighted Sum ◽

Equivalent Formulation ◽

Norm Minimization ◽

The Mean ◽

Value Decomposition ◽

High Computational Cost ◽

Theoretical Results

The Schatten quasi-norm is an approximation of the rank, which is tighter than the nuclear norm. However, most Schatten quasi-norm minimization (SQNM) algorithms suffer from high computational cost to compute the singular value decomposition (SVD) of large matrices at each iteration. In this paper, we prove that for any p, p1, p2>0 satisfying 1/p=1/p1+1/p2, the Schatten p-(quasi-)norm of any matrix is equivalent to minimizing the product of the Schatten p1-(quasi-)norm and Schatten p2-(quasi-)norm of its two much smaller factor matrices. Then, we present and prove the equivalence between the product and its weighted sum formulations for two cases: p1=p2 and p1≠p2. In particular, when p>1/2, there is an equivalence between the Schatten p-quasi-norm of any matrix and the Schatten 2p-norms of its two factor matrices. We further extend the theoretical results of two factor matrices to the cases of three and more factor matrices, from which we can see that for any 0<p<1, the Schatten p-quasi-norm of any matrix is the minimization of the mean of the Schatten (⌊1/p⌋+1)p-norms of ⌊1/p⌋+1 factor matrices, where ⌊1/p⌋ denotes the largest integer not exceeding 1/p.

Download Full-text

Hybrid SVD Model for Document Representation

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1191.0986s319 ◽

2019 ◽

Vol 8 (6S3) ◽

pp. 1147-1150

Keyword(s):

Singular Value Decomposition ◽

Performance Measures ◽

Document Clustering ◽

Singular Value ◽

High Dimensionality ◽

Racial Groups ◽

Document Representation ◽

Uniform Manner ◽

Value Decomposition ◽

Difficult Issue

Document clusters are the way to segment a certain set of text into racial groups. Nowadays all records are in electronic form due to the problem of retrieving appropriate document from the big database. The objective is to convert text consisting of daily language into a structured database format. Different documents are thus summarized and presented in a uniform manner. Big quantity, high dimensionality and complicated semantics are the difficult issue of document clustering. The aim of this article is primarily to cluster multisense word embedding using three distinct algorithms (K-means, DBSCAN, CURE) using singular value decomposition. In this performance measures are measured using different metrics.

Download Full-text

A Singular Value Decomposition Based Low-Computational Zero-Watermark Algorithm for Digital Right Management

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1010.0886s19 ◽

2019 ◽

Vol 8 (6S) ◽

pp. 51-56

Keyword(s):

Singular Value Decomposition ◽

Computational Cost ◽

Digital Rights Management ◽

Singular Value ◽

Significant Feature ◽

Source Image ◽

Living Room ◽

Rights Management ◽

Value Decomposition ◽

Watermark Algorithm

Digital rights management (DRM) is a systematic approach used for protecting the exclusive rights in the digital mass media. It uses a set of technologies to control doubling and reproducing exclusive rights for the digital works and software. The digital watermarking is one of the powerful technologies that play a vital role in numeral rights management. In this paper, a low-computational zero watermark (ZW) algorithm has been projected. It depends on the singular value decomposition (SVD) and implemented on standard cameraman, Barbara, Lena and living room images without attack and with various attacks. The significant feature of this algorithm is that it does not fuse any watermarking in the given source image and hence the result of the zero-watermark algorithm is looking very similar to the source image. This zero-watermark property is obtained by using SVD approach in which the ZW sequence is computed in accordance with the equivalence of prior digits of major remarkable worth in every slab. The implementation consequences shows highest similarity measures of 0.8658 for cameraman image. Further, the computational cost of the algorithm is calculated as 4.442 msec of execution time for all the images under watermark embedder and watermark extractor phases. The PSNR values are calculated for the watermarked images for testing the robustness in the algorithm that is proposed, and the observations have shown the promising results against attack

Download Full-text

WEB NEWS DOCUMENTS CLUSTERING IN INDONESIAN LANGUAGE USING SINGULAR VALUE DECOMPOSITION-PRINCIPAL COMPONENT ANALYSIS (SVDPCA) AND ANT ALGORITHMS

Jurnal Ilmu Komputer dan Informasi ◽

10.21609/jiki.v9i1.362 ◽

2016 ◽

Vol 9 (1) ◽

pp. 17

Author(s):

Arif Fadllullah ◽

Dasrit Debora Kamudi ◽

Muhamad Nasir ◽

Agus Zainal Arifin ◽

Diana Purwitasari

Keyword(s):

Principal Component Analysis ◽

Singular Value Decomposition ◽

Dimension Reduction ◽

Document Clustering ◽

Principal Component ◽

Component Analysis ◽

Singular Value ◽

Ant Algorithms ◽

Value Decomposition ◽

Trial Phase

Ant-based document clustering is a cluster method of measuring text documents similarity based on the shortest path between nodes (trial phase) and determines the optimal clusters of sequence document similarity (dividing phase). The processing time of trial phase Ant algorithms to make document vectors is very long because of high dimensional Document-Term Matrix (DTM). In this paper, we proposed a document clustering method for optimizing dimension reduction using Singular Value Decomposition-Principal Component Analysis (SVDPCA) and Ant algorithms. SVDPCA reduces size of the DTM dimensions by converting freq-term of conventional DTM to score-pc of Document-PC Matrix (DPCM). Ant algorithms creates documents clustering using the vector space model based on the dimension reduction result of DPCM. The experimental results on 506 news documents in Indonesian language demonstrated that the proposed method worked well to optimize dimension reduction up to 99.7%. We could speed up execution time efficiently of the trial phase and maintain the best F-measure achieved from experiments was 0.88 (88%).

Download Full-text

USE OF SINGULAR-VALUE DECOMPOSITION IN GRAVITATIONAL-WAVE DATA ANALYSIS

International Journal of Modern Physics Conference Series ◽

10.1142/s2010194513011136 ◽

2013 ◽

Vol 23 ◽

pp. 99-105

Author(s):

DREW KEPPEL

Keyword(s):

Parameter Estimation ◽

Data Analysis ◽

Singular Value Decomposition ◽

Gravitational Wave ◽

Computational Cost ◽

Singular Value ◽

Matched Filtering ◽

Powerful Technique ◽

Wave Data ◽

Value Decomposition

Singular-value decomposition is a powerful technique that has been used in the analysis of matrices in many fields. In this paper, we summarize how it has been applied to the analysis of gravitational-wave data analysis. These include producing basis waveforms for matched filtering, decreasing the computational cost of searching for many waveforms, improving parameter estimation, and providing a method of waveform interpolation.

Download Full-text

A dynamic adaptive AHRS algorithm for UAV based on SVDCKF

Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University ◽

10.1051/jnwpu/20213920350 ◽

2021 ◽

Vol 39 (2) ◽

pp. 350-358

Author(s):

Yue Yang ◽

Xiaoxiong Liu ◽

Weiguo Zhang ◽

Xuhang Liu ◽

Yicong Guo

Keyword(s):

Kalman Filter ◽

Singular Value Decomposition ◽

Adaptive Filtering ◽

Low Cost ◽

Singular Value ◽

Noise Variance ◽

Cubature Kalman Filter ◽

Value Decomposition ◽

Solution Accuracy ◽

Attitude Sensor

Aiming at the attitude solution accuracy and robustness for small UAVs in complex flight conditions, this paper proposes a dynamic adaptive attitude and heading systems(AHRS) estimator with singular value decomposition Cubature Kalman filter(SVDCKF). Considering the problem of random bias for the low-cost attitude sensor, this paper designs a method that the sensor random bias is used as the state vector to eliminate the effect of the sensor random bias. Due to the non-linearity of small UAVs AHRS model and the non-positive definite phenomenon of the covariance matrix, a nonlinear AHRS filter combined with the Cubature Kalman filter and singular value decomposition is designed to improve the attitude solution accuracy. In addition, when the UAV flies in the different flight conditions, the three-axis acceleration of the attitude sensor will affect the attitude solution. Thus, a dynamic adaptive factor based on adaptive filtering is used to adjust continuously the acceleration noise variance to improve the robustness of the AHRS. The experimental results show that the method and algorithm proposed not only improve the attitude solution accuracy, and satisfy the flight requirements of small UAVs, but also eliminate the influence of the attitude sensor random bias and three-axis acceleration for the attitude solution to improve the proposed algorithm robustness and anti-interference.

Download Full-text

Evaluation of Clustering Patterns using Singular Value Decomposition (SVD)

International Journal of Computational Models and Algorithms in Medicine ◽

10.4018/jcmam.2010070104 ◽

2010 ◽

Vol 1 (3) ◽

pp. 69-80 ◽

Cited By ~ 1

Author(s):

Josephine M. Namayanja

Keyword(s):

Singular Value Decomposition ◽

Data Reduction ◽

Cluster Formation ◽

Age Groups ◽

The Elderly ◽

Singular Value ◽

Computational Techniques ◽

Reduction Techniques ◽

Clustering Data ◽

Value Decomposition

Download Full-text

Similarity Measure Algorithm for Text Document Clustering, Using Singular Value Decomposition

Current Journal of Applied Science and Technology ◽

10.9734/cjast/2021/v40i2231475 ◽

2021 ◽

pp. 8-25

Author(s):

Valentina Adu ◽

Michael Donkor Adane ◽

Kwadwo Asante

Keyword(s):

Data Mining ◽

Singular Value Decomposition ◽

Similarity Measure ◽

Document Clustering ◽

Singular Value ◽

Data Mining Algorithm ◽

Original Matrix ◽

Text Documents ◽

Text Document ◽

Value Decomposition

We examined a similarity measure between text documents clustering. Data mining is a challenging field with more research and application areas. Text document clustering, which is a subset of data mining helps groups and organizes a large quantity of unstructured text documents into a small number of meaningful clusters. An algorithm which works better by calculating the degree of closeness of documents using their document matrix was used to query the terms/words in each document. We also determined whether a given set of text documents are similar/different to the other when these terms are queried. We found that, the ability to rank and approximate documents using matrix allows the use of Singular Value Decomposition (SVD) as an enhanced text data mining algorithm. Also, applying SVD to a matrix of a high dimension results in matrix of a lower dimension, to expose the relationships in the original matrix by ordering it from the most variant to the lowest.

Download Full-text