Dynamic Document Clustering Using Singular Value Decomposition

Author(s):  
Rashmi Nadubeediramesh ◽  
Aryya Gangopadhyay

Incremental document clustering is important in many applications, but particularly so in healthcare contexts where text data is found in abundance, ranging from published research in journals to day-to-day healthcare data such as discharge summaries and nursing notes. In such dynamic environments new documents are constantly added to the set of documents that have been used in the initial cluster formation. Hence it is important to be able to incrementally update the clusters at a low computational cost as new documents are added. In this paper the authors describe a novel, low cost approach for incremental document clustering. Their method is based on conducting singular value decomposition (SVD) incrementally. They dynamically fold in new documents into the existing term-document space and dynamically assign these new documents into pre-defined clusters based on intra-cluster similarity. This saves the cost of re-computing SVD on the entire document set every time updates occur. The authors also provide a way to retrieve documents based on different window sizes with high scalability and good clustering accuracy. They have tested their proposed method experimentally with 960 medical abstracts retrieved from the PubMed medical library. The authors’ incremental method is compared with the default situation where complete re-computation of SVD is done when new documents are added to the initial set of documents. The results show minor decreases in the quality of the cluster formation but much larger gains in computational throughput.

Author(s):  
Josephine M. Namayanja

Computational techniques, such as Simple K, have been used for exploratory analysis in applications ranging from data mining research, machine learning, and computational biology. The medical domain has benefitted from these applications, and in this regard, the authors analyze patterns in individuals of selected age groups linked with the possibility of Metabolic Syndrome (MetS), a disorder affecting approximately 45% of the elderly. The study identifies groups of individuals behaving in two defined categories, that is, those diagnosed with MetS (MetS Positive) and those who are not (MetS Negative), comparing the pattern definition. The paper compares the cluster formation in patterns when using a data reduction technique referred to as Singular Value Decomposition (SVD) versus eliminating its application in clustering. Data reduction techniques like SVD have proved to be very useful in projecting only what is considered to be key relations in the data by suppressing the less important ones. With the existence of high dimensionality, the importance of SVD can be highly effective. By applying two internal measures to validate the cluster quality, findings in this study prove interesting in context to both approaches.


2013 ◽  
Vol 2013 ◽  
pp. 1-8 ◽  
Author(s):  
Jengnan Tzeng

The singular value decomposition (SVD) is a fundamental matrix decomposition in linear algebra. It is widely applied in many modern techniques, for example, high- dimensional data visualization, dimension reduction, data mining, latent semantic analysis, and so forth. Although the SVD plays an essential role in these fields, its apparent weakness is the order three computational cost. This order three computational cost makes many modern applications infeasible, especially when the scale of the data is huge and growing. Therefore, it is imperative to develop a fast SVD method in modern era. If the rank of matrix is much smaller than the matrix size, there are already some fast SVD approaches. In this paper, we focus on this case but with the additional condition that the data is considerably huge to be stored as a matrix form. We will demonstrate that this fast SVD result is sufficiently accurate, and most importantly it can be derived immediately. Using this fast method, many infeasible modern techniques based on the SVD will become viable.


Mathematics ◽  
2020 ◽  
Vol 8 (8) ◽  
pp. 1325
Author(s):  
Fanhua Shang ◽  
Yuanyuan Liu ◽  
Fanjie Shang ◽  
Hongying Liu ◽  
Lin Kong ◽  
...  

The Schatten quasi-norm is an approximation of the rank, which is tighter than the nuclear norm. However, most Schatten quasi-norm minimization (SQNM) algorithms suffer from high computational cost to compute the singular value decomposition (SVD) of large matrices at each iteration. In this paper, we prove that for any p, p1, p2>0 satisfying 1/p=1/p1+1/p2, the Schatten p-(quasi-)norm of any matrix is equivalent to minimizing the product of the Schatten p1-(quasi-)norm and Schatten p2-(quasi-)norm of its two much smaller factor matrices. Then, we present and prove the equivalence between the product and its weighted sum formulations for two cases: p1=p2 and p1≠p2. In particular, when p>1/2, there is an equivalence between the Schatten p-quasi-norm of any matrix and the Schatten 2p-norms of its two factor matrices. We further extend the theoretical results of two factor matrices to the cases of three and more factor matrices, from which we can see that for any 0<p<1, the Schatten p-quasi-norm of any matrix is the minimization of the mean of the Schatten (⌊1/p⌋+1)p-norms of ⌊1/p⌋+1 factor matrices, where ⌊1/p⌋ denotes the largest integer not exceeding 1/p.


Document clusters are the way to segment a certain set of text into racial groups. Nowadays all records are in electronic form due to the problem of retrieving appropriate document from the big database. The objective is to convert text consisting of daily language into a structured database format. Different documents are thus summarized and presented in a uniform manner. Big quantity, high dimensionality and complicated semantics are the difficult issue of document clustering. The aim of this article is primarily to cluster multisense word embedding using three distinct algorithms (K-means, DBSCAN, CURE) using singular value decomposition. In this performance measures are measured using different metrics.


Digital rights management (DRM) is a systematic approach used for protecting the exclusive rights in the digital mass media. It uses a set of technologies to control doubling and reproducing exclusive rights for the digital works and software. The digital watermarking is one of the powerful technologies that play a vital role in numeral rights management. In this paper, a low-computational zero watermark (ZW) algorithm has been projected. It depends on the singular value decomposition (SVD) and implemented on standard cameraman, Barbara, Lena and living room images without attack and with various attacks. The significant feature of this algorithm is that it does not fuse any watermarking in the given source image and hence the result of the zero-watermark algorithm is looking very similar to the source image. This zero-watermark property is obtained by using SVD approach in which the ZW sequence is computed in accordance with the equivalence of prior digits of major remarkable worth in every slab. The implementation consequences shows highest similarity measures of 0.8658 for cameraman image. Further, the computational cost of the algorithm is calculated as 4.442 msec of execution time for all the images under watermark embedder and watermark extractor phases. The PSNR values are calculated for the watermarked images for testing the robustness in the algorithm that is proposed, and the observations have shown the promising results against attack


2016 ◽  
Vol 9 (1) ◽  
pp. 17
Author(s):  
Arif Fadllullah ◽  
Dasrit Debora Kamudi ◽  
Muhamad Nasir ◽  
Agus Zainal Arifin ◽  
Diana Purwitasari

Ant-based document clustering is a cluster method of measuring text documents similarity based on the shortest path between nodes (trial phase) and determines the optimal clusters of sequence document similarity (dividing phase). The processing time of trial phase Ant algorithms to make document vectors is very long because of high dimensional Document-Term Matrix (DTM). In this paper, we proposed a document clustering method for optimizing dimension reduction using Singular Value Decomposition-Principal Component Analysis (SVDPCA) and Ant algorithms. SVDPCA reduces size of the DTM dimensions by converting freq-term of conventional DTM to score-pc of Document-PC Matrix (DPCM). Ant algorithms creates documents clustering using the vector space model based on the dimension reduction result of DPCM. The experimental results on 506 news documents in Indonesian language demonstrated that the proposed method worked well to optimize dimension reduction up to 99.7%. We could speed up execution time efficiently of the trial phase and maintain the best F-measure achieved from experiments was 0.88 (88%).


Author(s):  
DREW KEPPEL

Singular-value decomposition is a powerful technique that has been used in the analysis of matrices in many fields. In this paper, we summarize how it has been applied to the analysis of gravitational-wave data analysis. These include producing basis waveforms for matched filtering, decreasing the computational cost of searching for many waveforms, improving parameter estimation, and providing a method of waveform interpolation.


Author(s):  
Yue Yang ◽  
Xiaoxiong Liu ◽  
Weiguo Zhang ◽  
Xuhang Liu ◽  
Yicong Guo

Aiming at the attitude solution accuracy and robustness for small UAVs in complex flight conditions, this paper proposes a dynamic adaptive attitude and heading systems(AHRS) estimator with singular value decomposition Cubature Kalman filter(SVDCKF). Considering the problem of random bias for the low-cost attitude sensor, this paper designs a method that the sensor random bias is used as the state vector to eliminate the effect of the sensor random bias. Due to the non-linearity of small UAVs AHRS model and the non-positive definite phenomenon of the covariance matrix, a nonlinear AHRS filter combined with the Cubature Kalman filter and singular value decomposition is designed to improve the attitude solution accuracy. In addition, when the UAV flies in the different flight conditions, the three-axis acceleration of the attitude sensor will affect the attitude solution. Thus, a dynamic adaptive factor based on adaptive filtering is used to adjust continuously the acceleration noise variance to improve the robustness of the AHRS. The experimental results show that the method and algorithm proposed not only improve the attitude solution accuracy, and satisfy the flight requirements of small UAVs, but also eliminate the influence of the attitude sensor random bias and three-axis acceleration for the attitude solution to improve the proposed algorithm robustness and anti-interference.


Author(s):  
Josephine M. Namayanja

Computational techniques, such as Simple K, have been used for exploratory analysis in applications ranging from data mining research, machine learning, and computational biology. The medical domain has benefitted from these applications, and in this regard, the authors analyze patterns in individuals of selected age groups linked with the possibility of Metabolic Syndrome (MetS), a disorder affecting approximately 45% of the elderly. The study identifies groups of individuals behaving in two defined categories, that is, those diagnosed with MetS (MetS Positive) and those who are not (MetS Negative), comparing the pattern definition. The paper compares the cluster formation in patterns when using a data reduction technique referred to as Singular Value Decomposition (SVD) versus eliminating its application in clustering. Data reduction techniques like SVD have proved to be very useful in projecting only what is considered to be key relations in the data by suppressing the less important ones. With the existence of high dimensionality, the importance of SVD can be highly effective. By applying two internal measures to validate the cluster quality, findings in this study prove interesting in context to both approaches.


Author(s):  
Valentina Adu ◽  
Michael Donkor Adane ◽  
Kwadwo Asante

We examined a similarity measure between text documents clustering. Data mining is a challenging field with more research and application areas. Text document clustering, which is a subset of data mining helps groups and organizes a large quantity of unstructured text documents into a small number of meaningful clusters. An algorithm which works better by calculating the degree of closeness of documents using their document matrix was used to query the terms/words in each document. We also determined whether a given set of text documents are similar/different to the other when these terms are queried. We found that, the ability to rank and approximate documents using matrix allows the use of Singular Value Decomposition (SVD) as an enhanced text data mining algorithm. Also, applying SVD to a matrix of a high dimension results in matrix of a lower dimension, to expose the relationships in the original matrix by ordering it from the most variant to the lowest.


Sign in / Sign up

Export Citation Format

Share Document