scholarly journals Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure

2016 ◽  
Vol 2016 ◽  
pp. 1-11
Author(s):  
Wen Zhang ◽  
Fan Xiao ◽  
Bin Li ◽  
Siguang Zhang

Recently, LSI (Latent Semantic Indexing) based on SVD (Singular Value Decomposition) is proposed to overcome the problems of polysemy and homonym in traditional lexical matching. However, it is usually criticized as with low discriminative power for representing documents although it has been validated as with good representative quality. In this paper, SVD on clusters is proposed to improve the discriminative power of LSI. The contribution of this paper is three manifolds. Firstly, we make a survey of existing linear algebra methods for LSI, including both SVD based methods and non-SVD based methods. Secondly, we propose SVD on clusters for LSI and theoretically explain that dimension expansion of document vectors and dimension projection using SVD are the two manipulations involved in SVD on clusters. Moreover, we develop updating processes to fold in new documents and terms in a decomposed matrix by SVD on clusters. Thirdly, two corpora, a Chinese corpus and an English corpus, are used to evaluate the performances of the proposed methods. Experiments demonstrate that, to some extent, SVD on clusters can improve the precision of interdocument similarity measure in comparison with other SVD based LSI methods.

Author(s):  
Jane E. Tougas

The tremendous size of the Internet and modern databases has made efficientsearching and information retrieval (IR) important. Latent semantic indexing (LSI) is an IR method that represents a dataset as a term-document matrix. LSI uses a matrix factorization method known as the partial singular value decomposition (PSVD). Calculating the PSVD of a large term-document matrix is computationally expensive. In a rapidly expanding environment, a term-document matrix is altered often as new documents and terms are added. Recomputing the PSVD of the term-document matrix each time these slight alterations occur can be prohibitively expensive. Folding-in is one method of adding new documents or terms to an LSI database; updating the PSVD of the existing LSI database is another. The folding-in method is computationally inexpensive, but may cause deterioration in the accuracy of the PSVD. The PSVD-updating method is computationally more expensive than the folding-inmethod, but better maintains the accuracy of the PSVD. Folding-up is a new method that combines folding-in and PSVD-updating. Folding-up is faster than either recomputing the PSVD or PSVD-updating, but avoids the degradation in the PSVD that can occur when the folding-in method is used on its own.La taille incroyable d’Internet et des bases de données modernes a fait en sorteque la recherche efficace d’informations est maintenant importante. L’indexation par sémantique latente (ISL) est une méthode de recherche d’informations qui représente un jeu de données comme une matrice document-terme. L’ISL comprend l’utilisation d’une méthode de factorisation matricielle connue sous le nom de décomposition partielle en valeurs singulières (DPVS). Le calcul de la DPVS d’une grande matrice document-terme est coûteux sur le plan des calculs. Dans un environnement en expansion rapide, une matrice document-terme est souvent modifiée à mesure que de nouveaux documents et termes sont ajoutés. Le recalcul de la DPVS de la matrice document-terme chaque fois qu’une légère modification est apportée peut devenir très coûteux. L’intégration (folding-in) est une méthode pour ajouter de nouveaux documents ou termes dans une base de donnée ISL, et la mise à jour de la DPVS de la base de données ISL existante en est une autre. La méthode d’intégration est peu coûteuse sur le plan des calculs, mais elle peut entraîner une perte d’exactitude de la DPVS. La méthode de mise à jour de la DPVS est plus coûteuse sur le plan des calculs, mais elle permet de mieux préserver l’exactitude de la DPVS. La méthode d’intégration et de mise à jour (folding-up) est une nouvelle méthode qui combine l’intégration et la mise à jour de la DPVS. Cette méthode est plus rapide que le recalcul ou la mise à jour de la DPVS, mais elle permet d’éviter la perte d’exactitude de la DPVS qui peut survenir quand seule la méthode d’intégration est utilisée.


2021 ◽  
Vol 4 (2) ◽  
pp. 64-70
Author(s):  
Agung Hasbi Ardiansyah ◽  
Kurnia Paranita Kartika ◽  
Saiful Nur Budiman

Ketika mendapat temuan atau laporan dugaan kasus pelanggaran pemilu, pengawas pemilu akan melakukan klarifikasi dan pencarian bukti-bukti yang cukup sebelum menentukan temuan atau laporan tersebut termasuk kedalam pelanggaran atau tidak. Pada saat proses klarifikasi, pengawas pemilu mencari pasal yang kemungkinan dilanggar pada temuan atau laporan yang masuk. Banyaknya pasal rujukan untuk masing-masing kasus pada temuan atau laporan terkadang menghambat pekerjaan petugas pengawas pemilu, sehingga dibutuhkan sebuah alat bantu untuk mempercepat proses pencarian pasal berdasarkan kasus pelanggaran. Pada penelitian ini, sistem temu balik informasi digunakan untuk mencari pasal-pasal pada undang-undang nomor 10 tahun 2016 yang relevan pada suatu kasus berdasarkan deskripsi kasus. Pada penelitian ini digunakan metode Latent Semantic Indexing (LSI). LSI menggunakan teknik Singular Value Decomposition (SVD) untuk mereduksi dimensi. Pada penelitian ini digunakan 37 pasal, dan 4 kasus atau deskripsi pelanggaran sebagai query. Sistem menerima masukkan berupa query atau deskripsi kasus pelanggaran kemudian menghitung dan menentukan pasal yang terkait. Tingkat keberhasilan dari metode ini untuk menemukan hasil pencarian yang relevan dapat dilihat melalui besar 100% untuk recall, 70% untuk precision dan 82% untuk f-measure.


2008 ◽  
Vol 130 (6) ◽  
Author(s):  
Zhixiang Xu ◽  
Tim Green

In most cases, the servo loops of computer numerically controlled (CNC) machine tools consist of position controllers, drivers, power transmissions, and tables. In the process of diagnosis, adjustment, and calibration of CNC machine tools, it is crucial to make servo loops’ performances as similar as possible, and ideally identical. This work is motivated by establishing a measure to evaluate the similarities between all coordinated axes. Based on the singular value decomposition (SVD) of time series, this contribution addresses an innovative approach to set up a similarity measure for evaluating the performances of CNC machines. A circular interpolation is carried out to sample the displacements of two involved axes into two independent time series. Then a special matrix called attractor is constructed from the time series and SVD algorithm is adopted to process attractors. As a result, a series of singular values is produced. From these values, the singular value ratio spectrum is formed and the similarity ratio, which numerically represents the similarity between the coordinated axes, is proposed. According to the similarity ratio, the similarity of the two series is compared. Finally, the approach has been validated by experimental measurements. The similarity measure presented in this paper provides an overall index on evaluating the mismatch between coordinated axes of CNC machine tools.


2021 ◽  
Vol 36 ◽  
pp. 04008
Author(s):  
Kong Hoong Lem

Singular value decomposition (SVD) is one of the most useful matrix decompositions in linear algebra. Here, a novel application of SVD in recovering ripped photos was exploited. Recovery was done by applying truncated SVD iteratively. Performance was evaluated using the Frobenius norm. Results from a few experimental photos were decent.


Sign in / Sign up

Export Citation Format

Share Document