Document Similarity Detection Using Indonesian Language Word2vec Model

Author(s):  
Nahda Rosa Ramadhanti ◽  
Siti Mariyah
Author(s):  
Papias Niyigena ◽  
Zhang Zuping ◽  
Mansoor Ahmed Khuhro ◽  
Damien Hanyurwimfura

Author(s):  
Wendi Usino ◽  
Anton Satria ◽  
Khalid Hamed ◽  
Arif Bramantoro ◽  
Hasniaty A ◽  
...  

2021 ◽  
Vol 11 (24) ◽  
pp. 12040
Author(s):  
Mustafa A. Al Sibahee ◽  
Ayad I. Abdulsada ◽  
Zaid Ameen Abduljabbar ◽  
Junchao Ma ◽  
Vincent Omollo Nyangaresi ◽  
...  

Applications for document similarity detection are widespread in diverse communities, including institutions and corporations. However, currently available detection systems fail to take into account the private nature of material or documents that have been outsourced to remote servers. None of the existing solutions can be described as lightweight techniques that are compatible with lightweight client implementation, and this deficiency can limit the effectiveness of these systems. For instance, the discovery of similarity between two conferences or journals must maintain the privacy of the submitted papers in a lightweight manner to ensure that the security and application requirements for limited-resource devices are fulfilled. This paper considers the problem of lightweight similarity detection between document sets while preserving the privacy of the material. The proposed solution permits documents to be compared without disclosing the content to untrusted servers. The fingerprint set for each document is determined in an efficient manner, also developing an inverted index that uses the whole set of fingerprints. Before being uploaded to the untrusted server, this index is secured by the Paillier cryptosystem. This study develops a secure, yet efficient method for scalable encrypted document comparison. To evaluate the computational performance of this method, this paper carries out several comparative assessments against other major approaches.


Sign in / Sign up

Export Citation Format

Share Document