document similarity Latest Research Papers

Event Graph Neural Network for Opinion Target Classification of Microblog Comments

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3469725 ◽

2022 ◽

Vol 21 (1) ◽

pp. 1-13

Author(s):

Yan Xiang ◽

Zhengtao Yu ◽

Junjun Guo ◽

Yuxin Huang ◽

Yantuan Xian

Keyword(s):

Neural Network ◽

Public Opinion ◽

Classification Performance ◽

Experimental Results ◽

Target Classification ◽

Document Similarity ◽

Opinion Analysis ◽

Relationship Of

Opinion target classification of microblog comments is one of the most important tasks for public opinion analysis about an event. Due to the high cost of manual labeling, opinion target classification is generally considered as a weak-supervised task. This article attempts to address the opinion target classification of microblog comments through an event graph convolution network (EventGCN) in a weak-supervised manner. Specifically, we take microblog contents and comments as document nodes, and construct an event graph with three typical relationships of event microblogs, including the co-occurrence relationship of event keywords extracted from microblogs, the reply relationship of comments, and the document similarity. Finally, under the supervision of a small number of labels, both word features and comment features can be represented well to complete the classification. The experimental results on two event microblog datasets show that EventGCN can significantly improve the classification performance compared with other baseline models.

Get full-text (via PubEx)

ANALISIS TINGKAT PLAGIASI DOKUMEN SKRIPSI DENGAN METODE COSINE SIMILARITY DAN PEMBOBOTAN TF-IDF

TEKNIMEDIA: Teknologi Informasi dan Multimedia ◽

10.46764/teknimedia.v2i2.51 ◽

2022 ◽

Vol 2 (2) ◽

pp. 90-95

Author(s):

Muhammad Azmi

Keyword(s):

Cosine Similarity ◽

Plagiarism Detection ◽

Document Similarity ◽

Final Project ◽

The Right ◽

Modify Technique ◽

Similarity Method

Plagiarism is the activity of duplicating or imitating the work of others then recognized as his own work without the author's permission or listing the source. Plagiarism or plagiarism is not something that is difficult to do because by using a copy-paste-modify technique in part or all of the document, the document can be said to be the result of plagiarism or duplication. The practice of plagiarism occurs because students are accustomed to taking the writings of others without including the source of origin, even copying in its entirety and exactly the same. Plagiarism practices are mostly carried out by students, especially when completing the final project or thesis One way that can be used to prevent the practice of plagiarism is by doing prevention and detecting. Plagiarism detection uses the concept of similarity or document similarity is one way to detect copy & paste plagiarism and disguised plagiarism. one of the right methods that can be done to detect plagiarism by analyzing the level of document plagiarism using the Cosine Similarity method and the TF-IDF weighting. This research produces an application that is able to process the similarity value of the document to be tested. Hasik testing shows that it is appropriate between manual calculations and implementation of algorithms in the application made. Use of the Literature Library is quite effective in the Stemming process. Calculations that use stemming will have a higher similarity value compared to calculations without stemming methods.

Get full-text (via PubEx)

Representation Learning and Similarity of Legal Judgements using Citation Networks

10.5121/csit.2021.112302 ◽

2021 ◽

Author(s):

Harshit Jain ◽

Naveen Pundir

Keyword(s):

State Of The Art ◽

Representation Learning ◽

Current Case ◽

Document Similarity ◽

Legal Information ◽

Law System ◽

Legal Documents ◽

Novel Approach ◽

The Common ◽

Dense Embedding

India and many other countries like UK, Australia, Canada follow the ‘common law system’ which gives substantial importance to prior related cases in determining the outcome of the current case. Better similarity methods can help in finding earlier similar cases, which can help lawyers searching for precedents. Prior approaches in computing similarity of legal judgements use a basic representation which is either abag-of-words or dense embedding which is learned by only using the words present in the document. They, however, either neglect or do not emphasize the vital ‘legal’ information in the judgements, e.g. citations to prior cases, act and article numbers or names etc. In this paper, we propose a novel approach to learn the embeddings of legal documents using the citationnetwork of documents. Experimental results demonstrate that the learned embedding is at par with the state-of-the-art methods for document similarity on a standard legal dataset.

Get full-text (via PubEx)

Bargaining power in informal trilogues: Intra-institutional preference cohesion and inter-institutional bargaining success

European Union Politics ◽

10.1177/14651165211064485 ◽

2021 ◽

pp. 146511652110644

Author(s):

Maximilian Haag

Keyword(s):

European Union ◽

Bargaining Power ◽

Edit Distance ◽

European Parliament ◽

The European Union ◽

Legislative Bargaining ◽

Document Similarity ◽

Institutional Constraint

Informal trilogue meetings are the main legislative bargaining forum in the European Union, yet their dynamics remain largely understudied in a quantitative context. This article builds on the assumption that the negotiating delegations of the European Parliament and the Council play a two-level game whereby these actors can use their intra-institutional constraint to extract inter-institutional bargaining success. Negotiators can credibly claim that their hands are tied if the members of their parent institutions hold similar preferences and do not accept alternative proposals or if their institution is divided and negotiators need to defend a fragile compromise. Employing a measure of document similarity (minimum edit distance) between an institution's negotiation mandate and the trilogue outcome to measure bargaining success, the analysis supports the hypothesis for the European Parliament, but not for the Council.

Get full-text (via PubEx)

Lightweight, Secure, Similar-Document Retrieval over Encrypted Data

Applied Sciences ◽

10.3390/app112412040 ◽

2021 ◽

Vol 11 (24) ◽

pp. 12040

Author(s):

Mustafa A. Al Sibahee ◽

Ayad I. Abdulsada ◽

Zaid Ameen Abduljabbar ◽

Junchao Ma ◽

Vincent Omollo Nyangaresi ◽

...

Keyword(s):

Limited Resource ◽

Document Retrieval ◽

Efficient Manner ◽

Document Similarity ◽

Encrypted Data ◽

Detection Systems ◽

Similarity Detection ◽

Computational Performance ◽

Paillier Cryptosystem ◽

Application Requirements

Applications for document similarity detection are widespread in diverse communities, including institutions and corporations. However, currently available detection systems fail to take into account the private nature of material or documents that have been outsourced to remote servers. None of the existing solutions can be described as lightweight techniques that are compatible with lightweight client implementation, and this deficiency can limit the effectiveness of these systems. For instance, the discovery of similarity between two conferences or journals must maintain the privacy of the submitted papers in a lightweight manner to ensure that the security and application requirements for limited-resource devices are fulfilled. This paper considers the problem of lightweight similarity detection between document sets while preserving the privacy of the material. The proposed solution permits documents to be compared without disclosing the content to untrusted servers. The fingerprint set for each document is determined in an efficient manner, also developing an inverted index that uses the whole set of fingerprints. Before being uploaded to the untrusted server, this index is secured by the Paillier cryptosystem. This study develops a secure, yet efficient method for scalable encrypted document comparison. To evaluate the computational performance of this method, this paper carries out several comparative assessments against other major approaches.

Get full-text (via PubEx)

The automation of relevant trial registration screening for systematic review updates: an evaluation study on a large dataset of ClinicalTrials.gov registrations

BMC Medical Research Methodology ◽

10.1186/s12874-021-01485-6 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Didi Surian ◽

Florence T. Bourgeois ◽

Adam G. Dunn

Keyword(s):

Systematic Review ◽

Clinical Trial ◽

Clinical Trials ◽

Hierarchical Clustering ◽

Systematic Reviews ◽

Euclidean Distance ◽

Latent Dirichlet Allocation ◽

Feature Representation ◽

Large Set ◽

Document Similarity

Abstract Background Clinical trial registries can be used as sources of clinical evidence for systematic review synthesis and updating. Our aim was to evaluate methods for identifying clinical trial registrations that should be screened for inclusion in updates of published systematic reviews. Methods A set of 4644 clinical trial registrations (ClinicalTrials.gov) included in 1089 systematic reviews (PubMed) were used to evaluate two methods (document similarity and hierarchical clustering) and representations (L2-normalised TF-IDF, Latent Dirichlet Allocation, and Doc2Vec) for ranking 163,501 completed clinical trials by relevance. Clinical trial registrations were ranked for each systematic review using seeding clinical trials, simulating how new relevant clinical trials could be automatically identified for an update. Performance was measured by the number of clinical trials that need to be screened to identify all relevant clinical trials. Results Using the document similarity method with TF-IDF feature representation and Euclidean distance metric, all relevant clinical trials for half of the systematic reviews were identified after screening 99 trials (IQR 19 to 491). The best-performing hierarchical clustering was using Ward agglomerative clustering (with TF-IDF representation and Euclidean distance) and needed to screen 501 clinical trials (IQR 43 to 4363) to achieve the same result. Conclusion An evaluation using a large set of mined links between published systematic reviews and clinical trial registrations showed that document similarity outperformed hierarchical clustering for identifying relevant clinical trials to include in systematic review updates.

Get full-text (via PubEx)

An Efficient Parallelized Ontology Network-Based Semantic Similarity Measure for Big Biomedical Document Clustering

Computational and Mathematical Methods in Medicine ◽

10.1155/2021/7937573 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Meijing Li ◽

Tianjie Chen ◽

Keun Ho Ryu ◽

Cheng Hao Jin

Keyword(s):

Semantic Similarity ◽

Clustering Algorithms ◽

Semantic Features ◽

Text Data ◽

Semantic Similarity Measure ◽

Document Similarity ◽

Hadoop Mapreduce ◽

Semantic Mining ◽

Semantic Similarity Measurement ◽

Open Datasets

Semantic mining is always a challenge for big biomedical text data. Ontology has been widely proved and used to extract semantic information. However, the process of ontology-based semantic similarity calculation is so complex that it cannot measure the similarity for big text data. To solve this problem, we propose a parallelized semantic similarity measurement method based on Hadoop MapReduce for big text data. At first, we preprocess and extract the semantic features from documents. Then, we calculate the document semantic similarity based on ontology network structure under MapReduce framework. Finally, based on the generated semantic document similarity, document clusters are generated via clustering algorithms. To validate the effectiveness, we use two kinds of open datasets. The experimental results show that the traditional methods can hardly work for more than ten thousand biomedical documents. The proposed method keeps efficient and accurate for big dataset and is of high parallelism and scalability.

Get full-text (via PubEx)

Automated Mapping of Environmental Higher Education Ranking Systems Indicators to SDGs Indicators using Natural Language Processing and Document Similarity

10.1109/3ict53449.2021.9581693 ◽

2021 ◽

Author(s):

Anwaar Buzaboon ◽

Hanan Alboflasa ◽

Waheeb Alnaser ◽

Safwan Shatnawi ◽

Khawla Albinali

Keyword(s):

Higher Education ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Document Similarity ◽

Automated Mapping ◽

Ranking Systems

Get full-text (via PubEx)

Unified Semantic Space for a Novel Multimodal Approach to Document Similarity

10.1109/rtsi50628.2021.9597240 ◽

2021 ◽

Author(s):

Simone Scannapieco ◽

Andrea Ponza ◽

Claudio Tomazzoli

Keyword(s):

Semantic Space ◽

Multimodal Approach ◽

Document Similarity

Get full-text (via PubEx)

Comparison of document similarity algorithms in extracting document keywords from an academic paper

10.1109/icsecs52883.2021.00121 ◽

2021 ◽

Author(s):

M. Saef Ullah Miah ◽

Junaida Sulaiman ◽

Saiful Azad ◽

Kamal Z. Zamli ◽

Rajan Jose

Keyword(s):

Document Similarity ◽

Academic Paper

Get full-text (via PubEx)

document similarity
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Event Graph Neural Network for Opinion Target Classification of Microblog Comments

ANALISIS TINGKAT PLAGIASI DOKUMEN SKRIPSI DENGAN METODE COSINE SIMILARITY DAN PEMBOBOTAN TF-IDF

Representation Learning and Similarity of Legal Judgements using Citation Networks

Bargaining power in informal trilogues: Intra-institutional preference cohesion and inter-institutional bargaining success

Lightweight, Secure, Similar-Document Retrieval over Encrypted Data

The automation of relevant trial registration screening for systematic review updates: an evaluation study on a large dataset of ClinicalTrials.gov registrations

An Efficient Parallelized Ontology Network-Based Semantic Similarity Measure for Big Biomedical Document Clustering

Automated Mapping of Environmental Higher Education Ranking Systems Indicators to SDGs Indicators using Natural Language Processing and Document Similarity

Unified Semantic Space for a Novel Multimodal Approach to Document Similarity

Comparison of document similarity algorithms in extracting document keywords from an academic paper

Export Citation Format

document similarityRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Event Graph Neural Network for Opinion Target Classification of Microblog Comments

ANALISIS TINGKAT PLAGIASI DOKUMEN SKRIPSI DENGAN METODE COSINE SIMILARITY DAN PEMBOBOTAN TF-IDF

Representation Learning and Similarity of Legal Judgements using Citation Networks

Bargaining power in informal trilogues: Intra-institutional preference cohesion and inter-institutional bargaining success

Lightweight, Secure, Similar-Document Retrieval over Encrypted Data

The automation of relevant trial registration screening for systematic review updates: an evaluation study on a large dataset of ClinicalTrials.gov registrations

An Efficient Parallelized Ontology Network-Based Semantic Similarity Measure for Big Biomedical Document Clustering

Automated Mapping of Environmental Higher Education Ranking Systems Indicators to SDGs Indicators using Natural Language Processing and Document Similarity

Unified Semantic Space for a Novel Multimodal Approach to Document Similarity

Comparison of document similarity algorithms in extracting document keywords from an academic paper

document similarity
Recently Published Documents