scholarly journals Paraphrase Detection based on Vector Space Model: A Study of Utilization of Semantic Network for Improving Information

Author(s):  
. Nurwati ◽  
Yudi Santoso ◽  
Krisna Adiyarta
2013 ◽  
Vol 04 (04) ◽  
pp. 515-527 ◽  
Author(s):  
R. Ball ◽  
T. Botsis

SummaryBackground: Establishing a Case Definition (CDef) is a first step in many epidemiological, clinical, surveillance, and research activities. The application of CDefs still relies on manual steps and this is a major source of inefficiency in surveillance and research.Objective: Describe the need and propose an approach for automating the useful representation of CDefs for medical conditions.Methods: We translated the existing Brighton Collaboration CDef for anaphylaxis by mostly relying on the identification of synonyms for the criteria of the CDef using the NLM MetaMap tool. We also generated a CDef for the same condition using all the related PubMed abstracts, processing them with a text mining tool, and further treating the synonyms with the above strategy. The co-occur-rence of the anaphylaxis and any other medical term within the same sentence of the abstracts supported the construction of a large semantic network. The ‘islands’ algorithm reduced the network and revealed its densest region including the nodes that were used to represent the key criteria of the CDef. We evaluated the ability of the “translated” and the “generated” CDef to classify a set of 6034 H1N1 reports for anaphylaxis using two similarity approaches and comparing them with our previous semi-automated classification approach.Results: Overall classification performance across approaches to producing CDefs was similar, with the generated CDef and vector space model with cosine similarity having the highest accuracy (0.825±0.003) and the semi-automated approach and vector space model with cosine similarity having the highest recall (0.809±0.042). Precision was low for all approaches.Conclusion: The useful representation of CDefs is a complicated task but potentially offers substantial gains in efficiency to support safety and clinical surveillance.Citation: Botsis T, Ball R. Automating case definitions using literature-based reasoning. Appl Clin Inf 2013; 4: 515–527http://dx.doi.org/10.4338/ACI-2013-04-RA-0028


2021 ◽  
pp. 347-352
Author(s):  
Joko Samodra ◽  
Primardiana Hermilia Wijayati ◽  
. Rosyidah ◽  
Andika Agung Sutrisno

Finding information from a large collection of documents is a complicated task; therefore, we need a method called an information retrieval system. Several models that have been used in information retrieval systems include the Vector Space Model (VSM), DICE Similarity, Latent Semantic Indexing (LSI), Generalized Vector Space Model (GVSM), and semantic-based information retrieval systems. The purpose of this study was to develop a semantic network-based search system that will find information based on keywords and the semantic relationship of keywords provided by users. This cannot be done by most search systems that only work based on keyword matching or similarities. The Waterfall development model was used, which divides the development stages into five steps, namely: (1) requirements analysis and definition; (2) system and software design; (3) implementation and unit testing; (4) integration and system testing; and (5) operation and maintenance. The developed system/application was tested by trying to find information based on various combinations of keywords provided by the user. The results showed that the system can find information that matches the keyword, and other relevant information based on the semantic relationships of these keywords. Keywords: information retrieval, search system, semantic network, web-based application


Author(s):  
Anthony Anggrawan ◽  
Azhari

Information searching based on users’ query, which is hopefully able to find the documents based on users’ need, is known as Information Retrieval. This research uses Vector Space Model method in determining the similarity percentage of each student’s assignment. This research uses PHP programming and MySQL database. The finding is represented by ranking the similarity of document with query, with mean average precision value of 0,874. It shows how accurate the application with the examination done by the experts, which is gained from the evaluation with 5 queries that is compared to 25 samples of documents. If the number of counted assignments has higher similarity, thus the process of similarity counting needs more time, it depends on the assignment’s number which is submitted.


2018 ◽  
Vol 9 (2) ◽  
pp. 97-105
Author(s):  
Richard Firdaus Oeyliawan ◽  
Dennis Gunawan

Library is one of the facilities which provides information, knowledge resource, and acts as an academic helper for readers to get the information. The huge number of books which library has, usually make readers find the books with difficulty. Universitas Multimedia Nusantara uses the Senayan Library Management System (SLiMS) as the library catalogue. SLiMS has many features which help readers, but there is still no recommendation feature to help the readers finding the books which are relevant to the specific book that readers choose. The application has been developed using Vector Space Model to represent the document in vector model. The recommendation in this application is based on the similarity of the books description. Based on the testing phase using one-language sample of the relevant books, the F-Measure value gained is 55% using 0.1 as cosine similarity threshold. The books description and variety of languages affect the F-Measure value gained. Index Terms—Book Recommendation, Porter Stemmer, SLiMS Universitas Multimedia Nusantara, TF-IDF, Vector Space Model


1985 ◽  
Vol 8 (2) ◽  
pp. 253-267
Author(s):  
S.K.M. Wong ◽  
Wojciech Ziarko

In information retrieval, it is common to model index terms and documents as vectors in a suitably defined vector space. The main difficulty with this approach is that the explicit representation of term vectors is not known a priori. For this reason, the vector space model adopted by Salton for the SMART system treats the terms as a set of orthogonal vectors. In such a model it is often necessary to adopt a separate, corrective procedure to take into account the correlations between terms. In this paper, we propose a systematic method (the generalized vector space model) to compute term correlations directly from automatic indexing scheme. We also demonstrate how such correlations can be included with minimal modification in the existing vector based information retrieval systems.


Sign in / Sign up

Export Citation Format

Share Document