Automating case definitions using literature-based reasoning

SummaryBackground: Establishing a Case Definition (CDef) is a first step in many epidemiological, clinical, surveillance, and research activities. The application of CDefs still relies on manual steps and this is a major source of inefficiency in surveillance and research.Objective: Describe the need and propose an approach for automating the useful representation of CDefs for medical conditions.Methods: We translated the existing Brighton Collaboration CDef for anaphylaxis by mostly relying on the identification of synonyms for the criteria of the CDef using the NLM MetaMap tool. We also generated a CDef for the same condition using all the related PubMed abstracts, processing them with a text mining tool, and further treating the synonyms with the above strategy. The co-occur-rence of the anaphylaxis and any other medical term within the same sentence of the abstracts supported the construction of a large semantic network. The ‘islands’ algorithm reduced the network and revealed its densest region including the nodes that were used to represent the key criteria of the CDef. We evaluated the ability of the “translated” and the “generated” CDef to classify a set of 6034 H1N1 reports for anaphylaxis using two similarity approaches and comparing them with our previous semi-automated classification approach.Results: Overall classification performance across approaches to producing CDefs was similar, with the generated CDef and vector space model with cosine similarity having the highest accuracy (0.825±0.003) and the semi-automated approach and vector space model with cosine similarity having the highest recall (0.809±0.042). Precision was low for all approaches.Conclusion: The useful representation of CDefs is a complicated task but potentially offers substantial gains in efficiency to support safety and clinical surveillance.Citation: Botsis T, Ball R. Automating case definitions using literature-based reasoning. Appl Clin Inf 2013; 4: 515–527http://dx.doi.org/10.4338/ACI-2013-04-RA-0028

Download Full-text

ASPECT BASED SENTIMENT ANALYSIS DATA KUESIONER DI RUMAH SAKIT MUHAMMADIYAH LAMONGAN MENGGUNAKAN ALGORITMA K-NN.

JOUTICA ◽

10.30736/jti.v6i2.677 ◽

2021 ◽

Vol 6 (2) ◽

pp. 506

Author(s):

Mustain Mustain Mustain

Keyword(s):

Vector Space ◽

Sentiment Analysis ◽

Nearest Neighbor ◽

Vector Space Model ◽

Analysis Data ◽

Cosine Similarity ◽

K Nearest Neighbor ◽

Space Model

Kesulitan untuk mengorganisir data kuesioner yang bersifat konvensional melatarbelakangi penelitian ini. Oleh karena itu dibuat sistem yang memudahkan pengelompokan data kuesioner secara otomatis yang lengkap dengan sentimen yang terkandung didalamnya. Dataset yang digunakan dalam penelitian ini adalah data kuesioner rumah sakit Muhammadiyah lamongan. Penelitian ini hanya menangani kuesioner yang berbentuk teks. Data dengan fisik kertas direkap kemudian diinput ke database lengkap dengan kategori unit kerja dan sentiment. Selanjutnya dataset tersebut di dilakukan pre-prosesing yang meliputi penanganan negasi case folding, tokenizing, filtering dan stemming. Sebagai data uji komentar dari kuesioner akan dilakukan pre-prosesing selanjutnya dihitung tingkat kemiripan document dengan menggunakan metode K- Nearest Neighbor dan Vector Space Model. Jumlah data yang ditangani mempengaruhi performa system terutama dari akurasi dan kecepatan pada saat proses klasifikasi. Hasil dari sistem yang dibuat berupa ranking dokumen yang paling mirip dengan dataset berdasarkan urutan nilai cosine similarity. Ujicoba klasifikasi berdasarkan kelas kategori menghasilkan nilai akurasi 91 %. Ujicoba berdasarkan Kelas Sentimen sebesar 94 %.dari kombinasi keduanya system berhasil mendapat akurasi sebesar 86 %

Download Full-text

Myanmar News Retrieval in Vector Space Model using Cosine Similarity Measure

2020 IEEE Conference on Computer Applications(ICCA) ◽

10.1109/icca49400.2020.9022845 ◽

2020 ◽

Author(s):

Hay Man Oo ◽

Win Pa Pa

Keyword(s):

Vector Space ◽

Similarity Measure ◽

Vector Space Model ◽

Cosine Similarity ◽

Space Model ◽

Cosine Similarity Measure ◽

News Retrieval

Download Full-text

Information retrieval from heterogeneous data sets using moderated IDF-cosine similarity in vector space model

2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS) ◽

10.1109/icecds.2017.8390174 ◽

2017 ◽

Cited By ~ 1

Author(s):

Bhagyashree Pathak ◽

Niranjan Lal

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Vector Space Model ◽

Heterogeneous Data ◽

Cosine Similarity ◽

Data Sets ◽

Space Model

Download Full-text

Paraphrase Detection based on Vector Space Model: A Study of Utilization of Semantic Network for Improving Information

Proceedings of the 1st International Conference on IT, Communication and Technology for Better Life ◽

10.5220/0008931301720176 ◽

2019 ◽

Author(s):

. Nurwati ◽

Yudi Santoso ◽

Krisna Adiyarta

Keyword(s):

Vector Space ◽

Semantic Network ◽

Vector Space Model ◽

Space Model

Download Full-text

Measuring the Level of Plagiarism of Thesis using Vector Space Model and Cosine Similarity Methods

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/662/2/022111 ◽

2019 ◽

Vol 662 ◽

pp. 022111

Author(s):

I Indriyanto ◽

I D Sumitra

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Cosine Similarity ◽

Space Model

Download Full-text

PENENTUAN MULTIPLE MEMBERSHIP DOKUMEN

Majalah Ilmiah UNIKOM ◽

10.34010/miu.v15i2.560 ◽

2017 ◽

Vol 15 (2) ◽

Author(s):

Stephanie Betha R.H

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Cosine Similarity ◽

Space Model ◽

Multiple Membership

Multiple membership merupakan keanggotaan yang dimiliki oleh seseorang pada beberapa komunitas. Multiple membership pada dokumen artinya suatu dokumen dapat mengandung konten dari beberapa jenis kategori. Jenis kategori pada dokumen dapat ditentukan dengan mengukur kemiripan dokumen tersebut dengan kategori yang ada. Vector Space Model adalah suatu model yang digunakan untuk mengukur kemiripan antara suatu dokumen dan suatu query dengan mewakili setiap dokumen dalam sebuah koleksi sebagai sebuah titik dalam ruang vektor. Hasil dari pengukuran kemiripan tersebut merupakan nilai cosine similarity antara vektor query dari dokumen terhadap vektor kategori. Permasalahan yang terjadi adalah suatu pengukuran kemiripan vektor query dokumen, dapat menghasilkan nilai cosine similarity dengan selisih yang kecil antara vektor kategori satu dengan vektor kategori lain. Hal ini menyebabkan kedua vektor kategori tersebut menjadi saling dominan satu sama lain pada dokumen. Oleh karena itu, dibutuhkan suatu nilai batas untuk menentukan kondisi kapan suatu vektor kategori dapat dinyatakan sebagai vektor kategori yang saling dominan. Penetapan nilai batas ini menggunakan K-Means Clustering. Nilai batas ini ditetapkan berdasarkan pengelompokkan nilai jarak antar presentase cosine similarity pada suatu dokumen. Penentuan multiple membership dokumen ini akan dilakukan pada atribut judul dan kata kunci pada dokumen publikasi ilmiah.

Download Full-text

Information Retrieval for Gujarati Language Using Cosine Similarity Based Vector Space Model

Advances in Intelligent Systems and Computing - Proceedings of the 5th International Conference on Frontiers in Intelligent Computing: Theory and Applications ◽

10.1007/978-981-10-3156-4_1 ◽

2017 ◽

pp. 1-9 ◽

Cited By ~ 3

Author(s):

Rajnish M. Rakholia ◽

Jatinderkumar R. Saini

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Vector Space Model ◽

Cosine Similarity ◽

Space Model ◽

Gujarati Language

Download Full-text

A Comparative Study on Cosine Similarity Algorithm and Vector Space Model Algorithm on Document Searching

Advanced Science Letters ◽

10.1166/asl.2015.6481 ◽

2015 ◽

Vol 21 (10) ◽

pp. 3321-3323

Author(s):

Warnia Nengsih

Keyword(s):

Comparative Study ◽

Vector Space ◽

Vector Space Model ◽

Cosine Similarity ◽

Space Model ◽

Similarity Algorithm ◽

Model Algorithm

Download Full-text

Word Sense Disambiguation Using Cosine Similarity Collaborates with Word2vec and WordNet

Future Internet ◽

10.3390/fi11050114 ◽

2019 ◽

Vol 11 (5) ◽

pp. 114 ◽

Cited By ~ 5

Author(s):

Korawit Orkphol ◽

Wu Yang

Keyword(s):

Vector Space ◽

Language Processing ◽

Semantic Analysis ◽

Word Sense Disambiguation ◽

Vector Space Model ◽

Word Embedding ◽

Cosine Similarity ◽

Word Sense ◽

Lexical Database ◽

Space Model

Words have different meanings (i.e., senses) depending on the context. Disambiguating the correct sense is important and a challenging task for natural language processing. An intuitive way is to select the highest similarity between the context and sense definitions provided by a large lexical database of English, WordNet. In this database, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms interlinked through conceptual semantics and lexicon relations. Traditional unsupervised approaches compute similarity by counting overlapping words between the context and sense definitions which must match exactly. Similarity should compute based on how words are related rather than overlapping by representing the context and sense definitions on a vector space model and analyzing distributional semantic relationships among them using latent semantic analysis (LSA). When a corpus of text becomes more massive, LSA consumes much more memory and is not flexible to train a huge corpus of text. A word-embedding approach has an advantage in this issue. Word2vec is a popular word-embedding approach that represents words on a fix-sized vector space model through either the skip-gram or continuous bag-of-words (CBOW) model. Word2vec is also effectively capturing semantic and syntactic word similarities from a huge corpus of text better than LSA. Our method used Word2vec to construct a context sentence vector, and sense definition vectors then give each word sense a score using cosine similarity to compute the similarity between those sentence vectors. The sense definition also expanded with sense relations retrieved from WordNet. If the score is not higher than a specific threshold, the score will be combined with the probability of that sense distribution learned from a large sense-tagged corpus, SEMCOR. The possible answer senses can be obtained from high scores. Our method shows that the result (50.9% or 48.7% without the probability of sense distribution) is higher than the baselines (i.e., original, simplified, adapted and LSA Lesk) and outperforms many unsupervised systems participating in the SENSEVAL-3 English lexical sample task.

Download Full-text

Web-Based Information Search System Development Using a Semantic Network

KnE Social Sciences ◽

10.18502/kss.v5i6.9223 ◽

2021 ◽

pp. 347-352

Author(s):

Joko Samodra ◽

Primardiana Hermilia Wijayati ◽

. Rosyidah ◽

Andika Agung Sutrisno

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Semantic Network ◽

Vector Space Model ◽

Latent Semantic Indexing ◽

Search System ◽

Web Based ◽

Space Model ◽

Retrieval Systems ◽

Information Retrieval Systems

Finding information from a large collection of documents is a complicated task; therefore, we need a method called an information retrieval system. Several models that have been used in information retrieval systems include the Vector Space Model (VSM), DICE Similarity, Latent Semantic Indexing (LSI), Generalized Vector Space Model (GVSM), and semantic-based information retrieval systems. The purpose of this study was to develop a semantic network-based search system that will find information based on keywords and the semantic relationship of keywords provided by users. This cannot be done by most search systems that only work based on keyword matching or similarities. The Waterfall development model was used, which divides the development stages into five steps, namely: (1) requirements analysis and definition; (2) system and software design; (3) implementation and unit testing; (4) integration and system testing; and (5) operation and maintenance. The developed system/application was tested by trying to find information based on various combinations of keywords provided by the user. The results showed that the system can find information that matches the keyword, and other relevant information based on the semantic relationships of these keywords. Keywords: information retrieval, search system, semantic network, web-based application

Download Full-text