vector space model
Recently Published Documents


TOTAL DOCUMENTS

484
(FIVE YEARS 104)

H-INDEX

24
(FIVE YEARS 3)

Author(s):  
Jiffriya Mohamed Abdul Cader ◽  
Roshan G. Ragel ◽  
Hasindu Gamaarachchi ◽  
Akmal Jahan Mohamed Abdul Cader

2021 ◽  
Vol 2 (2) ◽  
pp. 114-121
Author(s):  
Ayuni Asistyasari ◽  
Bibit Sudarsono ◽  
Umi Faddilah

Sebuah berita terkait suatu informasi yang beredar di media cetak atau mainstream akan menjadikan opini publik tentang suatu masalah baik yang bersifat informasi positif atau negatif, perkembangan teknologi informasi sekarang ini menyebabkan penyebaran informasi bisa uptodate setiap harinya. Dengan semakin mudahnya sebuah informasi menyebar maka akan semakin mudah pula mempengaruhi kehidupan dalam sosial masyarakat sekarang ini. Namun pada kenyataannya informasi yang beredar di media itu tidak semuanya benar atau bisa dikatakan adanya suatu berita hoax atau tidak benar. Dalam penelitian ini bertujuan untuk mengklasifikasi sistem temu kembali informasi berita hoaks menggunakan metode vektor space model untuk memastikan kebenaran suatu berita apakah berita hoax atau tidak. Dalam penelitian tersebut menghasilkan klasifikasi kebenaran berita dengan akurasi terbaik pada K-6 sebesar 83%, artinya dengan akurasi tersebut bisa memvalidasi klasifikasi terkait informasi berita benar ataupun hoax sebesar 83%.


Author(s):  
Roman Shaptala ◽  
Gennadiy Kyselov

In this study, we explore and compare two ways of vector space model creation for Kyiv city petitions. Both models are built on top of word vectors based on the distributional hypothesis, namely Word2Vec and FastText. We train word vectors on the dataset of Kyiv city petitions, preprocess the documents, and apply averaging to create petition vectors. Visualizations of the vector spaces after dimensionality reduction via UMAP are demonstrated in an attempt to show their overall structure. We show that the resulting models can be used to effectively query semantically related petitions as well as search for clusters of related petitions. The advantages and disadvantages of both models are analyzed.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Cheng Ye ◽  
Bradley A. Malin ◽  
Daniel Fabbri

Abstract Background Information retrieval (IR) help clinicians answer questions posed to large collections of electronic medical records (EMRs), such as how best to identify a patient’s cancer stage. One of the more promising approaches to IR for EMRs is to expand a keyword query with similar terms (e.g., augmenting cancer with mets). However, there is a large range of clinical chart review tasks, such that fixed sets of similar terms is insufficient. Current language models, such as Bidirectional Encoder Representations from Transformers (BERT) embeddings, do not capture the full non-textual context of a task. In this study, we present new methods that provide similar terms dynamically by adjusting with the context of the chart review task. Methods We introduce a vector space for medical-context in which each word is represented by a vector that captures the word’s usage in different medical contexts (e.g., how frequently cancer is used when ordering a prescription versus describing family history) beyond the context learned from the surrounding text. These vectors are transformed into a vector space for customizing the set of similar terms selected for different chart review tasks. We evaluate the vector space model with multiple chart review tasks, in which supervised machine learning models learn to predict the preferred terms of clinically knowledgeable reviewers. To quantify the usefulness of the predicted similar terms to a baseline of standard word2vec embeddings, we measure (1) the prediction performance of the medical-context vector space model using the area under the receiver operating characteristic curve (AUROC) and (2) the labeling effort required to train the models. Results The vector space outperformed the baseline word2vec embeddings in all three chart review tasks with an average AUROC of 0.80 versus 0.66, respectively. Additionally, the medical-context vector space significantly reduced the number of labels required to learn and predict the preferred similar terms of reviewers. Specifically, the labeling effort was reduced to 10% of the entire dataset in all three tasks. Conclusions The set of preferred similar terms that are relevant to a chart review task can be learned by leveraging the medical context of the task.


2021 ◽  
Author(s):  
Sukisno Sukisno

Kajian dalam buku ini bertujuan untuk membantu pengguna dalam melakukan kategorisasi dokumen yang dibutuhkan secara cepat dan akurat. Dengan adanya aplikasi untuk proses kategorisasi dokumen yang menerapkan algoritma stemming Nazief Adriani dan Algoritma K-Nearest Neighbor, maka diharapkan dapat memudahkan dalam mengkategorisasikan dokumen serta mempermudah pengguna dalam mencari dokumen berdasarkan tingkat kemiripan (similarity) antara dokumen uji dan learning document.


Author(s):  
Yongmin Yoo ◽  
Dongjin Lim ◽  
Kyungsun Kim

Thanks to rapid development of artificial intelligence technology in recent years, the current artificial intelligence technology is contributing to many part of society. Education, environment, medical care, military, tourism, economy, politics, etc. are having a very large impact on society as a whole. For example, in the field of education, there is an artificial intelligence tutoring system that automatically assigns tutors based on student's level. In the field of economics, there are quantitative investment methods that automatically analyze large amounts of data to find investment laws to create investment models or predict changes in financial markets. As such, artificial intelligence technology is being used in various fields. So, it is very important to know exactly what factors have an important influence on each field of artificial intelligence technology and how the relationship between each field is connected. Therefore, it is necessary to analyze artificial intelligence technology in each field. In this paper, we analyze patent documents related to artificial intelligence technology. We propose a method for keyword analysis within factors using artificial intelligence patent data sets for artificial intelligence technology analysis. This is a model that relies on feature engineering based on deep learning model named KeyBERT, and using vector space model. A case study of collecting and analyzing artificial intelligence patent data was conducted to show how the proposed model can be applied to real-world problems.


2021 ◽  
Vol 10 (2) ◽  
Author(s):  
Eka Sabna

Penyimpanan data judul skripsi mahasiswa semakin banyak dan akan terus bertambah.  Untuk mencari informasi dari judul skripsi tersebut akan menjadi sulit. Untuk itu dikembangkanlah metode pencarian yang disebut dengan temu-kembali informasi (information retrieval). Metode-metode temu-kembali informasi sudah dikenal sejak lama, salah satu dari metode tersebut yang paling banyak digunakan karena kemudahan implementasinya adalah Space Vector Model (SVM). Tujuan  penelitian  ini adalah memberikan paparan tentang proses pencarian  dokumen  digital dengan metode Vektor Space Model. Pada model ini dilakukan dengan proses  token dan    indexing   sehingga    ditemukan    hasil    dari maksimal  terdapat  dalam  data judul skripsi  menggunakan kata    kunci,    sehingga    di lakukan pencarian   sesuai   dengan   kata   kunci  dan   akan   dibandingkan dengan     data     yang     terdapat     pada     file dokumen judul skripsi, sehingga    dapat    menghasilkan    informasi    yang benar.


SinkrOn ◽  
2021 ◽  
Vol 6 (1) ◽  
pp. 69-79
Author(s):  
Bita Parga Zen ◽  
Irwan Susanto ◽  
Dian Finaliamartha

Advances in information and technology have caused the use of the internet to be a concern of the general public. Online news sites are one of the technologies that have developed as a means of disseminating the latest information in the world. When viewed in terms of numbers, newsreaders are very sufficient to get the desired information. However, with this, the amount of information collected will result in an explosion of information and the possibility of information redundancy. The search system is one of the solutions which expected to help in finding the desired or relevant information by the input query. The methods commonly used in this case are TF-IDF and VSM (Vector Space Model) which are used in weighting to measure statistics from a collection of documents on the search for some information about the Covid 19 vaccine on kompas.com news then tokenizing it to separate the text, stopword removal or filtering to remove unnecessary words which usually consist of conjunctions and others. The next step is sentence stemming which aims to eliminate word inflection to its basic form. Then the TF-IDF and VSM calculations were carried out and the final result are news documents 3 (DOC 3) with a weight of 5.914226424; news documents 2 (DOC 2) with a weight of 1.767692186; news documents 5 (DOC 5) with weights 1.550165096; news document 4 (DOC 4) with a weight of 1.17141223;, and the last is news document 1 (DOC 1) with a weight of 0.5244103739.


Sign in / Sign up

Export Citation Format

Share Document