Time Frame Detection Based on Online News Documents Using Vector Space Model

Extracting information from a large amount of structured data requires expensive computing. The Vector Space Model method works by mapping words in continuous vector space where semantically similar words are mapped in adjacent vector spaces. The Vector Space Model model assumes words that appear in the same context, having the same semantic meaning. In the implementation, there are two different approaches: counting methods (eg: Latent Semantic Analysis) and predictive methods (eg Neural Probabilistic Language Model). This study aims to apply Word2Vec method using the Continuous Bag of Words approach in Indonesian language. Research data was obtained by crawling on several online news portals. The expected result of the research is the Indonesian words vector mapping based on the data used.Keywords: vector space model, word to vector, Indonesian vector space model.Ekstraksi informasi dari sekumpulan data terstruktur dalam jumlah yang besar membutuhkan komputasi yang mahal. Metode Vector Space Model bekerja dengan cara memetakan kata-kata dalam ruang vektor kontinu dimana kata-kata yang serupa secara semantis dipetakan dalam ruang vektor yang berdekatan. Metode Vector Space Model mengasumsikan kata-kata yang muncul pada konteks yang sama, memiliki makna semantik yang sama. Dalam penerapannya ada dua pendekatan yang berbeda yaitu: metode yang berbasis hitungan (misal: Latent Semantic Analysis) dan metode prediktif (misalnya Neural Probabilistic Language Model). Penelitian ini bertujuan untuk menerapkan metode Word2Vec menggunakan pendekatan Continuous Bag Of Words model dalam Bahasa Indonesia. Data penelitian yang digunakan didapatkan dengan cara crawling pada berberapa portal berita online. Hasil penelitian yang diharapkan adalah pemetaan vektor kata Bahasa Indonesia berdasarkan data yang digunakan.Kata Kunci: vector space model, word to vector, vektor kata bahasa Indonesia.

Download Full-text

TF-IDF Method and Vector Space Model Regarding the Covid-19 Vaccine on Online News

SinkrOn ◽

10.33395/sinkron.v6i1.11179 ◽

2021 ◽

Vol 6 (1) ◽

pp. 69-79

Author(s):

Bita Parga Zen ◽

Irwan Susanto ◽

Dian Finaliamartha

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Relevant Information ◽

Online News ◽

Basic Form ◽

Information Redundancy ◽

Search System ◽

Space Model ◽

Use Of The Internet ◽

Online News Sites

Advances in information and technology have caused the use of the internet to be a concern of the general public. Online news sites are one of the technologies that have developed as a means of disseminating the latest information in the world. When viewed in terms of numbers, newsreaders are very sufficient to get the desired information. However, with this, the amount of information collected will result in an explosion of information and the possibility of information redundancy. The search system is one of the solutions which expected to help in finding the desired or relevant information by the input query. The methods commonly used in this case are TF-IDF and VSM (Vector Space Model) which are used in weighting to measure statistics from a collection of documents on the search for some information about the Covid 19 vaccine on kompas.com news then tokenizing it to separate the text, stopword removal or filtering to remove unnecessary words which usually consist of conjunctions and others. The next step is sentence stemming which aims to eliminate word inflection to its basic form. Then the TF-IDF and VSM calculations were carried out and the final result are news documents 3 (DOC 3) with a weight of 5.914226424; news documents 2 (DOC 2) with a weight of 1.767692186; news documents 5 (DOC 5) with weights 1.550165096; news document 4 (DOC 4) with a weight of 1.17141223;, and the last is news document 1 (DOC 1) with a weight of 0.5244103739.

Download Full-text

Extended Vector Space Model with Semantic Relatedness on Java Archive Search Engine

Jurnal Teknik Informatika dan Sistem Informasi ◽

10.28932/jutisi.v1i2.372 ◽

2015 ◽

Vol 1 (2) ◽

Cited By ~ 2

Author(s):

Oscar Karnalim

Keyword(s):

Vector Space ◽

Search Engine ◽

Vector Space Model ◽

Semantic Relatedness ◽

Space Model

Download Full-text

Aplikasi Deteksi Kemiripan Tugas Paper

Matrik Jurnal Manajemen Teknik Informatika dan Rekayasa Komputer ◽

10.30812/matrik.v15i2.39 ◽

2017 ◽

Vol 15 (2) ◽

pp. 5

Author(s):

Anthony Anggrawan ◽

Azhari

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Vector Space Model ◽

Mean Average Precision ◽

Average Precision ◽

Information Searching ◽

Space Model ◽

Model Method

Information searching based on users’ query, which is hopefully able to find the documents based on users’ need, is known as Information Retrieval. This research uses Vector Space Model method in determining the similarity percentage of each student’s assignment. This research uses PHP programming and MySQL database. The finding is represented by ranking the similarity of document with query, with mean average precision value of 0,874. It shows how accurate the application with the examination done by the experts, which is gained from the evaluation with 5 queries that is compared to 25 samples of documents. If the number of counted assignments has higher similarity, thus the process of similarity counting needs more time, it depends on the assignment’s number which is submitted.

Download Full-text

Aplikasi Rekomendasi Buku Pada Katalog Perpustakaan Universitas Multimedia Nusantara Menggunakan Vector Space Model

Jurnal ULTIMATICS ◽

10.31937/ti.v9i2.639 ◽

2018 ◽

Vol 9 (2) ◽

pp. 97-105

Author(s):

Richard Firdaus Oeyliawan ◽

Dennis Gunawan

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Vector Model ◽

Library Management ◽

Space Model ◽

Library Management System ◽

Index Terms ◽

Library Catalogue ◽

Language Sample ◽

F Measure

Library is one of the facilities which provides information, knowledge resource, and acts as an academic helper for readers to get the information. The huge number of books which library has, usually make readers find the books with difficulty. Universitas Multimedia Nusantara uses the Senayan Library Management System (SLiMS) as the library catalogue. SLiMS has many features which help readers, but there is still no recommendation feature to help the readers finding the books which are relevant to the specific book that readers choose. The application has been developed using Vector Space Model to represent the document in vector model. The recommendation in this application is based on the similarity of the books description. Based on the testing phase using one-language sample of the relevant books, the F-Measure value gained is 55% using 0.1 as cosine similarity threshold. The books description and variety of languages affect the F-Measure value gained. Index Terms—Book Recommendation, Porter Stemmer, SLiMS Universitas Multimedia Nusantara, TF-IDF, Vector Space Model

Download Full-text

First Movers and Follow-on Invention: Evidence from a Vector Space Model of Invention

SSRN Electronic Journal ◽

10.2139/ssrn.3354530 ◽

2019 ◽

Cited By ~ 1

Author(s):

Kenneth A. Younge ◽

Jeffrey M. Kuhn

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Space Model

Download Full-text

Topic detections in Arabic Dark websites using improved Vector Space Model

2012 4th Conference on Data Mining and Optimization (DMO) ◽

10.1109/dmo.2012.6329790 ◽

2012 ◽

Cited By ~ 9

Author(s):

Hanan M. Alghamdi ◽

Ali Selamat

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Space Model

Download Full-text

A relational vector-space model of information retrieval adapted to images

ACM SIGIR Forum ◽

10.1145/1067268.1067292 ◽

2005 ◽

Vol 39 (1) ◽

pp. 62-62

Author(s):

Jean Martinet

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Vector Space Model ◽

Space Model

Download Full-text

On Generalized Vector Space Model in Information Retrieval

Fundamenta Informaticae ◽

10.3233/fi-1985-8207 ◽

1985 ◽

Vol 8 (2) ◽

pp. 253-267

Author(s):

S.K.M. Wong ◽

Wojciech Ziarko

Keyword(s):

Information Retrieval ◽

Vector Space ◽

A Priori ◽

Vector Space Model ◽

Smart System ◽

Space Model ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Index Terms ◽

Minimal Modification

In information retrieval, it is common to model index terms and documents as vectors in a suitably defined vector space. The main difficulty with this approach is that the explicit representation of term vectors is not known a priori. For this reason, the vector space model adopted by Salton for the SMART system treats the terms as a set of orthogonal vectors. In such a model it is often necessary to adopt a separate, corrective procedure to take into account the correlations between terms. In this paper, we propose a systematic method (the generalized vector space model) to compute term correlations directly from automatic indexing scheme. We also demonstrate how such correlations can be included with minimal modification in the existing vector based information retrieval systems.

Download Full-text

Evaluation of Semantic Similarity Using Vector Space Model Based on Textual Corpus

2016 13th International Conference on Computer Graphics, Imaging and Visualization (CGiV) ◽

10.1109/cgiv.2016.64 ◽

2016 ◽

Cited By ~ 1

Author(s):

Badr Hssina ◽

Belaid Bouikhalene ◽

Abdelkrim Merbouha

Keyword(s):

Vector Space ◽

Semantic Similarity ◽

Vector Space Model ◽

Space Model ◽

Model Based

Download Full-text