Clustering of Authors’ Texts of English Fiction in the Vector Space of Semantic Fields

Abstract This paper describes the analysis of possible differentiation of the author’s idiolect in the space of semantic fields; it also analyzes the clustering of text documents in the vector space of semantic fields and in the semantic space with orthogonal basis. The analysis showed that using the vector space model on the basis of semantic fields is efficient in cluster analysis algorithms of author’s texts in English fiction. The study of the distribution of authors' texts in the cluster structure showed the presence of the areas of semantic space that represent the idiolects of individual authors. Such areas are described by the clusters where only one author dominates. The clusters, where the texts of several authors dominate, can be considered as areas of semantic similarity of author’s styles. SVD factorization of the semantic fields matrix makes it possible to reduce significantly the dimension of the semantic space in the cluster analysis of author’s texts. Using the clustering of the semantic field vector space can be efficient in a comparative analysis of author's styles and idiolects. The clusters of some authors' idiolects are semantically invariant and do not depend on any changes in the basis of the semantic space and clustering method.

Download Full-text

Perbandingan Boolean Model Dan Vector Space Model Dalam Pencarian Dokumen Teks

Digital Zone Jurnal Teknologi Informasi dan Komunikasi ◽

10.31849/digitalzone.v11i2.4168 ◽

2020 ◽

Vol 11 (2) ◽

pp. 268-277

Author(s):

Susanti Susanti ◽

Muhammad Azmi ◽

Edwar Ali ◽

Rahmaddeni Rahmaddeni ◽

Yansyah Saputra Wijaya

Keyword(s):

Vector Space ◽

User Satisfaction ◽

Search Time ◽

Vector Space Model ◽

Boolean Model ◽

Text Documents ◽

Space Model ◽

Search Results ◽

Model Search ◽

The Times

Perkembangan teknologi informasi di era globalisasi saat ini, membuat semua aspek kehidupan kita berubah dan tidak dapat dihindarkan dari pengaruh kemajuan zaman. Untuk mendapatkan data dan informasi yang kita inginkan bukanlah perkara mudah, mengingat sedemikian banyaknya informasi yang tersedia untuk berbagai keperluan dengan berbagai gaya penyajian. Pencarian data di komputer, baik itu secara online ataupun offline berkembang banyak metode yang semakin menyempurnakan hasil pencarian. Hal ini juga meningkatkan kepuasan pengguna dalam mencari informasi. Metode yang umum digunakan dalam melakukan pencarian adalah Boolean Model. Metode lainnya adalah Vector Space Model (VSM). VSM yaitu model yang digunakan untuk mengukur kueri antara suatu dokumen dengan suatu kata kunci. Oleh karena itu, penulis bertujuan untuk membandingkan kedua metode tersebut dari kecepatan (waktu) pencarian dan jumlah temuan. Kecepatan tersebut dihitung berdasarkan lama waktu pencarian untuk kedua metode tersebut. Hasil yang didapati adalah perbandingan waktu pencarian antara boolean model dan vector space model didapati bahwa boolean model lebih cepat dengan selisih 30 sampai 50 detik. Perbandingan untuk hasil temuan didapati bahwa vector space model mempunyai hasil temuan yang sama dengan boolean model yang menggunakan operator or, sedangkan dengan operator and dan gabungan and serta or didapati bahwa jumlah hasil temuan tidak sama dengan vector space model. Kata kunci: Perbandingan, Boolean Model, Vector Space Model, Pencarian, Dokumen Teks Abstract The development of information technology in the current era of globalization, makes all aspects of our lives change and cannot be avoided from the influence of the times. To get the data and information that we want is not an easy matter, considering that so much information is available for various purposes with various styles of presentation. Searching data on a computer, be it online or offline, there are many methods that improve the search results. It also increases user satisfaction in finding information. The most commonly method of searching is the Boolean Model. Another method is the Vector Space Model (VSM). VSM is a model used to measure queries between a document and a keyword. Therefore, the authors aim to compare the two methods from the speed (time) of the search and the number of findings. The speed is calculated based on the search time for both methods. The result is that the comparison between boolean model and vector space model shows that the boolean model is faster by a difference of 30 - 50 seconds. The comparison for the foundings document text shows that vector space model has the same findings as the boolean model using the or operator, whereas with the and operator and the combination of operator and or it is found that the number of findings is not the same as vector space model. Keywords: Comparison, Boolean Model, Vector Space Model, Search, Text Documents

Download Full-text

The Semi-Automatic Restoration of Regular Paper Fragments

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.596.79 ◽

2014 ◽

Vol 596 ◽

pp. 79-82

Author(s):

Zhong Qiu Ding

Keyword(s):

Mathematical Model ◽

Cluster Analysis ◽

Vector Space ◽

Vector Space Model ◽

Space Model ◽

Regular Paper ◽

Manual Intervention ◽

Edge Matching ◽

Edge Contrast

The restoration of regular paper fragments can be solved by edge matching. Edge matching is based primarily on the length and location of the break in writing, however, it is possible that the edges of some regular paper fragments are blank. And so the error of restoration is the inevitable. In this case, it needs manual intervention based on the article content and then eliminate errors. Prior to the establishment of a mathematical model, there is a need for binarization of regular paper fragments with Matlab, then establish vector space model and construct edge contrast matrix, get the order after the Q cluster analysis of fragments. Finally the original picturecan be obtained through splicing and restoring in the order.

Download Full-text

THE INFLUENCE OF TEXT PREPROCESSING METHODS AND TOOLS ON CALCULATING TEXT SIMILARITY

Facta Universitatis Series Mathematics and Informatics ◽

10.22190/fumi1905973d ◽

2019 ◽

pp. 973

Author(s):

Đorđe Petrović ◽

Milena Stanković

Keyword(s):

Text Mining ◽

Vector Space ◽

Vector Space Model ◽

Text Similarity ◽

Text Documents ◽

Space Model ◽

Text Document ◽

The Subject ◽

Text Preprocessing ◽

Multidimensional Representation

Text mining to a great extent depends on the various text preprocessing techniques. The preprocessing methods and tools which are used to prepare texts for further mining can be divided into those which are and those which are not language-dependent. The subject matter of this research was the analysis of the inﬂuence of these methods and tools on further text mining. We ﬁrst focused on the analysis of the inﬂuence on the reduction of the vector space model for the multidimensional represen-tation of text documents. We then analyzed the inﬂuence on calculating text similarity, which is the focus of this research. The conclusion we reached is that the implemen-tation of various text preprocessing methods in the Serbian language, which are used for the reduction of the vector space model for the multidimensional representation of text document, achieves the required results. But, the implementation of various text preprocessing methods speciﬁc to the Serbian language for the purpose of calculating text similarity can lead to great diﬀerences in the results.

Download Full-text

Extended Vector Space Model with Semantic Relatedness on Java Archive Search Engine

Jurnal Teknik Informatika dan Sistem Informasi ◽

10.28932/jutisi.v1i2.372 ◽

2015 ◽

Vol 1 (2) ◽

Cited By ~ 2

Author(s):

Oscar Karnalim

Keyword(s):

Vector Space ◽

Search Engine ◽

Vector Space Model ◽

Semantic Relatedness ◽

Space Model

Download Full-text

Aplikasi Deteksi Kemiripan Tugas Paper

Matrik Jurnal Manajemen Teknik Informatika dan Rekayasa Komputer ◽

10.30812/matrik.v15i2.39 ◽

2017 ◽

Vol 15 (2) ◽

pp. 5

Author(s):

Anthony Anggrawan ◽

Azhari

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Vector Space Model ◽

Mean Average Precision ◽

Average Precision ◽

Information Searching ◽

Space Model ◽

Model Method

Information searching based on users’ query, which is hopefully able to find the documents based on users’ need, is known as Information Retrieval. This research uses Vector Space Model method in determining the similarity percentage of each student’s assignment. This research uses PHP programming and MySQL database. The finding is represented by ranking the similarity of document with query, with mean average precision value of 0,874. It shows how accurate the application with the examination done by the experts, which is gained from the evaluation with 5 queries that is compared to 25 samples of documents. If the number of counted assignments has higher similarity, thus the process of similarity counting needs more time, it depends on the assignment’s number which is submitted.

Download Full-text

Aplikasi Rekomendasi Buku Pada Katalog Perpustakaan Universitas Multimedia Nusantara Menggunakan Vector Space Model

Jurnal ULTIMATICS ◽

10.31937/ti.v9i2.639 ◽

2018 ◽

Vol 9 (2) ◽

pp. 97-105

Author(s):

Richard Firdaus Oeyliawan ◽

Dennis Gunawan

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Vector Model ◽

Library Management ◽

Space Model ◽

Library Management System ◽

Index Terms ◽

Library Catalogue ◽

Language Sample ◽

F Measure

Library is one of the facilities which provides information, knowledge resource, and acts as an academic helper for readers to get the information. The huge number of books which library has, usually make readers find the books with difficulty. Universitas Multimedia Nusantara uses the Senayan Library Management System (SLiMS) as the library catalogue. SLiMS has many features which help readers, but there is still no recommendation feature to help the readers finding the books which are relevant to the specific book that readers choose. The application has been developed using Vector Space Model to represent the document in vector model. The recommendation in this application is based on the similarity of the books description. Based on the testing phase using one-language sample of the relevant books, the F-Measure value gained is 55% using 0.1 as cosine similarity threshold. The books description and variety of languages affect the F-Measure value gained. Index Terms—Book Recommendation, Porter Stemmer, SLiMS Universitas Multimedia Nusantara, TF-IDF, Vector Space Model

Download Full-text

First Movers and Follow-on Invention: Evidence from a Vector Space Model of Invention

SSRN Electronic Journal ◽

10.2139/ssrn.3354530 ◽

2019 ◽

Cited By ~ 1

Author(s):

Kenneth A. Younge ◽

Jeffrey M. Kuhn

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Space Model

Download Full-text

Topic detections in Arabic Dark websites using improved Vector Space Model

2012 4th Conference on Data Mining and Optimization (DMO) ◽

10.1109/dmo.2012.6329790 ◽

2012 ◽

Cited By ~ 9

Author(s):

Hanan M. Alghamdi ◽

Ali Selamat

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Space Model

Download Full-text

A relational vector-space model of information retrieval adapted to images

ACM SIGIR Forum ◽

10.1145/1067268.1067292 ◽

2005 ◽

Vol 39 (1) ◽

pp. 62-62

Author(s):

Jean Martinet

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Vector Space Model ◽

Space Model

Download Full-text

On Generalized Vector Space Model in Information Retrieval

Fundamenta Informaticae ◽

10.3233/fi-1985-8207 ◽

1985 ◽

Vol 8 (2) ◽

pp. 253-267

Author(s):

S.K.M. Wong ◽

Wojciech Ziarko

Keyword(s):

Information Retrieval ◽

Vector Space ◽

A Priori ◽

Vector Space Model ◽

Smart System ◽

Space Model ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Index Terms ◽

Minimal Modification

In information retrieval, it is common to model index terms and documents as vectors in a suitably defined vector space. The main difficulty with this approach is that the explicit representation of term vectors is not known a priori. For this reason, the vector space model adopted by Salton for the SMART system treats the terms as a set of orthogonal vectors. In such a model it is often necessary to adopt a separate, corrective procedure to take into account the correlations between terms. In this paper, we propose a systematic method (the generalized vector space model) to compute term correlations directly from automatic indexing scheme. We also demonstrate how such correlations can be included with minimal modification in the existing vector based information retrieval systems.

Download Full-text