Vector space model for patent documents with hierarchical class labels

A vector space model (VSM) composed of selected important features is a common way to represent documents, including patent documents. Patent documents have some special characteristics that make it difficult to apply traditional feature selection methods directly: (a) it is difficult to find common terms for patent documents in different categories; and (b) the class label of a patent document is hierarchical rather than flat. Hence, in this article we propose a new approach that includes a hierarchical feature selection (HFS) algorithm which can be used to select more representative features with greater discriminative ability to present a set of patent documents with hierarchical class labels. The performance of the proposed method is evaluated through application to two documents sets with 2400 and 9600 patent documents, where we extract candidate terms from their titles and abstracts. The experimental results reveal that a VSM whose features are selected by a proportional selection process gives better coverage, while a VSM whose features are selected with a weighted-summed selection process gives higher accuracy.

Download Full-text

An Extension of the VSM Documents Representation

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2017.3.2889 ◽

2017 ◽

Vol 12 (3) ◽

pp. 402

Author(s):

Lucian Nicolae Vintan ◽

Daniel Ionel Morariu ◽

Radu George Cretulescu ◽

Maria Vintan

Keyword(s):

Vector Space ◽

Clustering Algorithms ◽

Vector Space Model ◽

Bag Of Words ◽

New Approach ◽

Parts Of Speech ◽

Space Model ◽

Part Of Speech ◽

Different Parts ◽

Hyper Space

In this paper we will present a new approach regarding the documents representation in order to be used in classification and/or clustering algorithms. In our new representation we will start from the classical "bag-of-words" representation but we will augment each word with its correspondent part-of-speech. Thus we will introduce a new concept called hyper-vectors where each document is represented in a hyper-space where each dimension is a different part-of-speech component. For each dimension the document is represented using the Vector Space Model (VSM). In this work we will use only five different parts of speech: noun, verb, adverb, adjective and others. In the hyper-space each dimension has a different weight. To compute the similarity between two documents we have developed a new hyper-cosine formula. Some interesting classification experiments are presented as validation cases.

Download Full-text

A New Approach to Email Classification Using Concept Vector Space Model

2008 Second International Conference on Future Generation Communication and Networking Symposia ◽

10.1109/fgcns.2008.7 ◽

2008 ◽

Cited By ~ 5

Author(s):

Chao Zeng ◽

Zhao Lu ◽

Juzhong Gu

Keyword(s):

Vector Space ◽

Vector Space Model ◽

New Approach ◽

Space Model ◽

Email Classification

Download Full-text

Software Vulnerability Classification Based on Deep Neural Network

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a9746.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 3146-3150

Keyword(s):

Neural Network ◽

Feature Selection ◽

Vector Space ◽

Deep Neural Network ◽

Research Work ◽

Vector Space Model ◽

Information Leakage ◽

Vulnerability Detection ◽

Software Vulnerability ◽

Space Model

Software vulnerability is most common issues in software engineering, many applications has suffering vulnerability, information leakage, and data hijacking such kind of problems facing since couple of years. Sometimes developers should be making some mistakes during code making which generate vulnerability issues for entire application. In this research work, we carried out an approach to software vulnerability detection using deep learning approach behalf of metadata processing. The system carried software vulnerability detection based on the Deep Neural Network (DNN). a new dynamic vulnerability classification approach has suggested. The model basic build based on TF-IDF as well density based feature selection approach for DNN. basically TF-IDF has used to measured the frequency and weight of specific word of vulnerability description; the Vector Space Model (VSM) is used for feature selection to achieve an finest set of feature term, and; the DNN neural network model is used to built an dynamic weakness classifier to achieve effectiveness into the bug detection. The overall system has categorized into four phases in first phase we detect the code clone to eliminate the data redundancy and execution time complexity, in second we apply Vector Space Model (VSM) recommend the re-factor possibility in entire code while in third section we build DNN module for software vulnerability detection and finally recommend the vulnerability for entire code. The system partial implementation has evaluated in java environment which provide satisfactory results for heterogeneous code modules .

Download Full-text

Contextual weighting approach to compute term weight in layered vector space model

Journal of Information Science ◽

10.1177/0165551519860043 ◽

2019 ◽

pp. 016555151986004

Author(s):

Jayant Gadge ◽

Sunil Bhirud

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Vector Space Model ◽

Primary Concern ◽

New Approach ◽

Space Model ◽

Web Document ◽

Web Information ◽

Unique Approach ◽

The Web

The World Wide Web (WWW) is the largest available repository of information. This huge amount of information put forward the challenges of retrieval of trustworthy information from WWW. It defies researchers with new issues of diversity and complexity while retrieving the web information. Information retrieval from the web demands approaches that span beyond conventional information retrieval. Heterogeneity, complexity and the huge volume of web information requires a unique approach to retrieve information. Besides, end-users introduce some difficulties in the retrieval process. Sometimes queries submitted by the user are subtle and ambiguous. The primary concern in information retrieval is the issue of predicting the relevance of documents. In this article, a new approach is proposed that rationally separates web document into five layers, namely, title, header, hyperlink, meta tag and body layer. The proposed method effectively combines the textual information and structural evidence of web document for retrieving information from Web. In the proposed layered vector space model, each layer has an allocated priority which is used to compute weight factor for these layers. The proposed method deduces equation that effectively combines priority of the layer and length of the layer to calculate the weight of the layer.

Download Full-text

Extended Vector Space Model with Semantic Relatedness on Java Archive Search Engine

Jurnal Teknik Informatika dan Sistem Informasi ◽

10.28932/jutisi.v1i2.372 ◽

2015 ◽

Vol 1 (2) ◽

Cited By ~ 2

Author(s):

Oscar Karnalim

Keyword(s):

Vector Space ◽

Search Engine ◽

Vector Space Model ◽

Semantic Relatedness ◽

Space Model

Download Full-text

Aplikasi Deteksi Kemiripan Tugas Paper

Matrik Jurnal Manajemen Teknik Informatika dan Rekayasa Komputer ◽

10.30812/matrik.v15i2.39 ◽

2017 ◽

Vol 15 (2) ◽

pp. 5

Author(s):

Anthony Anggrawan ◽

Azhari

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Vector Space Model ◽

Mean Average Precision ◽

Average Precision ◽

Information Searching ◽

Space Model ◽

Model Method

Information searching based on users’ query, which is hopefully able to find the documents based on users’ need, is known as Information Retrieval. This research uses Vector Space Model method in determining the similarity percentage of each student’s assignment. This research uses PHP programming and MySQL database. The finding is represented by ranking the similarity of document with query, with mean average precision value of 0,874. It shows how accurate the application with the examination done by the experts, which is gained from the evaluation with 5 queries that is compared to 25 samples of documents. If the number of counted assignments has higher similarity, thus the process of similarity counting needs more time, it depends on the assignment’s number which is submitted.

Download Full-text

Aplikasi Rekomendasi Buku Pada Katalog Perpustakaan Universitas Multimedia Nusantara Menggunakan Vector Space Model

Jurnal ULTIMATICS ◽

10.31937/ti.v9i2.639 ◽

2018 ◽

Vol 9 (2) ◽

pp. 97-105

Author(s):

Richard Firdaus Oeyliawan ◽

Dennis Gunawan

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Vector Model ◽

Library Management ◽

Space Model ◽

Library Management System ◽

Index Terms ◽

Library Catalogue ◽

Language Sample ◽

F Measure

Library is one of the facilities which provides information, knowledge resource, and acts as an academic helper for readers to get the information. The huge number of books which library has, usually make readers find the books with difficulty. Universitas Multimedia Nusantara uses the Senayan Library Management System (SLiMS) as the library catalogue. SLiMS has many features which help readers, but there is still no recommendation feature to help the readers finding the books which are relevant to the specific book that readers choose. The application has been developed using Vector Space Model to represent the document in vector model. The recommendation in this application is based on the similarity of the books description. Based on the testing phase using one-language sample of the relevant books, the F-Measure value gained is 55% using 0.1 as cosine similarity threshold. The books description and variety of languages affect the F-Measure value gained. Index Terms—Book Recommendation, Porter Stemmer, SLiMS Universitas Multimedia Nusantara, TF-IDF, Vector Space Model

Download Full-text