document representation Latest Research Papers

Collections of scientific publications are growing rapidly. Scientists have access to portals containing a large number of documents. Such a large amount of data is difficult to investigate. Methods of document visualization are used to reduce labor costs, search for necessary and similar documents, evaluate the scientific contribution of certain publications and reveal hidden links between documents. The methods of document visualization can be based on various models of document representation. In recent years, word embedding methods for natural language processing have become extremely popular. Following them, methods for analyzing text collections began to appear to obtain vector representations of documents. Although there are many document analyzing systems, new methods can give new understandings of collections, have better performance for analyzing large collections of documents, or find new relationships between documents. This article discusses two methods Paper2vec and Cite2vec that get vector representations of documents using citation information. The text provides a brief description of the considered methods for analyzing collections of scientific publications, describes experiments with these methods, including the visualization of the results of the methods and a description of the problems that arise.

Download Full-text

Emotionally charged text classification with deep learning and sentiment semantic

Neural Computing and Applications ◽

10.1007/s00521-021-06542-1 ◽

2021 ◽

Author(s):

Jeow Li Huan ◽

Arif Ahmed Sekh ◽

Chai Quek ◽

Dilip K. Prasad

Keyword(s):

Language Processing ◽

Text Classification ◽

Classification Accuracy ◽

State Of The Art ◽

Document Representation ◽

Classical Technique ◽

Text Classifiers ◽

Vector Sequences ◽

Fully Connected ◽

Better Than

AbstractText classification is one of the widely used phenomena in different natural language processing tasks. State-of-the-art text classifiers use the vector space model for extracting features. Recent progress in deep models, recurrent neural networks those preserve the positional relationship among words achieve a higher accuracy. To push text classification accuracy even higher, multi-dimensional document representation, such as vector sequences or matrices combined with document sentiment, should be explored. In this paper, we show that documents can be represented as a sequence of vectors carrying semantic meaning and classified using a recurrent neural network that recognizes long-range relationships. We show that in this representation, additional sentiment vectors can be easily attached as a fully connected layer to the word vectors to further improve classification accuracy. On the UCI sentiment labelled dataset, using the sequence of vectors alone achieved an accuracy of 85.6%, which is better than 80.7% from ridge regression classifier—the best among the classical technique we tested. Additional sentiment information further increases accuracy to 86.3%. On our suicide notes dataset, the best classical technique—the Naíve Bayes Bernoulli classifier, achieves accuracy of 71.3%, while our classifier, incorporating semantic and sentiment information, exceeds that at 75% accuracy.

Download Full-text

Adaptive and hybrid context-aware fine-grained word sense disambiguation in topic modeling based document representation

Information Processing & Management ◽

10.1016/j.ipm.2021.102592 ◽

2021 ◽

Vol 58 (4) ◽

pp. 102592

Author(s):

Wenbo Li ◽

Einoshin Suzuki

Keyword(s):

Topic Modeling ◽

Word Sense Disambiguation ◽

Context Aware ◽

Word Sense ◽

Document Representation ◽

Fine Grained ◽

Sense Disambiguation

Download Full-text

Quantum probability-inspired graph neural network for document representation and classification

Neurocomputing ◽

10.1016/j.neucom.2021.02.060 ◽

2021 ◽

Vol 445 ◽

pp. 276-286

Author(s):

Peng Yan ◽

Linjing Li ◽

Miaotianzi Jin ◽

Daniel Zeng

Keyword(s):

Neural Network ◽

Quantum Probability ◽

Document Representation

Download Full-text

Myanmar news summarization using different word representations

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i3.pp2285-2292 ◽

2021 ◽

Vol 11 (3) ◽

pp. 2285

Author(s):

Soe Soe Lwin ◽

Khin Thandar Nwet

Keyword(s):

Relevant Information ◽

Text Summarization ◽

Word Embedding ◽

Bag Of Words ◽

International News ◽

Document Representation ◽

Word Similarity ◽

Word Representation ◽

News Summarization ◽

Automatic Mechanism

There is enormous amount information available in different forms of sources and genres. In order to extract useful information from a massive amount of data, automatic mechanism is required. The text summarization systems assist with content reduction keeping the important information and filtering the non-important parts of the text. Good document representation is really important in text summarization to get relevant information. Bag-of-words cannot give word similarity on syntactic and semantic relationship. Word embedding can give good document representation to capture and encode the semantic relation between words. Therefore, centroid based on word embedding representation is employed in this paper. Myanmar news summarization based on different word embedding is proposed. In this paper, Myanmar local and international news are summarized using centroid-based word embedding summarizer using the effectiveness of word representation approach, word embedding. Experiments were done on Myanmar local and international news dataset using different word embedding models and the results are compared with performance of bag-of-words summarization. Centroid summarization using word embedding performs comprehensively better than centroid summarization using bag-of-words.

Download Full-text

Multi-Criteria Recommendation Systems to Foster Online Grocery

Sensors ◽

10.3390/s21113747 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3747

Author(s):

Manar Mohamed Hafez ◽

Rebeca P. Díaz Redondo ◽

Ana Fernández Vilas ◽

Héctor Olivera Pazó

Keyword(s):

Neural Network ◽

Recommendation System ◽

Technology Development ◽

Intelligent System ◽

Recommendation Systems ◽

Grocery Shopping ◽

Document Representation ◽

Specific Product ◽

The Neural Network ◽

Representation Method

With the exponential increase in information, it has become imperative to design mechanisms that allow users to access what matters to them as quickly as possible. The recommendation system (RS) with information technology development is the solution, it is an intelligent system. Various types of data can be collected on items of interest to users and presented as recommendations. RS also play a very important role in e-commerce. The purpose of recommending a product is to designate the most appropriate designation for a specific product. The major challenge when recommending products is insufficient information about the products and the categories to which they belong. In this paper, we transform the product data using two methods of document representation: bag-of-words (BOW) and the neural network-based document combination known as vector-based (Doc2Vec). We propose three-criteria recommendation systems (product, package and health) for each document representation method to foster online grocery shopping, which depends on product characteristics such as composition, packaging, nutrition table, allergen, and so forth. For our evaluation, we conducted a user and expert survey. Finally, we compared the performance of these three criteria for each document representation method, discovering that the neural network-based (Doc2Vec) performs better and completely alters the results.

Download Full-text

Graph Topic Neural Network for Document Representation

Proceedings of the Web Conference 2021 ◽

10.1145/3442381.3450045 ◽

2021 ◽

Author(s):

Qianqian Xie ◽

Jimin Huang ◽

Pan Du ◽

Min Peng ◽

Jian-Yun Nie

Keyword(s):

Neural Network ◽

Document Representation

Download Full-text

Comparing vector document representation methods for authorship identification

10.11606/d.45.2021.tde-05052021-040638 ◽

2021 ◽

Author(s):

Pamela Rosy Revuelta Quintanilla

Keyword(s):

Document Representation ◽

Authorship Identification

Download Full-text

document representation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Privacy-Preserving smart contracts for Fuzzy WordNet Based Document Representation and Clustering using Regularized K-Means Method

Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation: the case of gluten bibliome

PAPER2VEC AND CITE2VEC METHODS FOR ANALYZING COLLECTIONS OF SCIENTIFIC PUBLICATIONS

Emotionally charged text classification with deep learning and sentiment semantic

Adaptive and hybrid context-aware fine-grained word sense disambiguation in topic modeling based document representation

Quantum probability-inspired graph neural network for document representation and classification

Myanmar news summarization using different word representations

Multi-Criteria Recommendation Systems to Foster Online Grocery

Graph Topic Neural Network for Document Representation

Comparing vector document representation methods for authorship identification

Export Citation Format

document representationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Privacy-Preserving smart contracts for Fuzzy WordNet Based Document Representation and Clustering using Regularized K-Means Method

Boosting biomedical document classification through the use of domain entity recognizers and semantic ontologies for document representation: the case of gluten bibliome

PAPER2VEC AND CITE2VEC METHODS FOR ANALYZING COLLECTIONS OF SCIENTIFIC PUBLICATIONS

Emotionally charged text classification with deep learning and sentiment semantic

Adaptive and hybrid context-aware fine-grained word sense disambiguation in topic modeling based document representation

Quantum probability-inspired graph neural network for document representation and classification

Myanmar news summarization using different word representations

Multi-Criteria Recommendation Systems to Foster Online Grocery

Graph Topic Neural Network for Document Representation

Comparing vector document representation methods for authorship identification

document representation
Recently Published Documents