scholarly journals Myanmar news summarization using different word representations

Author(s):  
Soe Soe Lwin ◽  
Khin Thandar Nwet

There is enormous amount information available in different forms of sources and genres. In order to extract useful information from a massive amount of data, automatic mechanism is required. The text summarization systems assist with content reduction keeping the important information and filtering the non-important parts of the text. Good document representation is really important in text summarization to get relevant information. Bag-of-words cannot give word similarity on syntactic and semantic relationship. Word embedding can give good document representation to capture and encode the semantic relation between words. Therefore, centroid based on word embedding representation is employed in this paper. Myanmar news summarization based on different word embedding is proposed. In this paper, Myanmar local and international news are summarized using centroid-based word embedding summarizer using the effectiveness of word representation approach, word embedding. Experiments were done on Myanmar local and international news dataset using different word embedding models and the results are compared with performance of bag-of-words summarization. Centroid summarization using word embedding performs comprehensively better than centroid summarization using bag-of-words.

2021 ◽  
Vol 11 (4) ◽  
pp. 3769-3783
Author(s):  
Merin Cherian ◽  
Kannan Balakrishnan

An evaluation of static word embedding models for Malayalam is conducted in this paper. In this work, we have created a well-documented and pre-processed corpus for Malayalam. Word vectors were created for this corpus using three different word embedding models and they were evaluated using intrinsic evaluators. Quality of word representation is tested using word analogy, word similarity and concept categorization. The testing is independent of the downstream language processing tasks. Experimental results on Malayalam word representations of GloVe, FastText and Word2Vec are reported in this work. It is shown that higher-dimensional word representation and larger window size gave better results on intrinsic evaluators.


2020 ◽  
pp. 619-637
Author(s):  
Yogesh Kumar Meena ◽  
Dinesh Gopalani

Automatic Text Summarization (ATS) enables users to save their precious time to retrieve their relevant information need while searching voluminous big data. Text summaries are sensitive to scoring methods, as most of the methods requires to weight features for sentence scoring. In this chapter, various statistical features proposed by researchers for extractive automatic text summarization are explored. Features that perform well are termed as best features using ROUGE evaluation measures and used for creating feature combinations. After that, best performing feature combinations are identified. Performance evaluation of best performing feature combinations on short, medium and large size documents is also conducted using same ROUGE performance measures.


2017 ◽  
Vol 2 (1) ◽  
pp. 249-257
Author(s):  
Daniel Morariu ◽  
Lucian Vințan ◽  
Radu Crețulescu

Abstract In this paper, we will present experiments that try to integrate the power of Word Embedding representation in real problems for documents classification. Word Embedding is a new tendency used in the natural language processing domain that tries to represent each word from the document in a vector format. This representation embeds the semantically context in that the word occurs more frequently. We include this new representation in a classical VSM document representation and evaluate it using a learning algorithm based on the Support Vector Machine. This new added information makes the classification to be more difficult because it increases the learning time and the memory needed. The obtained results are slightly weaker comparatively with the classical VSM document representation. By adding the WE representation to the classical VSM representation we want to improve the current educational paradigm for the computer science students which is generally limited to the VSM representation.


Author(s):  
S Hasanzadeh ◽  
S M Fakhrahmad ◽  
M Taheri

Abstract Recommender systems nowadays play an important role in providing helpful information for users, especially in ecommerce applications. Many of the proposed models use rating histories of the users in order to predict unknown ratings. Recently, users’ reviews as a valuable source of knowledge have attracted the attention of researchers in this field and a new category denoted as review-based recommender systems has emerged. In this study, we make use of the information included in user reviews as well as available rating scores to develop a review-based rating prediction system. The proposed scheme attempts to handle the uncertainty problem of the rating histories, by fuzzifying the given ratings. Another advantage of the proposed system is the use of a word embedding representation model for textual reviews, instead of using traditional models such as binary bag of words and TFIDF 1 vector space. It also makes use of the helpfulness voting scores, in order to prune data and achieve better results. The effectiveness of the rating prediction scheme as well as the final recommender system was evaluated against the Amazon dataset. Experimental results revealed that the proposed recommender system outperforms its counterparts and can be used as a suitable tool in ecommerce environments.


Author(s):  
Haoyan Liu ◽  
Lei Fang ◽  
Jian-Guang Lou ◽  
Zhoujun Li

Much recent work focuses on leveraging semantic lexicons like WordNet to enhance word representation learning (WRL) and achieves promising performance on many NLP tasks. However, most existing methods might have limitations because they require high-quality, manually created, semantic lexicons or linguistic structures. In this paper, we propose to leverage semantic knowledge automatically mined from web structured data to enhance WRL. We first construct a semantic similarity graph, which is referred as semantic knowledge, based on a large collection of semantic lists extracted from the web using several pre-defined HTML tag patterns. Then we introduce an efficient joint word representation learning model to capture semantics from both semantic knowledge and text corpora. Compared with recent work on improving WRL with semantic resources, our approach is more general, and can be easily scaled with no additional effort. Extensive experimental results show that our approach outperforms the state-of-the-art methods on word similarity, word sense disambiguation, text classification and textual similarity tasks.


2020 ◽  
Author(s):  
Masashi Sugiyama

Recently, word embeddings have been used in many natural language processing problems successfully and how to train a robust and accurate word embedding system efficiently is a popular research area. Since many, if not all, words have more than one sense, it is necessary to learn vectors for all senses of word separately. Therefore, in this project, we have explored two multi-sense word embedding models, including Multi-Sense Skip-gram (MSSG) model and Non-parametric Multi-sense Skip Gram model (NP-MSSG). Furthermore, we propose an extension of the Multi-Sense Skip-gram model called Incremental Multi-Sense Skip-gram (IMSSG) model which could learn the vectors of all senses per word incrementally. We evaluate all the systems on word similarity task and show that IMSSG is better than the other models.


Author(s):  
Anusha Kalbande

Abstract: Data is growing at an unimaginable speed around us, but what part of it is really useful information? Business leaders, financial analysts, stock market enthusiasts, researchers etc. often need to go through a plethora of news articles and data every day, and this time spent may not even result in any fruitful insights. Considering such a huge volume of data, there is difficulty in gaining precise, relevant information and interpreting the overall sentiment portrayed by the article. The proposed method helps in conceptualizing a tool that takes financial news from selected and trusted online sources as an input and gives a summary of the same along with a basic positive, negative or neutral sentiment. Here it is assumed that the tool user is familiar with the company’s profile. Based on the input (company name/symbol) given by the user, the corresponding news articles will be fetched using web scraping. All these articles will then be summarized to gain succinct and to the point information. An overall sentiment about the company will be portrayed based on the different important features in the article about the company. Keywords: Financial News; Summarization; Sentiment Analysis.


Sign in / Sign up

Export Citation Format

Share Document