Myanmar news summarization using different word representations

There is enormous amount information available in different forms of sources and genres. In order to extract useful information from a massive amount of data, automatic mechanism is required. The text summarization systems assist with content reduction keeping the important information and filtering the non-important parts of the text. Good document representation is really important in text summarization to get relevant information. Bag-of-words cannot give word similarity on syntactic and semantic relationship. Word embedding can give good document representation to capture and encode the semantic relation between words. Therefore, centroid based on word embedding representation is employed in this paper. Myanmar news summarization based on different word embedding is proposed. In this paper, Myanmar local and international news are summarized using centroid-based word embedding summarizer using the effectiveness of word representation approach, word embedding. Experiments were done on Myanmar local and international news dataset using different word embedding models and the results are compared with performance of bag-of-words summarization. Centroid summarization using word embedding performs comprehensively better than centroid summarization using bag-of-words.

Download Full-text

Evaluating Word Embedding Models for Malayalam

Revista Gestão Inovação e Tecnologias ◽

10.47059/revistageintec.v11i4.2406 ◽

2021 ◽

Vol 11 (4) ◽

pp. 3769-3783

Author(s):

Merin Cherian ◽

Kannan Balakrishnan

Keyword(s):

Language Processing ◽

Window Size ◽

Word Embedding ◽

Experimental Results ◽

Word Similarity ◽

Higher Dimensional ◽

Word Representation

An evaluation of static word embedding models for Malayalam is conducted in this paper. In this work, we have created a well-documented and pre-processed corpus for Malayalam. Word vectors were created for this corpus using three different word embedding models and they were evaluated using intrinsic evaluators. Quality of word representation is tested using word analogy, word similarity and concept categorization. The testing is independent of the downstream language processing tasks. Experimental results on Malayalam word representations of GloVe, FastText and Word2Vec are reported in this work. It is shown that higher-dimensional word representation and larger window size gave better results on intrinsic evaluators.

Download Full-text

Word Embedding-Based Biomedical Text Summarization

Advances in Intelligent Systems and Computing - Emerging Trends in Intelligent Computing and Informatics ◽

10.1007/978-3-030-33582-3_28 ◽

2019 ◽

pp. 288-297

Author(s):

Oussama Rouane ◽

Hacene Belhadef ◽

Mustapha Bouakkaz

Keyword(s):

Text Summarization ◽

Word Embedding ◽

Biomedical Text

Download Full-text

Statistical Features for Extractive Automatic Text Summarization

Natural Language Processing ◽

10.4018/978-1-7998-0951-7.ch030 ◽

2020 ◽

pp. 619-637

Author(s):

Yogesh Kumar Meena ◽

Dinesh Gopalani

Keyword(s):

Big Data ◽

Performance Measures ◽

Relevant Information ◽

Text Summarization ◽

Statistical Features ◽

Information Need ◽

Evaluation Measures ◽

Large Size ◽

Automatic Text Summarization ◽

Automatic Text

Automatic Text Summarization (ATS) enables users to save their precious time to retrieve their relevant information need while searching voluminous big data. Text summaries are sensitive to scoring methods, as most of the methods requires to weight features for sentence scoring. In this chapter, various statistical features proposed by researchers for extractive automatic text summarization are explored. Features that perform well are termed as best features using ROUGE evaluation measures and used for creating feature combinations. After that, best performing feature combinations are identified. Performance evaluation of best performing feature combinations on short, medium and large size documents is also conducted using same ROUGE performance measures.

Download Full-text

An Extension of the VSM Documents Representation using Word Embedding

Balkan Region Conference on Engineering and Business Education ◽

10.1515/cplbu-2017-0033 ◽

2017 ◽

Vol 2 (1) ◽

pp. 249-257

Author(s):

Daniel Morariu ◽

Lucian Vințan ◽

Radu Crețulescu

Keyword(s):

Support Vector Machine ◽

Natural Language Processing ◽

Language Processing ◽

Learning Algorithm ◽

Word Embedding ◽

Support Vector ◽

Science Students ◽

Document Representation ◽

Learning Time ◽

Educational Paradigm

Abstract In this paper, we will present experiments that try to integrate the power of Word Embedding representation in real problems for documents classification. Word Embedding is a new tendency used in the natural language processing domain that tries to represent each word from the document in a vector format. This representation embeds the semantically context in that the word occurs more frequently. We include this new representation in a classical VSM document representation and evaluate it using a learning algorithm based on the Support Vector Machine. This new added information makes the classification to be more difficult because it increases the learning time and the memory needed. The obtained results are slightly weaker comparatively with the classical VSM document representation. By adding the WE representation to the classical VSM representation we want to improve the current educational paradigm for the computer science students which is generally limited to the VSM representation.

Download Full-text

Combining Word Embedding and Semantic Lexicon for Chinese Word Similarity Computation

Natural Language Understanding and Intelligent Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-319-50496-4_69 ◽

2016 ◽

pp. 766-777 ◽

Cited By ~ 6

Author(s):

Jiahuan Pei ◽

Cong Zhang ◽

Degen Huang ◽

Jianjun Ma

Keyword(s):

Word Embedding ◽

Chinese Word ◽

Word Similarity ◽

Similarity Computation

Download Full-text

Review-Based Recommender Systems: A Proposed Rating Prediction Scheme Using Word Embedding Representation of Reviews

The Computer Journal ◽

10.1093/comjnl/bxaa044 ◽

2020 ◽

Cited By ~ 1

Author(s):

S Hasanzadeh ◽

S M Fakhrahmad ◽

M Taheri

Keyword(s):

Vector Space ◽

Recommender Systems ◽

Recommender System ◽

Word Embedding ◽

Bag Of Words ◽

Prediction System ◽

User Reviews ◽

Prediction Scheme ◽

Rating Prediction ◽

The Given

Abstract Recommender systems nowadays play an important role in providing helpful information for users, especially in ecommerce applications. Many of the proposed models use rating histories of the users in order to predict unknown ratings. Recently, users’ reviews as a valuable source of knowledge have attracted the attention of researchers in this field and a new category denoted as review-based recommender systems has emerged. In this study, we make use of the information included in user reviews as well as available rating scores to develop a review-based rating prediction system. The proposed scheme attempts to handle the uncertainty problem of the rating histories, by fuzzifying the given ratings. Another advantage of the proposed system is the use of a word embedding representation model for textual reviews, instead of using traditional models such as binary bag of words and TFIDF 1 vector space. It also makes use of the helpfulness voting scores, in order to prune data and achieve better results. The effectiveness of the rating prediction scheme as well as the final recommender system was evaluated against the Amazon dataset. Experimental results revealed that the proposed recommender system outperforms its counterparts and can be used as a suitable tool in ecommerce environments.

Download Full-text

Abstractive Text Summarization Using Pointer-Generator Networks With Pre-trained Word Embedding

Proceedings of the Tenth International Symposium on Information and Communication Technology - SoICT 2019 ◽

10.1145/3368926.3369728 ◽

2019 ◽

Author(s):

Dang Trung Anh ◽

Nguyen Thi Thu Trang

Keyword(s):

Text Summarization ◽

Word Embedding

Download Full-text

Leveraging Web Semantic Knowledge in Word Representation Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016746 ◽

2019 ◽

Vol 33 ◽

pp. 6746-6753

Author(s):

Haoyan Liu ◽

Lei Fang ◽

Jian-Guang Lou ◽

Zhoujun Li

Keyword(s):

Recent Work ◽

Word Sense Disambiguation ◽

Representation Learning ◽

Semantic Knowledge ◽

Word Sense ◽

Word Similarity ◽

Text Corpora ◽

Word Representation ◽

Semantic Lexicons ◽

Semantic Resources

Much recent work focuses on leveraging semantic lexicons like WordNet to enhance word representation learning (WRL) and achieves promising performance on many NLP tasks. However, most existing methods might have limitations because they require high-quality, manually created, semantic lexicons or linguistic structures. In this paper, we propose to leverage semantic knowledge automatically mined from web structured data to enhance WRL. We first construct a semantic similarity graph, which is referred as semantic knowledge, based on a large collection of semantic lists extracted from the web using several pre-defined HTML tag patterns. Then we introduce an efficient joint word representation learning model to capture semantics from both semantic knowledge and text corpora. Compared with recent work on improving WRL with semantic resources, our approach is more general, and can be easily scaled with no additional effort. Extensive experimental results show that our approach outperforms the state-of-the-art methods on word similarity, word sense disambiguation, text classification and textual similarity tasks.

Download Full-text

Multi-Sense Embeddings per Word

10.31219/osf.io/udfhn ◽

2020 ◽

Author(s):

Masashi Sugiyama

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Research Area ◽

Word Embedding ◽

The Other ◽

Word Embeddings ◽

Word Similarity ◽

Better Than ◽

Non Parametric

Recently, word embeddings have been used in many natural language processing problems successfully and how to train a robust and accurate word embedding system efficiently is a popular research area. Since many, if not all, words have more than one sense, it is necessary to learn vectors for all senses of word separately. Therefore, in this project, we have explored two multi-sense word embedding models, including Multi-Sense Skip-gram (MSSG) model and Non-parametric Multi-sense Skip Gram model (NP-MSSG). Furthermore, we propose an extension of the Multi-Sense Skip-gram model called Incremental Multi-Sense Skip-gram (IMSSG) model which could learn the vectors of all senses per word incrementally. We evaluate all the systems on word similarity task and show that IMSSG is better than the other models.

Download Full-text

Summarization and Sentiment Analysis for Financial News

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38345 ◽

2021 ◽

Vol 9 (10) ◽

pp. 88-90

Author(s):

Anusha Kalbande

Keyword(s):

Stock Market ◽

Sentiment Analysis ◽

Financial Analysts ◽

Relevant Information ◽

Business Leaders ◽

Financial News ◽

Web Scraping ◽

Abstract Data ◽

News Summarization ◽

Online Sources

Abstract: Data is growing at an unimaginable speed around us, but what part of it is really useful information? Business leaders, financial analysts, stock market enthusiasts, researchers etc. often need to go through a plethora of news articles and data every day, and this time spent may not even result in any fruitful insights. Considering such a huge volume of data, there is difficulty in gaining precise, relevant information and interpreting the overall sentiment portrayed by the article. The proposed method helps in conceptualizing a tool that takes financial news from selected and trusted online sources as an input and gives a summary of the same along with a basic positive, negative or neutral sentiment. Here it is assumed that the tool user is familiar with the company’s profile. Based on the input (company name/symbol) given by the user, the corresponding news articles will be fetched using web scraping. All these articles will then be summarized to gain succinct and to the point information. An overall sentiment about the company will be portrayed based on the different important features in the article about the company. Keywords: Financial News; Summarization; Sentiment Analysis.

Download Full-text