scholarly journals Topic Modeling Based Extractive Text Summarization

Text summarization is an approach for identifying important information present within text documents. This computational technique aims to generate shorter versions of the source text, by including only the relevant and salient information present within the source text. In this paper, we propose a novel method to summarize a text document by clustering its contents based on latent topics produced using topic modeling techniques and by generating extractive summaries for each of the identified text clusters. All extractive sub-summaries are later combined to generate a summary for any given source document. We utilize the lesser used and challenging WikiHow dataset in our approach to text summarization. This dataset is unlike the commonly used news datasets which are available for text summarization. The well-known news datasets present their most important information in the first few lines of their source texts, which make their summarization a lesser challenging task when compared to summarizing the WikiHow dataset. Contrary to these news datasets, the documents in the WikiHow dataset are written using a generalized approach and have lesser abstractedness and higher compression ratio, thus proposing a greater challenge to generate summaries. A lot of the current state-of-the-art text summarization techniques tend to eliminate important information present in source documents in the favor of brevity. Our proposed technique aims to capture all the varied information present in source documents. Although the dataset proved challenging, after performing extensive tests within our experimental setup, we have discovered that our model produces encouraging ROUGE results and summaries when compared to the other published extractive and abstractive text summarization models

2019 ◽  
Author(s):  
Mehrdad Shoeiby ◽  
Mohammad Ali Armin ◽  
Sadegh Aliakbarian ◽  
Saeed Anwar ◽  
Lars petersson

<div>Advances in the design of multi-spectral cameras have</div><div>led to great interests in a wide range of applications, from</div><div>astronomy to autonomous driving. However, such cameras</div><div>inherently suffer from a trade-off between the spatial and</div><div>spectral resolution. In this paper, we propose to address</div><div>this limitation by introducing a novel method to carry out</div><div>super-resolution on raw mosaic images, multi-spectral or</div><div>RGB Bayer, captured by modern real-time single-shot mo-</div><div>saic sensors. To this end, we design a deep super-resolution</div><div>architecture that benefits from a sequential feature pyramid</div><div>along the depth of the network. This, in fact, is achieved</div><div>by utilizing a convolutional LSTM (ConvLSTM) to learn the</div><div>inter-dependencies between features at different receptive</div><div>fields. Additionally, by investigating the effect of different</div><div>attention mechanisms in our framework, we show that a</div><div>ConvLSTM inspired module is able to provide superior at-</div><div>tention in our context. Our extensive experiments and anal-</div><div>yses evidence that our approach yields significant super-</div><div>resolution quality, outperforming current state-of-the-art</div><div>mosaic super-resolution methods on both Bayer and multi-</div><div>spectral images. Additionally, to the best of our knowledge,</div><div>our method is the first specialized method to super-resolve</div><div>mosaic images, whether it be multi-spectral or Bayer.</div><div><br></div>


2019 ◽  
Vol 3 (3) ◽  
pp. 58 ◽  
Author(s):  
Tim Haarman ◽  
Bastiaan Zijlema ◽  
Marco Wiering

Keyphrase extraction is an important part of natural language processing (NLP) research, although little research is done in the domain of web pages. The World Wide Web contains billions of pages that are potentially interesting for various NLP tasks, yet it remains largely untouched in scientific research. Current research is often only applied to clean corpora such as abstracts and articles from academic journals or sets of scraped texts from a single domain. However, textual data from web pages differ from normal text documents, as it is structured using HTML elements and often consists of many small fragments. These elements are furthermore used in a highly inconsistent manner and are likely to contain noise. We evaluated the keyphrases extracted by several state-of-the-art extraction methods and found that they did not transfer well to web pages. We therefore propose WebEmbedRank, an adaptation of a recently proposed extraction method that can make use of structural information in web pages in a robust manner. We compared this novel method to other baselines and state-of-the-art methods using a manually annotated dataset and found that WebEmbedRank achieved significant improvements over existing extraction methods on web pages.


Author(s):  
Amal M. Al-Numai ◽  
Aqil M. Azmi

As the number of electronic text documents is increasing so is the need for an automatic text summarizer. The summary can be extractive, compression, or abstractive. In the former, the more important sentences are retained, more or less in their original structure, while the second one involves reducing the length of each sentence. For the latter, it requires a fusion of multiple sentences and/or paraphrasing. This chapter focuses on the abstractive text summarization (ATS) of a single text document. The study explores what ATS is. Additionally, the literature of the field of ATS is investigated. Different datasets and evaluation techniques used in assessing the summarizers are discussed. The fact is that ATS is much more challenging than its extractive counterpart, and as such, there are a few works in this area for all the languages.


2020 ◽  
Vol 17 (9) ◽  
pp. 3867-3872
Author(s):  
Aniv Chakravarty ◽  
Jagadish S. Kallimani

Text summarization is an active field of research with a goal to provide short and meaningful gists from large amount of text documents. Extractive text summarization methods have been extensively studied where text is extracted from the documents to build summaries. There are various type of multi document ranging from different formats to domains and topics. With the recent advancement in technology and use of neural networks for text generation, interest for research in abstractive text summarization has increased significantly. The use of graph based methods which handle semantic information has shown significant results. When given a set of documents of English text files, we make use of abstractive method and predicate argument structures to retrieve necessary text information and pass it through a neural network for text generation. Recurrent neural networks are a subtype of recursive neural networks which try to predict the next sequence based on the current state and considering the information from previous states. The use of neural networks allows generation of summaries for long text sentences as well. This paper implements a semantic based filtering approach using a similarity matrix while keeping all stop-words. The similarity is calculated using semantic concepts and Jiang–Conrath similarity and making use of a recurrent neural network with an attention mechanism to generate summary. ROUGE score is used for measuring accuracy, precision and recall scores.


2009 ◽  
Vol 28 (3) ◽  
pp. 143 ◽  
Author(s):  
Przemyslaw Skibiński ◽  
Jakub Swacha

In this paper we investigate the possibility of improving the efficiency of data compression, and thus reducing storage requirements, for seven widely used text document formats. We propose an open-source text compression software library, featuring an advanced word-substitution scheme with static and semidynamic word dictionaries. The empirical results show an average storage space reduction as high as 78 percent compared to uncompressed documents, and as high as 30 percent compared to documents compressed with the free compression software gzip.


Text summarization is an area of research with a goal to provide short text from huge text documents. Extractive text summarization methods have been extensively studied by many researchers. There are various type of multi document ranging from different formats to domains and topic specific. With the application of neural networks for text generation, interest for research in abstractive text summarization has increased significantly. This approach has been attempted for English and Telugu languages in this article. Recurrent neural networks are a subtype of recursive neural networks which try to predict the next sequence based on the current state and considering the information from previous states. The use of neural networks allows generation of summaries for long text sentences as well. The work implements semantic based filtering using a similarity matrix while keeping all stop-words. The similarity is calculated using semantic concepts and Jiang Similarity and making use of a Recurrent Neural Network (RNN) with an attention mechanism to generate summary. ROUGE score is used for measuring the performance of the applied method on Telugu and English langauges .


Author(s):  
Junaid Rashid ◽  
Syed Muhammad Adnan Shah ◽  
Aun Irtaza

Topic modeling is an effective text mining and information retrieval approach to organizing knowledge with various contents under a specific topic. Text documents in form of news articles are increasing very fast on the web. Analysis of these documents is very important in the fields of text mining and information retrieval. Meaningful information extraction from these documents is a challenging task. One approach for discovering the theme from text documents is topic modeling but this approach still needs a new perspective to improve its performance. In topic modeling, documents have topics and topics are the collection of words. In this paper, we propose a new k-means topic modeling (KTM) approach by using the k-means clustering algorithm. KTM discovers better semantic topics from a collection of documents. Experiments on two real-world Reuters 21578 and BBC News datasets show that KTM performance is better than state-of-the-art topic models like LDA (Latent Dirichlet Allocation) and LSA (Latent Semantic Analysis). The KTM is also applicable for classification and clustering tasks in text mining and achieves higher performance with a comparison of its competitors LDA and LSA.


Electronics ◽  
2020 ◽  
Vol 9 (9) ◽  
pp. 1459
Author(s):  
Mirela Kundid Vasić ◽  
Vladan Papić

In this paper, we propose a novel method for person detection in aerial images of nonurban terrain gathered by an Unmanned Aerial Vehicle (UAV), which plays an important role in Search And Rescue (SAR) missions. The UAV in SAR operations contributes significantly due to the ability to survey a larger geographical area from an aerial viewpoint. Because of the high altitude of recording, the object of interest (person) covers a small part of an image (around 0.1%), which makes this task quite challenging. To address this problem, a multimodel deep learning approach is proposed. The solution consists of two different convolutional neural networks in region proposal, as well as in the classification stage. Additionally, contextual information is used in the classification stage in order to improve the detection results. Experimental results tested on the HERIDAL dataset achieved precision of 68.89% and a recall of 94.65%, which is better than current state-of-the-art methods used for person detection in similar scenarios. Consequently, it may be concluded that this approach is suitable for usage as an auxiliary method in real SAR operations.


2016 ◽  
Vol 25 (01) ◽  
pp. 1660002 ◽  
Author(s):  
Guangbing Yang

Oft-decried information overload is a serious problem that negatively impacts the comprehension of information in the digital age. Text summarization is a helpful process that can be used to alleviate this problem. With the aim of seeking a novel method to enhance the performance of multi-document summarization, this study proposes a novel approach to analyze the problem of multi-document summarization based on a mixture model, consisting of a contextual topic model from a Bayesian hierarchical topic modeling family for selecting candidate summary sentences, and a regression model in machine learning for generating the summary. By investigating hierarchical topics and their correlations with respect to the lexical co-occurrences of words, the proposed contextual topic model can determine the relevance of sentences more effectively, recognize latent topics, and arrange them hierarchically. The quantitative evaluation results from a practical application demonstrates that a system implementing this model can significantly improve the performance of summarization and make it comparable to state-of-the-art summarization systems.


1989 ◽  
Vol 111 (4) ◽  
pp. 285-297 ◽  
Author(s):  
G. A. Georgiou ◽  
M. Blakemore

During the past decade or so there have been significant advances in the mathematical modeling of nondestructive testing (NDT) techniques. The current state of modeling is reviewed, with the aim of demonstrating the important role that it can play in improving and validating NDT techniques and procedures. This paper was presented at the 6th International Offshore Mechanics and Arctic Engineering Symposium, (1987). Since that time, a number of advances have been made in mathematical modeling. A more recent review of mathematical modeling techniques, with additional references, is currently in preparation by the authors.


Sign in / Sign up

Export Citation Format

Share Document