A New LSA and Entropy-Based Approach for Automatic Text Document Summarization

Author(s):  
Chandra Yadav ◽  
Aditi Sharan

Automatic text document summarization is active research area in text mining field. In this article, the authors are proposing two new approaches (three models) for sentence selection, and a new entropy-based summary evaluation criteria. The first approach is based on the algebraic model, Singular Value Decomposition (SVD), i.e. Latent Semantic Analysis (LSA) and model is termed as proposed_model-1, and Second Approach is based on entropy that is further divided into proposed_model-2 and proposed_model-3. In first proposed model, the authors are using right singular matrix, and second & third proposed models are based on Shannon entropy. The advantage of these models is that these are not a Length dominating model, giving better results, and low redundancy. Along with these three new models, an entropy-based summary evaluation criteria is proposed and tested. They are also showing that their entropy based proposed models statistically closer to DUC-2002's standard/gold summary. In this article, the authors are using a dataset taken from Document Understanding Conference-2002.

Automatic text summarization of a resource-poor language is a challenging task. Unsupervised extractive techniques are often preferred for such languages due to scarcity of resources. Latent Semantic Analysis (LSA) is an unsupervised technique which automatically identifies semantically important sentences from a text document. Two methods based on Latent Semantic Analysis have been evaluated on two datasets of a resource-poor language using Singular Value Decomposition (SVD) on different vector-space models. The performance of the methods is evaluated using ROUGE-L scores obtained by comparing the system generated summaries with human generated model summaries. Both the methods are found to be performing better for shorter documents than longer ones.


For years’ radiologist and clinician continues to employs various approaches, machine learning algorithms included to detect, diagnose, and prevent diseases using medical imaging. Recent advances in deep learning made medical imaging analysis and processing an active research area, various algorithms for segmentation, detection, and classification have been proposed. In this survey, we describe the trends of deep learning algorithms use in medical imaging, their architecture, hardware, and software used are all discussed. We concluded with the proposed model for brain lesion segmentation and classification using Magnetic Resonance Images (MRI).


Author(s):  
Gabriel Silva ◽  
Rafael Ferreira ◽  
Rafael Dueire Lins ◽  
Luciano Cabral ◽  
Hilário Oliveira ◽  
...  

2019 ◽  
Vol 9 (7) ◽  
pp. 1291 ◽  
Author(s):  
Zakria ◽  
Jingye Cai ◽  
Jianhua Deng ◽  
Muhammad Aftab ◽  
Muhammad Khokhar ◽  
...  

The intelligent transportation system is currently an active research area, and vehicle re-identification (Re-Id) is a fundamental task to implement it. It determines whether the given vehicle image obtained from one camera has already appeared over a camera network or not. There are many possible practical applications where the vehicle Re-Id system can be employed, such as intelligent vehicle parking, suspicious vehicle tracking, vehicle incident detection, vehicle counting, and automatic toll collection. This task becomes more challenging because of intra-class similarity, viewpoint changes, and inconsistent environmental conditions. In this paper, we propose a novel approach that re-identifies a vehicle in two steps: first we shortlist the vehicle from a gallery set on the basis of appearance, and then in the second step we verify the shortlisted vehicle’s license plates with a query image to identify the targeted vehicle. In our model, the global channel extracts the feature vector from the whole vehicle image, and the local region channel extracts more discriminative and salient features from different regions. In addition to this, we jointly incorporate attributes like model, type, and color, etc. Lastly, we use a siamese neural network to verify license plates to reach the exact vehicle. Extensive experimental results on the benchmark dataset VeRi-776 demonstrate the effectiveness of the proposed model as compared to various state-of-the-art methods.


Automatic text summarization is a technique of generating short and accurate summary of a longer text document. Text summarization can be classified based on the number of input documents (single document and multi-document summarization) and based on the characteristics of the summary generated (extractive and abstractive summarization). Multi-document summarization is an automatic process of creating relevant, informative and concise summary from a cluster of related documents. This paper does a detailed survey on the existing literature on the various approaches for text summarization. Few of the most popular approaches such as graph based, cluster based and deep learning-based summarization techniques are discussed here along with the evaluation metrics, which can provide an insight to the future researchers.


2018 ◽  
Vol 8 (3) ◽  
pp. 14-32 ◽  
Author(s):  
Chandra Shakhar Yadav ◽  
Aditi Sharan

This article proposes a new concept of Lexical Network for Automatic Text Document Summarization. Instead of a number of chains, the authors are getting a network of sentences which is called as Lexical Network termed as LexNetwork. This network is created between sentences based on different lexical and semantic relations. In this network, a node is representing sentences and edges are representing strength between two sentences. Strength means the number of relations present between the two sentences. The importance of the sentences is decided based on different centrality measures and extracted for the summary. WSD is done with Simple Lesk technique, and Cosine-Similarity threshold (Ɵ, TH) is used as post processing task. In this article, the authors are suggesting that a Cosine similarity threshold 10% is better vs. 5%, and an Eigen-Value based centrality measure is better for summarization process. At last for comparison, they are using Semantrica-Lexalytics System.


2022 ◽  
Vol 19 (1) ◽  
pp. 1719
Author(s):  
Saravanan Arumugam ◽  
Sathya Bama Subramani

With the increase in the amount of data and documents on the web, text summarization has become one of the significant fields which cannot be avoided in today’s digital era. Automatic text summarization provides a quick summary to the user based on the information presented in the text documents. This paper presents the automated single document summarization by constructing similitude graphs from the extracted text segments. On extracting the text segments, the feature values are computed for all the segments by comparing them with the title and the entire document and by computing segment significance using the information gain ratio. Based on the computed features, the similarity between the segments is evaluated to construct the graph in which the vertices are the segments and the edges specify the similarity between them. The segments are ranked for including them in the extractive summary by computing the graph score and the sentence segment score. The experimental analysis has been performed using ROUGE metrics and the results are analyzed for the proposed model. The proposed model has been compared with the various existing models using 4 different datasets in which the proposed model acquired top 2 positions with the average rank computed on various metrics such as precision, recall, F-score. HIGHLIGHTS Paper presents the automated single document summarization by constructing similitude graphs from the extracted text segments It utilizes information gain ratio, graph construction, graph score and the sentence segment score computation Results analysis has been performed using ROUGE metrics with 4 popular datasets in the document summarization domain The model acquired top 2 positions with the average rank computed on various metrics such as precision, recall, F-score GRAPHICAL ABSTRACT


Author(s):  
Manju Lata Joshi ◽  
Nisheeth Joshi ◽  
Namita Mittal

Creating a coherent summary of the text is a challenging task in the field of Natural Language Processing (NLP). Various Automatic Text Summarization techniques have been developed for abstractive as well as extractive summarization. This study focuses on extractive summarization which is a process containing selected delineative paragraphs or sentences from the original text and combining these into smaller forms than the document(s) to generate a summary. The methods that have been used for extractive summarization are based on a graph-theoretic approach, machine learning, Latent Semantic Analysis (LSA), neural networks, cluster, and fuzzy logic. In this paper, a semantic graph-based approach SGATS (Semantic Graph-based approach for Automatic Text Summarization) is proposed to generate an extractive summary. The proposed approach constructs a semantic graph of the original Hindi text document by establishing a semantic relationship between sentences of the document using Hindi Wordnet ontology as a background knowledge source. Once the semantic graph is constructed, fourteen different graph theoretical measures are applied to rank the document sentences depending on their semantic scores. The proposed approach is applied to two data sets of different domains of Tourism and Health. The performance of the proposed approach is compared with the state-of-the-art TextRank algorithm and human-annotated summary. The performance of the proposed system is evaluated using widely accepted ROUGE measures. The outcomes exhibit that our proposed system produces better results than TextRank for health domain corpus and comparable results for tourism corpus. Further, correlation coefficient methods are applied to find a correlation between eight different graphical measures and it is observed that most of the graphical measures are highly correlated.


2020 ◽  
Vol 2020 (9) ◽  
pp. 323-1-323-8
Author(s):  
Litao Hu ◽  
Zhenhua Hu ◽  
Peter Bauer ◽  
Todd J. Harris ◽  
Jan P. Allebach

Image quality assessment has been a very active research area in the field of image processing, and there have been numerous methods proposed. However, most of the existing methods focus on digital images that only or mainly contain pictures or photos taken by digital cameras. Traditional approaches evaluate an input image as a whole and try to estimate a quality score for the image, in order to give viewers an idea of how “good” the image looks. In this paper, we mainly focus on the quality evaluation of contents of symbols like texts, bar-codes, QR-codes, lines, and hand-writings in target images. Estimating a quality score for this kind of information can be based on whether or not it is readable by a human, or recognizable by a decoder. Moreover, we mainly study the viewing quality of the scanned document of a printed image. For this purpose, we propose a novel image quality assessment algorithm that is able to determine the readability of a scanned document or regions in a scanned document. Experimental results on some testing images demonstrate the effectiveness of our method.


2020 ◽  
Vol 13 (5) ◽  
pp. 977-986
Author(s):  
Srinivasa Rao Kongara ◽  
Dasika Sree Rama Chandra Murthy ◽  
Gangadhara Rao Kancherla

Background: Text summarization is the process of generating a short description of the entire document which is more difficult to read. This method provides a convenient way of extracting the most useful information and a short summary of the documents. In the existing research work, this is focused by introducing the Fuzzy Rule-based Automated Summarization Method (FRASM). Existing work tends to have various limitations which might limit its applicability to the various real-world applications. The existing method is only suitable for the single document summarization where various applications such as research industries tend to summarize information from multiple documents. Methods: This paper proposed Multi-document Automated Summarization Method (MDASM) to introduce the summarization framework which would result in the accurate summarized outcome from the multiple documents. In this work, multi-document summarization is performed whereas in the existing system only single document summarization was performed. Initially document clustering is performed using modified k means cluster algorithm to group the similar kind of documents that provides the same meaning. This is identified by measuring the frequent term measurement. After clustering, pre-processing is performed by introducing the Hybrid TF-IDF and Singular value decomposition technique which would eliminate the irrelevant content and would result in the required content. Then sentence measurement is one by introducing the additional metrics namely Title measurement in addition to the existing work metrics to accurately retrieve the sentences with more similarity. Finally, a fuzzy rule system is applied to perform text summarization. Results: The overall evaluation of the research work is conducted in the MatLab simulation environment from which it is proved that the proposed research method ensures the optimal outcome than the existing research method in terms of accurate summarization. MDASM produces 89.28% increased accuracy, 89.28% increased precision, 89.36% increased recall value and 70% increased the f-measure value which performs better than FRASM. Conclusion: The summarization processes carried out in this work provides the accurate summarized outcome.


Sign in / Sign up

Export Citation Format

Share Document