scholarly journals Query-Based Extractive Text Summarization Using Sense-Oriented Semantic Relatedness Measure

Author(s):  
Nazreena Rahman ◽  
Bhogeswar Borah

Abstract This paper presents a query-based extractive text summarization method by using sense-oriented semantic relatedness measure. We have proposed a Word Sense Disambiguation (WSD) technique to find the exact sense of a word present in the sentence. It helps in extracting query relevance sentences while calculating the sense-oriented sentence semantic relatedness score between the query and input text sentence. The proposed method uses five unique features to make clusters of query-relevant sentences. A redundancy removal technique is also put forward to eliminate redundant sentences. We have evaluated our proposed WSD technique with other existing methods by using Senseval and SemEval datasets. Experimental evaluation and discussion signifies the better performance of proposed WSD method over current systems in terms of F-score. We compare our proposed query-based extractive text summarization method with other methods participated in Document Understanding Conference (DUC) and as well as with current methods. Evaluation and comparison state that the proposed query-based extractive text summarization method outperforms many existing methods. As an unsupervised learning algorithm, we obtained highest ROUGE (Recall-Oriented Understudy for Gisting Evaluation) score for all three DUC 2005, 2006 and 2007 datasets. Our proposed method is also quite comparable with other supervised learning based algorithms. We also observe that our query-based extractive text summarization method can recognize query relevance sentences which meet the query need.

2013 ◽  
Vol 22 (02) ◽  
pp. 1350003 ◽  
Author(s):  
KOSTAS FRAGOS

In this work, we propose a new measure of semantic relatedness between concepts applied in word sense disambiguation. Using the overlaps between WordNet definitions of concepts (glosses) and the so-called goodness of fit statistical test we establish a formal mechanism for quantifying and estimating the semantic relatedness between concepts. More concretely, we model WordNet glosses overlaps by making a theoretical assumption about their distribution and then we quantify the discrepancy between the theoretical and actual distribution. This discrepancy is suitably used to measure the relatedness between the input concepts. The experimental results showed very good performance on SensEval-2 lexical sample data for word sense disambiguation.


2018 ◽  
Vol 25 (7) ◽  
pp. 800-808 ◽  
Author(s):  
Yue Wang ◽  
Kai Zheng ◽  
Hua Xu ◽  
Qiaozhu Mei

Abstract Objective Medical word sense disambiguation (WSD) is challenging and often requires significant training with data labeled by domain experts. This work aims to develop an interactive learning algorithm that makes efficient use of expert’s domain knowledge in building high-quality medical WSD models with minimal human effort. Methods We developed an interactive learning algorithm with expert labeling instances and features. An expert can provide supervision in 3 ways: labeling instances, specifying indicative words of a sense, and highlighting supporting evidence in a labeled instance. The algorithm learns from these labels and iteratively selects the most informative instances to ask for future labels. Our evaluation used 3 WSD corpora: 198 ambiguous terms from Medical Subject Headings (MSH) as MEDLINE indexing terms, 74 ambiguous abbreviations in clinical notes from the University of Minnesota (UMN), and 24 ambiguous abbreviations in clinical notes from Vanderbilt University Hospital (VUH). For each ambiguous term and each learning algorithm, a learning curve that plots the accuracy on the test set against the number of labeled instances was generated. The area under the learning curve was used as the primary evaluation metric. Results Our interactive learning algorithm significantly outperformed active learning, the previous fastest learning algorithm for medical WSD. Compared to active learning, it achieved 90% accuracy for the MSH corpus with 42% less labeling effort, 35% less labeling effort for the UMN corpus, and 16% less labeling effort for the VUH corpus. Conclusions High-quality WSD models can be efficiently trained with minimal supervision by inviting experts to label informative instances and provide domain knowledge through labeling/highlighting contextual features.


2021 ◽  
pp. 1-41
Author(s):  
Panagiotis Kouris ◽  
Georgios Alexandridis ◽  
Andreas Stafylopatis

Abstract Nowadays, most research conducted in the field of abstractive text summarization focuses on neural-based models alone, without considering their combination with knowledge-based that could further enhance their efficiency. In this direction, this work presents a novel framework that combines sequence to sequence neural-based text summarization along with structure and semantic-based methodologies. The proposed framework is capable of dealing with the problem of out-of-vocabulary or rare words, improving the performance of the deep learning models. The overall methodology is based on a well defined theoretical model of knowledge-based content generalization and deeplearning predictions for generating abstractive summaries. The framework is comprised of three key elements: (i) a pre-processing task, (ii) a machine learning methodology and (iii) a post-processing task. The pre-processing task is a knowledge-based approach, based on ontological knowledge resources, word-sense-disambiguation and namedentity recognition, along with content generalization, that transforms ordinary text into a generalized form. A deep learning model of attentive encoder-decoder architecture, which is expanded to enable a coping and coverage mechanism, as well as reinforcement learning and transformer-based architectures, is trained on a generalized version of text-summary pairs, learning to predict summaries in a generalized form. The post-processing task utilizes knowledge resources, word embeddings, word-sense disambiguation and heuristic algorithms based on text similarity methods in order to transform the generalized version of a predicted summary to a final, humanreadable form. An extensive experimental procedure on three popular datasets evaluates key aspects of the proposed framework, while the obtained results exhibit promising performance, validating the robustness of the proposed approach.


Techno Com ◽  
2017 ◽  
Vol 16 (2) ◽  
pp. 195-207
Author(s):  
Dika Muhammad Fazar ◽  
Nelly Indriani Widiastuti

Text Summarization adalah sebuah proses untuk menghasilkan ringkasan suatu dokumen dengan  tidak menghilangkan informasi utama dari artikel. Ada beberapa metode untuk melakukan peringkasan, seperti metode rantai leksikal atau lexical chain yang memiliki kinerja yang baik untuk dokumen peringkasan dokumen tunggal. Meskipun demikian penggunaan metode lexical chain ini masih memiliki kelemahan yaitu tidak dapat mengidentifikasi kata yang berambigu  dalam pembentukan lexical chain. Dalam penelitian ini, untuk memperbaiki kekurangan tersebut metode lexical chain dilengkapi dengan  word sense disambiguation. Metode lexical chain dengan word sense disambiguation adalah metode pengembangan metode dari lexical chain. Dalam penelitian ini, setiap kata diperiksa ambiguitasnya.  Pemeriksaan ini dilakukan untuk mendapatkan akurasi dalam menentukan lexical chain yang sesuai dengan konteks kalimat. Perbedaan mendasar dari metode lexical chain dengan word sense disambiguation  dibandingkan dengan lexical chain, yaitu metode lexical chain dengan pengecekan word sense disambiguation sebelum menentukan rantai ambiguitas leksikal sedangan metode asli lexical chain ambiguitas tidak diperiksa. Metode lexical chain dengan word sense disambiguation diterapkan pada peringkasan artikel kesehatan. Dari hasil penelitian ini disimpulkan bahwa metode lexical chain dengan word sense disambiguation dapat memperbaiki kelemahan dalam metode sebelumnya karena metode lexical chain dengan word sense disambiguation dapat mengidentifikasi ambiguitas dan menghasilkan akurasi ringkasan yang lebih baik.


2006 ◽  
Vol 12 (3) ◽  
pp. 209-228 ◽  
Author(s):  
JUDITA PREISS

We compare the word sense disambiguation systems submitted for the English-all-words task in SENSEVAL-2. We give several performance measures for the systems, and analyze correlations between system performance and word features. A decision tree learning algorithm is employed to discover the situations in which systems perform particularly well, and the resulting decision tree is examined. We investigate using a decision tree based on the SENSEVAL systems to (i) filter out senses unlikely to be correct, and to (ii) combine WSD systems. Some combinations created in this way outperform the best SENSEVAL system.


Sign in / Sign up

Export Citation Format

Share Document