sentence extraction
Recently Published Documents


TOTAL DOCUMENTS

108
(FIVE YEARS 23)

H-INDEX

14
(FIVE YEARS 1)

2021 ◽  
Author(s):  
G. Vijay Kumar ◽  
Arvind Yadav ◽  
B. Vishnupriya ◽  
M. Naga Lahari ◽  
J. Smriti ◽  
...  

In this era everything is digitalized we can find a large amount of digital data for different purposes on the internet and relatively it’s very hard to summarize this data manually. Automatic Text Summarization (ATS) is the subsequent big one that could simply summarize the source data and give us a short version that could preserve the content and the overall meaning. While the concept of ATS is started long back in 1950’s, this field is still struggling to give the best and efficient summaries. ATS proceeds towards 2 methods, Extractive and Abstractive Summarization. The Extractive and Abstractive methods had a process to improve text summarization technique. Text Summarization is implemented with NLP due to packages and methods in Python. Different approaches are present for summarizing the text and having few algorithms with which we can implement it. Text Rank is what to extractive text summarization and it is an unsupervised learning. Text Rank algorithm also uses undirected graphs, weighted graphs. keyword extraction, sentence extraction. So, in this paper, a model is made to get better result in text summarization with Genism library in NLP. This method improves the overall meaning of the phrase and the person reading it can understand in a better way.


2021 ◽  
Author(s):  
Qianying Wang ◽  
Jing Liao ◽  
Mirella Lapata ◽  
Malcolm Macleod

Objective: We sought to apply natural language processing to the task of automatic risk of bias assessment in preclinical literature, which could speed the process of systematic review, provide information to guide research improvement activity, and support translation from preclinical to clinical research. Materials and Methods: We use 7,840 full-text publications describing animal experiments with yes/no annotations for five risk of bias items. We implement a series of models including baselines (support vector machine, logistic regression, random forest), neural models (convolutional neural network, recurrent neural network with attention, hierarchical neural network) and models using BERT with two strategies (document chunk pooling and sentence extraction). We tune hyperparameters to obtain the highest F1 scores for each risk of bias item on the validation set and compare evaluation results on the test set to our previous regular expression approach. Results: The F1 scores of best models on test set are 82.0% for random allocation, 81.6% for blinded assessment of outcome, 82.6% for conflict of interests, 91.4% for compliance with animal welfare regulations and 46.6% for reporting animals excluded from analysis. Our models significantly outperform regular expressions for four risk of bias items. Conclusion: For random allocation, blinded assessment of outcome, conflict of interests and animal exclusions, neural models achieve good performance, and for animal welfare regulations, BERT model with sentence extraction strategy works better.


2021 ◽  
pp. 016555152199275
Author(s):  
Juryong Cheon ◽  
Youngjoong Ko

Translation language resources, such as bilingual word lists and parallel corpora, are important factors affecting the effectiveness of cross-language information retrieval (CLIR) systems. In particular, when large domain-appropriate parallel corpora are not available, developing an effective CLIR system is particularly difficult. Furthermore, creating a large parallel corpus is costly and requires considerable effort. Therefore, we here demonstrate the construction of parallel corpora from Wikipedia as well as improved query translation, wherein the queries are used for a CLIR system. To do so, we first constructed a bilingual dictionary, termed WikiDic. Then, we evaluated individual language resources and combinations of them in terms of their ability to extract parallel sentences; the combinations of our proposed WikiDic with the translation probability from the Web’s bilingual example sentence pairs and WikiDic was found to be best suited to parallel sentence extraction. Finally, to evaluate the parallel corpus generated from this best combination of language resources, we compared its performance in query translation for CLIR to that of a manually created English–Korean parallel corpus. As a result, the corpus generated by our proposed method achieved a better performance than did the manually created corpus, thus demonstrating the effectiveness of the proposed method for automatic parallel corpus extraction. Not only can the method demonstrated herein be used to inform the construction of other parallel corpora from language resources that are readily available, but also, the parallel sentence extraction method will naturally improve as Wikipedia continues to be used and its content develops.


Information ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 41
Author(s):  
K. Manju ◽  
S. David Peter ◽  
Sumam Idicula

Automatic extractive text summarization retrieves a subset of data that represents most notable sentences in the entire document. In the era of digital explosion, which is mostly unstructured textual data, there is a demand for users to understand the huge amount of text in a short time; this demands the need for an automatic text summarizer. From summaries, the users get the idea of the entire content of the document and can decide whether to read the entire document or not. This work mainly focuses on generating a summary from multiple news documents. In this case, the summary helps to reduce the redundant news from the different newspapers. A multi-document summary is more challenging than a single-document summary since it has to solve the problem of overlapping information among sentences from different documents. Extractive text summarization yields the sensitive part of the document by neglecting the irrelevant and redundant sentences. In this paper, we propose a framework for extracting a summary from multiple documents in the Malayalam Language. Also, since the multi-document summarization data set is sparse, methods based on deep learning are difficult to apply. The proposed work discusses the performance of existing standard algorithms in multi-document summarization of the Malayalam Language. We propose a sentence extraction algorithm that selects the top ranked sentences with maximum diversity. The system is found to perform well in terms of precision, recall, and F-measure on multiple input documents.


Author(s):  
Chantana Chantrapornchai ◽  
Aphisit Tunsakul

In this paper, we present two methodologies to extract particular information based on the full text returned from the search engine to facilitate the users. The approaches are based three tasks: name entity recognition (NER), text classification and text summarization. The first step is the building training data and data cleansing. We consider tourism domain such as restaurant, hotels, shopping and tourism data set crawling from the websites. First, the tourism data are gathered and the vocabularies are built. Several minor steps include sentence extraction, relation and name entity extraction for tagging purpose. These steps are needed for creating proper training data. Then, the recognition model of a given entity type can be built. From the experiments, given review texts, we demonstrate to build the model to extract the desired entity,i.e, name, location, facility as well as relation type, classify the reviews or summarize the reviews. Two tools, SpaCy and BERT, are used to compare the performance of these tasks.


Author(s):  
Phong Nguyen-Thuan Do ◽  
Nhat Duy Nguyen ◽  
Tin Van Huynh ◽  
Kiet Van Nguyen ◽  
Anh Gia-Tuan Nguyen ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document