topic extraction
Recently Published Documents


TOTAL DOCUMENTS

146
(FIVE YEARS 33)

H-INDEX

11
(FIVE YEARS 2)

2021 ◽  
Vol 11 (23) ◽  
pp. 11251
Author(s):  
Shuohua Zhou ◽  
Yanping Zhang

With the outbreak of COVID-19 that has prompted an increased focus on self-care, more and more people hope to obtain disease knowledge from the Internet. In response to this demand, medical question answering and question generation tasks have become an important part of natural language processing (NLP). However, there are limited samples of medical questions and answers, and the question generation systems cannot fully meet the needs of non-professionals for medical questions. In this research, we propose a BERT medical pretraining model, using GPT-2 for question augmentation and T5-Small for topic extraction, calculating the cosine similarity of the extracted topic and using XGBoost for prediction. With augmentation using GPT-2, the prediction accuracy of our model outperforms the state-of-the-art (SOTA) model performance. Our experiment results demonstrate the outstanding performance of our model in medical question answering and question generation tasks, and its great potential to solve other biomedical question answering challenges.


2021 ◽  
Vol 2010 (1) ◽  
pp. 012077
Author(s):  
Cheng Wang ◽  
Ya Zhou

2021 ◽  
Author(s):  
Samuel Miles ◽  
Lixia Yao ◽  
Weilin Meng ◽  
Christopher M. Black ◽  
Zina Ben Miled
Keyword(s):  

2021 ◽  
Vol 25 (2) ◽  
pp. 397-417
Author(s):  
Xiaoling Huang ◽  
Hao Wang ◽  
Lei Li ◽  
Yi Zhu ◽  
Chengxiang Hu

Inferring user interest over large-scale microblogs have attracted much attention in recent years. However, the emergence of the massive data, dynamic change of information and persistence of microblogs pose challenges to interest inference. Most of the existing approaches rarely take into account the combination of these microbloggers’ characteristics within the model, which may incur information loss with nontrivial magnitude in real-time extraction of user interest and massive social data processing. To address these problems, in this paper, we propose a novel User-Networked Interest Topic Extraction in the form of Subgraph Stream (UNITE_SS) for microbloggers’ interest inference. To be specific, we develop several strategies for the construction of subgraph stream to select the better strategy for user interest inference. Moreover, the information of microblogs in each subgraph is utilized to obtain a real-time and effective interest for microbloggers. The experimental evaluation on a large dataset from Sina Weibo, one of the most popular microblogs in China, demonstrates that the proposed approach outperforms the state-of-the-art baselines in terms of precision, mean reciprocal rank (MRR) as well as runtime from the effectiveness and efficiency perspectives.


2021 ◽  
Vol 3 (1) ◽  
pp. 123-167
Author(s):  
Lars Hillebrand ◽  
David Biesner ◽  
Christian Bauckhage ◽  
Rafet Sifa

Unsupervised topic extraction is a vital step in automatically extracting concise contentual information from large text corpora. Existing topic extraction methods lack the capability of linking relations between these topics which would further help text understanding. Therefore we propose utilizing the Decomposition into Directional Components (DEDICOM) algorithm which provides a uniquely interpretable matrix factorization for symmetric and asymmetric square matrices and tensors. We constrain DEDICOM to row-stochasticity and non-negativity in order to factorize pointwise mutual information matrices and tensors of text corpora. We identify latent topic clusters and their relations within the vocabulary and simultaneously learn interpretable word embeddings. Further, we introduce multiple methods based on alternating gradient descent to efficiently train constrained DEDICOM algorithms. We evaluate the qualitative topic modeling and word embedding performance of our proposed methods on several datasets, including a novel New York Times news dataset, and demonstrate how the DEDICOM algorithm provides deeper text analysis than competing matrix factorization approaches.


Author(s):  
Ammar Kamal Abasi ◽  
Ahamad Tajudin Khader ◽  
Mohammed Azmi Al-Betar ◽  
Syibrah Naim ◽  
Sharif Naser Makhadmeh ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document