Combining Word Embedding and Knowledge-Based Topic Modeling for Entity Summarization

Author(s):  
Seyedamin Pouriyeh ◽  
Mehdi Allahyari ◽  
Krys Kochut ◽  
Gong Cheng ◽  
Hamid Reza Arabnia
2019 ◽  
Vol 2019 ◽  
pp. 1-10 ◽  
Author(s):  
Jun Li ◽  
Guimin Huang ◽  
Jianheng Chen ◽  
Yabing Wang

Relation extraction is the underlying critical task of textual understanding. However, the existing methods currently have defects in instance selection and lack background knowledge for entity recognition. In this paper, we propose a knowledge-based attention model, which can make full use of supervised information from a knowledge base, to select an entity. We also design a method of dual convolutional neural networks (CNNs) considering the word embedding of each word is restricted by using a single training tool. The proposed model combines a CNN with an attention mechanism. The model inserts the word embedding and supervised information from the knowledge base into the CNN, performs convolution and pooling, and combines the knowledge base and CNN in the full connection layer. Based on these processes, the model not only obtains better entity representations but also improves the performance of relation extraction with the help of rich background knowledge. The experimental results demonstrate that the proposed model achieves competitive performance.


2019 ◽  
Vol 174 ◽  
pp. 27-42 ◽  
Author(s):  
Farman Ali ◽  
Daehan Kwak ◽  
Pervez Khan ◽  
Shaker El-Sappagh ◽  
Amjad Ali ◽  
...  

2020 ◽  
Author(s):  
Kai Zhang ◽  
Yuan Zhou ◽  
Zheng Chen ◽  
Yufei Liu ◽  
Zhuo Tang ◽  
...  

Abstract The prevalence of short texts on the Web has made mining the latent topic structures of short texts a critical and fundamental task for many applications. However, due to the lack of word co-occurrence information induced by the content sparsity of short texts, it is challenging for traditional topic models like latent Dirichlet allocation (LDA) to extract coherent topic structures on short texts. Incorporating external semantic knowledge into the topic modeling process is an effective strategy to improve the coherence of inferred topics. In this paper, we develop a novel topic model—called biterm correlation knowledge-based topic model (BCK-TM)—to infer latent topics from short texts. Specifically, the proposed model mines biterm correlation knowledge automatically based on recent progress in word embedding, which can represent semantic information of words in a continuous vector space. To incorporate external knowledge, a knowledge incorporation mechanism is designed over the latent topic layer to regularize the topic assignment of each biterm during the topic sampling process. Experimental results on three public benchmark datasets illustrate the superior performance of the proposed approach over several state-of-the-art baseline models.


Author(s):  
Seyedamin Pouriyeh ◽  
Mehdi Allahyaril ◽  
Gong Cheng ◽  
Hamid Reza Arabnia ◽  
Krys Kochut ◽  
...  

2017 ◽  
Vol 14 (4) ◽  
Author(s):  
Rui Antunes ◽  
Sérgio Matos

AbstractWord sense disambiguation (WSD) is an important step in biomedical text mining, which is responsible for assigning an unequivocal concept to an ambiguous term, improving the accuracy of biomedical information extraction systems. In this work we followed supervised and knowledge-based disambiguation approaches, with the best results obtained by supervised means. In the supervised method we used bag-of-words as local features, and word embeddings as global features. In the knowledge-based method we combined word embeddings, concept textual definitions extracted from the UMLS database, and concept association values calculated from the MeSH co-occurrence counts from MEDLINE articles. Also, in the knowledge-based method, we tested different word embedding averaging functions to calculate the surrounding context vectors, with the goal to give more importance to closest words of the ambiguous term. The MSH WSD dataset, the most common dataset used for evaluating biomedical concept disambiguation, was used to evaluate our methods. We obtained a top accuracy of 95.6 % by supervised means, while the best knowledge-based accuracy was 87.4 %. Our results show that word embedding models improved the disambiguation accuracy, proving to be a powerful resource in the WSD task.


2017 ◽  
Vol 1 (1) ◽  
pp. 35-48
Author(s):  
Juyoung An ◽  
Sieun Jeon ◽  
Teryn Jones ◽  
Min Song

AbstractOur motivation for conducting this research is driven by the lack of studies focusing on the acknowledgments sections of published papers. Another motivation is the lack of a study examining the countries and organizations mentioned in the acknowledgments section and their influence—something that cannot be analyzed using a citation or co-authorship relationship. Concentrating on the qualitative aspects of acknowledgments has been limited because of the atypical pattern of the acknowledgment section. Our research aims to identify useful information hidden within the acknowledgment sections of the articles stored in the PubMed Central database and to analyze a map of influence via a country-acknowledgment network. To solve the problems, we use the topic modeling to analyze topics of acknowledgments and conduct a basic network analysis to find the difference in the co-the country network and acknowledgment network. A word-embedding model is used to compare the semantic similarity that exists between the authors and countries extracted from our original dataset. The result of topic modeling suggests that funding has become a critical topic in acknowledgments. The results of network analysis indicate that some large countries work as hubs in terms of both implicitly and explicitly while revealing that some countries such as China do not frequently work with other countries. The word-embedding model built by acknowledgments suggests that the authors frequently referenced in acknowledgments are also likely to be referred to in a similar context. It also implies that the publishing country of a paper has little effect on whether it receives an acknowledgment from any other specific country. Through these results, we conclude that the content in acknowledgments extracted from the papers can be divided into two categories—funding and appreciation. We also find that there is no clear relationship between the publication country and the countries mentioned in the acknowledgment section.


Author(s):  
Emmanuel Papadakis ◽  
Song Gao ◽  
George Baryannis

The problem of identifying functional regions in an urban setting has been approached in literature using two general methodologies: top-down, encoding expert knowledge on urban planning and design (e.g. into patterns) and using that knowledge for identification, and bottom-up, relying on crowdsourcing and Volunteered Geographic Information (VGI) to train learning models, using techniques such as Latent Dirichlet Allocation (LDA) topic modeling. Both approaches have their advantages but also face important limitations, with knowledge-based approaches being criticized for scalability and transferability issues and data-driven approaches for lacking interpretability and depending heavily on data quality. To mitigate these disadvantages, we propose a novel framework that fuses data and knowledge in three different ways: functional regions identified from individual approaches are evaluated against each other, knowledge from patterns is used to adjust learning model results and topic models are used to adjust pattern-based results. The proposed methodologies are demonstrated through the use case of identifying shopping-related functional regions in the Los Angeles metropolitan area. Results show that the combination of results from knowledge-based and data-driven techniques can help uncover discrepancies between the two different approaches and smoothen inaccuracies caused by the limitations of each approach.


2019 ◽  
Vol 8 (9) ◽  
pp. 385 ◽  
Author(s):  
Emmanuel Papadakis ◽  
Song Gao ◽  
George Baryannis

The problem of discovering regions that support particular functionalities in an urban setting has been approached in literature using two general methodologies: top-down, encoding expert knowledge on urban planning and design and discovering regions that conform to that knowledge; and bottom-up, using data to train machine learning models, which can discover similar regions. Both methodologies face limitations, with knowledge-based approaches being criticized for scalability and transferability issues and data-driven approaches for lacking interpretability and depending heavily on data quality. To mitigate these disadvantages, we propose a novel framework that fuses a knowledge-based approach using design patterns and a data-driven approach using latent Dirichlet allocation (LDA) topic modeling in three different ways: Functional regions discovered using either approach are evaluated against each other to identify cases of significant agreement or disagreement; knowledge from patterns is used to adjust topic probabilities in the learning model; and topic probabilities are used to adjust pattern-based results. The proposed methodologies are demonstrated through the use case of identifying shopping-related regions in the Los Angeles metropolitan area. Results show that the combination of pattern-based discovery and topic modeling extraction helps uncover discrepancies between the two approaches and smooth inaccuracies caused by the limitations of each approach.


Sign in / Sign up

Export Citation Format

Share Document