Combining Word Embedding and Knowledge-Based Topic Modeling for Entity Summarization

Relation extraction is the underlying critical task of textual understanding. However, the existing methods currently have defects in instance selection and lack background knowledge for entity recognition. In this paper, we propose a knowledge-based attention model, which can make full use of supervised information from a knowledge base, to select an entity. We also design a method of dual convolutional neural networks (CNNs) considering the word embedding of each word is restricted by using a single training tool. The proposed model combines a CNN with an attention mechanism. The model inserts the word embedding and supervised information from the knowledge base into the CNN, performs convolution and pooling, and combines the knowledge base and CNN in the full connection layer. Based on these processes, the model not only obtains better entity representations but also improves the performance of relation extraction with the help of rich background knowledge. The experimental results demonstrate that the proposed model achieves competitive performance.

Download Full-text

Transportation sentiment analysis using word embedding and ontology-based topic modeling

Knowledge-Based Systems ◽

10.1016/j.knosys.2019.02.033 ◽

2019 ◽

Vol 174 ◽

pp. 27-42 ◽

Cited By ~ 30

Author(s):

Farman Ali ◽

Daehan Kwak ◽

Pervez Khan ◽

Shaker El-Sappagh ◽

Amjad Ali ◽

...

Keyword(s):

Sentiment Analysis ◽

Topic Modeling ◽

Word Embedding

Download Full-text

Incorporating Biterm Correlation Knowledge into Topic Modeling for Short Texts

The Computer Journal ◽

10.1093/comjnl/bxaa079 ◽

2020 ◽

Author(s):

Kai Zhang ◽

Yuan Zhou ◽

Zheng Chen ◽

Yufei Liu ◽

Zhuo Tang ◽

...

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Semantic Knowledge ◽

Superior Performance ◽

Knowledge Based ◽

Modeling Process ◽

Proposed Model ◽

Benchmark Datasets ◽

Latent Topic

Abstract The prevalence of short texts on the Web has made mining the latent topic structures of short texts a critical and fundamental task for many applications. However, due to the lack of word co-occurrence information induced by the content sparsity of short texts, it is challenging for traditional topic models like latent Dirichlet allocation (LDA) to extract coherent topic structures on short texts. Incorporating external semantic knowledge into the topic modeling process is an effective strategy to improve the coherence of inferred topics. In this paper, we develop a novel topic model—called biterm correlation knowledge-based topic model (BCK-TM)—to infer latent topics from short texts. Specifically, the proposed model mines biterm correlation knowledge automatically based on recent progress in word embedding, which can represent semantic information of words in a continuous vector space. To incorporate external knowledge, a knowledge incorporation mechanism is designed over the latent topic layer to regularize the topic assignment of each biterm during the topic sampling process. Experimental results on three public benchmark datasets illustrate the superior performance of the proposed approach over several state-of-the-art baseline models.

Download Full-text

R-LDA: Profiling RDF Datasets Using Knowledge-Based Topic Modeling

2019 IEEE 13th International Conference on Semantic Computing (ICSC) ◽

10.1109/icosc.2019.8665510 ◽

2019 ◽

Cited By ~ 1

Author(s):

Seyedamin Pouriyeh ◽

Mehdi Allahyaril ◽

Gong Cheng ◽

Hamid Reza Arabnia ◽

Krys Kochut ◽

...

Keyword(s):

Topic Modeling ◽

Knowledge Based

Download Full-text

Supervised Learning and Knowledge-Based Approaches Applied to Biomedical Word Sense Disambiguation

Journal of Integrative Bioinformatics ◽

10.1515/jib-2017-0051 ◽

2017 ◽

Vol 14 (4) ◽

Cited By ~ 1

Author(s):

Rui Antunes ◽

Sérgio Matos

Keyword(s):

Word Sense Disambiguation ◽

Word Embedding ◽

Biomedical Text Mining ◽

Bag Of Words ◽

Word Sense ◽

Word Embeddings ◽

Global Features ◽

Knowledge Based ◽

Sense Disambiguation ◽

Averaging Functions

AbstractWord sense disambiguation (WSD) is an important step in biomedical text mining, which is responsible for assigning an unequivocal concept to an ambiguous term, improving the accuracy of biomedical information extraction systems. In this work we followed supervised and knowledge-based disambiguation approaches, with the best results obtained by supervised means. In the supervised method we used bag-of-words as local features, and word embeddings as global features. In the knowledge-based method we combined word embeddings, concept textual definitions extracted from the UMLS database, and concept association values calculated from the MeSH co-occurrence counts from MEDLINE articles. Also, in the knowledge-based method, we tested different word embedding averaging functions to calculate the surrounding context vectors, with the goal to give more importance to closest words of the ambiguous term. The MSH WSD dataset, the most common dataset used for evaluating biomedical concept disambiguation, was used to evaluate our methods. We obtained a top accuracy of 95.6 % by supervised means, while the best knowledge-based accuracy was 87.4 %. Our results show that word embedding models improved the disambiguation accuracy, proving to be a powerful resource in the WSD task.

Download Full-text

Topic Modeling of Short Texts: A Pseudo-Document View with Word Embedding Enhancement

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2021.3073195 ◽

2021 ◽

pp. 1-1

Author(s):

Yuan Zuo ◽

Congrui Li ◽

Hao Lin ◽

Junjie Wu

Keyword(s):

Topic Modeling ◽

Word Embedding

Download Full-text

Data-driven Pattern Analysis of Acknowledgments in the Biomedical Domain

Data and Information Management ◽

10.1515/dim-2017-0002 ◽

2017 ◽

Vol 1 (1) ◽

pp. 35-48

Author(s):

Juyoung An ◽

Sieun Jeon ◽

Teryn Jones ◽

Min Song

Keyword(s):

Network Analysis ◽

Topic Modeling ◽

Pattern Analysis ◽

Word Embedding ◽

Pubmed Central ◽

Original Dataset ◽

Central Database ◽

The Difference ◽

Basic Network ◽

Specific Country

AbstractOur motivation for conducting this research is driven by the lack of studies focusing on the acknowledgments sections of published papers. Another motivation is the lack of a study examining the countries and organizations mentioned in the acknowledgments section and their influence—something that cannot be analyzed using a citation or co-authorship relationship. Concentrating on the qualitative aspects of acknowledgments has been limited because of the atypical pattern of the acknowledgment section. Our research aims to identify useful information hidden within the acknowledgment sections of the articles stored in the PubMed Central database and to analyze a map of influence via a country-acknowledgment network. To solve the problems, we use the topic modeling to analyze topics of acknowledgments and conduct a basic network analysis to find the difference in the co-the country network and acknowledgment network. A word-embedding model is used to compare the semantic similarity that exists between the authors and countries extracted from our original dataset. The result of topic modeling suggests that funding has become a critical topic in acknowledgments. The results of network analysis indicate that some large countries work as hubs in terms of both implicitly and explicitly while revealing that some countries such as China do not frequently work with other countries. The word-embedding model built by acknowledgments suggests that the authors frequently referenced in acknowledgments are also likely to be referred to in a similar context. It also implies that the publishing country of a paper has little effect on whether it receives an acknowledgment from any other specific country. Through these results, we conclude that the content in acknowledgments extracted from the papers can be divided into two categories—funding and appreciation. We also find that there is no clear relationship between the publication country and the countries mentioned in the acknowledgment section.

Download Full-text

Fusing Knowledge-Based and Data-Driven Techniques for the Identification of Urban Functional Regions

10.20944/preprints201907.0267.v1 ◽

2019 ◽

Author(s):

Emmanuel Papadakis ◽

Song Gao ◽

George Baryannis

Keyword(s):

Los Angeles ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Expert Knowledge ◽

Urban Setting ◽

Data Driven ◽

Knowledge Based ◽

Functional Regions ◽

Planning And Design ◽

Urban Planning And Design

The problem of identifying functional regions in an urban setting has been approached in literature using two general methodologies: top-down, encoding expert knowledge on urban planning and design (e.g. into patterns) and using that knowledge for identification, and bottom-up, relying on crowdsourcing and Volunteered Geographic Information (VGI) to train learning models, using techniques such as Latent Dirichlet Allocation (LDA) topic modeling. Both approaches have their advantages but also face important limitations, with knowledge-based approaches being criticized for scalability and transferability issues and data-driven approaches for lacking interpretability and depending heavily on data quality. To mitigate these disadvantages, we propose a novel framework that fuses data and knowledge in three different ways: functional regions identified from individual approaches are evaluated against each other, knowledge from patterns is used to adjust learning model results and topic models are used to adjust pattern-based results. The proposed methodologies are demonstrated through the use case of identifying shopping-related functional regions in the Los Angeles metropolitan area. Results show that the combination of results from knowledge-based and data-driven techniques can help uncover discrepancies between the two different approaches and smoothen inaccuracies caused by the limitations of each approach.

Download Full-text

Combining Design Patterns and Topic Modeling to Discover Regions That Support Particular Functionality

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8090385 ◽

2019 ◽

Vol 8 (9) ◽

pp. 385 ◽

Cited By ~ 1

Author(s):

Emmanuel Papadakis ◽

Song Gao ◽

George Baryannis

Keyword(s):

Los Angeles ◽

Topic Modeling ◽

Design Patterns ◽

Latent Dirichlet Allocation ◽

Expert Knowledge ◽

Urban Setting ◽

Data Driven ◽

Knowledge Based ◽

Functional Regions ◽

Data Driven Approach

The problem of discovering regions that support particular functionalities in an urban setting has been approached in literature using two general methodologies: top-down, encoding expert knowledge on urban planning and design and discovering regions that conform to that knowledge; and bottom-up, using data to train machine learning models, which can discover similar regions. Both methodologies face limitations, with knowledge-based approaches being criticized for scalability and transferability issues and data-driven approaches for lacking interpretability and depending heavily on data quality. To mitigate these disadvantages, we propose a novel framework that fuses a knowledge-based approach using design patterns and a data-driven approach using latent Dirichlet allocation (LDA) topic modeling in three different ways: Functional regions discovered using either approach are evaluated against each other to identify cases of significant agreement or disagreement; knowledge from patterns is used to adjust topic probabilities in the learning model; and topic probabilities are used to adjust pattern-based results. The proposed methodologies are demonstrated through the use case of identifying shopping-related regions in the Los Angeles metropolitan area. Results show that the combination of pattern-based discovery and topic modeling extraction helps uncover discrepancies between the two approaches and smooth inaccuracies caused by the limitations of each approach.

Download Full-text