Interpretable Topic Extraction and Word Embedding Learning Using Row-Stochastic DEDICOM

Does deep learning help topic extraction? A kernel k-means clustering method with word embedding

Journal of Informetrics ◽

10.1016/j.joi.2018.09.004 ◽

2018 ◽

Vol 12 (4) ◽

pp. 1099-1117 ◽

Cited By ~ 18

Author(s):

Yi Zhang ◽

Jie Lu ◽

Feng Liu ◽

Qian Liu ◽

Alan Porter ◽

...

Keyword(s):

Deep Learning ◽

Word Embedding ◽

Clustering Method ◽

Topic Extraction

Download Full-text

Interpretable Topic Extraction and Word Embedding Learning Using Non-Negative Tensor DEDICOM

Machine Learning and Knowledge Extraction ◽

10.3390/make3010007 ◽

2021 ◽

Vol 3 (1) ◽

pp. 123-167

Author(s):

Lars Hillebrand ◽

David Biesner ◽

Christian Bauckhage ◽

Rafet Sifa

Keyword(s):

New York ◽

Matrix Factorization ◽

New York Times ◽

Extraction Methods ◽

Word Embedding ◽

Topic Extraction ◽

Text Understanding ◽

Text Corpora ◽

Embedding Performance ◽

Information Matrices

Unsupervised topic extraction is a vital step in automatically extracting concise contentual information from large text corpora. Existing topic extraction methods lack the capability of linking relations between these topics which would further help text understanding. Therefore we propose utilizing the Decomposition into Directional Components (DEDICOM) algorithm which provides a uniquely interpretable matrix factorization for symmetric and asymmetric square matrices and tensors. We constrain DEDICOM to row-stochasticity and non-negativity in order to factorize pointwise mutual information matrices and tensors of text corpora. We identify latent topic clusters and their relations within the vocabulary and simultaneously learn interpretable word embeddings. Further, we introduce multiple methods based on alternating gradient descent to efficiently train constrained DEDICOM algorithms. We evaluate the qualitative topic modeling and word embedding performance of our proposed methods on several datasets, including a novel New York Times news dataset, and demonstrate how the DEDICOM algorithm provides deeper text analysis than competing matrix factorization approaches.

Download Full-text

Text Genre Detection Using Doc2Vec Word-embedding Language Model

Language and Information ◽

10.29403/li.23.2.2 ◽

2019 ◽

Vol 23 (2) ◽

pp. 23-43

Author(s):

Dongsung Kim

Keyword(s):

Language Model ◽

Word Embedding ◽

Text Genre

Download Full-text

A Simple Word Embedding Model for Lexical Substitution

10.3115/v1/w15-1501 ◽

2015 ◽

Cited By ~ 13

Author(s):

Oren Melamud ◽

Omer Levy ◽

Ido Dagan

Keyword(s):

Word Embedding ◽

Lexical Substitution

Download Full-text

Word Embedding Based Knowledge Representation with Extracting Relationship Between Scientific Terminologies

Intelligent Automation & Soft Computing ◽

10.31209/2019.100000135 ◽

2019 ◽

pp. -1--1

Author(s):

Mucheol Kim ◽

Junho Kim ◽

Mincheol Shin

Keyword(s):

Knowledge Representation ◽

Word Embedding

Download Full-text

Key phrase Extraction by Improving TextRank with an Integration of Word Embedding and Syntactic Information

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200820155846 ◽

2020 ◽

Vol 13 ◽

Author(s):

Sheng Zhang ◽

Qi Luo ◽

Yukun Feng ◽

Ke Ding ◽

Daniela Gifu ◽

...

Keyword(s):

Semantic Information ◽

Performance Enhancement ◽

Word Embedding ◽

The Other ◽

Test Set ◽

Pagerank Algorithm ◽

Phrase Extraction ◽

Extraction Algorithm ◽

Syntactic Information ◽

Key Phrase Extraction

Background: As a known key phrase extraction algorithm, TextRank is an analogue of PageRank algorithm, which relied heavily on the statistics of term frequency in the manner of co-occurrence analysis. Objective: The frequency-based characteristic made it a neck-bottle for performance enhancement, and various improved TextRank algorithms were proposed in the recent years. Most of improvements incorporated semantic information into key phrase extraction algorithm and achieved improvement. Method: In this research, taking both syntactic and semantic information into consideration, we integrated syntactic tree algorithm and word embedding and put forward an algorithm of Word Embedding and Syntactic Information Algorithm (WESIA), which improved the accuracy of the TextRank algorithm. Results: By applying our method on a self-made test set and a public test set, the result implied that the proposed unsupervised key phrase extraction algorithm outperformed the other algorithms to some extent.

Download Full-text