Paradigmatic and syntagmatic rule extraction for lifelong machine learning topic models

Author(s):  
Muhammad Taimoor Khan ◽  
Shehzad Khalid
2001 ◽  
Vol 136 (1-4) ◽  
pp. 109-133 ◽  
Author(s):  
Hisao Ishibuchi ◽  
Tomoharu Nakashima ◽  
Tadahiko Murata

Author(s):  
Carlo Schwarz

In this article, I introduce the ldagibbs command, which implements latent Dirichlet allocation in Stata. Latent Dirichlet allocation is the most popular machine-learning topic model. Topic models automatically cluster text documents into a user-chosen number of topics. Latent Dirichlet allocation represents each document as a probability distribution over topics and represents each topic as a probability distribution over words. Therefore, latent Dirichlet allocation provides a way to analyze the content of large unclassified text data and an alternative to predefined document classifications.


2019 ◽  
Author(s):  
André Dalmora ◽  
Tiago Tavares

Music lyrics can convey a great part of the meaning in popular songs. Such meaning is important for humans to understand songs as related to typical narratives, such as romantic interests or life stories. This understanding is part of affective aspects that can be used to choose songs to play in particular situations. This paper analyzes the effectiveness of using text mining tools to classify lyrics according to their narrative contexts. For such, we used a vote-based dataset and several machine learning algorithms. Also, we compared the classification results to that of a typical human. Last, we compare the problems of identifying narrative contexts and of identifying lyric valence. Our results indicate that narrative contexts can be identified more consistently than valence. Also, we show that human-based classification typically do not reach a high accuracy, which suggests an upper bound for automatic classification. narrative contexts. For such, we built a dataset containing Brazilian popular music lyrics which were raters voted online according to its context and valence. We approached the problem using a machine learning pipeline in which lyrics are projected into a vector space and then classified using general-purpose algorithms. We experimented with document representations based on sparse topic models [11, 12, 13, 14], which aims to find groups of words that typically appear together in the dataset. Also, we extracted part-of-speech tags for each lyric and used their histogram as features in the classification process.


2020 ◽  
Author(s):  
Benedict Han ◽  
Jinwook Choi

BACKGROUND Predicting the complications of diabetes mellitus from an early stage would be beneficial for its management. Topic modeling is a posterior procedure to estimate semantic objects in a dataset through a statistical approach. The topic model can play the role of a feature set for supervised classification. OBJECTIVE : We performed a study to predict diabetic retinopathy (DMR), diabetic nephropathy (DMN), and non-alcoholic fatty liver disease (NAFLD) from clinical notes using semi-supervised classification based on topic modeling. METHODS : We applied four types of machine learning algorithms for classification: random forest (RF), gradient boosting machine (GBM), support vector machine (SVM), and fully connected artificial neural network (ANN) We reviewed the topic models through statistical analysis to determine whether these topic models are clinically plausible. RESULTS F1 scores were above 0.8 when predicting all kinds of target diseases with all types of classification methods, and above 0.9 using RF or GBM. Hypertension and dyslipidemia seem to be statistically associated with DMR, DMN, and NAFLD. They may be important clues with which we can predict DMR, DMN, and NAFLD. CONCLUSIONS This study showed that complications of diabetes mellitus that are likely to occur later in life can be predicted from the clinical notes of outpatient departments. We believe that this kind of predictive model could be utilized by patients and physicians in outpatient departments as a useful tool, similar to clinical decision support systems.


Sign in / Sign up

Export Citation Format

Share Document