Author(s):  
Ajeet Ram Pathak ◽  
Manjusha Pandey ◽  
Siddharth Rautaray

Background: The large amount of data emanated from social media platforms need scalable topic modeling in order to get current trends and themes of events discussed on such platforms. Topic modeling play crucial role in many natural language processing applications like sentiment analysis, recommendation systems, event tracking, summarization, etc. Objectives: The aim of the proposed work is to adaptively extract the dynamically evolving topics over streaming data, and infer the current trends and get the notion of trend of topics over time. Because of various world level events, many uncorrelated streaming channels tend to start discussion on similar topics. We aim to find the effect of uncorrelated streaming channels on topic modeling when they tend to start discussion on similar topics. Method: An adaptive framework for dynamic and temporal topic modeling using deep learning has been put forth in this paper. The framework approximates online latent semantic indexing constrained by regularization on streaming data using adaptive learning method. The framework is designed using deep layers of feedforward neural network. Results: This framework supports dynamic and temporal topic modeling. The proposed approach is scalable to large collection of data. We have performed exploratory data analysis and correspondence analysis on real world Twitter dataset. Results state that our approach works well to extract topic topics associated with a given hashtag. Given the query, the approach is able to extract both implicit and explicit topics associated with the terms mentioned in the query. Conclusion: The proposed approach is a suitable solution for performing topic modeling over Big Data. We are approximating the Latent Semantic Indexing model with regularization using deep learning with differentiable ℓ1 regularization, which makes the model work on streaming data adaptively at real-time. The model also supports the extraction of aspects from sentences based on interrelation of topics and thus, supports aspect modeling in aspect-based sentiment analysis.


2008 ◽  
Vol 7 (1) ◽  
pp. 182-191 ◽  
Author(s):  
Sebastian Klie ◽  
Lennart Martens ◽  
Juan Antonio Vizcaíno ◽  
Richard Côté ◽  
Phil Jones ◽  
...  

2011 ◽  
Vol 181-182 ◽  
pp. 830-835
Author(s):  
Min Song Li

Latent Semantic Indexing(LSI) is an effective feature extraction method which can capture the underlying latent semantic structure between words in documents. However, it is probably not the most appropriate for text categorization to use the method to select feature subspace, since the method orders extracted features according to their variance,not the classification power. We proposed a method based on support vector machine to extract features and select a Latent Semantic Indexing that be suited for classification. Experimental results indicate that the method improves classification performance with more compact representation.


2021 ◽  
Vol 12 (4) ◽  
pp. 169-185
Author(s):  
Saida Ishak Boushaki ◽  
Omar Bendjeghaba ◽  
Nadjet Kamel

Clustering is an important unsupervised analysis technique for big data mining. It finds its application in several domains including biomedical documents of the MEDLINE database. Document clustering algorithms based on metaheuristics is an active research area. However, these algorithms suffer from the problems of getting trapped in local optima, need many parameters to adjust, and the documents should be indexed by a high dimensionality matrix using the traditional vector space model. In order to overcome these limitations, in this paper a new documents clustering algorithm (ASOS-LSI) with no parameters is proposed. It is based on the recent symbiotic organisms search metaheuristic (SOS) and enhanced by an acceleration technique. Furthermore, the documents are represented by semantic indexing based on the famous latent semantic indexing (LSI). Conducted experiments on well-known biomedical documents datasets show the significant superiority of ASOS-LSI over five famous algorithms in terms of compactness, f-measure, purity, misclassified documents, entropy, and runtime.


Sign in / Sign up

Export Citation Format

Share Document