topic detection
Recently Published Documents


TOTAL DOCUMENTS

494
(FIVE YEARS 110)

H-INDEX

24
(FIVE YEARS 5)

Array ◽  
2022 ◽  
Vol 13 ◽  
pp. 100124
Author(s):  
Hendri Murfi ◽  
Natasha Rosaline ◽  
Nora Hariadi

2022 ◽  
Vol 59 (2) ◽  
pp. 102843
Author(s):  
Jinqing Yang ◽  
Wei Lu ◽  
Jiming Hu ◽  
Shengzhi Huang

AI ◽  
2021 ◽  
Vol 2 (4) ◽  
pp. 578-599
Author(s):  
Fuad Alattar ◽  
Khaled Shaalan

Comparing two sets of documents to identify new topics is useful in many applications, like discovering trending topics from sets of scientific papers, emerging topic detection in microblogs, and interpreting sentiment variations in Twitter. In this paper, the main topic-modeling-based approaches to address this task are examined to identify limitations and necessary enhancements. To overcome these limitations, we introduce two separate frameworks to discover emerging topics through a filtered latent Dirichlet allocation (filtered-LDA) model. The model acts as a filter that identifies old topics from a timestamped set of documents, removes all documents that focus on old topics, and keeps documents that discuss new topics. Filtered-LDA also genuinely reduces the chance of using keywords from old topics to represent emerging topics. The final stage of the filter uses multiple topic visualization formats to improve human interpretability of the filtered topics, and it presents the most-representative document for each topic.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Yiting Zhu

The automatic scoring system of business English essay has been widely used in the field of education, and it is indispensable for the task of off-topic detection of essay. Most of the traditional off-topic detection methods convert text into vector representation of vector space and then calculate the similarity between the text and the correct text to get the off-topic result. However, those methods only focus on the structure of the text, but ignore the semantic association. In addition, the traditional detection method has a low off-topic detection effect for essays with high divergence. In view of the above problems, this paper proposes an off-topic detection method for business English essay based on the deep learning model. Firstly, the word2vec model is used to represent words in sentences as word vectors. And, LDA is used to extract the vector of topic and text, respectively. Then, word vector and topic word vector are spliced together as the input of the convolutional neural network (CNN). CNN is used to extract and screen the features of sentences and perform similarity calculation. When the similarity is less than the threshold, the paper also maps the topic and the subject words in the coupling space and calculates their relevance. Finally, unsupervised off-topic detection is realized by the clustering method. The experimental results show that the off-topic detection method based on the deep learning model can improve the detection accuracy of both the essays with low divergence and the essays with high divergence to a certain extent, especially the essays with high divergence.


2021 ◽  
Author(s):  
E. Elakiya ◽  
R. Kanagaraj ◽  
N. Rajkumar

In every moment, there is a huge capacity of data and information communicated through social network. Analyzing huge amounts of text data is very tedious, time consuming, expensive and manual sorting leads to mistakes and inconsistency. Document dispensation phase is still not accomplished of extracting data as a human reader. Furthermore the significance of content in the text may also differ from one reader to another. The proposed Multiple Spider Hunting Algorithm has been used to diminish the time complexity in compare with single spider move with multiple spiders. The construction of spider is dynamic depends on the volume of a corpus. In some case tokens may related to more than one topic and there is a need to detect Topic on semantic way. Multiple Semantic Spider Hunting Algorithm is proposed based on the semantics among terms and association can be drawn between words using semantic lexicons. Topic or lists of opinions are generated from the knowledge graph. News articles are gathered from five dissimilar topics such as sports, business, education, tourism and media. Usefulness of the proposed algorithms have been calculated based on the factors precision, recall, f-measure, accuracy, true positive, false positive and topic detection percentage. Multiple Semantic Spider Hunting Algorithm produced good result. Topic detection percentage of Spider Hunting Algorithm has been compared to other algorithms Naïve bayes, Neural Network, Decision tree and Particle Swarm Optimization. Spider Hunting Algorithm produced more than 90% precise detection of topic and subtopic.


2021 ◽  
Vol 151 ◽  
pp. 111274
Author(s):  
Meysam Asgari-Chenaghlu ◽  
Mohammad-Reza Feizi-Derakhshi ◽  
Leili farzinvash ◽  
Mohammad-Ali Balafar ◽  
Cina Motamed

Author(s):  
Zehao Yu

Topic detection is a hot issue that many researchers are interested in. The previous researches focused on the single data stream, they did not consider the topic detection from different data streams in a harmonious way, so they cannot detect closely related topics from different data streams. Recently, Twitter, along with other SNS such as Weibo, and Yelp, began backing position services in their texts. Previous approaches are either complex to be conducted or oversimplified that cannot achieve better performance on detecting spatial topics. In our paper, we introduce a probabilistic method which can precisely detect closely related bursty topics and their bursty periods across different data streams in a unified way. We also introduce a probabilistic method called Latent Spatial Events Model (LSEM) that can find areas as well as to detect the spatial events, it can also predict positions of the texts. We evaluate LSEM on different datasets and reflect that our approach outperforms other baseline approaches in different indexes such as perplexity, entropy of topic and KL-divergence, range error. Evaluation of our first proposed approach on different datasets shows that it can detect closely related topics and meaningful bursty time periods from different datasets.


Information ◽  
2021 ◽  
Vol 12 (10) ◽  
pp. 401
Author(s):  
Girma Neshir ◽  
Andreas Rauber ◽  
Solomon Atnafu

Topic Modeling is a statistical process, which derives the latent themes from extensive collections of text. Three approaches to topic modeling exist, namely, unsupervised, semi-supervised and supervised. In this work, we develop a supervised topic model for an Amharic corpus. We also investigate the effect of stemming on topic detection on Term Frequency Inverse Document Frequency (TF-IDF) features, Latent Dirichlet Allocation (LDA) features and a combination of these two feature sets using four supervised machine learning tools, that is, Support Vector Machine (SVM), Naive Bayesian (NB), Logistic Regression (LR), and Neural Nets (NN). We evaluate our approach using an Amharic corpus of 14,751 documents of ten topic categories. Both qualitative and quantitative analysis of results show that our proposed supervised topic detection outperforms with an accuracy of 88% by SVM using state-of-the-art-approach TF-IDF word features with the application of the Synthetic Minority Over-sampling Technique (SMOTE) and with no stemming operation. The results show that text features with stemming slightly improve the performance of the topic classifier over features with no stemming.


Sign in / Sign up

Export Citation Format

Share Document