Network Pseudohealth Information Recognition Model: An Integrated Architecture of Latent Dirichlet Allocation and Data Block Update

The wanton dissemination of network pseudohealth information has brought great harm to people’s health, life, and property. It is important to detect and identify network pseudohealth information. Based on this, this paper defines the concepts of pseudohealth information, data block, and data block integration, designs an architecture that combines the latent Dirichlet allocation (LDA) algorithm and data block update integration, and proposes the combination algorithm model. In addition, crawler technology is used to crawl the pseudohealth information transmitted on the Sina Weibo platform during the “epidemic situation” from February to March 2020 for the simulation test on the experimental case dataset. The research results show that (1) the LDA model can deeply mine the semantic information of network pseudohealth information, obtain the features of document-topic distribution, and classify and train topic features as input variables; (2) the dataset partitioning method can effectively block data according to the text attributes and class labels of network pseudohealth information and can accurately classify and integrate the block data through the data block reintegration method; and (3) considering that the combination model has certain limitations on the detection of network pseudohealth information, the support vector machine (SVM) model can extract the granularity content of data blocks in pseudohealth information in real time, thus greatly improving the recognition performance of the combination model.

Download Full-text

Multi-aspect sentiment analysis on netflix application using latent dirichlet allocation and support vector machine methods

JURNAL INFOTEL ◽

10.20895/infotel.v13i3.670 ◽

2021 ◽

Vol 13 (3) ◽

pp. 128-133

Author(s):

Attala Rafid Abelard ◽

Yuliant Sibaroni

Keyword(s):

Support Vector Machine ◽

Sentiment Analysis ◽

Latent Dirichlet Allocation ◽

The Other ◽

Support Vector ◽

Performance Score ◽

Negative Class ◽

Google Play ◽

Dirichlet Allocation

Among many film streaming platforms that have sprung up, Netflix is the platform that has the most subscribers compared to the other platforms. However, not all reviews provided by the Netflix users are good reviews. These reviews will later be analyzed to determine what aspects are reviewed by the users based on reviews written on the Google Play Store, using the Latent Dirichlet Allocation (LDA) method. Then, the classification process using the Support Vector Machine (SVM) method will be carried out to determine whether each of these reviews is included in the positive or negative class (Sentiment Analysis). There are 2 scenarios that were carried out in this study. The first scenario resulted that the best number of LDA topics to be used is 40, and the second scenario resulted that the use of filtering process in the preprocessing stage reduces the score of the f1-score. Thus, this study resulted in the best performance score on LDA and SVM testing with 40 topics, and without running the filtering process with the score of 78.15%.

Download Full-text

Sentiment Analysis Based on The Aspect of Culinary and Restaurant Review Using Latent Dirichlet Allocation and Support Vector Machine to Improve the Profitability of Culinary Business and Restaurant in Surabaya

10.2991/aebmr.k.211226.011 ◽

2022 ◽

Author(s):

Drajad Bima Ajipangestu ◽

Riyanarto Sarno

Keyword(s):

Support Vector Machine ◽

Sentiment Analysis ◽

Latent Dirichlet Allocation ◽

Support Vector ◽

Dirichlet Allocation

Download Full-text

Klasifikasi Laporan Keluhan Pelayanan Publik Berdasarkan Instansi Menggunakan Metode LDA-SVM

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2021863768 ◽

2021 ◽

Vol 8 (6) ◽

pp. 1265

Author(s):

Muhammad Alkaff ◽

Andreyan Rizky Baskara ◽

Irham Maulani

Keyword(s):

Support Vector Machine ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Confusion Matrix ◽

Support Vector ◽

Topic Distribution ◽

The Government ◽

Dirichlet Allocation

Sebuah sistem layanan untuk menyampaikan aspirasi dan keluhan masyarakat terhadap layanan pemerintah Indonesia, bernama Lapor! Pemerintah sudah lama memanfaatkan sistem tersebut untuk menjawab permasalahan masyarakat Indonesia terkait permasalahan birokrasi. Namun, peningkatan volume laporan dan pemilahan laporan yang dilakukan oleh operator dengan membaca setiap keluhan yang masuk melalui sistem menyebabkan sering terjadi kesalahan dimana operator meneruskan laporan tersebut ke instansi yang salah. Oleh karena itu, diperlukan suatu solusi yang dapat menentukan konteks laporan secara otomatis dengan menggunakan teknik Natural Language Processing. Penelitian ini bertujuan untuk membangun klasifikasi laporan secara otomatis berdasarkan topik laporan yang ditujukan kepada instansi yang berwenang dengan menggabungkan metode Latent Dirichlet Allocation (LDA) dan Support Vector Machine (SVM). Proses pemodelan topik untuk setiap laporan dilakukan dengan menggunakan metode LDA. Metode ini mengekstrak laporan untuk menemukan pola tertentu dalam dokumen yang akan menghasilkan keluaran dalam nilai distribusi topik. Selanjutnya, proses klasifikasi untuk menentukan laporan agensi tujuan dilakukan dengan menggunakan SVM berdasarkan nilai topik yang diekstraksi dengan metode LDA. Performa model LDA-SVM diukur dengan menggunakan confusion matrix dengan menghitung nilai akurasi, presisi, recall, dan F1 Score. Hasil pengujian menggunakan teknik split train-test dengan skor 70:30 menunjukkan bahwa model menghasilkan kinerja yang baik dengan akurasi 79,85%, presisi 79,98%, recall 72,37%, dan Skor F1 74,67%. AbstractA service system to convey aspirations and complaints from the public against Indonesia's government services, named Lapor! The Government has used the Government for a long time to answer the problems of the Indonesian people related to bureaucratic problems. However, the increasing volume of reports and the sorting of reports carried out by operators by reading every complaint that comes through the system cause frequent errors where operators forward the reports to the wrong agencies. Therefore, we need a solution that can automatically determine the report's context using Natural Language Processing techniques. This study aims to build automatic report classifications based on report topics addressed to authorized agencies by combining Latent Dirichlet Allocation (LDA) and Support Vector Machine (SVM). The topic-modeling process for each report was carried out using the LDA method. This method extracts reports to find specific patterns in documents that will produce output in topic distribution values. Furthermore, the classification process to determine the report's destination agency carried out using the SVM based on the value of the topics extracted by the LDA method. The LDA-SVM model's performance is measured using a confusion matrix by calculating the value of accuracy, precision, recall, and F1 Score. The test results using the train-test split technique with a 70:30 show that the model produces good performance with 79.85% accuracy, 79.98% precision, 72.37% recall, and 74.67% F1 Score

Download Full-text

Topic Modelling and Clustering of Disaster-Related Tweets using Bilingual Latent Dirichlet Allocation and Incremental Clustering Algorithm with Support Vector Machines for Need Assessment

10.1109/icsecs52883.2021.00041 ◽

2021 ◽

Author(s):

Lady Angelica Buen Guerzo ◽

Hans Aaron O. Kilkenny ◽

Raphael Noel D. Osorio ◽

Andrei Hart E. Villegas ◽

Charmaine S. Ponay

Keyword(s):

Support Vector Machines ◽

Clustering Algorithm ◽

Latent Dirichlet Allocation ◽

Support Vector ◽

Topic Modelling ◽

Incremental Clustering ◽

Need Assessment ◽

Vector Machines ◽

Dirichlet Allocation

Download Full-text

Toward a better fitness club: Evidence from exerciser online rating and review using latent Dirichlet allocation and support vector machine

International Journal of Market Research ◽

10.1177/1470785318770571 ◽

2018 ◽

Vol 61 (1) ◽

pp. 64-76 ◽

Cited By ~ 1

Author(s):

Susan (Sixue) Jia

Keyword(s):

Support Vector Machine ◽

Latent Dirichlet Allocation ◽

Support Vector ◽

Svm Classifier ◽

Data Sets ◽

Fitness Clubs ◽

Online Ratings ◽

Online Rating ◽

Quantitative Rating ◽

Dirichlet Allocation

Fitness clubs have never ceased searching for quality improvement opportunities to better serve their exercisers, whereas exercisers have been posting online ratings and reviews regarding fitness clubs. Studied together, the quantitative rating and qualitative review can provide a comprehensive depiction of exercisers’ perception of fitness clubs. However, the typological and dimensional discrepancies of online rating and review have hindered the joint study of the two data sets to fully exploit their business value. To this end, this study bridges the gap by examined 53,979 pairs of exerciser online rating and review from 100 fitness clubs in Shanghai, China. Using latent Dirichlet allocation (LDA) based text mining, we identified the 17 major topics on which the exercisers were writing. A support vector machine (SVM) classifier was then employed to establish the rating-review relations, with an accuracy rate of up to 86%. Finally, the relative impact of each topic on exerciser satisfaction was computed and compared by introducing virtual reviews. The significance of this study is that it systematically creates a standardized protocol of mining and correlating the massive structured/quantitative and unstructured/qualitative data available online, which is readily transferable to the other service and product sectors.

Download Full-text

Collecting Cyber Threat Intelligence from Hacker Forums via a Two-Stage, Hybrid Process using Support Vector Machines and Latent Dirichlet Allocation

2018 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata.2018.8622469 ◽

2018 ◽

Cited By ~ 6

Author(s):

Isuf Deliu ◽

Carl Leichter ◽

Katrin Franke

Keyword(s):

Support Vector Machines ◽

Latent Dirichlet Allocation ◽

Hybrid Process ◽

Support Vector ◽

Two Stage ◽

Threat Intelligence ◽

Vector Machines ◽

Cyber Threat ◽

Cyber Threat Intelligence ◽

Dirichlet Allocation

Download Full-text

LDA filter: A Latent Dirichlet Allocation preprocess method for Weka

PLoS ONE ◽

10.1371/journal.pone.0241701 ◽

2020 ◽

Vol 15 (11) ◽

pp. e0241701

Author(s):

P. Celard ◽

A. Seara Vieira ◽

E. L. Iglesias ◽

L. Borrajo

Keyword(s):

Latent Dirichlet Allocation ◽

Nearest Neighbors ◽

Support Vector ◽

Classification Algorithms ◽

Text Representation ◽

K Nearest Neighbors ◽

Processing Times ◽

Representation Technique ◽

Similar Accuracy ◽

Dirichlet Allocation

This work presents an alternative method to represent documents based on LDA (Latent Dirichlet Allocation) and how it affects to classification algorithms, in comparison to common text representation. LDA assumes that each document deals with a set of predefined topics, which are distributions over an entire vocabulary. Our main objective is to use the probability of a document belonging to each topic to implement a new text representation model. This proposed technique is deployed as an extension of the Weka software as a new filter. To demonstrate its performance, the created filter is tested with different classifiers such as a Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), and Naive Bayes in different documental corpora (OHSUMED, Reuters-21578, 20Newsgroup, Yahoo! Answers, YELP Polarity, and TREC Genomics 2015). Then, it is compared with the Bag of Words (BoW) representation technique. Results suggest that the application of our proposed filter achieves similar accuracy as BoW but greatly improves classification processing times.

Download Full-text

Using latent Dirichlet allocation to improve text classification performance of support vector machine

2016 IEEE Congress on Evolutionary Computation (CEC) ◽

10.1109/cec.2016.7743935 ◽

2016 ◽

Cited By ~ 1

Author(s):

Yaw-Huei Chen ◽

Shu-Fong Li

Keyword(s):

Support Vector Machine ◽

Text Classification ◽

Latent Dirichlet Allocation ◽

Classification Performance ◽

Support Vector ◽

Dirichlet Allocation

Download Full-text

Vector representation based on a supervised codebook for Nepali documents classification

PeerJ Computer Science ◽

10.7717/peerj-cs.412 ◽

2021 ◽

Vol 7 ◽

pp. e412

Author(s):

Chiranjibi Sitaula ◽

Anish Basnet ◽

Sunil Aryal

Keyword(s):

Latent Dirichlet Allocation ◽

Short Term Memory ◽

Classification Performance ◽

Linguistic Community ◽

Support Vector ◽

Document Representation ◽

Domain Specific ◽

Comparable Accuracy ◽

Representation Method ◽

Class Labels

Document representation with outlier tokens exacerbates the classification performance due to the uncertain orientation of such tokens. Most existing document representation methods in different languages including Nepali mostly ignore the strategies to filter them out from documents before learning their representations. In this article, we propose a novel document representation method based on a supervised codebook to represent the Nepali documents, where our codebook contains only semantic tokens without outliers. Our codebook is domain-specific as it is based on tokens in a given corpus that have higher similarities with the class labels in the corpus. Our method adopts a simple yet prominent representation method for each word, called probability-based word embedding. To show the efficacy of our method, we evaluate its performance in the document classification task using Support Vector Machine and validate against widely used document representation methods such as Bag of Words, Latent Dirichlet allocation, Long Short-Term Memory, Word2Vec, Bidirectional Encoder Representations from Transformers and so on, using four Nepali text datasets (we denote them shortly as A1, A2, A3 and A4). The experimental results show that our method produces state-of-the-art classification performance (77.46% accuracy on A1, 67.53% accuracy on A2, 80.54% accuracy on A3 and 89.58% accuracy on A4) compared to the widely used existing document representation methods. It yields the best classification accuracy on three datasets (A1, A2 and A3) and a comparable accuracy on the fourth dataset (A4). Furthermore, we introduce the largest Nepali document dataset (A4), called NepaliLinguistic dataset, to the linguistic community.

Download Full-text