Topic Modelling: A Comparison of The Performance of Latent Dirichlet Allocation and LDA2vec Model on Bangla Newspaper

Topic models are very popular methods of text analysis. The most popular algorithm for topic modelling is LDA (Latent Dirichlet Allocation). Recently, many new methods were proposed, that enable the usage of this model in large scale processing. One of the problem is, that a data scientist has to choose the number of topics manually. This step, requires some previous analysis. A few methods were proposed to automatize this step, but none of them works very well if LDA is used as a preprocessing for further classification. In this paper, we propose an ensemble approach which allows us to use more than one model at prediction phase, at the same time, reducing the need of finding a single best number of topics. We have also analyzed a few methods of estimating topic number.

Download Full-text

Topic Modelling on Pharmaceutical Incident Data

European Journal of Mathematics and Statistics ◽

10.24018/ejmath.2021.2.3.33 ◽

2021 ◽

Vol 2 (3) ◽

pp. 92-96

Author(s):

Deepu Dileep ◽

Soumya Rudraraju ◽

V. V. HaraGopal

Keyword(s):

Pharmaceutical Industry ◽

Key Words ◽

Latent Dirichlet Allocation ◽

Topic Modelling ◽

Probability Of Occurrence ◽

Proposed Model ◽

Textual Data ◽

Incident Data ◽

Dirichlet Allocation

Focus of the current study is to explore and analyse textual data in the form of incidents in pharmaceutical industry using topic modelling. Topic modelling applied in the current study is based on Latent Dirichlet Allocation. The proposed model is applied on a corpus containing 190 incidents to retrieve key words with highest probability of occurrence. It is used to form informative topics related to incidents.

Download Full-text

Analisis Trending Topik untuk Percakapan Media Sosial dengan Menggunakan Topic Modelling Berbasis Algoritme LDA

Journal of Intelligent System and Computation ◽

10.52985/insyst.v2i1.150 ◽

2021 ◽

Vol 2 (1) ◽

pp. 12-19

Author(s):

Ahmad Syaifuddin ◽

Reddy Alexandro Harianto ◽

Joan Santoso

Keyword(s):

Latent Dirichlet Allocation ◽

Topic Modelling ◽

Human In The Loop ◽

Text Preprocessing ◽

F Measure ◽

Bahasa Indonesia ◽

Dirichlet Allocation

Aplikasi WhatsApp merupakan salah satu aplikasi chatting yang sangat populer terutama di Indonesia. WhatsApp mempunyai data unik karena memiliki pola pesan dan topik yang beragam dan sangat cepat berubah, sehingga untuk mengidentifikasi suatu topik dari kumpulan pesan tersebut sangat sulit dan menghabiskan banyak waktu jika dilakukan secara manual. Salah satu cara untuk mendapatkan informasi tersirat dari media sosial tersebut yaitu dengan melakukan pemodelan topik. Penelitian ini dilakukan untuk menganalisis penerapan metode LDA (Latent Dirichlet Allocation) dalam mengidentifikasi topik apa saja yang sedang dibahas pada grup WhatsApp di Universitas Islam Majapahit serta melakukan eksperimen pemodelan topik dengan menambahkan atribut waktu dalam penyusunan dokumen. Penelitian ini menghasilkan model topic dan nilai evaluasi f-measure dari model topik berdasarkan uji coba yang dilakukan. Metode LDA dipilih untuk melakukan pemodelan topik dengan memanfaatkan library LDA pada python serta menerapkan standar text-preprocessing dan menambahkan slang words removal untuk menangani kata tidak baku dan singkatan pada chat logs. Pengujian model topik dilakukan dengan uji human in the loop menggunakan word instrusion task kepada pakar Bahasa Indonesia. Hasil evaluasi LDA didapatkan hasil percobaan terbaik dengan mengubah dokumen menjadi 10 menit dan menggabungkan dengan reply chat pada percakapan grup WhatsApp merupakan salah satu cara dalam meningkatkan hasil pemodelan topik menggunakan algoritma Latent Dirichlet Allocation (LDA), didapatkan nilai precision sebesar 0.9294, nilai recall sebesar 0.7900 dan nilai f-measure sebesar 0.8541.

Download Full-text

Two-stage topic modelling of scientific publications: A case study of University of Nairobi, Kenya

PLoS ONE ◽

10.1371/journal.pone.0243208 ◽

2021 ◽

Vol 16 (1) ◽

pp. e0243208

Author(s):

Leacky Muchene ◽

Wende Safari

Keyword(s):

Hierarchical Clustering ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Topic Modelling ◽

Two Stage ◽

Scientific Publications ◽

Statistical Tool ◽

Second Stage ◽

The University ◽

Dirichlet Allocation

Unsupervised statistical analysis of unstructured data has gained wide acceptance especially in natural language processing and text mining domains. Topic modelling with Latent Dirichlet Allocation is one such statistical tool that has been successfully applied to synthesize collections of legal, biomedical documents and journalistic topics. We applied a novel two-stage topic modelling approach and illustrated the methodology with data from a collection of published abstracts from the University of Nairobi, Kenya. In the first stage, topic modelling with Latent Dirichlet Allocation was applied to derive the per-document topic probabilities. To more succinctly present the topics, in the second stage, hierarchical clustering with Hellinger distance was applied to derive the final clusters of topics. The analysis showed that dominant research themes in the university include: HIV and malaria research, research on agricultural and veterinary services as well as cross-cutting themes in humanities and social sciences. Further, the use of hierarchical clustering in the second stage reduces the discovered latent topics to clusters of homogeneous topics.

Download Full-text

Topic Modelling and Clustering of Disaster-Related Tweets using Bilingual Latent Dirichlet Allocation and Incremental Clustering Algorithm with Support Vector Machines for Need Assessment

10.1109/icsecs52883.2021.00041 ◽

2021 ◽

Author(s):

Lady Angelica Buen Guerzo ◽

Hans Aaron O. Kilkenny ◽

Raphael Noel D. Osorio ◽

Andrei Hart E. Villegas ◽

Charmaine S. Ponay

Keyword(s):

Support Vector Machines ◽

Clustering Algorithm ◽

Latent Dirichlet Allocation ◽

Support Vector ◽

Topic Modelling ◽

Incremental Clustering ◽

Need Assessment ◽

Vector Machines ◽

Dirichlet Allocation

Download Full-text

Using latent dirichlet allocation for topic modelling in twitter

Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015) ◽

10.1109/icosc.2015.7050858 ◽

2015 ◽

Cited By ~ 5

Author(s):

David Alfred Ostrowski

Keyword(s):

Latent Dirichlet Allocation ◽

Topic Modelling ◽

Dirichlet Allocation

Download Full-text

Fesztivállátogatók véleményeinek számítógéppel támogatott tematikus modellezése – egy kísérlet eredményei Computer-aided topic modelling based on festival-goers’ opinions – results of an experiment

Turizmus Bulletin ◽

10.14267/turbull.2021v21n1.1 ◽

2021 ◽

Vol 21 (1) ◽

pp. 4-12

Author(s):

Mátyás Hinek

Keyword(s):

Qualitative Research ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Computer Algorithm ◽

Topic Modelling ◽

Computer Tools ◽

Computer Aided ◽

Dirichlet Allocation

Tanulmányunkban arra teszünk kísérletet, hogy egy számítógépes algoritmus, a rejtett Dirichlet eloszlást alkalmazó strukturált témamodell (stm) segítségével meghatározzuk a Sziget Fesztivál látogatói által a Facebookon írt vélemények jellemző témáit, és ezeket összevessük egy korábbi kutatásunkban körvonalazott témákkal. A Sziget Fesztivál látogatóinak az elmúlt hét évben angol nyelven írt szöveges véleményei alapján az algoritmus segítségével kilenc témát modelleztünk, melyek tartalma és köre csak részben egyezett meg a korábbi, kvalitatív kutatásunkban azonosított témákkal. Vizsgálatunk legfontosabb eredménye, hogy számítógépes eszközökkel eredményesen vizsgálhatók a látogatói vélemények, ugyanakkor az eredmények minőségét meghatározza a korpusz nagysága, vagyis az elemzett hozzászólások száma és terjedelme. In our study, we attempt to determine the typical topics of opinions written by Sziget Festival visitors on Facebook using structured topic model (stm) computer algorithm and latent Dirichlet allocation, and compare the results with our previous research. Based on written opinions of the visitors of the Sziget Festival in the last seven years, we modelled nine topics. Their content and scope partly matched the topics identified in our previous qualitative research. The most important result of our study is that visitor opinions can be successfully examined with computer tools, but the quality of the results is determined by the size of the corpus, i.e. the number and scope of the analysed posts.

Download Full-text

Indonesians' Song Lyrics Topic Modelling Using Latent Dirichlet Allocation

2018 5th International Conference on Information Science and Control Engineering (ICISCE) ◽

10.1109/icisce.2018.00064 ◽

2018 ◽

Cited By ~ 3

Author(s):

Enrico Laoh ◽

Isti Surjandari ◽

Limisgy Ramadhina Febirautami

Keyword(s):

Latent Dirichlet Allocation ◽

Topic Modelling ◽

Song Lyrics ◽

Dirichlet Allocation

Download Full-text

Topic Modelling of Germas Related Content on Instagram Using Latent Dirichlet Allocation (LDA)

Proceedings of the International Conference on Health and Medical Sciences (AHMS 2020) ◽

10.2991/ahsr.k.210127.060 ◽

2021 ◽

Author(s):

Muhammad Habibi ◽

Adri Priadana ◽

Andika Bayu Saputra ◽

Puji Winar Cahyo

Keyword(s):

Latent Dirichlet Allocation ◽

Topic Modelling ◽

Dirichlet Allocation

Download Full-text

Topic Modelling: A Comparison of The Performance of Latent Dirichlet Allocation and LDA2vec Model on Bangla Newspaper

Topic Modelling Twitter Data with Latent Dirichlet Allocation Method

Ensemble Methods for Improving Classification of Data Produced by Latent Dirichlet Allocation

Topic Modelling on Pharmaceutical Incident Data

Analisis Trending Topik untuk Percakapan Media Sosial dengan Menggunakan Topic Modelling Berbasis Algoritme LDA

Two-stage topic modelling of scientific publications: A case study of University of Nairobi, Kenya

Topic Modelling and Clustering of Disaster-Related Tweets using Bilingual Latent Dirichlet Allocation and Incremental Clustering Algorithm with Support Vector Machines for Need Assessment

Using latent dirichlet allocation for topic modelling in twitter

Fesztivállátogatók véleményeinek számítógéppel támogatott tematikus modellezése – egy kísérlet eredményei Computer-aided topic modelling based on festival-goers’ opinions – results of an experiment

Indonesians' Song Lyrics Topic Modelling Using Latent Dirichlet Allocation

Topic Modelling of Germas Related Content on Instagram Using Latent Dirichlet Allocation (LDA)

Export Citation Format