Topic Modelling on Pharmaceutical Incident Data

Deepu Dileep; Soumya Rudraraju; V. V. HaraGopal

doi:10.24018/ejmath.2021.2.3.33

Topic Modelling on Pharmaceutical Incident Data

European Journal of Mathematics and Statistics ◽

10.24018/ejmath.2021.2.3.33 ◽

2021 ◽

Vol 2 (3) ◽

pp. 92-96

Author(s):

Deepu Dileep ◽

Soumya Rudraraju ◽

V. V. HaraGopal

Keyword(s):

Pharmaceutical Industry ◽

Key Words ◽

Latent Dirichlet Allocation ◽

Topic Modelling ◽

Probability Of Occurrence ◽

Proposed Model ◽

Textual Data ◽

Incident Data ◽

Dirichlet Allocation

Focus of the current study is to explore and analyse textual data in the form of incidents in pharmaceutical industry using topic modelling. Topic modelling applied in the current study is based on Latent Dirichlet Allocation. The proposed model is applied on a corpus containing 190 incidents to retrieve key words with highest probability of occurrence. It is used to form informative topics related to incidents.

Download Full-text

Topic Modelling Twitter Data with Latent Dirichlet Allocation Method

2019 International Conference on Electrical Engineering and Computer Science (ICECOS) ◽

10.1109/icecos47637.2019.8984523 ◽

2019 ◽

Cited By ~ 1

Author(s):

Edi Surya Negara ◽

Dendi Triadi ◽

Ria Andryani

Keyword(s):

Latent Dirichlet Allocation ◽

Topic Modelling ◽

Twitter Data ◽

Allocation Method ◽

Dirichlet Allocation

Download Full-text

Topic Modelling: A Comparison of The Performance of Latent Dirichlet Allocation and LDA2vec Model on Bangla Newspaper

2019 International Conference on Bangla Speech and Language Processing (ICBSLP) ◽

10.1109/icbslp47725.2019.202047 ◽

2019 ◽

Author(s):

Md. Hasan ◽

Md. Motaher Hossain ◽

Adnan Ahmed ◽

Mohammad Shahidur Rahman

Keyword(s):

Latent Dirichlet Allocation ◽

Topic Modelling ◽

Dirichlet Allocation

Download Full-text

Ensemble Methods for Improving Classification of Data Produced by Latent Dirichlet Allocation

Computer Science and Mathematical Modelling ◽

10.5604/01.3001.0013.1458 ◽

2019 ◽

Vol 0 (8/2018) ◽

pp. 17-28

Author(s):

Maciej Jankowski

Keyword(s):

Text Analysis ◽

Large Scale ◽

Latent Dirichlet Allocation ◽

Previous Analysis ◽

Ensemble Methods ◽

Topic Modelling ◽

New Methods ◽

Data Scientist ◽

Dirichlet Allocation

Topic models are very popular methods of text analysis. The most popular algorithm for topic modelling is LDA (Latent Dirichlet Allocation). Recently, many new methods were proposed, that enable the usage of this model in large scale processing. One of the problem is, that a data scientist has to choose the number of topics manually. This step, requires some previous analysis. A few methods were proposed to automatize this step, but none of them works very well if LDA is used as a preprocessing for further classification. In this paper, we propose an ensemble approach which allows us to use more than one model at prediction phase, at the same time, reducing the need of finding a single best number of topics. We have also analyzed a few methods of estimating topic number.

Download Full-text

Analisis Trending Topik untuk Percakapan Media Sosial dengan Menggunakan Topic Modelling Berbasis Algoritme LDA

Journal of Intelligent System and Computation ◽

10.52985/insyst.v2i1.150 ◽

2021 ◽

Vol 2 (1) ◽

pp. 12-19

Author(s):

Ahmad Syaifuddin ◽

Reddy Alexandro Harianto ◽

Joan Santoso

Keyword(s):

Latent Dirichlet Allocation ◽

Topic Modelling ◽

Human In The Loop ◽

Text Preprocessing ◽

F Measure ◽

Bahasa Indonesia ◽

Dirichlet Allocation

Aplikasi WhatsApp merupakan salah satu aplikasi chatting yang sangat populer terutama di Indonesia. WhatsApp mempunyai data unik karena memiliki pola pesan dan topik yang beragam dan sangat cepat berubah, sehingga untuk mengidentifikasi suatu topik dari kumpulan pesan tersebut sangat sulit dan menghabiskan banyak waktu jika dilakukan secara manual. Salah satu cara untuk mendapatkan informasi tersirat dari media sosial tersebut yaitu dengan melakukan pemodelan topik. Penelitian ini dilakukan untuk menganalisis penerapan metode LDA (Latent Dirichlet Allocation) dalam mengidentifikasi topik apa saja yang sedang dibahas pada grup WhatsApp di Universitas Islam Majapahit serta melakukan eksperimen pemodelan topik dengan menambahkan atribut waktu dalam penyusunan dokumen. Penelitian ini menghasilkan model topic dan nilai evaluasi f-measure dari model topik berdasarkan uji coba yang dilakukan. Metode LDA dipilih untuk melakukan pemodelan topik dengan memanfaatkan library LDA pada python serta menerapkan standar text-preprocessing dan menambahkan slang words removal untuk menangani kata tidak baku dan singkatan pada chat logs. Pengujian model topik dilakukan dengan uji human in the loop menggunakan word instrusion task kepada pakar Bahasa Indonesia. Hasil evaluasi LDA didapatkan hasil percobaan terbaik dengan mengubah dokumen menjadi 10 menit dan menggabungkan dengan reply chat pada percakapan grup WhatsApp merupakan salah satu cara dalam meningkatkan hasil pemodelan topik menggunakan algoritma Latent Dirichlet Allocation (LDA), didapatkan nilai precision sebesar 0.9294, nilai recall sebesar 0.7900 dan nilai f-measure sebesar 0.8541.

Download Full-text

Two-stage topic modelling of scientific publications: A case study of University of Nairobi, Kenya

PLoS ONE ◽

10.1371/journal.pone.0243208 ◽

2021 ◽

Vol 16 (1) ◽

pp. e0243208

Author(s):

Leacky Muchene ◽

Wende Safari

Keyword(s):

Hierarchical Clustering ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Topic Modelling ◽

Two Stage ◽

Scientific Publications ◽

Statistical Tool ◽

Second Stage ◽

The University ◽

Dirichlet Allocation

Unsupervised statistical analysis of unstructured data has gained wide acceptance especially in natural language processing and text mining domains. Topic modelling with Latent Dirichlet Allocation is one such statistical tool that has been successfully applied to synthesize collections of legal, biomedical documents and journalistic topics. We applied a novel two-stage topic modelling approach and illustrated the methodology with data from a collection of published abstracts from the University of Nairobi, Kenya. In the first stage, topic modelling with Latent Dirichlet Allocation was applied to derive the per-document topic probabilities. To more succinctly present the topics, in the second stage, hierarchical clustering with Hellinger distance was applied to derive the final clusters of topics. The analysis showed that dominant research themes in the university include: HIV and malaria research, research on agricultural and veterinary services as well as cross-cutting themes in humanities and social sciences. Further, the use of hierarchical clustering in the second stage reduces the discovered latent topics to clusters of homogeneous topics.

Download Full-text

Estimating News Coverage Patterns using Latent Dirichlet Allocation (LDA)

Vol 3 No 2 - Sukkur IBA Journal of Emerging Technologies ◽

10.30537/sjet.v1i1.142 ◽

2018 ◽

Vol 1 (1) ◽

pp. 51-56

Author(s):

Naeem Ahmed Mahoto

Keyword(s):

Knowledge Discovery ◽

Probabilistic Model ◽

Data Model ◽

Latent Dirichlet Allocation ◽

News Coverage ◽

Multidimensional Data ◽

Large Collection ◽

Allocation Pattern ◽

Textual Data ◽

Dirichlet Allocation

The growing rate of unstructured textual data has made an open challenge for the knowledge discovery, which aims extracting desired information from large collection of data. This study presents a system to derive news coverage patterns with the help of probabilistic model – Latent Dirichlet Allocation. Pattern is an arrangement of words within collected data that more likely appear together in certain context. The news coverage patterns have been computed as number function of news articles comprising of such patterns. A prototype, as a proof, has been developed to estimate the news coverage patterns for a newspaper – The Dawn. Analyzing the news coverage patterns from different aspects has been carried out using multidimensional data model. Further, the extracted news coverage patterns are illustrated by visual graphs to yield in-depth understanding of the topics, which have been covered in the news. The results also assist in identification of schema related to newspaper and journalists’ articles.

Download Full-text

A Latent Allocation Model for Brand Awareness and Mindset Metrics

International Journal of Market Research ◽

10.1177/14707853211040052 ◽

2021 ◽

pp. 147078532110400

Author(s):

Pablo Marshall

Keyword(s):

Personality Traits ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Brand Image ◽

Brand Awareness ◽

Bag Of Words ◽

Attribute Importance ◽

Allocation Model ◽

Proposed Model ◽

Dirichlet Allocation

Mindset metrics, the measurement of consumers’ perceptions, attitudes, and intentions, have a long tradition in marketing, particularly in advertising and branding. Some of the most usual mindset metrics are brand awareness, brand image, personality traits, and attribute importance. Brand awareness and other mindset measures have the form of texts (bag of words). And, a natural methodology for analyzing these variables is topic modeling and the popular Latent Dirichlet allocation (LDA) model. The LDA methodology assumes that brands or concepts are represented by clusters of brands in consumers’ minds. This study proposes an extension/modification of the LDA model for brand awareness and other mindset variables that incorporate Bernoulli observations instead of the Multinomial specification present in the usual LDA specification. This extension is relevant since, unlike words in texts, brands and mindset concepts are not repeated within a document and have a dichotomous form, present or absent. The proposed model is applied to two brand awareness datasets. The results show significant gains in both managerial insights in analyzing brand clusters and consumers’ profiles.

Download Full-text

Topic Modelling Extraction of “Mann Ki Baat”

European Journal of Mathematics and Statistics ◽

10.24018/ejmath.2021.2.1.11 ◽

2021 ◽

Vol 2 (1) ◽

pp. 1-12

Author(s):

Mounika Kandukuri ◽

HaraGopal V.V.

Keyword(s):

Public Relations ◽

Data Analytics ◽

Latent Dirichlet Allocation ◽

Statistical Modelling ◽

Analytical Framework ◽

Conventional Technique ◽

Topic Modelling ◽

Textual Data ◽

Public Relations Campaign ◽

Radio Programme

The purpose of this study is to give an insight about Textual Data analytics and its application in the analysis of unique public relations campaign”Mann Ki Baat” that was initiated by incumbent Prime Minister of India,honourable “Mr.Narendra Modi” which was initially aired on All India Radio Programme on Vijaya Dashami on October 3rd , 2014 followed by second on November 2nd, 2014 of the same year till December 2019. In this paper, an analytical framework is designed using a powerful technique of textual data analytics “Topic Modelling based on LDA (Latent Dirichlet Allocation)” to accomplish the study. The proposed framework is applied to the corpus of 60 episodes(October 2014 to December 2019) of Mann ki Baat gathered from PMindia website and was analyzed in greater detail. The terms used frequently and recurrence of the topics spoken in his popular monthly radio address program were determined and analyzed from both in statistical and dynamic perspectives.In this context the present study is a first approach of application under the conventional technique “topic modelling” on Mann Ki Baat.Further, this is the principal endeavour to excerpt the themes discussed in radio programme using statistical modelling.

Download Full-text