scholarly journals Topic Modelling on Pharmaceutical Incident Data

2021 ◽  
Vol 2 (3) ◽  
pp. 92-96
Author(s):  
Deepu Dileep ◽  
Soumya Rudraraju ◽  
V. V. HaraGopal

Focus of the current study is to explore and analyse textual data in the form of incidents in pharmaceutical industry using topic modelling. Topic modelling applied in the current study is based on Latent Dirichlet Allocation. The proposed model is applied on a corpus containing 190 incidents to retrieve key words with highest probability of occurrence. It is used to form informative topics related to incidents.

2019 ◽  
Vol 0 (8/2018) ◽  
pp. 17-28
Author(s):  
Maciej Jankowski

Topic models are very popular methods of text analysis. The most popular algorithm for topic modelling is LDA (Latent Dirichlet Allocation). Recently, many new methods were proposed, that enable the usage of this model in large scale processing. One of the problem is, that a data scientist has to choose the number of topics manually. This step, requires some previous analysis. A few methods were proposed to automatize this step, but none of them works very well if LDA is used as a preprocessing for further classification. In this paper, we propose an ensemble approach which allows us to use more than one model at prediction phase, at the same time, reducing the need of finding a single best number of topics. We have also analyzed a few methods of estimating topic number.


2021 ◽  
Vol 2 (1) ◽  
pp. 12-19
Author(s):  
Ahmad Syaifuddin ◽  
Reddy Alexandro Harianto ◽  
Joan Santoso

Aplikasi WhatsApp merupakan salah satu aplikasi chatting yang sangat populer terutama di Indonesia. WhatsApp mempunyai data unik karena memiliki pola pesan dan topik yang beragam dan sangat cepat berubah, sehingga untuk mengidentifikasi suatu topik dari kumpulan pesan tersebut sangat sulit dan menghabiskan banyak waktu jika dilakukan secara manual. Salah satu cara untuk mendapatkan informasi tersirat dari media sosial tersebut yaitu dengan melakukan pemodelan topik. Penelitian ini dilakukan untuk menganalisis penerapan metode LDA (Latent Dirichlet Allocation) dalam mengidentifikasi topik apa saja yang sedang dibahas pada grup WhatsApp di Universitas Islam Majapahit serta melakukan eksperimen pemodelan topik dengan menambahkan atribut waktu dalam penyusunan dokumen. Penelitian ini menghasilkan model topic dan nilai evaluasi f-measure dari model topik berdasarkan uji coba yang dilakukan. Metode LDA dipilih untuk melakukan pemodelan topik dengan memanfaatkan library LDA pada python serta menerapkan standar text-preprocessing dan menambahkan slang words removal untuk menangani kata tidak baku dan singkatan pada chat logs. Pengujian model topik dilakukan dengan uji human in the loop menggunakan word instrusion task kepada pakar Bahasa Indonesia. Hasil evaluasi LDA didapatkan hasil percobaan terbaik dengan mengubah dokumen menjadi 10 menit dan menggabungkan dengan reply chat pada percakapan grup WhatsApp merupakan salah satu cara dalam meningkatkan hasil pemodelan topik menggunakan algoritma Latent Dirichlet Allocation (LDA), didapatkan nilai precision sebesar 0.9294, nilai recall sebesar 0.7900 dan nilai f-measure sebesar 0.8541.


PLoS ONE ◽  
2021 ◽  
Vol 16 (1) ◽  
pp. e0243208
Author(s):  
Leacky Muchene ◽  
Wende Safari

Unsupervised statistical analysis of unstructured data has gained wide acceptance especially in natural language processing and text mining domains. Topic modelling with Latent Dirichlet Allocation is one such statistical tool that has been successfully applied to synthesize collections of legal, biomedical documents and journalistic topics. We applied a novel two-stage topic modelling approach and illustrated the methodology with data from a collection of published abstracts from the University of Nairobi, Kenya. In the first stage, topic modelling with Latent Dirichlet Allocation was applied to derive the per-document topic probabilities. To more succinctly present the topics, in the second stage, hierarchical clustering with Hellinger distance was applied to derive the final clusters of topics. The analysis showed that dominant research themes in the university include: HIV and malaria research, research on agricultural and veterinary services as well as cross-cutting themes in humanities and social sciences. Further, the use of hierarchical clustering in the second stage reduces the discovered latent topics to clusters of homogeneous topics.


2018 ◽  
Vol 1 (1) ◽  
pp. 51-56
Author(s):  
Naeem Ahmed Mahoto

The growing rate of unstructured textual data has made an open challenge for the knowledge discovery, which aims extracting desired information from large collection of data. This study presents a system to derive news coverage patterns with the help of probabilistic model – Latent Dirichlet Allocation. Pattern is an arrangement of words within collected data that more likely appear together in certain context. The news coverage patterns have been computed as number function of news articles comprising of such patterns. A prototype, as a proof, has been developed to estimate the news coverage patterns for a newspaper – The Dawn. Analyzing the news coverage patterns from different aspects has been carried out using multidimensional data model. Further, the extracted news coverage patterns are illustrated by visual graphs to yield in-depth understanding of the topics, which have been covered in the news. The results also assist in identification of schema related to newspaper and journalists’ articles.


2021 ◽  
pp. 147078532110400
Author(s):  
Pablo Marshall

Mindset metrics, the measurement of consumers’ perceptions, attitudes, and intentions, have a long tradition in marketing, particularly in advertising and branding. Some of the most usual mindset metrics are brand awareness, brand image, personality traits, and attribute importance. Brand awareness and other mindset measures have the form of texts (bag of words). And, a natural methodology for analyzing these variables is topic modeling and the popular Latent Dirichlet allocation (LDA) model. The LDA methodology assumes that brands or concepts are represented by clusters of brands in consumers’ minds. This study proposes an extension/modification of the LDA model for brand awareness and other mindset variables that incorporate Bernoulli observations instead of the Multinomial specification present in the usual LDA specification. This extension is relevant since, unlike words in texts, brands and mindset concepts are not repeated within a document and have a dichotomous form, present or absent. The proposed model is applied to two brand awareness datasets. The results show significant gains in both managerial insights in analyzing brand clusters and consumers’ profiles.


2021 ◽  
Vol 2 (1) ◽  
pp. 1-12
Author(s):  
Mounika Kandukuri ◽  
HaraGopal V.V.

The purpose of this study is to give an insight about Textual Data analytics and its application in the analysis of unique public relations campaign”Mann Ki Baat” that was initiated by incumbent Prime Minister of India,honourable “Mr.Narendra Modi” which was initially aired on All India Radio Programme on Vijaya Dashami on October 3rd , 2014 followed by second on November 2nd, 2014 of the same year till December 2019. In this paper, an analytical framework is designed using a powerful technique of textual data analytics “Topic Modelling based on LDA (Latent Dirichlet Allocation)” to accomplish the study. The proposed framework is applied to the corpus of 60 episodes(October 2014 to December 2019) of Mann ki Baat gathered from PMindia website and  was analyzed in greater detail. The terms used frequently and recurrence of the topics spoken in his popular monthly radio address  program were determined and analyzed from both in statistical and dynamic perspectives.In this context the present study is a first approach of application under the conventional technique “topic modelling” on Mann Ki Baat.Further, this is the principal endeavour to excerpt the themes discussed in radio programme using statistical modelling.


Sign in / Sign up

Export Citation Format

Share Document