How to Identify Hot Topics in Psychology Using Topic Modeling

Abstract. Latent topics and trends in psychological publications were examined to identify hotspots in psychology. Topic modeling was contrasted with a classification-based scientometric approach in order to demonstrate the benefits of the former. Specifically, the psychological publication output in the German-speaking countries containing German- and English-language publications from 1980 to 2016 documented in the PSYNDEX database was analyzed. Topic modeling based on latent Dirichlet allocation (LDA) was applied to a corpus of 314,573 publications. Input for topic modeling was the controlled terms of the publications, that is, a standardized vocabulary of keywords in psychology. Based on these controlled terms, 500 topics were determined and trending topics were identified. Hot topics, indicated by the highest increasing trends in this data, were facets of neuropsychology, online therapy, cross-cultural aspects, traumatization, and visual attention. In conclusion, the findings indicate that topics can reveal more detailed insights into research trends than standardized classifications. Possible applications of this method, limitations, and implications for research synthesis are discussed.

Download Full-text

Exploring Latent Topics and International Research Trends in Competency-Based Education Using Topic Modeling

Education Sciences ◽

10.3390/educsci11060303 ◽

2021 ◽

Vol 11 (6) ◽

pp. 303

Author(s):

Seungsu Paek ◽

Taehun Um ◽

Namhyoung Kim

Keyword(s):

Education Reform ◽

International Research ◽

Topic Modeling ◽

Modern Society ◽

Research Trends ◽

Competency Development ◽

Competency Based Education ◽

Competency Based ◽

Latent Topics ◽

International Research Trends

Recently, there has been growing educational interest in competency. Global organizations, such as the United Nations (UN) and Organization for Economic Co-operation and Development (OECD), which are leading the discourse on education reform, are undertaking the lead in spreading awareness regarding competency education. Since 2015, the number of published articles on competency education has been rapidly increasing. This paper aims to provide significant implications for creating a sustainable future of competency education. A topic modeling method was used to empirically analyze latent topics and international research trends in 26,532 articles published on competency-based education (CBE). As a result of the analysis, 15 topics were derived, including “approach to competency development.” In addition, five topics including “learning skills” and “teacher training” were found to be hot topics with the increasing article publication. The rapidly changing modern society is calling for a transformation in education. We hope that the results of this study paves the way for further research exploring new directions for education, such as competency education.

Download Full-text

Analysis and Visualization Latent Topic on COVID-19 Vaccine Tweet use two-stage topic modeling (Preprint)

10.2196/preprints.30290 ◽

2021 ◽

Author(s):

Faizah Faizah ◽

Bor-Shen Lin

Keyword(s):

Topic Modeling ◽

Public Perception ◽

Latent Dirichlet Allocation ◽

World Health ◽

Two Stage ◽

The Public ◽

Global Pandemic ◽

Difficult Time ◽

Latent Topic ◽

Latent Topics

BACKGROUND The World Health Organization (WHO) declared COVID-19 as a global pandemic on January 30, 2020. However, the pandemic has not been over yet. Furthermore, in the first quartal of 2021, some countries face the third wave of the pandemic. During the difficult time, the development of the vaccines for COVID-19 accelerates rapidly. Understanding the public perception of the COVID-19 Vaccine according to the data collected from social media can widen the perspective on the state of the global pandemic OBJECTIVE This study explores and analyzes the latent topic on COVID-19 Vaccine Tweet posted by individuals from various countries by using two-stage topic modeling. METHODS A two-stage analysis in topic modeling was proposed to investigating people’s reactions in five countries. The first stage is Latent Dirichlet Allocation that produces the latent topics with the corresponding term distributions that facilitate the investigators to understand the main issues or opinions. The second stage then performs agglomerative clustering on the latent topics based on Hellinger distance, which merges close topics hierarchically into topic clusters to visualize those topics in either tree or graph views. RESULTS In general, the topic discussion regarding the COVID-19 Vaccine in five countries is similar. Topic themes such as "first vaccine" and & "vaccine effect" dominate the public discussion. The remarkable point is that people in some countries have some topic themes, such as "politician opinion" and " stay home" in Canada, "emergency" in India, and & "blood clots" in the United Kingdom. The analysis also shows the most popular COVID-19 Vaccine, which is gaining more public interest. CONCLUSIONS With LDA and Hierarchical clustering, two-stage topic modeling is powerful for visualizing the latent topics and understanding the public perception regarding the COVID-19 Vaccine.

Download Full-text

Topic Modelling in Bangla Language: An LDA Approach to Optimize Topics and News Classification

Computer and Information Science ◽

10.5539/cis.v11n4p77 ◽

2018 ◽

Vol 11 (4) ◽

pp. 77 ◽

Cited By ~ 2

Author(s):

Malek Mouhoub ◽

Mustakim Al Helal

Keyword(s):

Topic Modeling ◽

Text Categorization ◽

English Language ◽

Latent Dirichlet Allocation ◽

Similarity Measures ◽

Document Collections ◽

Statistical Language Modeling ◽

Document Models ◽

Wide Range ◽

News Corpus

Topic modeling is a powerful technique for unsupervised analysis of large document collections. Topic models have a wide range of applications including tag recommendation, text categorization, keyword extraction and similarity search in the text mining, information retrieval and statistical language modeling. The research on topic modeling is gaining popularity day by day. There are various efficient topic modeling techniques available for the English language as it is one of the most spoken languages in the whole world but not for the other spoken languages. Bangla being the seventh most spoken native language in the world by population, it needs automation in different aspects. This paper deals with finding the core topics of Bangla news corpus and classifying news with similarity measures. The document models are built using LDA (Latent Dirichlet Allocation) with bigram.

Download Full-text

Authorship verification

10.12681/eadd/45382 ◽

2019 ◽

Author(s):

Νεκταρία Πόθα

Keyword(s):

Cyber Security ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

State Of The Art ◽

Latent Semantic Indexing ◽

Semantic Indexing ◽

Authorship Verification ◽

Latent Topics ◽

Authorship Analysis ◽

Dirichlet Allocation

Η περιοχή της ανάλυσης συγγραφέα (Authorship Analysis) αποσκοπεί στην άντληση πληροφοριών σχετικά με τους συγγραφείς ψηφιακών κειμένων. Συνδέεται άμεσα με πολλές εφαρμογές καθώς είναι εφικτό να χρησιμοποιηθεί για την ανάλυση οποιουδήποτε είδους(genre) κειμένων: λογοτεχνικών έργων, άρθρων εφημερίδων, αναρτήσεις σε κοινωνικά δίκτυα κλπ. Οι περιοχές εφαρμογών της τεχνολογίας αυτής διακρίνονται σε φιλολογικές (humanities),(π.χ. ποιος είναι ο συγγραφέας ενός λογοτεχνικού έργου που εκδόθηκε ανώνυμα, ποιος είναι ο συγγραφέας έργων που έχουν εκδοθεί με ψευδώνυμο, επαλήθευση της πατρότητας λογοτεχνικών έργων γνωστών συγγραφέων κτλ.), εγκληματολογικές (forensics) (π.χ. εύρεση υφολογικών ομοιοτήτων μεταξύ προκηρύξεων τρομοκρατικών ομάδων, διερεύνηση αυθεντικότητας σημειώματος αυτοκτονίας, αποκάλυψη πολλαπλών λογαριασμών χρήστη σε κοινωνικά δίκτυα που αντιστοιχούν στο ίδιο άτομο κτλ.) και στον τομέα της ασφάλειας του κυβερνοχώρου (cyber-security) (π.χ. εύρεση υφολογικών ομοιοτήτων μεταξύ χρηστών πολλαπλών ψευδωνύμων).Θεμελιώδες ερευνητικό πεδίο της ανάλυσης συγγραφέα αποτελεί η επαλήθευση συγγραφέα (author verification), όπου δεδομένου ενός συνόλου κειμένων (σε ηλεκτρονική μορφή) από τον ίδιο συγγραφέα (υποψήφιος συγγραφέας) καλούμαστε να αποφασίσουμε αν ένα άλλο κείμενο (άγνωστης ή αμφισβητούμενης συγγραφικής προέλευσης) έχει γραφτεί από τον συγγραφέα αυτόν ή όχι. Η επαλήθευση συγγραφέα έχει αποκτήσει ιδιαίτερο ενδιαφέρον τα τελευταία χρόνια κυρίως λόγω των πειραματικών αξιολογήσεων PAN@CLEF. Συγκεκριμένα, από το 2013 εως το 2015 οι διαγωνισμοί PAN είχαν εστιάσει στο πεδίο της επαλήθευσης συγγραφέα παρέχοντας ένα καλά οργανωμένο σύνολο δεδομένων (PAN corpora) και συγκεντρώνοντας πλήθος μεθόδων για τον σκοπό αυτό. Ωστόσο, το περιθώριο λάθους είναι αρκετά μεγάλο εφόσον η επίδοση των μεθόδων εξαρτάται από πολλαπλούς παράγοντες όπως το μήκος των κειμένων, η θεματική συνάφεια μεταξύ των κειμένων και η υφολογική συνάφεια μεταξύ των κειμένων. Η πιο απαιτητική περίπτωση προκύπτει όταν τα κείμενα γνωστού συγγραφέα ανήκουν σε ένα είδος (π.χ. blogs ή μηνύματα email) ενώ το προς διερεύνηση κείμενο ανήκει σε άλλο είδος (π.χ., tweet ή άρθρο εφημερίδας). Επιπλέον, αν τα κείμενα του γνωστού συγγραφέα με το προς διερεύνηση κείμενο δεν συμφωνούν ως προς τη θεματική περιοχή (topic) (π.χ. τα γνωστά κείμενα σχετίζονται με εξωτερική πολιτική και το άγνωστο με πολιτιστικά θέματα) η επίδοση των τρεχόντων μεθόδων επαλήθευσης συγγραφέα είναι ιδιαίτερα χαμηλή. Στόχος της παρούσας διδακτορικής διατριβής είναι η ανάπτυξη αποδοτικών και εύρωστων μεθόδων επαλήθευσης συγγραφέα που είναι ικανές να χειριστούν ακόμα και τέτοιες περίπλοκες περιπτώσεις. Προς την κατεύθυνση αυτή, παρουσιάζουμε βελτιωμένες μεθόδους επαλήθευσης συγγραφέα και συστηματικά εξετάζουμε την αποδοτικότητα τους σε διάφορα σύνολα δεδομένων αναφοράς (PAN datasets και Enron Data). Αρχικά, προτείνουμε δύο βελτιωμένους αλγόριθμους, ο ένας ακολουθεί το παράδειγμα όπου όλα τα διαθέσιμα δείγματα γραφής του υποψηφίου συγγραφέα αντιμετωπίζονται μεμονωμένα, ως ξεχωριστές αναπαραστάσεις (instance-based paradigm) και ο άλλος είναι βασισμένος στο παράδειγμα όπου όλα τα δείγματα γραφής του υποψηφίου συγγραφέα συννενώνονται και εξάγεται ένα ενιαίο κείμενο, μία μοναδική αναπαράσταση (profile-based paradigm), οι οποίες επιτυγχανουν υψηλότερη απόδοση σε σύνολα δεδομένων που καλύπτουν ποικιλία γλωσσώνν (Αγγλικά, Ελληνικά, Ισπανικά, Ολλανδικά) και κειμενικών ειδών (άρθρα, κριτικές, νουβέλες, κ.ά.) σε σύγκριση με την τεχνολογία αιχμής (state-of-the-art) στον τομέα της επαλήθευσης. Είναι σημαντικό να τονίσουμε ότι οι προτεινόμενες μέθοδοι επωφελούνται σημαντικά από τη διαθεσιμότητα πολλαπλών δειγμάτων κειμένων του υποψηφίου συγγραφέα και παραμένουν ιδιαίτερα ανθεκτικές/ανταγωνιστικές όταν το μήκος των κειμένων είναι περιορισμένο. Επιπλέον, διερευνούμε τη χρησιμότητα της εφαρμογής μοντελοποίησης θέματος (topic modeling) στην επαλήθευση συγγραφέα. Συγκεκριμένα, διεξάγουμε μια συστηματική μελέτη για να εξετάσουμε εάν οι τεχνικές μοντελοποίησης θέματος επιτυγχάνουν την βελτίωση της απόδοσης των πιο βασικών κατηγοριών μεθόδων επαλήθευσης καθώς και ποια συγκεκριμένη τεχνική μοντελοποίησης θέματος είναι η πλέον κατάλληλη για κάθε ένα από τα παραδείγματα μεθόδων επαλήθευσης. Για το σκοπό αυτό, συνδυάζουμε γνωστές μεθόδους μοντελοποίσης, Latent Semantic Indexing (LSI) και Latent Dirichlet Allocation, (LDA), με διάφορες μεθόδους επαλήθευσης συγγραφέα, οι οποίες καλύπτουν τις βασικές κατηγορίες στην περιοχή αυτή, δηλαδή την ενδογενή(intrinsic), που αντιμετωπίζει το πρόβλημα επαλήθευσης ως πρόβλημα μίας κλάσης, και την εξωγενή (extrinsic), που μετατρέπει το πρόβλημα επαλήθευσης σε πρόβλημα δύο κλάσεων, σε συνδυασμό με τις profile-based και instance-based προσεγγίσεις.Χρησιμοποιώντας πολλαπλά σύνολα δεδομένων αξιολόγησης επιδεικνύουμε ότι η LDA τεχνική συνδυάζεται καλύτερα με τις εξωγενείς μεθόδους ενώ η τεχνική LSI αποδίδει καλύτερα με την πιο αποδοτικής ενδογενή μέθοδο. Επιπλέον, οι τεχνικές μοντελοποίησης θέματος φαίνεται να είναι πιο αποτελεσματικές όταν εφαρμόζονται σε μεθόδους που ακολουθούν το profile-based παράδειγμα και η αποδοτικότητα τους ενισχύεται όταν η πληροφορία των latent topics εξάγεται από ένα ενισχυμένο σύνολο κειμένων (εμπλουτισμένο με επιπλέον κείμενα τα οποία έχουν συλλεχθεί από εξωτερικές πηγές (π.χ web) και παρουσιάζουν σημαντική θεματική συνάφεια με το αρχικό υπό εξέταση σύνολο δεδομένων. Η σύγκριση των αποτελεσμάτων μας με την τεχνολογία αιχμής του τομέα της επαλήθευσης, επιδεικνύει την δυναμική των προτεινόμενων μεθόδων. Επίσης, οι προτεινόμενες εξωγενείς μέθοδοι είναι ιδιαίτερα ανταγωνιστικές στην περίπτωση που χρησιμοποιηθούν αγνώστου είδους εξωγενή κείμενα. Σε ορισμένες από τις σχετικές μελέτες, υπάρχουν ενδείξεις ότι ετερογενή σύνολα(heterogeneous ensembles) μεθόδων επαλήθευσης μπορούν να παρέχουν πολύ αξιόπιστες λύσεις, καλύτερες από κάθε ατομικό μοντέλο επαλήθευσης ξεχωριστά. Ωστόσο, έχουν εξεταστεί μόνο πολύ απλά μοντέλα συνόλων έως τώρα που συνδυάζουν σχετικά λίγες βασικές μεθόδους. Προσπαθώντας να καλύψουμε το κενό αυτό, θεωρούμε ένα μεγάλο σύνολο βασικών μοντέλων επαλήθευσης (συνολικά 47 μοντέλα) που καλύπτουν τα κύρια παραδείγματα /κατηγορίες μεθόδων στην περιοχή αυτή και μελετούμε τον τρόπο με τον οποίο μπορούν να συνδυαστούν ώστε να δημιουργηθεί ένα αποτελεσματικό σύνολο. Με αυτό τον τρόπο, προτείνουμε ένα απλό σύνολο ομαδοποίησης στοίβας (stacking ensemble) καθώς και μια προσέγγιση που βασίζεται στην δυναμική επιλογή μοντέλων για καθεμία υπό εξέταση περίπτωση επαλήθευσης συγγραφέα ξεχωριστά. Τα πειραματικά αποτελέσματα σε πολλαπλά σύνολα δεδομένων επιβεβαιώνουν την καταλληλότητα των προτεινόμενων μεθόδων επιδεικνύοντας την αποτελεσματικότητα τους. Η βελτίωση της επίδοσης που επιτυγχάνουν τα καλύτερα από τα αναφερόμενα μοντέλα σε σχέση με την τρέχουσα τεχνολογία αιχμής είναι περισσότερο από 10%.

Download Full-text

A Recommendation Mechanism for Under-Emphasized Tourist Spots Using Topic Modeling and Sentiment Analysis

Sustainability ◽

10.3390/su12010320 ◽

2019 ◽

Vol 12 (1) ◽

pp. 320 ◽

Cited By ~ 2

Author(s):

Wafa Shafqat ◽

Yung-Cheol Byun

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Recommendation Systems ◽

Support Vector ◽

Latent Factors ◽

Internet Applications ◽

Enormous Amount ◽

Combined Features ◽

Latent Topics ◽

User History

With rapid advancements in internet applications, the growth rate of recommendation systems for tourists has skyrocketed. This has generated an enormous amount of travel-based data in the form of reviews, blogs, and ratings. However, most recommendation systems only recommend the top-rated places. Along with the top-ranked places, we aim to discover places that are often ignored by tourists owing to lack of promotion or effective advertising, referred to as under-emphasized locations. In this study, we use all relevant data, such as travel blogs, ratings, and reviews, in order to obtain optimal recommendations. We also aim to discover the latent factors that need to be addressed, such as food, cleanliness, and opening hours, and recommend a tourist place based on user history data. In this study, we propose a cross mapping table approach based on the location’s popularity, ratings, latent topics, and sentiments. An objective function for recommendation optimization is formulated based on these mappings. The baseline algorithms are latent Dirichlet allocation (LDA) and support vector machine (SVM). Our results show that the combined features of LDA, SVM, ratings, and cross mappings are conducive to enhanced performance. The main motivation of this study was to help tourist industries to direct more attention towards designing effective promotional activities for under-emphasized locations.

Download Full-text

Translanguaging in Indian fiction

Translation and Translanguaging in Multilingual Contexts ◽

10.1075/ttmc.00076.gup ◽

2021 ◽

Vol 7 (3) ◽

pp. 253-278

Author(s):

Munmun Gupta

Keyword(s):

Case Studies ◽

English Language ◽

Cross Cultural ◽

Data Sources ◽

Literary Studies ◽

Fiction Writing ◽

Cultural Aspects ◽

Indian Writing In English ◽

And Linguistics ◽

The Way

Abstract Translanguaging refers to the way in which multilingual individuals draw on their full linguistic repertoires, rather than adhering to narrow use of one named language. This concept has important sociolinguistic significance because it enables individuals to move beyond colonial structures of power and liberates the language practices of multilinguals. The purpose of this research is to investigate the phenomenon of translanguaging in Indian writing in English, using two anthologies, She Speaks (Ray et al. 2019) and She Celebrates (Choudhury et al. 2020), as data sources. Focusing on stories contained in these anthologies as case studies, the research describes linguistic, cultural and stylistic effects of translanguaging used in these works, in which Indian writers portray their characters engaging in translanguaging as a way of ‘Indianising’ the English language. In line with accounts of the process of translanguaging as culture-specific, the study reveals that often authors and their characters use translanguaging because forms of usage can be difficult to translate – or at least to translate in a way that conveys the meaning those forms have in the original, vernacular context. The study demonstrates how work at the intersection of literary studies and linguistics can illuminate cross-cultural aspects of fiction writing.

Download Full-text

Tourists’ shifting perceptions of UNESCO heritage sites: lessons from Jeju Island-South Korea

Tourism Review ◽

10.1108/tr-09-2017-0140 ◽

2019 ◽

Vol 74 (1) ◽

pp. 20-29 ◽

Cited By ~ 2

Author(s):

Kun Kim ◽

Ounjoung Park ◽

Jacob Barr ◽

Haejung Yun

Keyword(s):

Topic Modeling ◽

English Language ◽

Latent Dirichlet Allocation ◽

Online Reviews ◽

Natural World ◽

Tourism Industry ◽

Jeju Island ◽

Web Crawler ◽

Content Type ◽

Heritage Sites

Purpose The purpose of this research is to analyze the shifting perceptions of international tourists to Jeju Island and provide practical lessons to the tourism industry. Specifically, in regard to three United Nations Educational, Scientific and Cultural Organization (UNESCO) natural World Heritage sites in Jeju, this research measures the most salient topics mentioned by tourists to inform a more accurate perception of the island’s most valuable natural assets as reported by tourism experiences. Design/methodology/approach This study used a Web crawler to gather over 1,500 English language reviews from international tourists from a famous travel information website. The collected data were then preprocessed for stemming and lemmatization. After this, the processed text data were analyzed through a latent Dirichlet allocation (LDA)-based topic modeling approach to identify the most prominent clusters of ideas mentioned and represent them visually through graphs, tables and charts. Findings The findings from this research suggest that there are ten identifiable topics. Topics focusing on “adventure,” “summits” and “winter” showed noticeable increases, whereas topics focusing on “sunrise peak” and “UNESCO” have decreased over time. There is a trend for international tourists to be ever more conscious of the adventurous and rugged aspects of Jeju, and the novelty of mentioning UNESCO status seems to have worn off. Furthermore, there is the proclivity for tourists to mention “worth” and “enjoy” more as time goes on. Originality/value This study applies LDA-based topic modeling and LDAvis using user-generated online reviews with time-series analyses. Consequently, it provides unique insights into the changing perceptions of ecotourism on Jeju today, as well as contribution to smart tourism fields.

Download Full-text

Research Trends in College English Education in Korea －A Topic Analysis Using LDA Topic Modeling

The Korean Association of General Education ◽

10.46392/kjge.2021.15.5.169 ◽

2021 ◽

Vol 15 (5) ◽

pp. 169-183

Author(s):

Eunhee Park

Keyword(s):

Learning Strategies ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

English Education ◽

Research Trends ◽

English For Specific Purposes ◽

College English ◽

Affective Factors ◽

Learner Centered ◽

Wide Range

This study investigated the research trends of college English education in Korea from 2001 to 2020. The data was collected using a Biblio data collector and a total of 313 papers were analyzed. For research purposes, the data were analyzed using frequency analysis, LDA (Latent Dirichlet Allocation), and time series analysis. The summary of the findings is as follows: In the first instance, the number of research papers regarding college English education has increased significantly in quantity for 20 years. Secondly, in analyzing the topics of the chosen papers, a total of 10 topics in college English education were found. The topics were “curriculum and level-differentiated programs (T1)”, “learners’ affective factors (T2)”, “assesment and learning strategies (T3)”, “teachers’ factors (T4)”, “English vocabulary, grammar and writing (T5)”, “English for specific purposes (T6)”, “teaching and learning methods (T7)”, “web-based learning (T8)”, “learner-centered education (T9)”, and “textbook analysis etc. (T10).” Among these topics, the three that were identified as topics increasing in popularity were “learners’ affective factors (T2)”, “English for specific purposes (T6)”, and “learner-centered education (T9).” The topics increasing in popularity shared one key characteristic: the topics were related to learners’ factors such as the learners’ motivation, the learners’ goals, and the learners’ activities in class. This study is meaningful in that it collected a wide range of data related to college English education in Korea and produced more reliable results by using big data-based LDA topic modeling techniques.

Download Full-text

Comparative Study on Perceived Trust of Topic Modeling Based on Affective Level of Educational Text

Applied Sciences ◽

10.3390/app9214565 ◽

2019 ◽

Vol 9 (21) ◽

pp. 4565 ◽

Cited By ~ 1

Author(s):

Youngjae Im ◽

Jaehyun Park ◽

Minyeong Kim ◽

Kijung Park

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Negative Mood ◽

Ability Test ◽

Perceived Trust ◽

Significant Difference ◽

Traditional Algorithm ◽

Independent Variable ◽

Latent Topics

Latent dirichlet allocation (LDA) is a representative topic model to extract keywords related to latent topics embedded in a document set. Despite its effectiveness in finding underlying topics in documents, the traditional algorithm of LDA does not have a process to reflect sentimental meanings in text for topic extraction. Focusing on this issue, this study aims to investigate the usability of both LDA and sentiment analysis (SA) algorithms based on the affective level of text. This study defines the affective level of a given set of paragraphs and attempts to analyze the perceived trust of the methodologies in regards to usability. In our experiments, the text of the college scholastic ability test was selected as the set of evaluation paragraphs, and the affective level of the paragraphs was manipulated into three levels (low, medium, and high) as an independent variable. The LDA algorithm was used to extract the keywords of the paragraph, while SA was used to identify the positive or negative mood of the extracted subject word. In addition, the perceived trust score of the algorithm was evaluated by the subjects, and this study verifies whether there is a difference in the score according to the affective levels of the paragraphs. The results show that paragraphs with low affect lead to the high perceived trust of LDA from the participants. However, the perceived trust of SA does not show a statistically significant difference between the affect levels. The findings from this study indicate that LDA is more effective to find topics in text that mainly contains objective information.

Download Full-text

Public Perception of the COVID-19 Pandemic on Twitter: Sentiment Analysis and Topic Modeling Study (Preprint)

10.2196/preprints.21978 ◽

2020 ◽

Author(s):

Sakun Boon-Itt ◽

Yukolpat Skunkan

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

Topic Modeling ◽

Public Perception ◽

English Language ◽

Latent Dirichlet Allocation ◽

Public Awareness ◽

Good Communication ◽

Three Stages ◽

Twitter Users

BACKGROUND COVID-19 is a scientifically and medically novel disease that is not fully understood because it has yet to be consistently and deeply studied. Among the gaps in research on the COVID-19 outbreak, there is a lack of sufficient infoveillance data. OBJECTIVE The aim of this study was to increase understanding of public awareness of COVID-19 pandemic trends and uncover meaningful themes of concern posted by Twitter users in the English language during the pandemic. METHODS Data mining was conducted on Twitter to collect a total of 107,990 tweets related to COVID-19 between December 13 and March 9, 2020. The analyses included frequency of keywords, sentiment analysis, and topic modeling to identify and explore discussion topics over time. A natural language processing approach and the latent Dirichlet allocation algorithm were used to identify the most common tweet topics as well as to categorize clusters and identify themes based on the keyword analysis. RESULTS The results indicate three main aspects of public awareness and concern regarding the COVID-19 pandemic. First, the trend of the spread and symptoms of COVID-19 can be divided into three stages. Second, the results of the sentiment analysis showed that people have a negative outlook toward COVID-19. Third, based on topic modeling, the themes relating to COVID-19 and the outbreak were divided into three categories: the COVID-19 pandemic emergency, how to control COVID-19, and reports on COVID-19. CONCLUSIONS Sentiment analysis and topic modeling can produce useful information about the trends in the discussion of the COVID-19 pandemic on social media as well as alternative perspectives to investigate the COVID-19 crisis, which has created considerable public awareness. This study shows that Twitter is a good communication channel for understanding both public concern and public awareness about COVID-19. These findings can help health departments communicate information to alleviate specific public concerns about the disease.

Download Full-text