Retrospective and prospective approaches of coronavirus publications in the last half-century: a Latent Dirichlet allocation analysis

PurposeThe present article's primary purpose is the topic modeling of the global coronavirus publications in the last 50 years.Design/methodology/approachThe present study is applied research that has been conducted using text mining. The statistical population is the coronavirus publications that have been collected from the Web of Science Core Collection (1970–2020). The main keywords were extracted from the Medical Subject Heading browser to design the search strategy. Latent Dirichlet allocation and Python programming language were applied to analyze the data and implement the text mining algorithms of topic modeling.FindingsThe findings indicated that the SARS, science, protein, MERS, veterinary, cell, human, RNA, medicine and virology are the most important keywords in the global coronavirus publications. Also, eight important topics were identified in the global coronavirus publications by implementing the topic modeling algorithm. The highest number of publications were respectively on the following topics: “structure and proteomics,” “Cell signaling and immune response,” “clinical presentation and detection,” “Gene sequence and genomics,” “Diagnosis tests,” “vaccine and immune response and outbreak,” “Epidemiology and Transmission” and “gastrointestinal tissue.”Originality/valueThe originality of this article can be considered in three ways. First, text mining and Latent Dirichlet allocation were applied to analyzing coronavirus literature for the first time. Second, coronavirus is mentioned as a hot topic of research. Finally, in addition to the retrospective approaches to 50 years of data collection and analysis, the results can be exploited with prospective approaches to strategic planning and macro-policymaking.

Download Full-text

Iranian COVID-19 Publications in LitCovid: Text Mining and Topic Modeling

Scientific Programming ◽

10.1155/2021/3315695 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Meisam Dastani ◽

Farshid Danesh

Keyword(s):

Case Report ◽

Text Mining ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Subject Area ◽

Scientific Publications ◽

Statistical Population ◽

Strategic Issues ◽

Number Of Publications ◽

And Control

COVID-19 is a threat to the lives of people all over the world. As a result of the new and unknown nature of COVID-19, much research has been conducted recently. In order to increase and enhance the growth rate of Iranian publications on COVID-19, this article aims to analyze these publications in LitCovid to identify the topical and content structure and topic modeling of scientific publications in the mentioned subject area. The present article is applied research performed by using an analytical approach as well as text mining techniques. The statistical population is all the publications of Iranian researchers in LitCovid. Latent Dirichlet Allocation (LDA) and Python were used to analyze the data and implement text mining and topic modeling algorithms. Data analysis shows that the percentage of Iranian publications in the eight topical groups in LitCovid is as follows: prevention (39.57%), treatment (18.99%), diagnosis (18.99%), forecasting (7.83%), case report (6.52%), mechanism (3.91%), transmission (3.62%), and general (0.58%). The results indicate that patient, pandemic, outbreak, case, Iranian, model, care, health, coronavirus, and disease are the most important words in the publications of Iranian researchers in LitCovid. Six topics for prevention; four topics for treatment and case report and forecasting; three topics for diagnosis, mechanism, and transmission in general have been obtained by implementing the topic modeling algorithm. Most of the Iranian publications in LitCovid are related to the topic “pandemic status,” with 22.47% in the prevention category, and the lowest number of publications is related to the topic “environment,” with 11.11% in the transmission category. The present study indicates a better understanding of essential and strategic issues of Iranian publications in LitCovid. The results reveal that many Iranian studies on COVID-19 were primarily on the issues related to prevention, management, and control. These findings provided a structured and research-based viewpoint of COVID-19 in Iran to guide researchers and policymakers.

Download Full-text

Assessment of the history and trends of “The Journal of Intellectual Capital”: a bibliometrics, altmetrics and text mining analysis

Journal of Intellectual Capital ◽

10.1108/jic-02-2020-0057 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Mohammadreza Esmaeili Givi ◽

Mohammad Karim Saberi ◽

Mojtaba Talafidaryani ◽

Mahdi Abdolhamid ◽

Rahim Nikandish ◽

...

Keyword(s):

Text Mining ◽

Bibliometric Analysis ◽

Intellectual Capital ◽

Topic Modeling ◽

Editorial Team ◽

International Studies ◽

Content Type ◽

Number Of Publications ◽

Relative Prevalence ◽

Increasing Trends

PurposeThe Journal of Intellectual Capital (JIC) celebrated its 20th anniversary in 2020. Therefore, the present study aims to provide a general overview of the history and key trends in this journal during 2000–2019.Design/methodology/approachTwo types of citation and textual data during a 20-year journal period were retrieved from the Scopus database. The citation structures and contents were explored based on a combination of bibliometric analysis, altmetric analysis and text mining. The journal themes and trends of their changes were analyzed through citation bursts, mapping and topic modeling. To make a better comparison, the text mining process for the topic modeling of the IC field was performed in addition to the topic modeling of JIC.FindingsBibliometric analysis indicated that JIC has experienced a remarkable growth in terms of the number of publications and citations over the last 20 years. The results indicated that JIC plays a significant role among IC researchers. Additionally, a large number of researchers, institutes and countries have made contributions to this journal and cited its research papers. Altmetric analysis showed that JIC has been shared in different social media such as Twitter, Facebook, Wikipedia, Mendeley, Citeulike, news and blogs. Text mining abstract of JIC articles indicated that “measurement,” “financial performance” and “IC reporting” have the relative prevalence with increasing trends over the past 20 years. In addition, “research trends” and “national and international studies” had a stable trend with low thematic share.Research limitations/implicationsThe findings have important implications for the JIC editorial team in order to make informed decisions about the further development of JIC as well as for IC researchers and practitioners to make more valuable contributions to the journal.Originality/valueUsing bibliometric analysis, altmetric analysis and text mining, this study provided a systematic and comprehensive analysis of JIC. The simultaneous use of these methods provides an interesting, unique and suitable capacity to analyze the journals by considering their various aspects.

Download Full-text

Discovery of factors affecting tourists' fine dining experiences at five-star hotel restaurants in Istanbul

British Food Journal ◽

10.1108/bfj-02-2021-0138 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Semra Aktas-Polat ◽

Serkan Polat

Keyword(s):

Text Mining ◽

Food Quality ◽

Design Methodology ◽

Latent Dirichlet Allocation ◽

Content Type ◽

Factors Affecting ◽

Customer Delight ◽

Dirichlet Allocation

PurposeThe purpose of this study is to discover the factors affecting customer delight, satisfaction and dissatisfaction in fine dining experiences (FDEs).Design/methodology/approachOnline user generated 2,585 reviews on TripAdvisor for 46 five-star hotel restaurants operating in Istanbul were analyzed with the latent Dirichlet allocation (LDA) algorithm.FindingsLDA created nine, eight and seven topics for delight, satisfaction and dissatisfaction, respectively. The most salient topics for customer delight, satisfaction and dissatisfaction in FDEs are staff (17.3%), view (19%), and food quality (23%), respectively.Originality/valueThis study is one of the few studies investigating customer delight and satisfaction together. The study shows that FDEs can be analyzed with text mining techniques. Moreover, the study contributes to the literature on customer delight by adding staff topic as an antecedent.

Download Full-text

What’s yours is mine: exploring customer voice on Airbnb using text-mining approaches

Journal of Consumer Marketing ◽

10.1108/jcm-02-2018-2581 ◽

2019 ◽

Vol 36 (5) ◽

pp. 655-665 ◽

Cited By ~ 10

Author(s):

Jurui Zhang

Keyword(s):

Content Analysis ◽

Text Mining ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Local Community ◽

Negative Emotion ◽

Online Reviews ◽

Sharing Economy ◽

Consumer Reviews ◽

Content Type

Purpose This paper aims to investigate customers’ experiences with Airbnb by text-mining customer reviews posted on the platform and comparing the extracted topics from online reviews between Airbnb and the traditional hotel industry using topic modeling. Design/methodology/approach This research uses text-mining approaches, including content analysis and topic modeling (latent Dirichlet allocation method), to examine 1,026,988 Airbnb guest reviews of 50,933 listings in seven cities in the USA. Findings The content analysis shows that negative reviews are more authentic and credible than positive reviews on Airbnb and that the occurrence of social words is positively related to positive emotion in reviews, but negatively related to negative emotion in reviews. A comparison of reviews on Airbnb and hotel reviews shows unique topics on Airbnb, namely, “late check-in”, “patio and deck view”, “food in kitchen”, “help from host”, “door lock/key”, “sleep/bed condition” and “host response”. Research limitations/implications The topic modeling result suggests that Airbnb guests want to get to know and connect with the local community; thus, help from hosts on ways they can authentically experience the local community would be beneficial. In addition, the results suggest that customers emphasize their interaction with hosts; thus, to improve customer satisfaction, Airbnb hosts should interact with guests and respond to guests’ inquiries quickly. Practical implications Hotel managers should design marketing programs that fulfill customers’ desire for authentic and local experiences. The results also suggest that peer-to-peer accommodation platforms should improve online review systems to facilitate authentic reviews and help guests have a smooth check-in process. Originality/value This study is one of the first to examine consumer reviews in detail in the sharing economy and compare topics from consumer reviews between Airbnb and hotels.

Download Full-text

Latent Dirichlet allocation-based temporal summarization

International Journal of Web Information Systems ◽

10.1108/ijwis-04-2018-0023 ◽

2019 ◽

Vol 15 (1) ◽

pp. 83-102 ◽

Cited By ~ 1

Author(s):

Ahmed Amir Tazibt ◽

Farida Aoughlis

Keyword(s):

Topic Modeling ◽

Design Methodology ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

External Source ◽

Decision Makers ◽

Content Type ◽

Available Information ◽

Different Sources ◽

Dirichlet Allocation

Purpose During crises such as accidents or disasters, an enormous volume of information is generated on the Web. Both people and decision-makers often need to identify relevant and timely content that can help in understanding what happens and take right decisions, as soon it appears online. However, relevant content can be disseminated in document streams. The available information can also contain redundant content published by different sources. Therefore, the need of automatic construction of summaries that aggregate important, non-redundant and non-outdated pieces of information is becoming critical. Design/methodology/approach The aim of this paper is to present a new temporal summarization approach based on a popular topic model in the information retrieval field, the Latent Dirichlet Allocation. The approach consists of filtering documents over streams, extracting relevant parts of information and then using topic modeling to reveal their underlying aspects to extract the most relevant and novel pieces of information to be added to the summary. Findings The performance evaluation of the proposed temporal summarization approach based on Latent Dirichlet Allocation, performed on the TREC Temporal Summarization 2014 framework, clearly demonstrates its effectiveness to provide short and precise summaries of events. Originality/value Unlike most of the state of the art approaches, the proposed method determines the importance of the pieces of information to be added to the summaries solely relying on their representation in the topic space provided by Latent Dirichlet Allocation, without the use of any external source of evidence.

Download Full-text

Analysis of Health Research Topics in Indonesia Using the LDA (Latent Dirichlet Allocation) Topic Modeling Method

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i2.1821 ◽

2020 ◽

Vol 4 (2) ◽

pp. 336-344

Author(s):

Yoga Sahria ◽

Dhomas Hatta Fudholi

Keyword(s):

Health Research ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Research Trend ◽

The Public ◽

Know How ◽

The Government ◽

Python Programming ◽

Dirichlet Allocation

In this time, the need of research, the development and the implementation of the result of research in health is increasing both from the researchers, the government, the academic even of from the public general. One of the ways to find out the health research trend is by topic modeling. The method that used in this research is topic modeling LDA (Latent Dirichlet Allocation) method. The purpose of this research is to identify how modeling topic method LDA analyze modeling topic to some health research in Indonesia by Sinta Journal and to know how the coherence value in each topic of the model that has been made. Besides, hopefully it can be used as a reference to do heath research in Indonesia based the topic that has been modeled. The development of this research uses Anaconda3 Python Programming Language Tools and utilizes the LDA library that provided to get the topic model. To examine the result of this research the respondent are medical worker, health researcher and academics. The result of this research the topic modeling that used 94,1% respondent say very good and 5,9% say good.

Download Full-text

Analyzing U.S. Army Officer Evaluation Reports with Natural Language Processing: A Log-Odds and Latent Dirichlet Allocation Exploration

Industrial and Systems Engineering Review ◽

10.37266/iser.2019v7i1.pp44-55 ◽

2019 ◽

Vol 7 (1) ◽

pp. 44-55

Author(s):

Heidy Shi ◽

John Caddell ◽

Julia Lensing

Keyword(s):

Natural Language Processing ◽

Text Mining ◽

Language Processing ◽

Text Analysis ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Data Set ◽

Army Officer ◽

Log Odds ◽

Dirichlet Allocation

Each job field (branch) in the Army requires a unique set of skills and talents of the officers assigned. Officers who demonstrate the required skills are often more successful in their assigned branch. To better understand how success is described across branches, research was conducted using text mining and text analysis of a data set of Officer Evaluation Reports (OERs). This research looked for common trends and discrepancies across varying branches and like groups of branches by analyzing the narrative portion of OERs. Text analysis methods examined words and bigrams commonly used to describe varying degrees of performance by officers. Topic modeling using Latent Dirichlet Allocation (LDA) was also conducted on top rated narratives to investigate trends and discrepancies in clustering narratives. Findings show that qualitative narratives for the top two performance designations fail to differentiate between officers’ varying levels of performance regardless of branch.

Download Full-text

Innovation in an Emerging Market: A Bibliometric and Latent Dirichlet Allocation Based Topic Modeling Study

2020 International Conference on Decision Aid Sciences and Application (DASA) ◽

10.1109/dasa51403.2020.9317278 ◽

2020 ◽

Author(s):

Mohd Faiz Hilmi ◽

Yanti Mustapha ◽

Mohammad Tasyriq Che Omar

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Emerging Market ◽

Modeling Study ◽

Dirichlet Allocation

Download Full-text

Mining numerical measure of consumers’ product evaluation expressed in words based on latent Dirichlet allocation

Journal of Modelling in Management ◽

10.1108/jm2-07-2021-0163 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Ziang Wang ◽

Feng Yang

Keyword(s):

Latent Dirichlet Allocation ◽

Product Evaluation ◽

Online Reviews ◽

Product Evaluations ◽

Product Attributes ◽

Data Set ◽

Content Type ◽

Face To Face ◽

Online Retailers ◽

Dirichlet Allocation

Purpose It has always been a hot topic for online retailers to obtain consumers’ product evaluations from massive online reviews. In the process of online shopping, there is no face-to-face interaction between online retailers and customers. After collecting online reviews left by customers, online retailers are eager to acquire answers to some questions. For example, which product attributes will attract consumers? Or which step brings a better experience to consumers during the process of shopping? This paper aims to associate the latent Dirichlet allocation (LDA) model with the consumers’ attitude and provides a method to calculate the numerical measure of consumers’ product evaluation expressed in each word. Design/methodology/approach First, all possible pairs of reviews are organized as a document to build the corpus. After that, latent topics of the traditional LDA model noted as the standard LDA model, are separated into shared and differential topics. Then, the authors associate the model with consumers’ attitudes toward each review which is distinguished as positive review and non-positive review. The product evaluation reflected in consumers’ binary attitude is expanded to each word that appeared in the corpus. Finally, a variational optimization is introduced to calculate parameters mentioned in the expanded LDA model. Findings The experiment’s result illustrates that the LDA model in the research noted as an expanded LDA model, can successfully assign sufficient probability with words related to products attributes or consumers’ product evaluation. Compared with the standard LDA model, the expanded model intended to assign higher probability with words, which have a higher ranking within each topic. Besides, the expanded model also has higher precision on the prediction set, which shows that breaking down the topics into two categories fits better on the data set than the standard LDA model. The product evaluation of each word is calculated by the expanded model and depicted at the end of the experiment. Originality/value This research provides a new method to calculate consumers’ product evaluation from reviews in the level of words. Words may be used to describe product attributes or consumers’ experiences in reviews. Assigning words with numerical measures can analyze consumers’ products evaluation quantitatively. Besides, words are labeled themselves, they can also be ranked if a numerical measure is given. Online retailers can benefit from the result for label choosing, advertising or product recommendation.

Download Full-text

Spam Diffusion in Social Networking Media using Latent Dirichlet Allocation

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.i7898.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 881-885

Keyword(s):

Online Social Networks ◽

Topic Modeling ◽

Information Diffusion ◽

Latent Dirichlet Allocation ◽

Good Accuracy ◽

Ground Truth ◽

Online Social Media ◽

Diffusion Dynamics ◽

Dirichlet Allocation

Like web spam has been a major threat to almost every aspect of the current World Wide Web, similarly social spam especially in information diffusion has led a serious threat to the utilities of online social media. To combat this challenge the significance and impact of such entities and content should be analyzed critically. In order to address this issue, this work usedTwitter as a case study and modeled the contents of information through topic modeling and coupled it with the user oriented feature to deal it with a good accuracy. Latent Dirichlet Allocation (LDA) a widely used topic modeling technique is applied to capture the latent topics from the tweets’ documents. The major contribution of this work is twofold: constructing the dataset which serves as the ground-truth for analyzing the diffusion dynamics of spam/non-spam information and analyzing the effects of topics over the diffusibility. Exhaustive experiments clearly reveal the variation in topics shared by the spam and nonspam tweets. The rise in popularity of online social networks, not only attracts legitimate users but also the spammers. Legitimate users use the services of OSNs for a good purpose i.e., maintaining the relations with friends/colleagues, sharing the information of interest, increasing the reach of their business through advertisings

Download Full-text