scholarly journals Concerns of Thalassemia Patients, Carriers, and their Caregivers in Malaysia: Text Mining Information Shared on Social Media

2021 ◽  
Vol 27 (3) ◽  
pp. 200-213
Author(s):  
Yuen Chi Phang ◽  
Azleena Mohd Kassim ◽  
Ernest Mangantig

Objectives: The main aim of this study was to use text mining on social media to analyze information and gain insight into the health-related concerns of thalassemia patients, thalassemia carriers, and their caregivers.Methods: Posts from two Facebook groups whose members consisted of thalassemia patients, thalassemia carriers, and caregivers in Malaysia were extracted using the Data Miner tool. In this study, a new framework known as Malay-English social media text pre-processing was proposed for performing the steps of pre-processing the noisy mixed language (Malay-English language) of social media posts. Topic modeling was used to identify hidden topics within posts shared among members. Three different topic models—latent Dirichlet allocation (LDA) in GenSim, LDA in MALLET, and latent semantic analysis—were applied to the dataset with and without stemming using Python.Results: LDA in MALLET without stemming was found to be the best topic model for this dataset. Eight topics were identified within the posts shared by members. Of those eight topics, four were newly discovered by this study, and four others corresponded to the findings of previous studies that used an interview approach.Conclusions: Topic 2 (the challenges faced by thalassemia patients) was found to be the topic with the highest attention and engagement. Healthcare practitioners and other concerned parties should make an effort to build a stronger support system related to this issue for those affected by thalassemia.

2019 ◽  
Vol 11 (24) ◽  
pp. 7108
Author(s):  
Jun Shao ◽  
Qinlin Ying ◽  
Shujin Shu ◽  
Alastair M. Morrison ◽  
Elizabeth Booth

The tourist shopping experience is the sum of the satisfaction or dissatisfaction from the individual attributes of purchased products and services. With the popularity of the Internet and travel review websites, more people choose to upload their tour experiences on their favorite social media platforms, which can influence another’s travel planning and choices. However, there have been few investigations of social media reviews of tourist shopping experiences and especially of satisfaction with museum tourism shopping. This research analyzed the user-generated reviews of the National Gallery (NG) in London written in the English language on TripAdvisor to learn more about tourist shopping experience in museums. The Latent Dirichlet Allocation (LDA) topic model was used to discover the underlying themes of online reviews and keywords related to these shopping experiences. Sentiment analysis based on a purpose-developed dictionary was conducted to explore the dissatisfying aspects of tourist shopping experiences. The results provide a framework for museums to improve shopping experiences and enhance their future development.


2019 ◽  
Vol 52 (9-10) ◽  
pp. 1289-1298 ◽  
Author(s):  
Lei Shi ◽  
Gang Cheng ◽  
Shang-ru Xie ◽  
Gang Xie

The aim of topic detection is to automatically identify the events and hot topics in social networks and continuously track known topics. Applying the traditional methods such as Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis is difficult given the high dimensionality of massive event texts and the short-text sparsity problems of social networks. The problem also exists of unclear topics caused by the sparse distribution of topics. To solve the above challenge, we propose a novel word embedding topic model by combining the topic model and the continuous bag-of-words mode (Cbow) method in word embedding method, named Cbow Topic Model (CTM), for topic detection and summary in social networks. We conduct similar word clustering of the target social network text dataset by introducing the classic Cbow word vectorization method, which can effectively learn the internal relationship between words and reduce the dimensionality of the input texts. We employ the topic model-to-model short text for effectively weakening the sparsity problem of social network texts. To detect and summarize the topic, we propose a topic detection method by leveraging similarity computing for social networks. We collected a Sina microblog dataset to conduct various experiments. The experimental results demonstrate that the CTM method is superior to the existing topic model method.


2019 ◽  
Vol 119 (1) ◽  
pp. 111-128 ◽  
Author(s):  
Jianhong Luo ◽  
Xuwei Pan ◽  
Shixiong Wang ◽  
Yujing Huang

Purpose Delivering messages and information to potentially interested users is one of the distinguishing applications of online enterprise social network (ESN). The purpose of this paper is to provide insights to better understand the repost preferences of users and provide personalized information service in enterprise social media marketing. Design/methodology/approach It is accomplished by constructing a target audience identification framework. Repost preference latent Dirichlet allocation (RPLDA) topic model topic model is proposed to understand the mass user online repost preferences toward different contents. A topic-oriented preference metric is proposed to measure the preference degree of individual users. And the function of reposting forecasting is formulated to identify target audience. Findings The empirical research shows the following: a total of 20 percent of the repost users in ESN represent the key active users who are particularly interested in the latent topic of messages in ESN and fits Pareto distribution; and the target audience identification framework can successfully identify different target key users for messages with different latent topics. Practical implications The findings should motivate marketing managers to improve enterprise brand by identifying key target audience in ESN and marketing in a way that truthfully reflects personalized preferences. Originality/value This study runs counter to most current business practices, which tend to use simple popularity to seek important users. Adaptively and dynamically identifying target audience appears to have considerable potential, especially in the rapidly growing area of enterprise social media information service.


2022 ◽  
Vol 9 (3) ◽  
pp. 1-22
Author(s):  
Mohammad Daradkeh

This study presents a data analytics framework that aims to analyze topics and sentiments associated with COVID-19 vaccine misinformation in social media. A total of 40,359 tweets related to COVID-19 vaccination were collected between January 2021 and March 2021. Misinformation was detected using multiple predictive machine learning models. Latent Dirichlet Allocation (LDA) topic model was used to identify dominant topics in COVID-19 vaccine misinformation. Sentiment orientation of misinformation was analyzed using a lexicon-based approach. An independent-samples t-test was performed to compare the number of replies, retweets, and likes of misinformation with different sentiment orientations. Based on the data sample, the results show that COVID-19 vaccine misinformation included 21 major topics. Across all misinformation topics, the average number of replies, retweets, and likes of tweets with negative sentiment was 2.26, 2.68, and 3.29 times higher, respectively, than those with positive sentiment.


Author(s):  
Wallace Chipidza ◽  
Elmira Akbaripourdibazar ◽  
Tendai Gwanzura ◽  
Nicole M. Gatto

AbstractKnowledge gaps may initially exist among scientists, medical and public health professionals during pandemics, which are fertile grounds for misinformation in news media. We characterized and compared COVID-19 coverage in newspapers, television, and social media, and discussed implications for public health communication strategies that are relevant to an initial pandemic response. We conducted a Latent Dirichlet Allocation (LDA), an unsupervised topic modelling technique, analysis of 3,271 newspaper articles, 40 cable news shows transcripts, 96,000 Twitter posts, and 1,000 Reddit posts during March 4 - 12, 2020, a period chronologically early in the timeframe of the COVID-19 pandemic. Coverage of COVID-19 clustered on topics such as epidemic, politics, and the economy, and these varied across media sources. Topics dominating news were not predominantly health-related, suggesting a limited presence of public health in news coverage in traditional and social media. Examples of misinformation were identified particularly in social media. Public health entities should utilize communication specialists to create engaging informational content to be shared on social media sites. Public health officials should be attuned to their target audience to anticipate and prevent spread of common myths likely to exist within a population. This will help control misinformation in early stages of pandemics.


2018 ◽  
Author(s):  
Shatrunjai P. Singh ◽  
Swagata Karkare ◽  
Sudhir M. Baswan ◽  
Vijendra P. Singh

1.AbstractContent summarization is an important area of research in traditional data mining. The volume of studies published on anti-epileptic drugs (AED) has increased exponentially over the last two decades, making it an important area for the application of text mining based summarization algorithms. In the current study, we use text analytics algorithms to mine and summarize 10,000 PubMed abstracts related to anti-epileptic drugs published within the last 10 years. A Text Frequency – Inverse Document Frequency based filtering was applied to identify drugs with highest frequency of mentions within these abstracts. The US Food and Drug database was scrapped and linked to the results to quantify the most frequently mentioned modes of action and elucidate the pharmaceutical entities marketing these drugs. A sentiment analysis model was created to score the abstracts for sentiment positivity or negativity. Finally, a modified Latent Dirichlet Allocation topic model was generated to extract key topics associated with the most frequently mentioned AEDs. Results of this study provide accurate and data intensive insights on the progress of anti-epileptic drug research.


2021 ◽  
Author(s):  
Fei Shen ◽  
Wenting Yu ◽  
Chen Min ◽  
Qianying Ye ◽  
Chuanli Xia ◽  
...  

Text mining has been a dominant approach to extracting useful information from massive unstructured data online. But existing tools for Chinese word segmentation are not ideal for processing social media text data in Cantonese. This project developed CyberCan (https://github.com/shenfei1010/CyberCan), a lexicon of contemporary Cantonese based on more than 100 million pieces of internet texts. We compared the performance of CyberCan with existing Mandarin and Cantonese lexicons in terms of their word segmentation performance. Findings suggest that CyberCan outperforms all existing lexicons by a considerable margin.


2021 ◽  
Author(s):  
Yuting Guo ◽  
Yao Ge ◽  
Yuan-Chi Yang ◽  
Mohammed Ali Al-Garadi ◽  
Abeed Sarker

Motivation Pretrained contextual language models proposed in the recent past have been reported to achieve state-of-the-art performances in many natural language processing (NLP) tasks. There is a need to benchmark such models for targeted NLP tasks, and to explore effective pretraining strategies to improve machine learning performance. Results In this work, we addressed the task of health-related social media text classification. We benchmarked five models-RoBERTa, BERTweet, TwitterBERT, BioClinical_BERT, and BioBERT on 22 tasks. We attempted to boost performance for the best models by comparing distinct pretraining strategies-domain-adaptive pretraining (DAPT), source-adaptive pretraining (SAPT), and topic-specific pretraining (TSPT). RoBERTa and BERTweet performed comparably in most tasks, and better than others. For pretraining strategies, SAPT performed better or comparable to the off-the-shelf models, and significantly outperformed DAPT. SAPT+TSPT showed consistently high performance, with statistically significant improvement in one task. Our findings demonstrate that RoBERTa and BERTweet are excellent off-the-shelf models for health-related social media text classification, and extended pretraining using SAPT and TSPT can further improve performance.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Sudha Cheerkoot-Jalim ◽  
Kavi Kumar Khedo

Purpose This work shows the results of a systematic literature review on biomedical text mining. The purpose of this study is to identify the different text mining approaches used in different application areas of the biomedical domain, the common tools used and the challenges of biomedical text mining as compared to generic text mining algorithms. This study will be of value to biomedical researchers by allowing them to correlate text mining approaches to specific biomedical application areas. Implications for future research are also discussed. Design/methodology/approach The review was conducted following the principles of the Kitchenham method. A number of research questions were first formulated, followed by the definition of the search strategy. The papers were then selected based on a list of assessment criteria. Each of the papers were analyzed and information relevant to the research questions were extracted. Findings It was found that researchers have mostly harnessed data sources such as electronic health records, biomedical literature, social media and health-related forums. The most common text mining technique was natural language processing using tools such as MetaMap and Unstructured Information Management Architecture, alongside the use of medical terminologies such as Unified Medical Language System. The main application area was the detection of adverse drug events. Challenges identified included the need to deal with huge amounts of text, the heterogeneity of the different data sources, the duality of meaning of words in biomedical text and the amount of noise introduced mainly from social media and health-related forums. Originality/value To the best of the authors’ knowledge, other reviews in this area have focused on either specific techniques, specific application areas or specific data sources. The results of this review will help researchers to correlate most relevant and recent advances in text mining approaches to specific biomedical application areas by providing an up-to-date and holistic view of work done in this research area. The use of emerging text mining techniques has great potential to spur the development of innovative applications, thus considerably impacting on the advancement of biomedical research.


Sign in / Sign up

Export Citation Format

Share Document