scholarly journals An Automated Method To Enrich Consumer Health Vocabularies Using GloVe Word Embeddings and An Auxiliary Lexical Resource (Preprint)

2020 ◽  
Author(s):  
Mohammed Ibrahim ◽  
Susan Gauch ◽  
Omar Salman ◽  
Mohammed Alqahatani

BACKGROUND Clear language makes communication easier between any two parties. A layman may have difficulty communicating with a professional due to not understanding the specialized terms common to the domain. In healthcare, it is rare to find a layman knowledgeable in medical jargon which can lead to poor understanding of their condition and/or treatment. To bridge this gap, several professional vocabularies and ontologies have been created to map laymen medical terms to professional medical terms and vice versa. OBJECTIVE Many of the presented vocabularies are built manually or semi-automatically requiring large investments of time and human effort and consequently the slow growth of these vocabularies. In this paper, we present an automatic method to enrich laymen's vocabularies that has the benefit of being able to be applied to vocabularies in any domain. METHODS Our entirely automatic approach uses machine learning, specifically Global Vectors for Word Embeddings (GloVe), on a corpus collected from a social media healthcare platform to extend and enhance consumer health vocabularies (CHV). Our approach further improves the CHV by incorporating synonyms and hyponyms from the WordNet ontology. The basic GloVe and our novel algorithms incorporating WordNet were evaluated using two laymen datasets from the National Library of Medicine (NLM), Open-Access Consumer Health Vocabulary (OAC CHV) and MedlinePlus Healthcare Vocabulary. RESULTS The results show that GloVe was able to find new laymen terms with an F-score of 48.44%. Furthermore, our enhanced GloVe approach outperformed basic GloVe with an average F-score of 61%, a relative improvement of 25%. CONCLUSIONS This paper presents an automatic approach to enrich consumer health vocabularies using the GloVe word embeddings and an auxiliary lexical source, WordNet. Our approach was evaluated used a healthcare text downloaded from MedHelp.org, a healthcare social media platform using two standard laymen vocabularies, OAC CHV, and MedlinePlus. We used the WordNet ontology to expand the healthcare corpus by including synonyms, hyponyms, and hypernyms for each CHV layman term occurrence in the corpus. Given a seed term selected from a concept in the ontology, we measured our algorithms’ ability to automatically extract synonyms for those terms that appeared in the ground truth concept. We found that enhanced GloVe outperformed GloVe with a relative improvement of 25% in the F-score.

2021 ◽  
Vol 7 ◽  
pp. e668
Author(s):  
Mohammed Ibrahim ◽  
Susan Gauch ◽  
Omar Salman ◽  
Mohammed Alqahtani

Background Clear language makes communication easier between any two parties. A layman may have difficulty communicating with a professional due to not understanding the specialized terms common to the domain. In healthcare, it is rare to find a layman knowledgeable in medical terminology which can lead to poor understanding of their condition and/or treatment. To bridge this gap, several professional vocabularies and ontologies have been created to map laymen medical terms to professional medical terms and vice versa. Objective Many of the presented vocabularies are built manually or semi-automatically requiring large investments of time and human effort and consequently the slow growth of these vocabularies. In this paper, we present an automatic method to enrich laymen’s vocabularies that has the benefit of being able to be applied to vocabularies in any domain. Methods Our entirely automatic approach uses machine learning, specifically Global Vectors for Word Embeddings (GloVe), on a corpus collected from a social media healthcare platform to extend and enhance consumer health vocabularies. Our approach further improves the consumer health vocabularies by incorporating synonyms and hyponyms from the WordNet ontology. The basic GloVe and our novel algorithms incorporating WordNet were evaluated using two laymen datasets from the National Library of Medicine (NLM), Open-Access Consumer Health Vocabulary (OAC CHV) and MedlinePlus Healthcare Vocabulary. Results The results show that GloVe was able to find new laymen terms with an F-score of 48.44%. Furthermore, our enhanced GloVe approach outperformed basic GloVe with an average F-score of 61%, a relative improvement of 25%. Furthermore, the enhanced GloVe showed a statistical significance over the two ground truth datasets with P < 0.001. Conclusions This paper presents an automatic approach to enrich consumer health vocabularies using the GloVe word embeddings and an auxiliary lexical source, WordNet. Our approach was evaluated used healthcare text downloaded from MedHelp.org, a healthcare social media platform using two standard laymen vocabularies, OAC CHV, and MedlinePlus. We used the WordNet ontology to expand the healthcare corpus by including synonyms, hyponyms, and hypernyms for each layman term occurrence in the corpus. Given a seed term selected from a concept in the ontology, we measured our algorithms’ ability to automatically extract synonyms for those terms that appeared in the ground truth concept. We found that enhanced GloVe outperformed GloVe with a relative improvement of 25% in the F-score.


Author(s):  
Albert Park ◽  
Mike Conway

ObjectiveWe aim to develop an automated method to track opium relateddiscussions that are made in the social media platform calledReddit.As a first step towards this goal, we use a keyword-based approach totrack how often Reddit members discuss opium related issues.IntroductionIn recent years, the use of social media has increased at anunprecedented rate. For example, the popular social media platformReddit (http://www.reddit.com) had 83 billion page views from over88,000 active sub-communities (subreddits) in 2015. Members ofReddit made over 73 million individual posts and over 725 millionassociated comments in the same year [1].We use Reddit to track opium related discussions, because Redditallows for throwaway and unidentifiable accounts that are suitable forstigmatized discussions that may not be appropriate for identifiableaccounts. Reddit members exchange conversation via a forum likeplatform, and members who have achieved a certain status withinthe community are able to create new topically focused group calledsubreddits.MethodsFirst, we use a dataset archived by one of Reddit members who usedReddit’s official Application Programming Interface (API) to collectthe data (https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/). The dataset iscomprised of 239,772 (including both active and inactive) subreddits,13,213,173 unique user IDs, 114,320,798 posts, and 1,659,361,605associated comments that are made from Oct of 2007 to May of 2015.Second, we identify 10 terms that are associated with opium. Theterms are ‘opium’, ‘opioid’, ‘morphine’, ‘opiate’,’ hydrocodone’,‘oxycodone’, ‘fentanyl’, ‘oxy’, ‘heroin’, ‘methadone’. Third, wepreprocess the entire dataset, which includes structuring the data intomonthly time frame, converting text to lower cases, and stemmingkeywords and text. Fourth, we employed a dictionary approachto count and extract timestamps, user IDs, posts, and commentscontaining opium related terms. Fifth, we normalized the frequencycount by dividing the frequency count by the overall number of therespective variable for that period.ResultsAccording to our dataset, Reddit members discuss opium relatedtopics in social media. The normalized frequency count of postersshows that less than one percent members, on average, talk aboutopium related topics (Figure 1). Although the community as a wholedoes not frequently talk about opium related issues, this still amountsto more than 10,000 members in 2015 (Figure 2). Moreover, membersof Reddit created a number of subreddits, such as ‘oxycontin’,‘opioid’, ‘heroin’, ‘oxycodon’, that explicitly focus on opioids.ConclusionsWe present preliminary findings on developing an automatedmethod to track opium related discussions in Reddit. Our initialresults suggest that on the basis of our analysis of Reddit, members ofthe Reddit community discuss opium related issues in social media,although the discussions are contributed by a small fraction of themembers.We provide several interesting directions to future work to bettertrack opium related discussions in Reddit. First, the automated methodneeds to be further developed to employ more sophisticated methodslike knowledge-based and corpus-based approaches to better extractopium related discussions. Second, the automated method needs tobe thoroughly evaluated and measure precision, recall, accuracy, andF1-score of the system. Third, given how many members use socialmedia to discuss these issues, it will be helpful to investigate thespecifics of their discussions.Line Graphs of normalized frequency counts for posters, comments, and poststhat contained opium related termsLine Graphs of raw frequency counts for posters, comments, and posts thatcontained opium related terms


2021 ◽  
Author(s):  
Alejandro Garcia-Rudolph ◽  
Blanca Cegarra ◽  
Joan Sauri ◽  
John D. Kelleher ◽  
Katryna Cisek ◽  
...  

BACKGROUND Topic modeling and word embeddings’ studies of Twitter data related to COVID-19 are being extensively reported. Another social media platform that experienced a tremendous increase in new users and posts due to COVID-19 was Reddit, offering a much less explored alternative, especially the submissions’ titles, due to their format (≤ 300 characters) and content rules. The positivity of self-presentation on social media has an influence on both the quantity and quality of reactions (upvotes) from other social media contacts. OBJECTIVE 1) Expand on the concept of resilience identifying possible related topics considering their number of upvotes and its closest terms and 2) Associate specific emotions obtained from the state-of-the-art literature to their closest terms in order to relate such emotions to experienced situations. METHODS Reddit data were collected from pushshift.io, with the pushshiftr R package, data cleaning and preprocessing was performed using quanteda, tidyverse, tidytext R packages. A word2vec model (W2V) was trained using submissions’ titles, preliminary validation was performed using a subset of Mikolov’s analogies and a COVID-19 glossary. The W2V model was trained with the wordVectors R package. Main topics (represented as sets of words) using the number of upvotes as covariate were extracted using structural topic modelling (STM) with the spectral methos using the stm R package. Topics validation was performed using semantic coherence and exclusivity. Clusters were assessed using Dunn index. RESULTS We collected all 374,421 titles submitted by 104,351 different redditors to the r/Coronavirus subreddit between January 20th 2020 and 14th May 2021. We trained W2V and identified more than 20 valid analogies (e.g. doctor – hospital + teacher = school). We further validated W2V with representative terms extracted from a COVID-19 glossary, all closest terms retrieved by W2V were verified using state of the art publications. STM retrieved 20 topics (with 20 words each) ordered by their number of upvotes, we run W2V in a representative topic (addressing vaccines) and we used two terms as seeds leading to other related terms (represented using cluster analysis) that we validated using scientific publications. STM did not retrieve any topic containing the term “resilience”, it hardly appeared (less than 0.02%) in all titles. Nevertheless we identified several closest terms (e.g. wellbeing, roadmap) and combined terms (e.g. resilience and elderly, resilience and indigenous) as well as specific emotions that W2V related to lived experiences (e.g. the emotion of gratitude associated to applauses and balconies). CONCLUSIONS We applied for the first time the combination of STM and a word2vec model trained with a relatively small Coronavirus dataset of Reddit titles, leading to immediate and accurate terms that can be used to expand our knowledge on topics associated to the pandemic (e.g. vaccines) or specific aspects such as resilience.


2017 ◽  
Vol 4 (2) ◽  
pp. 185-200 ◽  
Author(s):  
Servet Kardeş ◽  
Çağla Banko ◽  
Berrin Akman

Bu araştırmada sığınmacılara yönelik paylaşımların yapıldığı sosyal medyada yer alan sözlüklerden birinde sığınmacılara yönelik algıya bakılmıştır. Yöntem olarak nitel desende olan bu çalışmada, bir sosyal medya sitesinde yer alan paylaşımlar içerik analizi yoluyla derinlemesine incelenip yorumlanmıştır. Araştırmanın sonucunda sosyal medya kullanıcılarının sığınmacıları büyük bir güvensizlik ortamı ve huzursuzluk yaratan bireyler olarak gördükleri saptanmış, sığınmacılarla yaşanan deneyimlerin ve medyadaki haberlerin bu düşüncelerin oluşmasında etkisinin olduğu belirlenmiştir. Bunun yanında sosyal medya kullanıcılarının devletin sığınmacılar konusunda yanlış politika izlediğini düşündükleri ve sığınmacılar için etkili bir planlama yapılmadığını ifade ettikleri görülmüştür. Çalışmanın sonuçları doğrultusunda medyada sığınmacılar hakkında çıkan haberlerde olumsuz ve şiddet temalı haberlerin azaltılması, Suriyeli sığınmacıların durumu, sahip oldukları haklar ve topluma yansımaları hakkında doğru ve bilgilendirici kamu spotları hazırlanması ayrıca sığınmacıların topluma entegre olma sürecinin her basamağında daha planlı ve etkili bir yol izlenmesi önerilebilir.ABSTRACT IN ENGLISHPerceptions about Syrian refugees on social media: an evaluation of a social media platformIn this research, posts which are about Syrian refugees were published in a social media platform, called as “sözlük” were investigated. The research is a qualitative research. The posts in this platform are analyzed with content analysis method. According to results of analyses, social media users see Syrian refugees as people who create an insecure and a restless environment. The experiences people had with them and news have an effect on this view. In addition, social media users think that government made inappropriate policies and ineffective plans about Syrian refugees. It is suggested negative news about Syrian refugees should be decreased and government should make safer policies. In addition, adaptation of refugees to society should be made in more planned and effective way.


2020 ◽  
Vol 48 (3) ◽  
pp. 1-11
Author(s):  
Huiqin Zhang ◽  
Hai Lan ◽  
Xudong Chen

The Weibo social media platform in China has an important role in the value-generation process between a company and a customer. We investigated the relationship between the service quality provided on a company's Weibo page and the two dimensions of customer value cocreation behavior, namely, participation and citizenship, as well as the moderating effect of collectivism on this relationship. Participants were 354 active users of Weibo. Our findings confirmed that the service quality provided on a company's Weibo page was critical to the generation of customer value cocreation behavior. Further, collectivism moderated this relationship, with higher levels of collectivism strengthening the Weibo page service quality and customer value cocreation behavior relationship. In addition, customer citizenship behavior was positively related to customer perceptions of brand image, whereas customer participation was not. Implications for companies in the Chinese context are discussed.


Author(s):  
Piotr Szamrowski ◽  
Adam Pawlewicz

The main objective of this paper is to identify the platforms and social media tools utilized by the brewing industry in communication with the stakeholders, mainly with potential clients. In addition, the study sought to determine the nature of the published content, identify those responsible for their management, and present the advantages and disadvantages of their conduct in communication and creating the image of the company. The results indicate that only 25% of the surveyed companies do not use social media in PR. This applies only to small enterprises, with regional character. All the major brewing companies in their public relations activities use at least one type of social media, focusing in most cases on social networking (Facebook) and Video Sharing (YouTube). In addition, some of the largest brands included in the individual equity groups have their own social media channels used to communicate with the stakeholders. General promotion of company products and, what is very important, creating a dialogue with social media platform community, were seen as the most important benefits of using social media.


GSA Today ◽  
2017 ◽  
Author(s):  
C.J. Spencer ◽  
K.L. Gunderson ◽  
C.W. Hoiland ◽  
W.K. Schleiffarth

2020 ◽  
Vol 12 (17) ◽  
pp. 7081 ◽  
Author(s):  
Athapol Ruangkanjanases ◽  
Shu-Ling Hsu ◽  
Yenchun Jim Wu ◽  
Shih-Chih Chen ◽  
Jo-Yu Chang

With the growth of social media communities, people now use this new media to engage in many interrelated activities. As a result, social media communities have grown into popular and interactive platforms among users, consumers and enterprises. In the social media era of high competition, increasing continuance intention towards a specific social media platform could transfer extra benefits to such virtual groups. Based on the expectation-confirmation model (ECM), this research proposed a conceptual framework incorporating social influence and social identity as key determinants of social media continuous usage intention. The research findings of this study highlight that: (1) the social influence view of the group norms and image significantly affects social identity; (2) social identity significantly affects perceived usefulness and confirmation; (3) confirmation has a significant impact on perceived usefulness and satisfaction; (4) perceived usefulness and satisfaction have positive effects on usage continuance intention. The results of this study can serve as a guide to better understand the reasons for and implications of social media usage and adoption.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Yahya Albalawi ◽  
Jim Buckley ◽  
Nikola S. Nikolov

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.


Author(s):  
Khyati Mahajan ◽  
Sourav Roy Choudhury ◽  
Sara Levens ◽  
Tiffany Gallicano ◽  
Samira Shaikh

Sign in / Sign up

Export Citation Format

Share Document