scholarly journals Domain Independent Automatic Labeling system for Large-scale Social Data using Lexicon and Web-based Augmentation

2020 ◽  
Vol 49 (1) ◽  
pp. 36-54
Author(s):  
Shaheen Khatoon ◽  
Lamis Abu Romman

Recently, with the large-scale adoption of social media, people have begun to express their opinion on these sites in the form of reviews. Potential consumers often forced to wade through huge amount of reviews to make informed decision. Sentiment analysis has become rapid and effective way to automatically gauge consumers’ opinion. However, such analysis often requires tedious process of manual tagging of large training examples or manually building a lexicon for the purpose of classifying reviews as positive or negative. In this paper, we present a method to automate the tedious process of labeling large textual data in an unsupervised, domain independent and scalable manner. The proposed method combines the lexicon-based and Web-based Point Wise Mutual Information (PMI) statistics to find the Semantic Orientation (SO) of opinion expressed in a review.  Based on proposed methods a system called Domain Independent Automatic Labeling System (DIALS) has been implemented, which takes collection of text from any domain as input and generates fully labeled dataset in an unsupervised and scalable manner. The result generated can be used to track and summarize online discussion and/or use to train any classifier in the next stage of development. The effectiveness of system is tested by comparing it with baseline machine learning and lexicon-based methods. Experiments on multi-domains dataset has shown that proposed method consistently shown improved recall and accuracy as compared to baseline machine learning and lexicon-based methods.   

Author(s):  
Aijaz Ahmad Malik ◽  
Warot Chotpatiwetchkul ◽  
Chuleeporn Phanus-umporn ◽  
Chanin Nantasenamat ◽  
Phasit Charoenkwan ◽  
...  

10.2196/30753 ◽  
2021 ◽  
Vol 23 (12) ◽  
pp. e30753
Author(s):  
Mai ElSherief ◽  
Steven A Sumner ◽  
Christopher M Jones ◽  
Royal K Law ◽  
Akadia Kacha-Ochana ◽  
...  

Background Expanding access to and use of medication for opioid use disorder (MOUD) is a key component of overdose prevention. An important barrier to the uptake of MOUD is exposure to inaccurate and potentially harmful health misinformation on social media or web-based forums where individuals commonly seek information. There is a significant need to devise computational techniques to describe the prevalence of web-based health misinformation related to MOUD to facilitate mitigation efforts. Objective By adopting a multidisciplinary, mixed methods strategy, this paper aims to present machine learning and natural language analysis approaches to identify the characteristics and prevalence of web-based misinformation related to MOUD to inform future prevention, treatment, and response efforts. Methods The team harnessed public social media posts and comments in the English language from Twitter (6,365,245 posts), YouTube (99,386 posts), Reddit (13,483,419 posts), and Drugs-Forum (5549 posts). Leveraging public health expert annotations on a sample of 2400 of these social media posts that were found to be semantically most similar to a variety of prevailing opioid use disorder–related myths based on representational learning, the team developed a supervised machine learning classifier. This classifier identified whether a post’s language promoted one of the leading myths challenging addiction treatment: that the use of agonist therapy for MOUD is simply replacing one drug with another. Platform-level prevalence was calculated thereafter by machine labeling all unannotated posts with the classifier and noting the proportion of myth-indicative posts over all posts. Results Our results demonstrate promise in identifying social media postings that center on treatment myths about opioid use disorder with an accuracy of 91% and an area under the curve of 0.9, including how these discussions vary across platforms in terms of prevalence and linguistic characteristics, with the lowest prevalence on web-based health communities such as Reddit and Drugs-Forum and the highest on Twitter. Specifically, the prevalence of the stated MOUD myth ranged from 0.4% on web-based health communities to 0.9% on Twitter. Conclusions This work provides one of the first large-scale assessments of a key MOUD-related myth across multiple social media platforms and highlights the feasibility and importance of ongoing assessment of health misinformation related to addiction treatment.


2021 ◽  
Author(s):  
Tanya Nijhawan ◽  
Girija Attigeri ◽  
Ananthakrishna T

Abstract Cyberspace is a vast soapbox for people to post anything that they witness in their day-to-day lives. Subsequently, it can be used as a very effective tool in detecting the stress levels of an individual based on the posts and comments shared by him/her on social networking platforms. We leverage large-scale datasets with tweets to successfully accomplish sentiment analysis with the aid of machine learning algorithms. We take the help of a capable deep learning pre-trained model called BERT to solve the problems which come with sentiment classification. The BERT model outperforms a lot of other well-known models for this job without any sophisticated architecture. We also adopted Latent Dirichlet Allocation which is an unsupervised machine learning method that’s skilled in scanning a group of documents, recognizing the word and phrase patterns within them, and gathering word groups and alike expressions that most precisely illustrate a set of documents. This helps us predict which topic is linked to the textual data. With the aid of the models suggested, we will be able to detect the emotion of users online. We are primarily working with Twitter data because Twitter is a website where people express their thoughts often. In conclusion, this proposal is for the well- being of one’s mental health. The results are evaluated using various metric at macro and micro level and indicate that the trained model detects the status of emotions bases on social interactions.


2019 ◽  
Author(s):  
Mostafa Karimi ◽  
Di Wu ◽  
Zhangyang Wang ◽  
Yang Shen

AbstractPredicting compound-protein affinity is beneficial for accelerating drug discovery. Doing so without the often-unavailable structure data is gaining interest. However, recent progress in structure-free affinity prediction, made by machine learning, focuses on accuracy but leaves much to be desired for interpretability. Defining inter-molecular contacts underlying affinities as a vehicle for interpretability, our large-scale interpretability assessment finds previously-used attention mechanisms inadequate. We thus formulate a hierarchical multi-objective learning problem whose predicted contacts form the basis for predicted affinities. And we solve the problem by embedding protein sequences (by hierarchical recurrent neural networks) and compound graphs (by graph neural networks) with joint attentions between protein residues and compound atoms. We further introduce three methodological advances to enhance interpretability: (1) structure-aware regularization of attentions using protein sequence-predicted solvent exposure and residue-residue contact maps; (2) supervision of attentions using known inter-molecular contacts in training data; and (3) an intrinsically explainable architecture where atomic-level contacts or “relations” lead to molecular-level affinity prediction. The first two and all three advances result in DeepAffinity+ and DeepRelations, respectively. Our methods show generalizability in affinity prediction for molecules that are new and dissimilar to training examples. Moreover, they show superior interpretability compared to state-of-the-art interpretable methods: with similar or better affinity prediction, they boost the AUPRC of contact prediction by around 33, 35, 10, and 9-fold for the default test, new-compound, new-protein, and both-new sets, respectively. We further demonstrate their potential utilities in contact-assisted docking, structure-free binding site prediction, and structure-activity relationship studies without docking. Our study represents the first model development and systematic model assessment dedicated to interpretable machine learning for structure-free compound-protein affinity prediction.


2021 ◽  
Author(s):  
Mai ElSherief ◽  
Steven A Sumner ◽  
Christopher M Jones ◽  
Royal K Law ◽  
Akadia Kacha-Ochana ◽  
...  

BACKGROUND Expanding access to and use of medication for opioid use disorder (MOUD) is a key component of overdose prevention. An important barrier to the uptake of MOUD is exposure to inaccurate and potentially harmful health misinformation on social media or web-based forums where individuals commonly seek information. There is a significant need to devise computational techniques to describe the prevalence of web-based health misinformation related to MOUD to facilitate mitigation efforts. OBJECTIVE By adopting a multidisciplinary, mixed methods strategy, this paper aims to present machine learning and natural language analysis approaches to identify the characteristics and prevalence of web-based misinformation related to MOUD to inform future prevention, treatment, and response efforts. METHODS The team harnessed public social media posts and comments in the English language from Twitter (6,365,245 posts), YouTube (99,386 posts), Reddit (13,483,419 posts), and Drugs-Forum (5549 posts). Leveraging public health expert annotations on a sample of 2400 of these social media posts that were found to be semantically most similar to a variety of prevailing opioid use disorder–related myths based on representational learning, the team developed a supervised machine learning classifier. This classifier identified whether a post’s language promoted one of the leading myths challenging addiction treatment: that the use of agonist therapy for MOUD is simply replacing one drug with another. Platform-level prevalence was calculated thereafter by machine labeling all unannotated posts with the classifier and noting the proportion of myth-indicative posts over all posts. RESULTS Our results demonstrate promise in identifying social media postings that center on treatment myths about opioid use disorder with an accuracy of 91% and an area under the curve of 0.9, including how these discussions vary across platforms in terms of prevalence and linguistic characteristics, with the lowest prevalence on web-based health communities such as Reddit and Drugs-Forum and the highest on Twitter. Specifically, the prevalence of the stated MOUD myth ranged from 0.4% on web-based health communities to 0.9% on Twitter. CONCLUSIONS This work provides one of the first large-scale assessments of a key MOUD-related myth across multiple social media platforms and highlights the feasibility and importance of ongoing assessment of health misinformation related to addiction treatment.


2013 ◽  
Author(s):  
Laura S. Hamilton ◽  
Stephen P. Klein ◽  
William Lorie

2020 ◽  
Vol 59 (04) ◽  
pp. 294-299 ◽  
Author(s):  
Lutz S. Freudenberg ◽  
Ulf Dittmer ◽  
Ken Herrmann

Abstract Introduction Preparations of health systems to accommodate large number of severely ill COVID-19 patients in March/April 2020 has a significant impact on nuclear medicine departments. Materials and Methods A web-based questionnaire was designed to differentiate the impact of the pandemic on inpatient and outpatient nuclear medicine operations and on public versus private health systems, respectively. Questions were addressing the following issues: impact on nuclear medicine diagnostics and therapy, use of recommendations, personal protective equipment, and organizational adaptations. The survey was available for 6 days and closed on April 20, 2020. Results 113 complete responses were recorded. Nearly all participants (97 %) report a decline of nuclear medicine diagnostic procedures. The mean reduction in the last three weeks for PET/CT, scintigraphies of bone, myocardium, lung thyroid, sentinel lymph-node are –14.4 %, –47.2 %, –47.5 %, –40.7 %, –58.4 %, and –25.2 % respectively. Furthermore, 76 % of the participants report a reduction in therapies especially for benign thyroid disease (-41.8 %) and radiosynoviorthesis (–53.8 %) while tumor therapies remained mainly stable. 48 % of the participants report a shortage of personal protective equipment. Conclusions Nuclear medicine services are notably reduced 3 weeks after the SARS-CoV-2 pandemic reached Germany, Austria and Switzerland on a large scale. We must be aware that the current crisis will also have a significant economic impact on the healthcare system. As the survey cannot adapt to daily dynamic changes in priorities, it serves as a first snapshot requiring follow-up studies and comparisons with other countries and regions.


Sign in / Sign up

Export Citation Format

Share Document