Drug Abuse Ontology for the study of Substance Use Epidemiology on Social Media and Dark Web: Ontology development and usability study (Preprint)

2020 ◽  
Author(s):  
Usha Lokala ◽  
Raminta Daniulaityte ◽  
Francois Lamy ◽  
Manas Gaur ◽  
Krishnaprasad Thirunarayan ◽  
...  

BACKGROUND Web-based resources and social media platforms play an increasingly important role in health-related knowledge and experience sharing. There is a growing interest in the utilization of these novel data sources for epidemiological surveillance of substance use behaviors and trends. OBJECTIVE The key aims are to describe the development and application of the Drug Abuse Ontology as a framework for analyzing web-based data to inform public health surveillance for the following applications: 1) determining user knowledge, attitudes, and behaviors related to non-medical use of buprenorphine and other illicit opioids through analysis of web forum data; 2) understanding patterns and trends of cannabis product use in the context of evolving cannabis legalization policies in the U.S through analysis of Twitter and web forum data; and 3) gleaning trends in the availability of novel synthetic opioids through analysis of crypto market data. METHODS The domain and scope of the drug abuse ontology were defined using competency questions from two popular ontology methodologies (Neon and 101 ontology development methodology). The quality of the ontology is evaluated with a set of tools and best practices recognized by the Semantic Web community and the AI community that engage in natural language processing. The standard ontology metrics are also presented. RESULTS The current version of Drug Abuse Ontology comprises 315 classes, 31 relationships, and 814 instances among the classes. The ontology is flexible and can easily accommodate new concepts. The integration of the ontology with machine learning algorithms dramatically decreases the false alarm rate by adding external knowledge to the learning process. The ontology is being updated to capture evolving concepts and has been used for four different projects: PREDOSE, eDrugTrends, eDarkTrends, DAO applications in Mental Health and COVID scenario. CONCLUSIONS It has been found that the developed Drug Abuse Ontology (DAO) is useful to identify the most frequently used terms/slang terms on social media/dark web related to drug abuse posted by the general population .

2020 ◽  
Author(s):  
Usha Lokala

BACKGROUND Web-based resources and social media platforms play an increasingly important role in health-related knowledge and experience sharing. There is a growing interest in the utilization of these novel data sources for epidemiological surveillance of substance use behaviors and trends. OBJECTIVE The key aims are to describe development and application of the Drug Abuse Ontology as a framework for analyzing web-based data to inform public health surveillance in the following domains: 1) user knowledge, attitudes, and behaviors related to non-medical use of buprenorphine and other illicit opioids through analysis of web forum data; 2) patterns and trends of cannabis product use in the context of evolving cannabis legalization policies in the U.S through analysis of Twitter and web forum data; and 3) trends in the availability of novel synthetic opioids through analysis of crypto market data. METHODS The domain and scope of the drug abuse ontology were defined using competency questions from two popular ontology methodologies (Neon and 101 ontology development methodology). The quality of the ontology is evaluated with a set of tools and best practices recognized by the Semantic Web community and the AI community that engage in natural language processing. The standard ontology metrics are also presented. The ontology was manually developed by the domain experts from the Center for Interventions, Treatment, and Addictions Research (CITAR) who used a range of data sources: 1) key epidemiological data sources and reports accessible through National Institute on Drug Abuse, Drug Enforcement Agency, European Monitoring Centre for Drugs Addiction, RxNorm and other; 2) prior peer-reviewed publications related to illicit opioids, cannabis, and other drugs; and 3) preliminary assessment and examination of web-based, social media sources related to selected substances. Sources of types 1 and 2 provided primary concepts while sources of type 3 were important in identifying alternative concepts including synonyms and street names. RESULTS The current version of Drug Abuse Ontology comprises 315 classes, 31 relationships, and 814 instances among the classes. The ontology is flexible and can easily accommodate new concepts. The integration of the ontology with machine learning algorithms dramatically decreases the false alarm rate by adding external knowledge to the learning process. The ontology is being updated to capture evolving concepts and has been used for three different projects: PREDOSE, eDrugTrends, and eDarkTrends. CONCLUSIONS It is found that the developed DAO is useful to identify the most frequently used terms/slang terms on social media/dark web related to drug abuse posted by the general population on social media and vendors on the dark web.


2019 ◽  
Vol 2 (1) ◽  
Author(s):  
Ari Z. Klein ◽  
Abeed Sarker ◽  
Davy Weissenbacher ◽  
Graciela Gonzalez-Hernandez

Abstract Social media has recently been used to identify and study a small cohort of Twitter users whose pregnancies with birth defect outcomes—the leading cause of infant mortality—could be observed via their publicly available tweets. In this study, we exploit social media on a larger scale by developing natural language processing (NLP) methods to automatically detect, among thousands of users, a cohort of mothers reporting that their child has a birth defect. We used 22,999 annotated tweets to train and evaluate supervised machine learning algorithms—feature-engineered and deep learning-based classifiers—that automatically distinguish tweets referring to the user’s pregnancy outcome from tweets that merely mention birth defects. Because 90% of the tweets merely mention birth defects, we experimented with under-sampling and over-sampling approaches to address this class imbalance. An SVM classifier achieved the best performance for the two positive classes: an F1-score of 0.65 for the “defect” class and 0.51 for the “possible defect” class. We deployed the classifier on 20,457 unlabeled tweets that mention birth defects, which helped identify 542 additional users for potential inclusion in our cohort. Contributions of this study include (1) NLP methods for automatically detecting tweets by users reporting their birth defect outcomes, (2) findings that an SVM classifier can outperform a deep neural network-based classifier for highly imbalanced social media data, (3) evidence that automatic classification can be used to identify additional users for potential inclusion in our cohort, and (4) a publicly available corpus for training and evaluating supervised machine learning algorithms.


2021 ◽  
Vol 4 (1) ◽  
pp. 01-26
Author(s):  
Muhammad Arif

Social media networks are becoming an essential part of life for most of the world’s population. Detecting cyberbullying using machine learning and natural language processing algorithms is getting the attention of researchers. There is a growing need for automatic detection and mitigation of cyberbullying events on social media. In this study, research directions and the theoretical foundation in this area are investigated. A systematic review of the current state-of-the-art research in this area is conducted. A framework considering all possible actors in the cyberbullying event must be designed, including various aspects of cyberbullying and its effect on the participating actors. Furthermore, future directions and challenges are also discussed.


2019 ◽  
Vol 9 (1) ◽  
pp. 53
Author(s):  
Nfn Bahrawi

<p class="JGI-AbstractIsi">Twitter is one of the social media that has a simple and fast concept, because short messages, news or information on Twitter can be more easily digested. This social media is also widely used as an object for researchers or industry to conduct sentiment analysis in the fields of social, economic, political or other fields. Opinion mining or also commonly called sentiment analysis is the process of analyzing text to get certain information in a sentence in the form of opinion. Sentiment analysis is one of the branches of the science of Text mining where text mining is a natural language processing technique and analytical method that is applied to text data to obtain relevant information. Public opinion or sentiment in social media twitter is very dynamic and fast changing, a real time sentiment analysis system is needed and it is automatically updated continuously so that changes can always be monitored, anytime and anywhere. This research builds a system so that it can analyze sentiment from twitter social media in realtime and automatically continuously. The results of the system trial succeeded in drawing data, conducting sentiment analysis and displaying it in graphical and web-based realtime and updated automatically. Furthermore, this research will be developed with a focus on the accuracy of the algorithms used in conducting the sentiment analysis process.</p>


Author(s):  
Erick Omuya ◽  
George Okeyo ◽  
Michael Kimwele

Social media has been embraced by different people as a convenient and official medium of communication. People write messages and attach images and videos on Twitter, Facebook and other social media which they share. Social media therefore generates a lot of data that is rich in sentiments from these updates. Sentiment analysis has been used to determine opinions of clients, for instance, relating to a particular product or company. Knowledge based approach and Machine learning approach are among the strategies that have been used to analyze these sentiments. The performance of sentiment analysis is however distorted by noise, the curse of dimensionality, the data domains and size of data used for training and testing. This research aims at developing a model for sentiment analysis in which dimensionality reduction and the use of different parts of speech improves sentiment analysis performance. It uses natural language processing for filtering, storing and performing sentiment analysis on the data from social media. The model is tested using Naïve Bayes, Support Vector Machines and K-Nearest neighbor machine learning algorithms and its performance compared with that of two other Sentiment Analysis models. Experimental results show that the model improves sentiment analysis performance using machine learning techniques.


2016 ◽  
Author(s):  
Sunny Jung Kim ◽  
Lisa A Marsch ◽  
Jeffrey T Hancock ◽  
Amarendra K Das

BACKGROUND Substance use–related communication for drug use promotion and its prevention is widely prevalent on social media. Social media big data involve naturally occurring communication phenomena that are observable through social media platforms, which can be used in computational or scalable solutions to generate data-driven inferences. Despite the promising potential to utilize social media big data to monitor and treat substance use problems, the characteristics, mechanisms, and outcomes of substance use–related communications on social media are largely unknown. Understanding these aspects can help researchers effectively leverage social media big data and platforms for observation and health communication outreach for people with substance use problems. OBJECTIVE The objective of this critical review was to determine how social media big data can be used to understand communication and behavioral patterns of problematic use of prescription drugs. We elaborate on theoretical applications, ethical challenges and methodological considerations when using social media big data for research on drug abuse and addiction. Based on a critical review process, we propose a typology with key initiatives to address the knowledge gap in the use of social media for research on prescription drug abuse and addiction. METHODS First, we provided a narrative summary of the literature on drug use–related communication on social media. We also examined ethical considerations in the research processes of (1) social media big data mining, (2) subgroup or follow-up investigation, and (3) dissemination of social media data-driven findings. To develop a critical review-based typology, we searched the PubMed database and the entire e-collection theme of “infodemiology and infoveillance” in the Journal of Medical Internet Research / JMIR Publications. Studies that met our inclusion criteria (eg, use of social media data concerning non-medical use of prescription drugs, data informatics-driven findings) were reviewed for knowledge synthesis. User characteristics, communication characteristics, mechanisms and predictors of such communications, and the psychological and behavioral outcomes of social media use for problematic drug use–related communications are the dimensions of our typology. In addition to ethical practices and considerations, we also reviewed the methodological and computational approaches used in each study to develop our typology. RESULTS We developed a typology to better understand non-medical, problematic use of prescription drugs through the lens of social media big data. Highly relevant studies that met our inclusion criteria were reviewed for knowledge synthesis. The characteristics of users who shared problematic substance use–related communications on social media were reported by general group terms, such as adolescents, Twitter users, and Instagram users. All reviewed studies examined the communication characteristics, such as linguistic properties, and social networks of problematic drug use–related communications on social media. The mechanisms and predictors of such social media communications were not directly examined or empirically identified in the reviewed studies. The psychological or behavioral consequence (eg, increased behavioral intention for mimicking risky health behaviors) of engaging with and being exposed to social media communications regarding problematic drug use was another area of research that has been understudied. CONCLUSIONS We offer theoretical applications, ethical considerations, and empirical evidence within the scope of social media communication and prescription drug abuse and addiction. Our critical review suggests that social media big data can be a tremendous resource to understand, monitor and intervene on drug abuse and addiction problems.


Author(s):  
Somya Goyal ◽  
Arti Saxena

NLP is a wide and quickly developing segment of today's new digital technology, which falls under the domain of artificial intelligence. Alternative approaches for qualifying and quantifying an individual's creditworthiness have emerged in recent years as a result of recent advancements in AI. Banks and creditors may use AI to rate potential borrowers' creditworthiness based on alternative data, such as social media messages and internet usage, such as which websites people visit and what they buy from e-commerce stores. These digital footprints may show whether or not an individual is able to repay their debts. In this chapter, how the approaches of NLP could offer financial solutions to unbanked communities is explored. This chapter includes the use of various machine learning algorithms and deep learning to find the most accurate credit score of a user. Since NLP is less intrusive than providing direct access to a person's entire contact list or a social media site, it is a more accessible way to measure risk while still having the potential to target a larger audience.


Author(s):  
Sumaya Sulaiman Al Ameri ◽  
Abdulhadi Shoufan

The natural language processing of Arabic dialects faces a major difficulty, which is the lack of lexical resources. This problem complicates the penetration and the business of related technologies such as machine translation, speech recognition, and sentiment analysis. Current solutions frequently use lexica, which are specific to the task at hand and limited to some language variety. Modern communication platforms including social media gather people from different nations and regions. This has increased the demand for general-purpose lexica towards effective natural language processing solutions. This chapter presents a collaborative web-based platform for building a cross-dialectical, general-purpose lexicon for Arabic dialects. This solution was tested by a team of two annotators, a reviewer, and a lexicographer. The lexicon expansion rate was measured and analyzed to estimate the overhead required to reach the desired size of the lexicon. The inter-annotator reliability was analyzed using Cohen's Kappa.


: Web based life administrations, as Facebook and Twitter, Renren, Instagram, and linkedin have recently become an enormous and persistent supply of day by day news. These stages give a huge number of clients and give numerous administrations, for example, content arrangement and distributing. Not all distributed information via internet based medium is dependable and exact. Numerous individuals attempt to distribute fake and mistaken news so as to control general conclusion. Counterfeit news might be intentionally made to advance money related, political and public premiums, and can lead to unsafe effects on people convictions and choices.. In this paper we examine different systems for recognizing counterfeit information via internet based networking medium. Our point is to locate a dependable and right model that arranges a given article as fake or genuine. For identification of fake articles we use machine learning algorithms.


Sign in / Sign up

Export Citation Format

Share Document