scholarly journals Solution to Detect, Classify, and Report Illicit Online Marketing and Sales of Controlled Substances via Twitter: Using Machine Learning and Web Forensics to Combat Digital Opioid Access (Preprint)

2018 ◽  
Author(s):  
Tim Mackey ◽  
Janani Kalyanam ◽  
Josh Klugman ◽  
Ella Kuzmenko ◽  
Rashmi Gupta

BACKGROUND On December 6 and 7, 2017, the US Department of Health and Human Services (HHS) hosted its first Code-a-Thon event aimed at leveraging technology and data-driven solutions to help combat the opioid epidemic. The authors—an interdisciplinary team from academia, the private sector, and the US Centers for Disease Control and Prevention—participated in the Code-a-Thon as part of the prevention track. OBJECTIVE The aim of this study was to develop and deploy a methodology using machine learning to accurately detect the marketing and sale of opioids by illicit online sellers via Twitter as part of participation at the HHS Opioid Code-a-Thon event. METHODS Tweets were collected from the Twitter public application programming interface stream filtered for common prescription opioid keywords in conjunction with participation in the Code-a-Thon from November 15, 2017 to December 5, 2017. An unsupervised machine learning–based approach was developed and used during the Code-a-Thon competition (24 hours) to obtain a summary of the content of the tweets to isolate those clusters associated with illegal online marketing and sale using a biterm topic model (BTM). After isolating relevant tweets, hyperlinks associated with these tweets were reviewed to assess the characteristics of illegal online sellers. RESULTS We collected and analyzed 213,041 tweets over the course of the Code-a-Thon containing keywords codeine, percocet, vicodin, oxycontin, oxycodone, fentanyl, and hydrocodone. Using BTM, 0.32% (692/213,041) tweets were identified as being associated with illegal online marketing and sale of prescription opioids. After removing duplicates and dead links, we identified 34 unique “live” tweets, with 44% (15/34) directing consumers to illicit online pharmacies, 32% (11/34) linked to individual drug sellers, and 21% (7/34) used by marketing affiliates. In addition to offering the “no prescription” sale of opioids, many of these vendors also sold other controlled substances and illicit drugs. CONCLUSIONS The results of this study are in line with prior studies that have identified social media platforms, including Twitter, as a potential conduit for supply and sale of illicit opioids. To translate these results into action, authors also developed a prototype wireframe for the purposes of detecting, classifying, and reporting illicit online pharmacy tweets selling controlled substances illegally to the US Food and Drug Administration and the US Drug Enforcement Agency. Further development of solutions based on these methods has the potential to proactively alert regulators and law enforcement agencies of illegal opioid sales, while also making the online environment safer for the public.

2021 ◽  
Author(s):  
Danny Valdez ◽  
Jennifer B Unger

BACKGROUND In 2018, JUUL Labs Inc, a popular e-cigarette manufacturer, announced it would substantially limit its social media presence in compliance with the Food and Drug Administration’s (FDA) call to curb underage e-cigarette use. However, shortly after the announcement, a series of JUUL-related hashtags emerged on various social media platforms, calling the effectiveness of the FDA’s regulations into question. OBJECTIVE The purpose of this study is to show that hashtags remain a common venue to market age-restricted products on social media. METHODS We used Twitter’s standard Application Programming Interface (API) to download the 3200 most-recent tweets originating from JUUL Labs Inc.’s official Twitter Account (@JUULVapor), and a series of tweets containing one, or more, of the following hashtags (#ecig, #vape, #JUUL). We ran two Latent Dirichlet Allocation (LDA) topic models comparing @JUULVapor’s content versus our hashtag corpus. We qualitatively deliberated topic meanings and substantiated our interpretations with tweets from either corpus. RESULTS The topic model generated for @JUULVapor’s timeline seemingly alluded to compliance with the FDA’s call to prohibit marketing of age-restricted products on social media. However, the topic model generated for the hashtag corpus contained several references to flavors, vaping paraphernalia, and illicit drugs which may be appealing to younger audiences. CONCLUSIONS Our findings underscore the complicated nature of social media regulation. Although JUUL Labs Inc. seemingly complied with the FDA to limit its social media presence, JUUL and other e-cigarette manufacturers are still discussed openly in social media spaces. Much discourse about JUUL and e-cigarettes is spread via hashtags, which allow messages to reach a wide audience quickly. This suggests social media regulations on manufacturers are, by themselves, in effective. Stricter protocols are needed to regulate discourse about age-restricted products on social media.


10.2196/19509 ◽  
2020 ◽  
Vol 6 (2) ◽  
pp. e19509 ◽  
Author(s):  
Tim Mackey ◽  
Vidya Purushothaman ◽  
Jiawei Li ◽  
Neal Shah ◽  
Matthew Nali ◽  
...  

Background The coronavirus disease (COVID-19) pandemic is a global health emergency with over 6 million cases worldwide as of the beginning of June 2020. The pandemic is historic in scope and precedent given its emergence in an increasingly digital era. Importantly, there have been concerns about the accuracy of COVID-19 case counts due to issues such as lack of access to testing and difficulty in measuring recoveries. Objective The aims of this study were to detect and characterize user-generated conversations that could be associated with COVID-19-related symptoms, experiences with access to testing, and mentions of disease recovery using an unsupervised machine learning approach. Methods Tweets were collected from the Twitter public streaming application programming interface from March 3-20, 2020, filtered for general COVID-19-related keywords and then further filtered for terms that could be related to COVID-19 symptoms as self-reported by users. Tweets were analyzed using an unsupervised machine learning approach called the biterm topic model (BTM), where groups of tweets containing the same word-related themes were separated into topic clusters that included conversations about symptoms, testing, and recovery. Tweets in these clusters were then extracted and manually annotated for content analysis and assessed for their statistical and geographic characteristics. Results A total of 4,492,954 tweets were collected that contained terms that could be related to COVID-19 symptoms. After using BTM to identify relevant topic clusters and removing duplicate tweets, we identified a total of 3465 (<1%) tweets that included user-generated conversations about experiences that users associated with possible COVID-19 symptoms and other disease experiences. These tweets were grouped into five main categories including first- and secondhand reports of symptoms, symptom reporting concurrent with lack of testing, discussion of recovery, confirmation of negative COVID-19 diagnosis after receiving testing, and users recalling symptoms and questioning whether they might have been previously infected with COVID-19. The co-occurrence of tweets for these themes was statistically significant for users reporting symptoms with a lack of testing and with a discussion of recovery. A total of 63% (n=1112) of the geotagged tweets were located in the United States. Conclusions This study used unsupervised machine learning for the purposes of characterizing self-reporting of symptoms, experiences with testing, and mentions of recovery related to COVID-19. Many users reported symptoms they thought were related to COVID-19, but they were not able to get tested to confirm their concerns. In the absence of testing availability and confirmation, accurate case estimations for this period of the outbreak may never be known. Future studies should continue to explore the utility of infoveillance approaches to estimate COVID-19 disease severity.


2020 ◽  
Author(s):  
Tim Mackey ◽  
Vidya Purushothaman ◽  
Jiawei Li ◽  
Neal Shah ◽  
Matthew Nali ◽  
...  

BACKGROUND The coronavirus disease (COVID-19) pandemic is a global health emergency with over 6 million cases worldwide as of the beginning of June 2020. The pandemic is historic in scope and precedent given its emergence in an increasingly digital era. Importantly, there have been concerns about the accuracy of COVID-19 case counts due to issues such as lack of access to testing and difficulty in measuring recoveries. OBJECTIVE The aims of this study were to detect and characterize user-generated conversations that could be associated with COVID-19-related symptoms, experiences with access to testing, and mentions of disease recovery using an unsupervised machine learning approach. METHODS Tweets were collected from the Twitter public streaming application programming interface from March 3-20, 2020, filtered for general COVID-19-related keywords and then further filtered for terms that could be related to COVID-19 symptoms as self-reported by users. Tweets were analyzed using an unsupervised machine learning approach called the biterm topic model (BTM), where groups of tweets containing the same word-related themes were separated into topic clusters that included conversations about symptoms, testing, and recovery. Tweets in these clusters were then extracted and manually annotated for content analysis and assessed for their statistical and geographic characteristics. RESULTS A total of 4,492,954 tweets were collected that contained terms that could be related to COVID-19 symptoms. After using BTM to identify relevant topic clusters and removing duplicate tweets, we identified a total of 3465 (&lt;1%) tweets that included user-generated conversations about experiences that users associated with possible COVID-19 symptoms and other disease experiences. These tweets were grouped into five main categories including first- and secondhand reports of symptoms, symptom reporting concurrent with lack of testing, discussion of recovery, confirmation of negative COVID-19 diagnosis after receiving testing, and users recalling symptoms and questioning whether they might have been previously infected with COVID-19. The co-occurrence of tweets for these themes was statistically significant for users reporting symptoms with a lack of testing and with a discussion of recovery. A total of 63% (n=1112) of the geotagged tweets were located in the United States. CONCLUSIONS This study used unsupervised machine learning for the purposes of characterizing self-reporting of symptoms, experiences with testing, and mentions of recovery related to COVID-19. Many users reported symptoms they thought were related to COVID-19, but they were not able to get tested to confirm their concerns. In the absence of testing availability and confirmation, accurate case estimations for this period of the outbreak may never be known. Future studies should continue to explore the utility of infoveillance approaches to estimate COVID-19 disease severity.


2020 ◽  
Author(s):  
Genevieve Fullerton Dash ◽  
Nicholas G. Martin ◽  
Arpana Agrawal ◽  
Michael Lynskey ◽  
Wendy S. Slutske

Background. Drug classes are grouped based on their chemical and pharmacological properties, but prescription and illicit drugs differ in other important ways. Opioid and stimulant classes contain prescription and illicit forms differentially associated with salient risk factors (common route of administration, legality), making them useful comparators for examining the potential differences in the etiological influences on (mis)use of prescription and illicit drugs. Methods. 2,410 individual Australian twins (Mage=31.77 [SD=2.48]; 67% women) were interviewed about prescription misuse and illicit use of opioids and stimulants. Univariate and bivariate biometric models partitioned variances and covariances into additive genetic, shared environmental, and unique environmental influences across drug types. Results. Variation in the propensity to misuse prescription opioids was primarily attributable to genes (37%) and unique environment (59%). Illicit opioid use was attributable to shared (71%) and unique (29%) environment. Prescription stimulant misuse was primarily attributable to genes (78%) and unique environment (21%). Illicit stimulant use was influenced by genes (48%), and shared (29%) and unique environment (23%). There was evidence for genetic influence common to both stimulant types, but limited evidence for genetic influence common to both opioid types. Conclusions. Prescription opioid misuse may share little genetic influence with illicit opioid use. Future research may consider avoiding unitary drug classifications, particularly when examining genetic influences.


2021 ◽  
Author(s):  
Nguyen Minh Khiem ◽  
Yuki Takahashi ◽  
Khuu Thi Phuong Dong ◽  
Hiroki Yasuma ◽  
Nobuo Kimura
Keyword(s):  

Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 556
Author(s):  
Thaer Thaher ◽  
Mahmoud Saheb ◽  
Hamza Turabieh ◽  
Hamouda Chantar

Fake or false information on social media platforms is a significant challenge that leads to deliberately misleading users due to the inclusion of rumors, propaganda, or deceptive information about a person, organization, or service. Twitter is one of the most widely used social media platforms, especially in the Arab region, where the number of users is steadily increasing, accompanied by an increase in the rate of fake news. This drew the attention of researchers to provide a safe online environment free of misleading information. This paper aims to propose a smart classification model for the early detection of fake news in Arabic tweets utilizing Natural Language Processing (NLP) techniques, Machine Learning (ML) models, and Harris Hawks Optimizer (HHO) as a wrapper-based feature selection approach. Arabic Twitter corpus composed of 1862 previously annotated tweets was utilized by this research to assess the efficiency of the proposed model. The Bag of Words (BoW) model is utilized using different term-weighting schemes for feature extraction. Eight well-known learning algorithms are investigated with varying combinations of features, including user-profile, content-based, and words-features. Reported results showed that the Logistic Regression (LR) with Term Frequency-Inverse Document Frequency (TF-IDF) model scores the best rank. Moreover, feature selection based on the binary HHO algorithm plays a vital role in reducing dimensionality, thereby enhancing the learning model’s performance for fake news detection. Interestingly, the proposed BHHO-LR model can yield a better enhancement of 5% compared with previous works on the same dataset.


2021 ◽  
pp. 1-13
Author(s):  
C S Pavan Kumar ◽  
L D Dhinesh Babu

Sentiment analysis is widely used to retrieve the hidden sentiments in medical discussions over Online Social Networking platforms such as Twitter, Facebook, Instagram. People often tend to convey their feelings concerning their medical problems over social media platforms. Practitioners and health care workers have started to observe these discussions to assess the impact of health-related issues among the people. This helps in providing better care to improve the quality of life. Dementia is a serious disease in western countries like the United States of America and the United Kingdom, and the respective governments are providing facilities to the affected people. There is much chatter over social media platforms concerning the patients’ care, healthy measures to be followed to avoid disease, check early indications. These chatters have to be carefully monitored to help the officials take necessary precautions for the betterment of the affected. A novel Feature engineering architecture that involves feature-split for sentiment analysis of medical chatter over online social networks with the pipeline is proposed that can be used on any Machine Learning model. The proposed model used the fuzzy membership function in refining the outputs. The machine learning model has obtained sentiment score is subjected to fuzzification and defuzzification by using the trapezoid membership function and center of sums method, respectively. Three datasets are considered for comparison of the proposed and the regular model. The proposed approach delivered better results than the normal approach and is proved to be an effective approach for sentiment analysis of medical discussions over online social networks.


2021 ◽  
pp. 194016122110091
Author(s):  
Magdalena Wojcieszak ◽  
Ericka Menchen-Trevino ◽  
Joao F. F. Goncalves ◽  
Brian Weeks

The online environment dramatically expands the number of ways people can encounter news but there remain questions of whether these abundant opportunities facilitate news exposure diversity. This project examines key questions regarding how internet users arrive at news and what kinds of news they encounter. We account for a multiplicity of avenues to news online, some of which have never been analyzed: (1) direct access to news websites, (2) social networks, (3) news aggregators, (4) search engines, (5) webmail, and (6) hyperlinks in news. We examine the extent to which each avenue promotes news exposure and also exposes users to news sources that are left leaning, right leaning, and centrist. When combined with information on individual political leanings, we show the extent of dissimilar, centrist, or congenial exposure resulting from each avenue. We rely on web browsing history records from 636 social media users in the US paired with survey self-reports, a unique data set that allows us to examine both aggregate and individual-level exposure. Visits to news websites account for about 2 percent of the total number of visits to URLs and are unevenly distributed among users. The most widespread ways of accessing news are search engines and social media platforms (and hyperlinks within news sites once people arrive at news). The two former avenues also increase dissimilar news exposure, compared to accessing news directly, yet direct news access drives the highest proportion of centrist exposure.


Sign in / Sign up

Export Citation Format

Share Document