2063. Using Twitter Data and Machine Learning to Identify Outpatient Antibiotic Misuse: A Proof-of-Concept Study

Abstract Background Outpatient antibiotic misuse is common, yet it is difficult to identify and prevent. Novel methods are needed to better identify unnecessary antibiotic use in the outpatient setting. Methods The Twitter developer platform was accessed to identify Tweets describing outpatient antibiotic use in the United States between November 2018 and March 2019. Unique English-language Tweets reporting recent antibiotic use were aggregated, reviewed, and labeled as describing possible misuse or not describing misuse. Possible misuse was defined as antibiotic use for a diagnosis or symptoms for which antibiotics are not indicated based on national guidelines, or the use of antibiotics without evaluation by a healthcare provider (Figure 1). Tweets were randomly divided into training and testing sets consisting of 80% and 20% of the data, respectively. Training set Tweets were preprocessed via a natural language processing pipeline, converted into numerical vectors, and used to generate a logistic regression algorithm to predict misuse in the testing set. Analyses were performed in Python using the scikit-learn and nltk libraries. Results 4000 Tweets were included, of which 1028 were labeled as describing possible outpatient antibiotic misuse. The algorithm correctly identified Tweets describing possible antibiotic misuse in the testing set with specificity = 94%, sensitivity = 55%, PPV = 75%, NPV = 87%, and area under the ROC curve = 0.91 (Figure 2). Conclusion A machine learning algorithm using Twitter data identified episodes of self-reported antibiotic misuse with good test performance, as defined by the area under the ROC curve. Analysis of Twitter data captured some episodes of antibiotic misuses, such as the use of non-prescribed antibiotics, that are not easily identified by other methods. This approach could be used to generate novel insights into the causes and extent of antibiotic misuse in the United States, and to monitor antibiotic misuse in real time. Disclosures All authors: No reported disclosures.

Download Full-text

A Chronological and Geographical Analysis of Personal Reports of COVID-19 on Twitter

10.1101/2020.04.19.20069948 ◽

2020 ◽

Cited By ~ 9

Author(s):

Ari Z. Klein ◽

Arjun Magge ◽

Karen O’Connor ◽

Haitao Cai ◽

Davy Weissenbacher ◽

...

Keyword(s):

United States ◽

Machine Learning ◽

Social Media ◽

Language Processing ◽

Personal Information ◽

The United States ◽

Social Media Mining ◽

Learning Framework ◽

Early Indication ◽

Media Mining

ABSTRACTThe rapidly evolving outbreak of COVID-19 presents challenges for actively monitoring its spread. In this study, we assessed a social media mining approach for automatically analyzing the chronological and geographical distribution of users in the United States reporting personal information related to COVID-19 on Twitter. The results suggest that our natural language processing and machine learning framework could help provide an early indication of the spread of COVID-19.

Download Full-text

Tools for Educational Researchers Working With Big Data

The Community of Inquiry Framework in Contemporary Education - Advances in Educational Technologies and Instructional Design ◽

10.4018/978-1-5225-5161-4.ch004 ◽

2018 ◽

pp. 53-75

Keyword(s):

United States ◽

Machine Learning ◽

Big Data ◽

Social Network ◽

Network Analysis ◽

Language Processing ◽

The United States ◽

Institutional Setting ◽

University Of Texas ◽

The University

Among the foremost challenges with big data is how to go about analyzing it. What new tools are needed to be able to properly investigate and model the large quantities of highly complex, often messy data? Chapter 4 addresses this question by introducing and briefly exploring the fields of Machine Learning, Natural Language Processing, and Social Network Analysis, focusing on how these methods and toolsets can be utilized to make sense of big data. The authors provide a broad overview of tools, ideas, and caveats for each of these fields. This chapter ends with a look at how one major public university in the United States, the University of Texas at Arlington, is beginning to address some of the questions surrounding big data in an institutional setting. A list of additional readings is provided.

Download Full-text

Application of Machine Learning Classification Algorithm to Cybersecurity Awareness

Information Technology and Management Science ◽

10.7250/itms-2018-0006 ◽

2018 ◽

Vol 21 ◽

pp. 45-48

Author(s):

Shilpa Balan ◽

Sanchita Gawand ◽

Priyanka Purushu

Keyword(s):

United States ◽

Machine Learning ◽

Learning Algorithm ◽

Identity Theft ◽

The United States ◽

Vital Role ◽

Classification Model ◽

Smart Devices ◽

Machine Learning Classification ◽

The Common

Cybersecurity plays a vital role in protecting the privacy and data of people. In the recent times, there have been several issues relating to cyber fraud, data breach and cyber theft. Many people in the United States have been a victim of identity theft. Thus, understanding of cybersecurity plays an important role in protecting their information and devices. As the adoption of smart devices and social networking are increasing, cybersecurity awareness needs to be spread. The research aims at building a classification machine learning algorithm to determine the awareness of cybersecurity by the common masses in the United States. We were able to attain a good F-measure score when evaluating the performance of the classification model built for this study.

Download Full-text

Validating Machine Learning Algorithms for Twitter Data Against Established Measures of Suicidality

JMIR Mental Health ◽

10.2196/mental.4822 ◽

2016 ◽

Vol 3 (2) ◽

pp. e21 ◽

Cited By ~ 65

Author(s):

Scott R Braithwaite ◽

Christophe Giraud-Carrier ◽

Josh West ◽

Michael D Barnes ◽

Carl Lee Hanson

Keyword(s):

Machine Learning ◽

Predictive Value ◽

Learning Algorithm ◽

Learning Algorithms ◽

The United States ◽

Machine Learning Algorithms ◽

Self Report ◽

Suicidal Risk ◽

Twitter Data ◽

Clinically Significant

Background One of the leading causes of death in the United States (US) is suicide and new methods of assessment are needed to track its risk in real time. Objective Our objective is to validate the use of machine learning algorithms for Twitter data against empirically validated measures of suicidality in the US population. Methods Using a machine learning algorithm, the Twitter feeds of 135 Mechanical Turk (MTurk) participants were compared with validated, self-report measures of suicide risk. Results Our findings show that people who are at high suicidal risk can be easily differentiated from those who are not by machine learning algorithms, which accurately identify the clinically significant suicidal rate in 92% of cases (sensitivity: 53%, specificity: 97%, positive predictive value: 75%, negative predictive value: 93%). Conclusions Machine learning algorithms are efficient in differentiating people who are at a suicidal risk from those who are not. Evidence for suicidality can be measured in nonclinical populations using social media data.

Download Full-text

Content-based features predict social media influence operations

Science Advances ◽

10.1126/sciadv.abb5824 ◽

2020 ◽

Vol 6 (30) ◽

pp. eabb5824 ◽

Cited By ~ 1

Author(s):

Meysam Alizadeh ◽

Jacob N. Shapiro ◽

Cody Buntain ◽

Joshua A. Tucker

Keyword(s):

United States ◽

Machine Learning ◽

Social Media ◽

The United States ◽

User Generated Content ◽

Learning Approach ◽

Monthly Basis ◽

Twitter Data ◽

Random Samples ◽

Machine Learning Approach

We study how easy it is to distinguish influence operations from organic social media activity by assessing the performance of a platform-agnostic machine learning approach. Our method uses public activity to detect content that is part of coordinated influence operations based on human-interpretable features derived solely from content. We test this method on publicly available Twitter data on Chinese, Russian, and Venezuelan troll activity targeting the United States, as well as the Reddit dataset of Russian influence efforts. To assess how well content-based features distinguish these influence operations from random samples of general and political American users, we train and test classifiers on a monthly basis for each campaign across five prediction tasks. Content-based features perform well across period, country, platform, and prediction task. Industrialized production of influence campaign content leaves a distinctive signal in user-generated content that allows tracking of campaigns from month to month and across different accounts.

Download Full-text

Predicting the Retroreflectivity Degradation of Waterborne Paint Pavement Markings using Advanced Machine Learning Techniques

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211002844 ◽

2021 ◽

pp. 036119812110028

Author(s):

Momen R. Mousa ◽

Saleh R. Mousa ◽

Marwa Hassan ◽

Paul Carlson ◽

Ibrahim A. Elnaml

Keyword(s):

United States ◽

Machine Learning ◽

Prediction Models ◽

Learning Algorithm ◽

Product Evaluation ◽

The United States ◽

Machine Learning Techniques ◽

Specific Product ◽

Prediction Horizon ◽

Transportation Agencies

Waterborne paint is the most common marking material used throughout the United States. Because of budget constraints, most transportation agencies repaint their markings based on a fixed schedule, which is questionable in relation to efficiency and economy. To overcome this problem, state agencies could evaluate the marking performance by utilizing measured retroreflectivity of waterborne paints applied in the National Transportation Product Evaluation Program (NTPEP) or by using retroreflectivity degradation models developed in previous studies. Generally, both options lack accuracy because of the high dimensionality and multi-collinearity of retroreflectivity data. Therefore, the objective of this study was to employ an advanced machine learning algorithm to develop performance prediction models for waterborne paints considering the variables that are believed to affect their performance. To achieve this objective, a total of 17,952 skip and wheel retroreflectivity measurements were collected from 10 test decks included in the NTPEP. Based on these data, two CatBoost models were developed with an acceptable level of accuracy which can predict the skip and wheel retroreflectivity of waterborne paints for up to 3 years using only the initial measured retroreflectivity and the anticipated project conditions over the intended prediction horizon, such as line color, traffic, air temperature, and so forth. These models could be used by transportation agencies throughout the United States to 1) compare between different products and select the best product for a specific project, and 2) determine the expected service life of a specific product based on a specified threshold retroreflectivity to plan for future restriping activities.

Download Full-text

Faculty Opinions recommendation of Seasonality and temporal correlation between community antibiotic use and resistance in the United States.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.717960713.793463570 ◽

2012 ◽

Author(s):

Jason Newland

Keyword(s):

United States ◽

Temporal Correlation ◽

Antibiotic Use ◽

The United States

Download Full-text

Personalized stratification of back to work risk amidst COVID-19: A machine learning approach (Preprint)

10.2196/preprints.22030 ◽

2020 ◽

Author(s):

Carson Lam ◽

Jacob Calvert ◽

Gina Barnes ◽

Emily Pellegrini ◽

Anna Lynn-Palevsky ◽

...

Keyword(s):

Machine Learning ◽

High Risk ◽

Learning Algorithm ◽

Severe Disease ◽

High Specificity ◽

Population Level ◽

The United States ◽

Health Condition ◽

Available P ◽

Severe Illness

BACKGROUND In the wake of COVID-19, the United States has developed a three stage plan to outline the parameters to determine when states may reopen businesses and ease travel restrictions. The guidelines also identify subpopulations of Americans that should continue to stay at home due to being at high risk for severe disease should they contract COVID-19. These guidelines were based on population level demographics, rather than individual-level risk factors. As such, they may misidentify individuals at high risk for severe illness and who should therefore not return to work until vaccination or widespread serological testing is available. OBJECTIVE This study evaluated a machine learning algorithm for the prediction of serious illness due to COVID-19 using inpatient data collected from electronic health records. METHODS The algorithm was trained to identify patients for whom a diagnosis of COVID-19 was likely to result in hospitalization, and compared against four U.S policy-based criteria: age over 65, having a serious underlying health condition, age over 65 or having a serious underlying health condition, and age over 65 and having a serious underlying health condition. RESULTS This algorithm identified 80% of patients at risk for hospitalization due to COVID-19, versus at most 62% that are identified by government guidelines. The algorithm also achieved a high specificity of 95%, outperforming government guidelines. CONCLUSIONS This algorithm may help to enable a broad reopening of the American economy while ensuring that patients at high risk for serious disease remain home until vaccination and testing become available.

Download Full-text

Race and Gender

The Oxford Handbook of Ethics of AI ◽

10.1093/oxfordhb/9780190067397.013.16 ◽

2020 ◽

pp. 251-269 ◽

Cited By ~ 2

Author(s):

Timnit Gebru

Keyword(s):

Machine Learning ◽

Language Processing ◽

The United States ◽

Error Rates ◽

Political Factors ◽

Recidivism Rates ◽

Race And Gender ◽

Decision Tools ◽

And Gender ◽

Technical Solutions

This chapter discusses the role of race and gender in artificial intelligence (AI). The rapid permeation of AI into society has not been accompanied by a thorough investigation of the sociopolitical issues that cause certain groups of people to be harmed rather than advantaged by it. For instance, recent studies have shown that commercial automated facial analysis systems have much higher error rates for dark-skinned women, while having minimal errors on light-skinned men. Moreover, a 2016 ProPublica investigation uncovered that machine learning–based tools that assess crime recidivism rates in the United States are biased against African Americans. Other studies show that natural language–processing tools trained on news articles exhibit societal biases. While many technical solutions have been proposed to alleviate bias in machine learning systems, a holistic and multifaceted approach must be taken. This includes standardization bodies determining what types of systems can be used in which scenarios, making sure that automated decision tools are created by people from diverse backgrounds, and understanding the historical and political factors that disadvantage certain groups who are subjected to these tools.

Download Full-text

Descriptors of Cytochrome Inhibitors and Useful Machine Learning Based Methods for the Design of Safer Drugs

Pharmaceuticals ◽

10.3390/ph14050472 ◽

2021 ◽

Vol 14 (5) ◽

pp. 472

Author(s):

Tyler C. Beck ◽

Kyle R. Beck ◽

Jordan Morningstar ◽

Menny M. Benjamin ◽

Russell A. Norris

Keyword(s):

United States ◽

Machine Learning ◽

Drug Interactions ◽

The United States ◽

Structural Features ◽

Physiochemical Properties ◽

Drug Dosing ◽

Therapeutic Outcomes ◽

Cyp Inhibition ◽

Cyp Inhibitors

Roughly 2.8% of annual hospitalizations are a result of adverse drug interactions in the United States, representing more than 245,000 hospitalizations. Drug–drug interactions commonly arise from major cytochrome P450 (CYP) inhibition. Various approaches are routinely employed in order to reduce the incidence of adverse interactions, such as altering drug dosing schemes and/or minimizing the number of drugs prescribed; however, often, a reduction in the number of medications cannot be achieved without impacting therapeutic outcomes. Nearly 80% of drugs fail in development due to pharmacokinetic issues, outlining the importance of examining cytochrome interactions during preclinical drug design. In this review, we examined the physiochemical and structural properties of small molecule inhibitors of CYPs 3A4, 2D6, 2C19, 2C9, and 1A2. Although CYP inhibitors tend to have distinct physiochemical properties and structural features, these descriptors alone are insufficient to predict major cytochrome inhibition probability and affinity. Machine learning based in silico approaches may be employed as a more robust and accurate way of predicting CYP inhibition. These various approaches are highlighted in the review.

Download Full-text