scholarly journals Leveraging Machine Learning to Estimate Effect Modification

2021 ◽  
Author(s):  
Ang Yu ◽  
Chan Park ◽  
Hyunseng Kang ◽  
Jason Fletcher

Sociologists are often interested in estimating and testing whether some causal effect varies by a modifier of interest. The conventional regression estimator for effect modification is inflexible in functional form and prone to misspecification bias. Machine Learning (ML) algorithms can aid the estimation of effect modification in observational studies by controlling for confounders in a highly flexible, automated, yet principled way. Therefore, leveraging ML for effect modification helps reduce misspecification bias and enhance the credibility of causal identification. We introduce a novel estimator that estimates effect modification in a familiar regression framework after using ML algorithms to fit nuisance components of the model. We show that this estimator is more flexible than the conventional regression model while more efficient and suitable for theory-driven sociological research than other ML-based methods. We use the new estimator to study the modification in the effect of a college degree on adult family income by gender and family income in adolescence in the United States. Along these two dimensions, the benefits of a college degree are rather equally distributed.

Author(s):  
Navid Asadizanjani ◽  
Sachin Gattigowda ◽  
Mark Tehranipoor ◽  
Domenic Forte ◽  
Nathan Dunn

Abstract Counterfeiting is an increasing concern for businesses and governments as greater numbers of counterfeit integrated circuits (IC) infiltrate the global market. There is an ongoing effort in experimental and national labs inside the United States to detect and prevent such counterfeits in the most efficient time period. However, there is still a missing piece to automatically detect and properly keep record of detected counterfeit ICs. Here, we introduce a web application database that allows users to share previous examples of counterfeits through an online database and to obtain statistics regarding the prevalence of known defects. We also investigate automated techniques based on image processing and machine learning to detect different physical defects and to determine whether or not an IC is counterfeit.


2020 ◽  
Author(s):  
Carson Lam ◽  
Jacob Calvert ◽  
Gina Barnes ◽  
Emily Pellegrini ◽  
Anna Lynn-Palevsky ◽  
...  

BACKGROUND In the wake of COVID-19, the United States has developed a three stage plan to outline the parameters to determine when states may reopen businesses and ease travel restrictions. The guidelines also identify subpopulations of Americans that should continue to stay at home due to being at high risk for severe disease should they contract COVID-19. These guidelines were based on population level demographics, rather than individual-level risk factors. As such, they may misidentify individuals at high risk for severe illness and who should therefore not return to work until vaccination or widespread serological testing is available. OBJECTIVE This study evaluated a machine learning algorithm for the prediction of serious illness due to COVID-19 using inpatient data collected from electronic health records. METHODS The algorithm was trained to identify patients for whom a diagnosis of COVID-19 was likely to result in hospitalization, and compared against four U.S policy-based criteria: age over 65, having a serious underlying health condition, age over 65 or having a serious underlying health condition, and age over 65 and having a serious underlying health condition. RESULTS This algorithm identified 80% of patients at risk for hospitalization due to COVID-19, versus at most 62% that are identified by government guidelines. The algorithm also achieved a high specificity of 95%, outperforming government guidelines. CONCLUSIONS This algorithm may help to enable a broad reopening of the American economy while ensuring that patients at high risk for serious disease remain home until vaccination and testing become available.


Author(s):  
Timnit Gebru

This chapter discusses the role of race and gender in artificial intelligence (AI). The rapid permeation of AI into society has not been accompanied by a thorough investigation of the sociopolitical issues that cause certain groups of people to be harmed rather than advantaged by it. For instance, recent studies have shown that commercial automated facial analysis systems have much higher error rates for dark-skinned women, while having minimal errors on light-skinned men. Moreover, a 2016 ProPublica investigation uncovered that machine learning–based tools that assess crime recidivism rates in the United States are biased against African Americans. Other studies show that natural language–processing tools trained on news articles exhibit societal biases. While many technical solutions have been proposed to alleviate bias in machine learning systems, a holistic and multifaceted approach must be taken. This includes standardization bodies determining what types of systems can be used in which scenarios, making sure that automated decision tools are created by people from diverse backgrounds, and understanding the historical and political factors that disadvantage certain groups who are subjected to these tools.


2021 ◽  
Vol 14 (5) ◽  
pp. 472
Author(s):  
Tyler C. Beck ◽  
Kyle R. Beck ◽  
Jordan Morningstar ◽  
Menny M. Benjamin ◽  
Russell A. Norris

Roughly 2.8% of annual hospitalizations are a result of adverse drug interactions in the United States, representing more than 245,000 hospitalizations. Drug–drug interactions commonly arise from major cytochrome P450 (CYP) inhibition. Various approaches are routinely employed in order to reduce the incidence of adverse interactions, such as altering drug dosing schemes and/or minimizing the number of drugs prescribed; however, often, a reduction in the number of medications cannot be achieved without impacting therapeutic outcomes. Nearly 80% of drugs fail in development due to pharmacokinetic issues, outlining the importance of examining cytochrome interactions during preclinical drug design. In this review, we examined the physiochemical and structural properties of small molecule inhibitors of CYPs 3A4, 2D6, 2C19, 2C9, and 1A2. Although CYP inhibitors tend to have distinct physiochemical properties and structural features, these descriptors alone are insufficient to predict major cytochrome inhibition probability and affinity. Machine learning based in silico approaches may be employed as a more robust and accurate way of predicting CYP inhibition. These various approaches are highlighted in the review.


Author(s):  
Leah H. Schinasi ◽  
Helen V. S. Cole ◽  
Jana A. Hirsch ◽  
Ghassan B. Hamra ◽  
Pedro Gullon ◽  
...  

Neighborhood greenspace may attract new residents and lead to sociodemographic or housing cost changes. We estimated relationships between greenspace and gentrification-related changes in the 43 largest metropolitan statistical areas (MSAs) of the United States (US). We used the US National Land Cover and Brown University Longitudinal Tracts databases, as well as spatial lag models, to estimate census tract-level associations between percentage greenspace (years 1990, 2000) and subsequent changes (1990–2000, 2000–2010) in percentage college-educated, percentage working professional jobs, race/ethnic composition, household income, percentage living in poverty, household rent, and home value. We also investigated effect modification by racial/ethnic composition. We ran models for each MSA and time period and used random-effects meta-analyses to derive summary estimates for each period. Estimates were modest in magnitude and heterogeneous across MSAs. After adjusting for census-tract level population density in 1990, compared to tracts with low percentage greenspace in 1992 (defined as ≤50th percentile of the MSA-specific distribution in 1992), those with high percentage greenspace (defined as >75th percentile of the MSA-specific distribution) experienced higher 1990–2000 increases in percentage of the employed civilian aged 16+ population working professional jobs (β: 0.18, 95% confidence interval (CI): 0.11, 0.26) and in median household income (β: 0.23, 95% CI: 0.15, 0.31). Adjusted estimates for the 2000–2010 period were near the null. We did not observe evidence of effect modification by race/ethnic composition. We observed evidence of modest associations between greenspace and gentrification trends. Further research is needed to explore reasons for heterogeneity and to quantify health implications.


2020 ◽  
Vol 8 (1) ◽  
pp. 54-69
Author(s):  
Peter B. Gilbert ◽  
Bryan S. Blette ◽  
Bryan E. Shepherd ◽  
Michael G. Hudgens

AbstractWhile the HVTN 505 trial showed no overall efficacy of the tested vaccine to prevent HIV infection over placebo, markers measuring immune response to vaccination were strongly correlated with infection. This finding generated the hypothesis that some marker-defined vaccinated subgroups were partially protected whereas others had their risk increased. This hypothesis can be assessed using the principal stratification framework (Frangakis and Rubin, 2002) for studying treatment effect modification by an intermediate response variable, using methods in the sub-field of principal surrogate (PS) analysis that studies multiple principal strata. Unfortunately, available methods for PS analysis require an augmented study design not available in HVTN 505, and make untestable structural risk assumptions, motivating a need for more robust PS methods. Fortunately, another sub-field of principal stratification, survivor average causal effect (SACE) analysis (Rubin, 2006) – which studies effects in a single principal stratum – provides many methods not requiring an augmented design and making fewer assumptions. We show how, for a binary intermediate response variable, methods developed for SACE analysis can be adapted to PS analysis, providing new and more robust PS methods. Application to HVTN 505 supports that the vaccine partially protected individuals with vaccine-induced T-cells expressing certain combinations of functions.


Agronomy ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 35
Author(s):  
Xiaodong Huang ◽  
Beth Ziniti ◽  
Michael H. Cosh ◽  
Michele Reba ◽  
Jinfei Wang ◽  
...  

Soil moisture is a key indicator to assess cropland drought and irrigation status as well as forecast production. Compared with the optical data which are obscured by the crop canopy cover, the Synthetic Aperture Radar (SAR) is an efficient tool to detect the surface soil moisture under the vegetation cover due to its strong penetration capability. This paper studies the soil moisture retrieval using the L-band polarimetric Phased Array-type L-band SAR 2 (PALSAR-2) data acquired over the study region in Arkansas in the United States. Both two-component model-based decomposition (SAR data alone) and machine learning (SAR + optical indices) methods are tested and compared in this paper. Validation using independent ground measurement shows that the both methods achieved a Root Mean Square Error (RMSE) of less than 10 (vol.%), while the machine learning methods outperform the model-based decomposition, achieving an RMSE of 7.70 (vol.%) and R2 of 0.60.


2020 ◽  
Vol 10 (01) ◽  
pp. e97-e103
Author(s):  
Irene Rethemiotaki

AbstractAttention-deficit hyperactivity disorder (ADHD) is an increasingly recognized chronic neurodevelopmental disorder. This work aims at studying the prevalence and clinical characteristics of children with ADHD in the United States in the period between 2009 and 2018. Data from the National Health Interview Survey were analyzed by univariate and multivariate statistics to assess the role of socioeconomic factors in the development of ADHD. It has been studied 615,608 children, 51.2% male and 48.7% female. The prevalence of ADHD was 9.13%, with males predominating over females. The number of children with ADHD increased from 2009 to 2018 by 14.8%. As specified by multiple logistic regression analysis, males (odds ratio [OR] 2.38) who have neither mother nor father (OR 1.76) are twice as likely to have ADHD compared with their peers. In addition, family income (OR 1.40) and parent's education (OR 1.12) were significantly associated with ADHD. It has been highlighted the significance of deprivation of both family and financial comfort as primary indicators for ADHD in children. Moreover, children with ADHD were more likely to be males in the age group of 12 to 17.


2020 ◽  
Vol 41 (S1) ◽  
pp. s521-s522
Author(s):  
Debarka Sengupta ◽  
Vaibhav Singh ◽  
Seema Singh ◽  
Dinesh Tewari ◽  
Mudit Kapoor ◽  
...  

Background: The rising trend of antibiotic resistance imposes a heavy burden on healthcare both clinically and economically (US$55 billion), with 23,000 estimated annual deaths in the United States as well as increased length of stay and morbidity. Machine-learning–based methods have, of late, been used for leveraging patient’s clinical history and demographic information to predict antimicrobial resistance. We developed a machine-learning model ensemble that maximizes the accuracy of such a drug-sensitivity versus resistivity classification system compared to the existing best-practice methods. Methods: We first performed a comprehensive analysis of the association between infecting bacterial species and patient factors, including patient demographics, comorbidities, and certain healthcare-specific features. We leveraged the predictable nature of these complex associations to infer patient-specific antibiotic sensitivities. Various base-learners, including k-NN (k-nearest neighbors) and gradient boosting machine (GBM), were used to train an ensemble model for confident prediction of antimicrobial susceptibilities. Base learner selection and model performance evaluation was performed carefully using a variety of standard metrics, namely accuracy, precision, recall, F1 score, and Cohen κ. Results: For validating the performance on MIMIC-III database harboring deidentified clinical data of 53,423 distinct patient admissions between 2001 and 2012, in the intensive care units (ICUs) of the Beth Israel Deaconess Medical Center in Boston, Massachusetts. From ~11,000 positive cultures, we used 4 major specimen types namely urine, sputum, blood, and pus swab for evaluation of the model performance. Figure 1 shows the receiver operating characteristic (ROC) curves obtained for bloodstream infection cases upon model building and prediction on 70:30 split of the data. We received area under the curve (AUC) values of 0.88, 0.92, 0.92, and 0.94 for urine, sputum, blood, and pus swab samples, respectively. Figure 2 shows the comparative performance of our proposed method as well as some off-the-shelf classification algorithms. Conclusions: Highly accurate, patient-specific predictive antibiogram (PSPA) data can aid clinicians significantly in antibiotic recommendation in ICU, thereby accelerating patient recovery and curbing antimicrobial resistance.Funding: This study was supported by Circle of Life Healthcare Pvt. Ltd.Disclosures: None


Information ◽  
2020 ◽  
Vol 11 (6) ◽  
pp. 314 ◽  
Author(s):  
Jim Samuel ◽  
G. G. Md. Nawaz Ali ◽  
Md. Mokhlesur Rahman ◽  
Ek Esawi ◽  
Yana Samuel

Along with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fueled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19’s informational crisis and gauge public sentiment, so that appropriate messaging and policy decisions can be implemented. In this research article, we identify public sentiment associated with the pandemic using Coronavirus specific Tweets and R statistical software, along with its sentiment analysis packages. We demonstrate insights into the progress of fear-sentiment over time as COVID-19 approached peak levels in the United States, using descriptive textual analytics supported by necessary textual data visualizations. Furthermore, we provide a methodological overview of two essential machine learning (ML) classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. We observe a strong classification accuracy of 91% for short Tweets, with the Naïve Bayes method. We also observe that the logistic regression classification method provides a reasonable accuracy of 74% with shorter Tweets, and both methods showed relatively weaker performance for longer Tweets. This research provides insights into Coronavirus fear sentiment progression, and outlines associated methods, implications, limitations and opportunities.


Sign in / Sign up

Export Citation Format

Share Document