Estimating Heterogeneous Treatment Effects and the Effects of Heterogeneous Treatments with Ensemble Methods

2017 ◽  
Vol 25 (4) ◽  
pp. 413-434 ◽  
Author(s):  
Justin Grimmer ◽  
Solomon Messing ◽  
Sean J. Westwood

Randomized experiments are increasingly used to study political phenomena because they can credibly estimate the average effect of a treatment on a population of interest. But political scientists are often interested in how effects vary across subpopulations—heterogeneous treatment effects—and how differences in the content of the treatment affects responses—the response to heterogeneous treatments. Several new methods have been introduced to estimate heterogeneous effects, but it is difficult to know if a method will perform well for a particular data set. Rather than using only one method, we show how an ensemble of methods—weighted averages of estimates from individual models increasingly used in machine learning—accurately measure heterogeneous effects. Building on a large literature on ensemble methods, we show how the weighting of methods can contribute to accurate estimation of heterogeneous treatment effects and demonstrate how pooling models lead to superior performance to individual methods across diverse problems. We apply the ensemble method to two experiments, illuminating how the ensemble method for heterogeneous treatment effects facilitates exploratory analysis of treatment effects.

2018 ◽  
Author(s):  
Weijia Zhang ◽  
Thuc Le ◽  
Lin Liu ◽  
Jiuyong Li

AbstractEstimating heterogeneous treatment effects is an important problem in many medical and biological applications since treatments may have different effects on the prognoses of different patients. Recently, several recursive partitioning methods have been proposed to identify the subgroups that with different responds to a treatment, and they rely on a fitness criterion to minimize the error between the estimated treatment effects and the unobservable true effects. In this paper, we propose that a heterogeneity criterion, which maximizes the differences of treatment effects among the subgroups, also needs to be considered. Moreover, we show that better performances can be achieved when the fitness and the heterogeneous criteria are considered simultaneously. Selecting the optimal splitting points then becomes a multi-objective problem; however, a solution that achieves optimal in both aspects are often not available. To solve this problem, we propose a multi-objective splitting procedure to balance both criteria. The proposed procedure is computationally efficient and fits naturally into the existing recursive partitioning framework. Experimental results show that the proposed multi-objective approach performs consistently better than existing ones.Author summaryThe effects of a treatment are often not the same for different individuals with different gene expressions. Learning to predict the heterogeneous treatment effects from clinical and expression data is an important step towards personalized medical treatment. Existing computational methods are not ideal for the task because they do not address the interpretability of the model and do not consider the limited sample sizes in biological and medical applications. Our method addresses these issues and achieves superior performance in analyzing the treatment effects of radiotherapy on breast cancer patients.


2011 ◽  
Vol 19 (1) ◽  
pp. 1-19 ◽  
Author(s):  
Kosuke Imai ◽  
Aaron Strauss

Although a growing number of political scientists are conducting randomized experiments, many of them only report the average treatment effects and do not systematically explore the variation in treatment effects across subpopulations. This is unfortunate from a scientific point of view because heterogeneous treatment effects can provide additional substantive insights. This current state of affairs is also problematic from a policy makers' perspective since such studies do not identify subgroups for which treatments are effective. In this paper, we propose a formal two-step framework that first identifies heterogeneous treatment effects from a randomized experiment and then uses this information to derive an optimal policy about which treatment should be given to whom. Our proposed method avoids the risk of false discoveries that are likely in post hoc subgroup analysis routinely conducted in the discipline. We discuss our methodology in the context of get-out-the-vote randomized field experiments and show how the proposed two-step framework can be applied in real-world settings.


2020 ◽  
Vol 70 (5) ◽  
pp. 1211-1230
Author(s):  
Abdus Saboor ◽  
Hassan S. Bakouch ◽  
Fernando A. Moala ◽  
Sheraz Hussain

AbstractIn this paper, a bivariate extension of exponentiated Fréchet distribution is introduced, namely a bivariate exponentiated Fréchet (BvEF) distribution whose marginals are univariate exponentiated Fréchet distribution. Several properties of the proposed distribution are discussed, such as the joint survival function, joint probability density function, marginal probability density function, conditional probability density function, moments, marginal and bivariate moment generating functions. Moreover, the proposed distribution is obtained by the Marshall-Olkin survival copula. Estimation of the parameters is investigated by the maximum likelihood with the observed information matrix. In addition to the maximum likelihood estimation method, we consider the Bayesian inference and least square estimation and compare these three methodologies for the BvEF. A simulation study is carried out to compare the performance of the estimators by the presented estimation methods. The proposed bivariate distribution with other related bivariate distributions are fitted to a real-life paired data set. It is shown that, the BvEF distribution has a superior performance among the compared distributions using several tests of goodness–of–fit.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Philipp Rentzsch ◽  
Max Schubach ◽  
Jay Shendure ◽  
Martin Kircher

Abstract Background Splicing of genomic exons into mRNAs is a critical prerequisite for the accurate synthesis of human proteins. Genetic variants impacting splicing underlie a substantial proportion of genetic disease, but are challenging to identify beyond those occurring at donor and acceptor dinucleotides. To address this, various methods aim to predict variant effects on splicing. Recently, deep neural networks (DNNs) have been shown to achieve better results in predicting splice variants than other strategies. Methods It has been unclear how best to integrate such process-specific scores into genome-wide variant effect predictors. Here, we use a recently published experimental data set to compare several machine learning methods that score variant effects on splicing. We integrate the best of those approaches into general variant effect prediction models and observe the effect on classification of known pathogenic variants. Results We integrate two specialized splicing scores into CADD (Combined Annotation Dependent Depletion; cadd.gs.washington.edu), a widely used tool for genome-wide variant effect prediction that we previously developed to weight and integrate diverse collections of genomic annotations. With this new model, CADD-Splice, we show that inclusion of splicing DNN effect scores substantially improves predictions across multiple variant categories, without compromising overall performance. Conclusions While splice effect scores show superior performance on splice variants, specialized predictors cannot compete with other variant scores in general variant interpretation, as the latter account for nonsense and missense effects that do not alter splicing. Although only shown here for splice scores, we believe that the applied approach will generalize to other specific molecular processes, providing a path for the further improvement of genome-wide variant effect prediction.


Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1285
Author(s):  
Mohammed Al-Sarem ◽  
Faisal Saeed ◽  
Zeyad Ghaleb Al-Mekhlafi ◽  
Badiea Abdulkarem Mohammed ◽  
Tawfik Al-Hadhrami ◽  
...  

Security attacks on legitimate websites to steal users’ information, known as phishing attacks, have been increasing. This kind of attack does not just affect individuals’ or organisations’ websites. Although several detection methods for phishing websites have been proposed using machine learning, deep learning, and other approaches, their detection accuracy still needs to be enhanced. This paper proposes an optimized stacking ensemble method for phishing website detection. The optimisation was carried out using a genetic algorithm (GA) to tune the parameters of several ensemble machine learning methods, including random forests, AdaBoost, XGBoost, Bagging, GradientBoost, and LightGBM. The optimized classifiers were then ranked, and the best three models were chosen as base classifiers of a stacking ensemble method. The experiments were conducted on three phishing website datasets that consisted of both phishing websites and legitimate websites—the Phishing Websites Data Set from UCI (Dataset 1); Phishing Dataset for Machine Learning from Mendeley (Dataset 2, and Datasets for Phishing Websites Detection from Mendeley (Dataset 3). The experimental results showed an improvement using the optimized stacking ensemble method, where the detection accuracy reached 97.16%, 98.58%, and 97.39% for Dataset 1, Dataset 2, and Dataset 3, respectively.


Sign in / Sign up

Export Citation Format

Share Document