scholarly journals G-computation and machine learning for estimating the causal effects of binary exposure statuses on binary outcomes

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Florent Le Borgne ◽  
Arthur Chatton ◽  
Maxime Léger ◽  
Rémi Lenain ◽  
Yohann Foucher

AbstractIn clinical research, there is a growing interest in the use of propensity score-based methods to estimate causal effects. G-computation is an alternative because of its high statistical power. Machine learning is also increasingly used because of its possible robustness to model misspecification. In this paper, we aimed to propose an approach that combines machine learning and G-computation when both the outcome and the exposure status are binary and is able to deal with small samples. We evaluated the performances of several methods, including penalized logistic regressions, a neural network, a support vector machine, boosted classification and regression trees, and a super learner through simulations. We proposed six different scenarios characterised by various sample sizes, numbers of covariates and relationships between covariates, exposure statuses, and outcomes. We have also illustrated the application of these methods, in which they were used to estimate the efficacy of barbiturates prescribed during the first 24 h of an episode of intracranial hypertension. In the context of GC, for estimating the individual outcome probabilities in two counterfactual worlds, we reported that the super learner tended to outperform the other approaches in terms of both bias and variance, especially for small sample sizes. The support vector machine performed well, but its mean bias was slightly higher than that of the super learner. In the investigated scenarios, G-computation associated with the super learner was a performant method for drawing causal inferences, even from small sample sizes.

2016 ◽  
Vol 41 (5) ◽  
pp. 472-505 ◽  
Author(s):  
Elizabeth Tipton ◽  
Kelly Hallberg ◽  
Larry V. Hedges ◽  
Wendy Chan

Background: Policy makers and researchers are frequently interested in understanding how effective a particular intervention may be for a specific population. One approach is to assess the degree of similarity between the sample in an experiment and the population. Another approach is to combine information from the experiment and the population to estimate the population average treatment effect (PATE). Method: Several methods for assessing the similarity between a sample and population currently exist as well as methods estimating the PATE. In this article, we investigate properties of six of these methods and statistics in the small sample sizes common in education research (i.e., 10–70 sites), evaluating the utility of rules of thumb developed from observational studies in the generalization case. Result: In small random samples, large differences between the sample and population can arise simply by chance and many of the statistics commonly used in generalization are a function of both sample size and the number of covariates being compared. The rules of thumb developed in observational studies (which are commonly applied in generalization) are much too conservative given the small sample sizes found in generalization. Conclusion: This article implies that sharp inferences to large populations from small experiments are difficult even with probability sampling. Features of random samples should be kept in mind when evaluating the extent to which results from experiments conducted on nonrandom samples might generalize.


2016 ◽  
Vol 2 (1) ◽  
pp. 41-54
Author(s):  
Ashleigh Saunders ◽  
Karen E. Waldie

Purpose – Autism spectrum disorder (ASD) is a lifelong neurodevelopmental condition for which there is no known cure. The rate of psychiatric comorbidity in autism is extremely high, which raises questions about the nature of the co-occurring symptoms. It is unclear whether these additional conditions are true comorbid conditions, or can simply be accounted for through the ASD diagnosis. The paper aims to discuss this issue. Design/methodology/approach – A number of questionnaires and a computer-based task were used in the current study. The authors asked the participants about symptoms of ASD, attention deficit hyperactivity disorder (ADHD) and anxiety, as well as overall adaptive functioning. Findings – The results demonstrate that each condition, in its pure form, can be clearly differentiated from one another (and from neurotypical controls). Further analyses revealed that when ASD occurs together with anxiety, anxiety appears to be a separate condition. In contrast, there is no clear behavioural profile for when ASD and ADHD co-occur. Research limitations/implications – First, due to small sample sizes, some analyses performed were targeted to specific groups (i.e. comparing ADHD, ASD to comorbid ADHD+ASD). Larger sample sizes would have given the statistical power to perform a full scale comparative analysis of all experimental groups when split by their comorbid conditions. Second, males were over-represented in the ASD group and females were over-represented in the anxiety group, due to the uneven gender balance in the prevalence of these conditions. Lastly, the main profiling techniques used were questionnaires. Clinical interviews would have been preferable, as they give a more objective account of behavioural difficulties. Practical implications – The rate of psychiatric comorbidity in autism is extremely high, which raises questions about the nature of the co-occurring symptoms. It is unclear whether these additional conditions are true comorbid conditions, or can simply be accounted for through the ASD diagnosis. Social implications – This information will be important, not only to healthcare practitioners when administering a diagnosis, but also to therapists who need to apply evidence-based treatment to comorbid and stand-alone conditions. Originality/value – This study is the first to investigate the nature of co-existing conditions in ASD in a New Zealand population.


2005 ◽  
Vol 28 (3) ◽  
pp. 283-294 ◽  
Author(s):  
Jin-Shei Lai ◽  
Jeanne Teresi ◽  
Richard Gershon

An item with differential item functioning (DIF) displays different statistical properties, conditional on a matching variable. The presence of DIF in measures can invalidate the conclusions of medical outcome studies. Numerous approaches have been developed to examine DIF in many areas, including education and health-related quality of life. There is little consensus in the research community regarding selection of one best method, and most methods require large sample sizes. This article describes some approaches to examine DIF with small samples (e.g., less than 200).


1988 ◽  
Vol 13 (3) ◽  
pp. 142-146 ◽  
Author(s):  
David A. Cole

In the area of severe-profound retardation, researchers are faced with small sample sizes. The question of statistical power is critical. In this article, three commonly used tests for treatment-control group differences are compared with respect to their relative power: the posttest-only approach, the change-score approach, and an analysis of covariance (ANCOVA) approach. In almost all cases, the ANCOVA approach is the more powerful than the other two, even when very small samples are involved. Finally, a fourth approach involving ANCOVA plus alternate rank assignments is examined and found to be superior even to the ANCOVA approach, especially in small sample cases. Use of slightly more sophisticated statistics in small sample research is recommended.


2019 ◽  
Author(s):  
Andrea Cardini ◽  
Paul O’Higgins ◽  
F. James Rohlf

AbstractUsing sampling experiments, we found that, when there are fewer groups than variables, between-groups PCA (bgPCA) may suggest surprisingly distinct differences among groups for data in which none exist. While apparently not noticed before, the reasons for this problem are easy to understand. A bgPCA captures the g-1 dimensions of variation among the g group means, but only a fraction of the ∑ni − g dimensions of within-group variation (ni are the sample sizes), when the number of variables, p, is greater than g-1. This introduces a distortion in the appearance of the bgPCA plots because the within-group variation will be underrepresented, unless the variables are sufficiently correlated so that the total variation can be accounted for with just g-1 dimensions. The effect is most obvious when sample sizes are small relative to the number of variables, because smaller samples spread out less, but the distortion is present even for large samples. Strong covariance among variables largely reduces the magnitude of the problem, because it effectively reduces the dimensionality of the data and thus enables a larger proportion of the within-group variation to be accounted for within the g-1-dimensional space of a bgPCA. The distortion will still be relevant though its strength will vary from case to case depending on the structure of the data (p, g, covariances etc.). These are important problems for a method mainly designed for the analysis of variation among groups when there are very large numbers of variables and relatively small samples. In such cases, users are likely to conclude that the groups they are comparing are much more distinct than they really are. Having many variables but just small sample sizes is a common problem in fields ranging from morphometrics (as in our examples) to molecular analyses.


2006 ◽  
Vol 361 (1475) ◽  
pp. 2023-2037 ◽  
Author(s):  
Thomas P Curtis ◽  
Ian M Head ◽  
Mary Lunn ◽  
Stephen Woodcock ◽  
Patrick D Schloss ◽  
...  

The extent of microbial diversity is an intrinsically fascinating subject of profound practical importance. The term ‘diversity’ may allude to the number of taxa or species richness as well as their relative abundance. There is uncertainty about both, primarily because sample sizes are too small. Non-parametric diversity estimators make gross underestimates if used with small sample sizes on unevenly distributed communities. One can make richness estimates over many scales using small samples by assuming a species/taxa-abundance distribution. However, no one knows what the underlying taxa-abundance distributions are for bacterial communities. Latterly, diversity has been estimated by fitting data from gene clone libraries and extrapolating from this to taxa-abundance curves to estimate richness. However, since sample sizes are small, we cannot be sure that such samples are representative of the community from which they were drawn. It is however possible to formulate, and calibrate, models that predict the diversity of local communities and of samples drawn from that local community. The calibration of such models suggests that migration rates are small and decrease as the community gets larger. The preliminary predictions of the model are qualitatively consistent with the patterns seen in clone libraries in ‘real life’. The validation of this model is also confounded by small sample sizes. However, if such models were properly validated, they could form invaluable tools for the prediction of microbial diversity and a basis for the systematic exploration of microbial diversity on the planet.


2019 ◽  
Vol 80 (3) ◽  
pp. 499-521
Author(s):  
Ben Babcock ◽  
Kari J. Hodge

Equating and scaling in the context of small sample exams, such as credentialing exams for highly specialized professions, has received increased attention in recent research. Investigators have proposed a variety of both classical and Rasch-based approaches to the problem. This study attempts to extend past research by (1) directly comparing classical and Rasch techniques of equating exam scores when sample sizes are small ( N≤ 100 per exam form) and (2) attempting to pool multiple forms’ worth of data to improve estimation in the Rasch framework. We simulated multiple years of a small-sample exam program by resampling from a larger certification exam program’s real data. Results showed that combining multiple administrations’ worth of data via the Rasch model can lead to more accurate equating compared to classical methods designed to work well in small samples. WINSTEPS-based Rasch methods that used multiple exam forms’ data worked better than Bayesian Markov Chain Monte Carlo methods, as the prior distribution used to estimate the item difficulty parameters biased predicted scores when there were difficulty differences between exam forms.


Sign in / Sign up

Export Citation Format

Share Document