scholarly journals The Prediction-Explanation Fallacy: A Pervasive Problem in Scientific Applications of Machine Learning

2021 ◽  
Author(s):  
Marco Del Giudice

In this paper, I highlight a problem that has become ubiquitous in scientific applications of machine learning methods, and can lead to seriously distorted inferences about the phenomena under study. I call it the prediction-explanation fallacy. The fallacy occurs when researchers use prediction-optimized models for explanatory purposes, without considering the tradeoffs between explanation and prediction. This is a problem for at least two reasons. First, prediction-optimized models are often deliberately biased and unrealistic in order to prevent overfitting, and hence fail to accurately explain the phenomenon of interest. In other cases, they have an exceedingly complex structure that is hard or impossible to interpret, which greatly limits their explanatory value. Second, different predictive models trained on the same or similar data can be biased in different ways, so that multiple models may predict equally well but suggest conflicting explanations of the underlying phenomenon. In this note I introduce the tradeoffs between prediction and explanation in a non-technical fashion, present some illustrative examples from neuroscience, and end by discussing some mitigating factors and methods that can be used to limit or circumvent the problem.

2019 ◽  
Vol 24 (34) ◽  
pp. 3998-4006
Author(s):  
Shijie Fan ◽  
Yu Chen ◽  
Cheng Luo ◽  
Fanwang Meng

Background: On a tide of big data, machine learning is coming to its day. Referring to huge amounts of epigenetic data coming from biological experiments and clinic, machine learning can help in detecting epigenetic features in genome, finding correlations between phenotypes and modifications in histone or genes, accelerating the screen of lead compounds targeting epigenetics diseases and many other aspects around the study on epigenetics, which consequently realizes the hope of precision medicine. Methods: In this minireview, we will focus on reviewing the fundamentals and applications of machine learning methods which are regularly used in epigenetics filed and explain their features. Their advantages and disadvantages will also be discussed. Results: Machine learning algorithms have accelerated studies in precision medicine targeting epigenetics diseases. Conclusion: In order to make full use of machine learning algorithms, one should get familiar with the pros and cons of them, which will benefit from big data by choosing the most suitable method(s).


2020 ◽  
Vol 4 (2) ◽  
pp. 61
Author(s):  
Yi Di Boon ◽  
Sunil Chandrakant Joshi ◽  
Somen Kumar Bhudolia ◽  
Goram Gohel

Advanced manufacturing techniques, such as automated fiber placement and additive manufacturing enables the fabrication of fiber-reinforced polymer composite components with customized material and structural configurations. In order to take advantage of this customizability, the design process for fiber-reinforced polymer composite components needs to be improved. Machine learning methods have been identified as potential techniques capable of handling the complexity of the design problem. In this review, the applications of machine learning methods in various aspects of structural component design are discussed. They include studies on microstructure-based material design, applications of machine learning models in stress analysis, and topology optimization of fiber-reinforced polymer composites. A design automation framework for performance-optimized fiber-reinforced polymer composite components is also proposed. The proposed framework aims to provide a comprehensive and efficient approach for the design and optimization of fiber-reinforced polymer composite components. The challenges in building the models required for the proposed framework are also discussed briefly.


2019 ◽  
Vol 7 ◽  
Author(s):  
Jihyeun Lee ◽  
Surendra Kumar ◽  
Sang-Yoon Lee ◽  
Sung Jean Park ◽  
Mi-hyun Kim

Author(s):  
Francois Charih ◽  
Ashlynn Steeves ◽  
Matthew Bromwich ◽  
Amy E. Mark ◽  
Renee Lefrancois ◽  
...  

2020 ◽  
Author(s):  
Edwin Tse ◽  
Laksh Aithani ◽  
Mark Anderson ◽  
Jonathan Cardoso-Silva ◽  
Giovanni Cincilla ◽  
...  

<p>The discovery of new antimalarial medicines with novel mechanisms of action is key to combating the problem of increasing resistance to our frontline treatments. The Open Source Malaria (OSM) consortium has been developing compounds ("Series 4") that have potent activity against <i>Plasmodium falciparum</i> <i>in vitro</i> and <i>in vivo</i> and that have been suggested to act through the inhibition of <i>Pf</i>ATP4, an essential membrane ion pump that regulates the parasite’s intracellular Na<sup>+</sup> concentration. The structure of <i>Pf</i>ATP4 is yet to be determined. In the absence of structural information about this target, a public competition was created to develop a model that would allow the prediction of anti-<i>Pf</i>ATP4 activity among Series 4 compounds, thereby reducing project costs associated with the unnecessary synthesis of inactive compounds.</p>In the first round, in 2016, six participants used the open data collated by OSM to develop moderately predictive models using diverse methods. Notably, all submitted models were available to all other participants in real time. Since then further bioactivity data have been acquired and machine learning methods have rapidly developed, so a second round of the competition was undertaken, in 2019, again with freely-donated models that other participants could see. The best-performing models from this second round were used to predict novel inhibitory molecules, of which several were synthesised and evaluated against the parasite. One such compound, containing a motif that the human chemists familiar with this series would have dismissed as ill-advised, was active. The project demonstrated the abilities of new machine learning methods in the prediction of active compounds where there is no biological target structure, frequently the central problem in phenotypic drug discovery. Since all data and participant interactions remain in the public domain, this research project “lives” and may be improved by others.


Sign in / Sign up

Export Citation Format

Share Document