scholarly journals Mechanistic models versus machine learning, a fight worth fighting for the biological community?

2018 ◽  
Vol 14 (5) ◽  
pp. 20170660 ◽  
Author(s):  
Ruth E. Baker ◽  
Jose-Maria Peña ◽  
Jayaratnam Jayamohan ◽  
Antoine Jérusalem

Ninety per cent of the world's data have been generated in the last 5 years ( Machine learning: the power and promise of computers that learn by example . Report no. DES4702. Issued April 2017. Royal Society). A small fraction of these data is collected with the aim of validating specific hypotheses. These studies are led by the development of mechanistic models focused on the causality of input–output relationships. However, the vast majority is aimed at supporting statistical or correlation studies that bypass the need for causality and focus exclusively on prediction. Along these lines, there has been a vast increase in the use of machine learning models, in particular in the biomedical and clinical sciences, to try and keep pace with the rate of data generation. Recent successes now beg the question of whether mechanistic models are still relevant in this area. Said otherwise, why should we try to understand the mechanisms of disease progression when we can use machine learning tools to directly predict disease outcome?

PLoS ONE ◽  
2019 ◽  
Vol 14 (1) ◽  
pp. e0208141 ◽  
Author(s):  
Monica A. Konerman ◽  
Lauren A. Beste ◽  
Tony Van ◽  
Boang Liu ◽  
Xuefei Zhang ◽  
...  

2020 ◽  
Vol 2 (2) ◽  
pp. 106-119
Author(s):  
Subasish Das ◽  
Minh Le ◽  
Boya Dai

Abstract Crash occurrence is a complex phenomenon, and crashes associated with pedestrians and bicyclists are even more complex. Furthermore, pedestrian- and bicyclist-involved crashes are typically not reported in detail in state or national crash databases. To address this issue, developers created the Pedestrian and Bicycle Crash Analysis Tool (PBCAT). However, it is labour-intensive to manually identify the types of pedestrian and bicycle crash from crash-narrative reports and to classify different crash attributes from the textual content of police reports. Therefore, there is a need for a supporting tool that can assist practitioners in using PBCAT more efficiently and accurately. The objective of this study is to develop a framework for applying machine-learning models to classify crash types from unstructured textual content. In this study, the research team collected pedestrian crash-typing data from two locations in Texas. The XGBoost model was found to be the best classifier. The high prediction power of the XGBoost classifiers indicates that this machine-learning technique was able to classify pedestrian crash types with the highest accuracy rate (up to 77% for training data and 72% for test data). The findings demonstrate that advanced machine-learning models can extract underlying patterns and trends of crash mechanisms. This provides the basis for applying machine-learning techniques in addressing the crash typing issues associated with non-motorist crashes.


Author(s):  
Danielle Beaulieu ◽  
Albert A. Taylor ◽  
Dustin Pierce ◽  
Jonavelle Cuerdo ◽  
Mark Schactman ◽  
...  

Author(s):  
Gyasi Emmanuel Kwabena ◽  
Mageshbabu Ramamurthy ◽  
Akila Wijethunga ◽  
Purushotham Swarnalatha

The world is fascinated to see how technology evolves each passing day. All too soon, there's an emerging technology that is trending around us, and it is no other technology than smart wearable technology. Less attention is paid to the data that our bodies are radiating and communicating to us, but with the timely arrival of wearable sensors, we now have numerous devices that can be tracking and collecting the data that our bodies are radiating. Apart from numerous benefits that we derive from the functions provided by wearable technology such as monitoring of our fitness levels, etc., one other critical importance of wearable technology is helping the advancement of artificial intelligence (AI) and machine learning (ML). Machine learning thrives on the availability of massive data and wearable technology which forms part of the internet of things (IoT) generates megabytes of data every single day. The data generated by these wearable devices are used as a dataset for the training and learning of machine learning models. Through the analysis of the outcome of these machine learning models, scientific conclusions are made.


2020 ◽  
Vol 245 ◽  
pp. 06019
Author(s):  
Kim Albertsson ◽  
Sitong An ◽  
Sergei Gleyzer ◽  
Lorenzo Moneta ◽  
Joana Niermann ◽  
...  

ROOT provides, through TMVA, machine learning tools for data analysis at HEP experiments and beyond. We present recently included features in TMVA and the strategy for future developments in the diversified machine learning landscape. Focus is put on fast machine learning inference, which enables analysts to deploy their machine learning models rapidly on large scale datasets. The new developments are paired with newly designed C++ and Python interfaces supporting modern C++ paradigms and full interoperability in the Python ecosystem. We present as well a new deep learning implementation for convolutional neural network using the cuDNN library for GPU. We show benchmarking results in term of training time and inference time, when comparing with other machine learning libraries such as Keras/Tensorflow.


2019 ◽  
Author(s):  
Jie Zhang ◽  
Søren D. Petersen ◽  
Tijana Radivojevic ◽  
Andrés Ramirez ◽  
Andrés Pérez ◽  
...  

SUMMARYIn combination with advanced mechanistic modeling and the generation of high-quality multi-dimensional data sets, machine learning is becoming an integral part of understanding and engineering living systems. Here we show that mechanistic and machine learning models can complement each other and be used in a combined approach to enable accurate genotype-to-phenotype predictions. We use a genome-scale model to pinpoint engineering targets and produce a large combinatorial library of metabolic pathway designs with different promoters which, once phenotyped, provide the basis for machine learning algorithms to be trained and used for new design recommendations. The approach enables successful forward engineering of aromatic amino acid metabolism in yeast, with the new recommended designs improving tryptophan production by up to 17% compared to the best designs used for algorithm training, and ultimately producing a total increase of 106% in tryptophan accumulation compared to optimized reference designs. Based on a single high-throughput data-generation iteration, this study highlights the power of combining mechanistic and machine learning models to enhance their predictive power and effectively direct metabolic engineering efforts.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Jie Zhang ◽  
Søren D. Petersen ◽  
Tijana Radivojevic ◽  
Andrés Ramirez ◽  
Andrés Pérez-Manríquez ◽  
...  

Abstract Through advanced mechanistic modeling and the generation of large high-quality datasets, machine learning is becoming an integral part of understanding and engineering living systems. Here we show that mechanistic and machine learning models can be combined to enable accurate genotype-to-phenotype predictions. We use a genome-scale model to pinpoint engineering targets, efficient library construction of metabolic pathway designs, and high-throughput biosensor-enabled screening for training diverse machine learning algorithms. From a single data-generation cycle, this enables successful forward engineering of complex aromatic amino acid metabolism in yeast, with the best machine learning-guided design recommendations improving tryptophan titer and productivity by up to 74 and 43%, respectively, compared to the best designs used for algorithm training. Thus, this study highlights the power of combining mechanistic and machine learning models to effectively direct metabolic engineering efforts.


2020 ◽  
Author(s):  
David Meyer

<p>The use of real data for training machine learning (ML) models are often a cause of major limitations. For example, real data may be (a) representative of a subset of situations and domains, (b) expensive to produce, (c) limited to specific individuals due to licensing restrictions. Although the use of synthetic data are becoming increasingly popular in computer vision, ML models used in weather and climate models still rely on the use of large real data datasets. Here we present some recent work towards the generation of synthetic data for weather and climate applications and outline some of the major challenges and limitations encountered.</p>


Sign in / Sign up

Export Citation Format

Share Document