Training set design for machine learning techniques applied to the approximation of computationally intensive first-principles kinetic models

A hybrid vision-map system is presented to solve the road detection problem in urban scenarios. The standardized use of machine learning techniques in classification problems has been merged with digital navigation map information to increase system robustness. The objective of this paper is to create a new environment perception method to detect the road in urban environments, fusing stereo vision with digital maps by detecting road appearance and road limits such as lane markings or curbs. Deep learning approaches make the system hard-coupled to the training set. Even though our approach is based on machine learning techniques, the features are calculated from different sources (GPS, map, curbs, etc.), making our system less dependent on the training set.

Download Full-text

Validation of machine learning techniques: decision trees and finite training set

Journal of Electronic Imaging ◽

10.1117/1.482630 ◽

1998 ◽

Vol 7 (1) ◽

pp. 94 ◽

Cited By ~ 1

Author(s):

Geoffrey A. W. West

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Machine Learning Techniques ◽

Training Set ◽

Learning Techniques

Download Full-text

Machine learning for transient recognition in difference imaging with minimum sampling effort

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa3096 ◽

2020 ◽

Vol 499 (4) ◽

pp. 6009-6017

Author(s):

Y-L Mong ◽

K Ackley ◽

D K Galloway ◽

T Killestein ◽

J Lyman ◽

...

Keyword(s):

Machine Learning ◽

Feature Representation ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Sampling Effort ◽

Training Set ◽

The Real ◽

Learning Techniques ◽

The Difference ◽

Difference Imaging

ABSTRACT The amount of observational data produced by time-domain astronomy is exponentially increasing. Human inspection alone is not an effective way to identify genuine transients from the data. An automatic real-bogus classifier is needed and machine learning techniques are commonly used to achieve this goal. Building a training set with a sufficiently large number of verified transients is challenging, due to the requirement of human verification. We present an approach for creating a training set by using all detections in the science images to be the sample of real detections and all detections in the difference images, which are generated by the process of difference imaging to detect transients, to be the samples of bogus detections. This strategy effectively minimizes the labour involved in the data labelling for supervised machine learning methods. We demonstrate the utility of the training set by using it to train several classifiers utilizing as the feature representation the normalized pixel values in 21 × 21 pixel stamps centred at the detection position, observed with the Gravitational-wave Optical Transient Observer (GOTO) prototype. The real-bogus classifier trained with this strategy can provide up to $95{{\ \rm per\ cent}}$ prediction accuracy on the real detections at a false alarm rate of $1{{\ \rm per\ cent}}$.

Download Full-text

AN EFFECTIVE BIAS-CORRECTED BAGGING METHOD FOR THE VALUATION OF LARGE VARIABLE ANNUITY PORTFOLIOS

Astin Bulletin ◽

10.1017/asb.2020.28 ◽

2020 ◽

Vol 50 (3) ◽

pp. 853-871

Author(s):

Hyukjun Gweon ◽

Shu Li ◽

Rogemar Mamon

Keyword(s):

Machine Learning ◽

Predictive Performance ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Insurance Companies ◽

Market Values ◽

Learning Techniques ◽

Fair Market ◽

Computationally Intensive ◽

Bagging Method

AbstractTo evaluate a large portfolio of variable annuity (VA) contracts, many insurance companies rely on Monte Carlo simulation, which is computationally intensive. To address this computational challenge, machine learning techniques have been adopted in recent years to estimate the fair market values (FMVs) of a large number of contracts. It is shown that bootstrapped aggregation (bagging), one of the most popular machine learning algorithms, performs well in valuing VA contracts using related attributes. In this article, we highlight the presence of prediction bias of bagging and use the bias-corrected (BC) bagging approach to reduce the bias and thus improve the predictive performance. Experimental results demonstrate the effectiveness of BC bagging as compared with bagging, boosting, and model points in terms of prediction accuracy.

Download Full-text

Uncertainty Reduction in Biochemical Kinetic Models: Enforcing Desired Model Properties

10.1101/427716 ◽

2018 ◽

Cited By ~ 1

Author(s):

Ljubisa Miskovic ◽

Jonas Béal ◽

Michael Moret ◽

Vassily Hatzimanikatis

Keyword(s):

Machine Learning ◽

Monte Carlo ◽

Kinetic Parameters ◽

Kinetic Models ◽

Large Scale ◽

Metabolic Networks ◽

Sampling Methods ◽

Machine Learning Techniques ◽

Control Analysis ◽

Learning Techniques

AbstractA persistent obstacle for constructing kinetic models of metabolism is uncertainty in the kinetic properties of enzymes. Currently, available methods for building kinetic models can cope indirectly with uncertainties by integrating data from different biological levels and origins into models. In this study, we use the recently proposed computational approach iSCHRUNK (in Silico Approach to Characterization and Reduction of Uncertainty in the Kinetic Models), which combines Monte Carlo parameter sampling methods and machine learning techniques, in the context of Bayesian inference. Monte Carlo parameter sampling methods allow us to exploit synergies between different data sources and generate a population of kinetic models that are consistent with the available data and physicochemical laws. The machine learning allows us to data-mine the a priori generated kinetic parameters together with the integrated datasets and derive posterior distributions of kinetic parameters consistent with the observed physiology. In this work, we used iSCHRUNK to address a design question: can we identify which are the kinetic parameters and what are their values that give rise to a desired metabolic behavior? Such information is important for a wide variety of studies ranging from biotechnology to medicine. To illustrate the proposed methodology, we performed Metabolic Control Analysis, computed the flux control coefficients of the xylose uptake (XTR), and identified parameters that ensure a rate improvement of XTR in a glucose-xylose co-utilizing S. cerevisiae strain. Our results indicate that only three kinetic parameters need to be accurately characterized to describe the studied physiology, and ultimately to design and control the desired responses of the metabolism. This framework paves the way for a new generation of methods that will systematically integrate the wealth of available omics data and efficiently extract the information necessary for metabolic engineering and synthetic biology decisions.Author SummaryKinetic models are the most promising tool for understanding the complex dynamic behavior of living cells. The primary goal of kinetic models is to capture the properties of the metabolic networks as a whole, and thus we need large-scale models for dependable in silico analyses of metabolism. However, uncertainty in kinetic parameters impedes the development of kinetic models, and uncertainty levels increase with the model size. Tools that will address the issues with parameter uncertainty and that will be able to reduce the uncertainty propagation through the system are therefore needed. In this work, we applied a method called iSCHRUNK that combines parameter sampling and machine learning techniques to characterize the uncertainties and uncover intricate relationships between the parameters of kinetic models and the responses of the metabolic network. The proposed method allowed us to identify a small number of parameters that determine the responses in the network regardless of the values of other parameters. As a consequence, in future studies of metabolism, it will be sufficient to explore a reduced kinetic space, and more comprehensive analyses of large-scale and genome-scale metabolic networks will be computationally tractable.

Download Full-text

Machine Learning augmented docking studies of aminothioureas at the SARS-CoV-2—ACE2 interface

PLoS ONE ◽

10.1371/journal.pone.0256834 ◽

2021 ◽

Vol 16 (9) ◽

pp. e0256834

Author(s):

Monika Rola ◽

Jakub Krassowski ◽

Julita Górska ◽

Anna Grobelna ◽

Wojciech Płonka ◽

...

Keyword(s):

Machine Learning ◽

Docking Studies ◽

Machine Learning Techniques ◽

Training Set ◽

Binding Properties ◽

Learning Techniques ◽

Complete Set ◽

Comparison Of The Results ◽

Receptor Interface ◽

Human Receptor

The current pandemic outbreak clearly indicated the urgent need for tools allowing fast predictions of bioactivity of a large number of compounds, either available or at least synthesizable. In the computational chemistry toolbox, several such tools are available, with the main ones being docking and structure-activity relationship modeling either by classical linear QSAR or Machine Learning techniques. In this contribution, we focus on the comparison of the results obtained using different docking protocols on the example of the search for bioactivity of compounds containing N-N-C(S)-N scaffold at the S-protein of SARS-CoV-2 virus with ACE2 human receptor interface. Based on over 1800 structures in the training set we have predicted binding properties of the complete set of nearly 600000 structures from the same class using the Machine Learning Random Forest Regressor approach.

Download Full-text

Using machine learning techniques to reduce data annotation time

PsycEXTRA Dataset ◽

10.1037/e577762012-020 ◽

2006 ◽

Author(s):

Christopher Schreiner ◽

Kari Torkkola ◽

Mike Gardner ◽

Keshu Zhang

Keyword(s):

Machine Learning ◽

Machine Learning Techniques ◽

Data Annotation ◽

Learning Techniques

Download Full-text

Using Machine Learning Algorithms on Prediction of Stock Price

Journal of Modeling and Optimization ◽

10.32732/jmo.2020.12.2.84 ◽

2020 ◽

Vol 12 (2) ◽

pp. 84-99

Author(s):

Li-Pang Chen

Keyword(s):

Machine Learning ◽

Stock Price ◽

Short Term Memory ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Short Term ◽

Learning Techniques ◽

Historical Database ◽

Long Short Term Memory

In this paper, we investigate analysis and prediction of the time-dependent data. We focus our attention on four different stocks are selected from Yahoo Finance historical database. To build up models and predict the future stock price, we consider three different machine learning techniques including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) and Support Vector Regression (SVR). By treating close price, open price, daily low, daily high, adjusted close price, and volume of trades as predictors in machine learning methods, it can be shown that the prediction accuracy is improved.

Download Full-text