Model Specification, Variable Selection, and Model Building

<div> <div> <div> <p>As interest grows in applying machine learning force-fields and methods to molecular simulation, there is a need for state-of-the-art inference methods to use trained models within efficient molecular simulation engines. We have designed and implemented software that enables integration of a scalable GPU-accelerated molecular mechanics engine, HOOMD-blue, with the machine learning (ML) TensorFlow package. TensorFlow is a GPU-accelerated, scalable, graph-based tensor computation model building package that has been the implementation of many recent innovations in deep learning and other ML tasks. TensorFlow models are constructed in Python and can be visualized or debugged using the rich set of tools implemented in the TensorFlow package. In this article, we present four major examples of tasks this software can accomplish which would normally require multiple different tools: (1) we train a neural network to reproduce a force field of a Lennard-Jones simulation; (2) we perform online force matching of methanol; (3) we compute the maximum entropy bias of a Lennard-Jones collective variable; (4) we calculate the scattering profile of an ongoing TIP4P water molecular dynamics simulation. This work should accelerate both the design of new neural network based models in computational chemistry research and reproducible model specification by leveraging a widely-used ML package.</p></div></div></div>

Download Full-text

Two tales of variable selection for high dimensional regression: Screening and model building

Statistical Analysis and Data Mining The ASA Data Science Journal ◽

10.1002/sam.11219 ◽

2014 ◽

Vol 7 (2) ◽

pp. 140-159 ◽

Cited By ~ 6

Author(s):

Cong Liu ◽

Tao Shi ◽

Yoonkyung Lee

Keyword(s):

Variable Selection ◽

Model Building ◽

High Dimensional ◽

High Dimensional Regression ◽

Selection For

Download Full-text

Hipparcos minor planets: Towards an improvement of the model analysis by detecting influence factors

Symposium - International Astronomical Union ◽

10.1017/s0074180900127846 ◽

1996 ◽

Vol 172 ◽

pp. 447-450 ◽

Cited By ~ 2

Author(s):

M. L. Bougeard ◽

J.-F. Bange ◽

M. Mahfouz ◽

A. Bec-Borsenberger

Keyword(s):

Variable Selection ◽

Statistical Methods ◽

Preliminary Data ◽

Reference Frames ◽

Model Building ◽

Influence Factors ◽

Model Analysis ◽

Short Length ◽

Minor Planets

In order to evaluate a possible rotation between the Hipparcos and the dynamical reference frames, Hipparcos minor planets preliminary data are analysed. The resolution of the problem is very sensitive to correlations induced by the short length of the interval of observation. Several statistical methods are performed to appreciate the factors of bad conditioning. A procedure for variable selection and model building is given.

Download Full-text

Toward an Optimal Procedure for Variable Selection and QSAR Model Building

Journal of Chemical Information and Computer Sciences ◽

10.1021/ci010291a ◽

2001 ◽

Vol 41 (5) ◽

pp. 1218-1227 ◽

Cited By ~ 118

Author(s):

A. Yasri ◽

D. Hartsough

Keyword(s):

Variable Selection ◽

Model Building ◽

Qsar Model ◽

Optimal Procedure

Download Full-text

Specifications and Standards for Insect 3D Data

Biodiversity Information Science and Standards ◽

10.3897/biss.2.26561 ◽

2018 ◽

Vol 2 ◽

pp. e26561

Author(s):

Jiangning Wang ◽

Jing Ren ◽

Tianyu Xi ◽

Siqin Ge ◽

Liqiang Ji

Keyword(s):

3D Reconstruction ◽

3D Model ◽

Model Building ◽

3D Models ◽

Model Specification ◽

Specimen Preparation ◽

Reconstruction Process ◽

Specimen Collection ◽

3D Data ◽

Imaging Results

With the continuous development of imaging technology, the amount of insect 3D data is increasing, but research on data management is still virtually non-existent. This paper will discuss the specifications and standards relevant to the process of insect 3D data acquisition, processing and analysis. The collection of 3D data of insects includes specimen collection, sample preparation, image scanning specifications and 3D model specification. The specimen collection information uses existing biodiversity information standards such as Darwin Core. However, the 3D scanning process contains unique specifications for specimen preparation, depending on the scanning equipment, to achieve the best imaging results. Data processing of 3D images includes 3D reconstruction, tagging morphological structures (such as muscle and skeleton), and 3D model building. There are different algorithms in the 3D reconstruction process, but the processing results generally follow DICOM (Digital Imaging and Communications in Medicine) standards. There is no available standard for marking morphological structures, because this process is currently executed by individual researchers who create operational specifications according to their own needs. 3D models have specific file specifications, such as object files (https://en.wikipedia.org/wiki/Wavefront_.obj_file) and 3D max format (https://en.wikipedia.org/wiki/.3ds), which are widely used at present. There are only some simple tools for analysis of three-dimensional data and there are no specific standards or specifications in Audubon Core (https://terms.tdwg.org/wiki/Audubon_Core), the TDWG standard for biodiversity-related multi-media. There are very few 3D databases of animals at this time. Most of insect 3D data are created by individual entomologists and are not even stored in databases. Specifications for the management of insect 3D data need to be established step-by-step. Based on our attempt to construct a database of 3D insect data, we preliminarily discuss the necessary specifications.

Download Full-text

A GPU-Accelerated Machine Learning Framework for Molecular Simulation: Hoomd-Blue with TensorFlow

10.26434/chemrxiv.8019527.v1 ◽

2019 ◽

Author(s):

Rainier Barrett ◽

Maghesree Chakraborty ◽

Dilnoza Amirkulova ◽

Heta Gandhi ◽

Andrew White

Keyword(s):

Machine Learning ◽

Computational Models ◽

Model Building ◽

Automatic Differentiation ◽

Force Fields ◽

Model Specification ◽

Efficient Computation ◽

Collective Variables ◽

Chemistry Research ◽

The Rich

We have designed and implemented software that enables integration of a scalable GPU-accelerated molecular mechanics engine, Hoomd-blue, with the machine learning (ML) TensorFlow package. TensorFlow is a GPU accelerated, scalable, graph-based tensor computation model building package that has been the implementation of many recent innovations in deep learning and other ML tasks. Tensor computation graphs allow for designation of robust, flexible, and easily replicated computational models for a variety of tasks. Our plugin leverages the generality and speed of computational tensor graphs in TensorFlow to enable four previously challenging tasks in molecular dynamics: (1) the calculation of arbitrary force-fields including neural-network-based, stochastic, and/or automatically-generated force-fields which are differentiated from potential functions; (2) the efficient computation of arbitrary collective variables; (3) the biasing of simulations via automatic differentiation of collective variables and consequently the implementation of many free energy biasing methods; (4) ML on any of the above tasks, including coarse grain force fields, on-the-fly learned biases, and collective variable calculations. The TensorFlow models are constructed in Python and can be visualized or debugged using the rich set of tools implemented in the TensorFlow package. In this article, we present examples of the four major tasks this method can accomplish, benchmark data, and describe the architecture of our implementation. This method should lead to both the design of new models in computational chemistry research and reproducible model specification without requiring recompiling or writing low-level code. <br>

Download Full-text

Modeling acute urinary toxicity after radiotherapy for prostate cancer.

Journal of Clinical Oncology ◽

10.1200/jco.2014.32.4_suppl.156 ◽

2014 ◽

Vol 32 (4_suppl) ◽

pp. 156-156

Author(s):

Viviana Carillo ◽

Tiziana Rancati ◽

Cesare Cozzarini ◽

Sergio Villa ◽

Andrea Botti ◽

...

Keyword(s):

Prostate Cancer ◽

Risk Factors ◽

Variable Selection ◽

Model Building ◽

Selection Process ◽

Final Multivariable Model ◽

Improve Model ◽

Quantitative Analyses ◽

Moderate Hypofractionation ◽

Delivered Dose

156 Background: DUE-01 is a multi-centric observational study aimed at developing predictive models of genito-urinary toxicity and erectile dysfunction for prostate cancer patients treated with conventional (1.8-2Gy/fr, CONV) or moderate hypofractionation (2.5-2.7Gy/fr, HYPO). Current analysis focused on modelling the relationship between the risk of IPSS≥15 (IPSS15end) at the end of radiotherapy and clinincal/dosimetric risk factors. Methods: Planning data and relevant clinical factors were prospectively collected, including DVH/DSH referred to the whole treatment and to the weekly delivered dose (DVHw/DSHw). Best discriminating DVH/DSH parameters were selected by the differences between patients with/without IPSS15end=1 (t-test). Bootstrap variable selection techniques (300 resamples) in the framework of logistic backward feature selection was used to improve model building (El Naqa, IJROBP 2006). Graphical and quantitative analyses of the variable selection process applied to bootstrap data replicates was used to avoid underfitting/overfitting and to assess the final multivariable model. Results: 247 patients were available (CONV:116, HYPO:131). Seventy one out of 247 (28.7%) reported IPSS15end=1. The most predictive dosimetric tools were the absolute weekly delivered dose (DSHw and DVHw). DSHw and DVHw were alternatively inserted in the bootstrap variable selection flow, together with clinical risk factors. Due to the number of events, a logistic model containing six variables was accepted On the basis of observed frequency of variables in the top six positions, a model including basal IPSS (median OR=1.22, p=0.00001), use of anti-hypertensives (median OR=2.7, p=0.01), absolute bladder surface receiving more than 10.5 Gy/week (s10.5w, median OR=1.16, p=0.0001), and s12.5w (median OR=1.07, p=0.005), was choosen. AUC of this model was 0.80. Silmilar results were obtained when using DVHw. Conclusions: Basal IPSS, use of anti-hypertensive drugs, s10.5w/v10.5w and s12.5w/v12.5w are the main predictors of IPSS>=15 at the end of radiotherapy Bootstrap variable selection technique gives the modeler more insight into the importance and stability of the different variables selected and allows development of more robust models

Download Full-text