scholarly journals Deeptime: a Python library for machine learning dynamical models from time series data

Author(s):  
Moritz Hoffmann ◽  
Martin Konrad Scherer ◽  
Tim Hempel ◽  
Andreas Mardt ◽  
Brian de Silva ◽  
...  

Abstract Generation and analysis of time-series data is relevant to many quantitative fields ranging from economics to fluid mechanics. In the physical sciences, structures such as metastable and coherent sets, slow relaxation processes, collective variables, dominant transition pathways or manifolds and channels of probability flow can be of great importance for understanding and characterizing the kinetic, thermodynamic and mechanistic properties of the system. Deeptime is a general purpose Python library offering various tools to estimate dynamical models based on time-series data including conventional linear learning methods, such as Markov state models (MSMs), Hidden Markov Models and Koopman models, as well as kernel and deep learning approaches such as VAMPnets and deep MSMs. The library is largely compatible with scikit-learn, having a range of Estimator classes for these different models, but in contrast to scikit-learn also provides deep Model classes, e.g. in the case of an MSM, which provide a multitude of analysis methods to compute interesting thermodynamic, kinetic and dynamical quantities, such as free energies, relaxation times and transition paths. The library is designed for ease of use but also easily maintainable and extensible code. In this paper we introduce the main features and structure of the deeptime software. Deeptime can be found under https://deeptime-ml.github.io/.

Author(s):  
Nobuhiko Yamaguchi ◽  

Gaussian Process Dynamical Models (GPDMs) constitute a nonlinear dimensionality reduction technique that provides a probabilistic representation of time series data in terms of Gaussian process priors. In this paper, we report a method based on GPDMs to visualize the states of time-series data. Conventional GPDMs are unsupervised, and therefore, even when the labels of data are available, it is not possible to use this information. To overcome the problem, we propose a supervised GPDM (S-GPDM) that utilizes both the data and their corresponding labels. We demonstrate experimentally that the S-GPDM can locate related motion data closer together than conventional GPDMs.


2018 ◽  
Author(s):  
Elijah Bogart ◽  
Richard Creswell ◽  
Georg K. Gerber

AbstractLongitudinal studies are crucial for discovering casual relationships between the microbiome and human disease. We present Microbiome Interpretable Temporal Rule Engine (MITRE), the first machine learning method specifically designed for predicting host status from microbiome time-series data. Our method maintains interpretability by learning predictive rules over automatically inferred time-periods and phylogenetically related microbes. We validate MITRE’s performance on semi-synthetic data, and five real datasets measuring microbiome composition over time in infant and adult cohorts. Our results demonstrate that MITRE performs on par or outperforms “black box” machine learning approaches, providing a powerful new tool enabling discovery of biologically interpretable relationships between microbiome and human host.


Author(s):  
S. Park ◽  
J. Im

Many satellite sensors including Landsat series have been extensively used for land cover classification. Studies have been conducted to mitigate classification problems associated with the use of single data (e.g., such as cloud contamination) through multi-sensor data fusion and the use of time series data. This study investigated two areas with different environment and climate conditions: one in South Korea and the other in US. Cropland classification was conducted by using multi-temporal Landsat 5, Radarsat-1 and digital elevation models (DEM) based on two machine learning approaches (i.e., random forest and support vector machines). Seven classification scenarios were examined and evaluated through accuracy assessment. Results show that SVM produced the best performance (overall accuracy of 93.87%) when using all temporal and spectral data as input variables. Normalized Difference Water Index (NDWI), SAR backscattering, and Normalized Difference Vegetation Index (NDVI) were identified as more contributing variables than the others for cropland classification.


2021 ◽  
Vol 13 (2) ◽  
pp. 296
Author(s):  
Xing Jin ◽  
Ping Tang ◽  
Thomas Houet ◽  
Thomas Corpetti ◽  
Emilien Gence Alvarez-Vanhard ◽  
...  

Remote-sensing time-series data are significant for global environmental change research and a better understanding of the Earth. However, remote-sensing acquisitions often provide sparse time series due to sensor resolution limitations and environmental factors, such as cloud noise for optical data. Image interpolation is the method that is often used to deal with this issue. This paper considers the deep learning method to learn the complex mapping of an interpolated intermediate image from predecessor and successor images, called separable convolution network for sequence image interpolation. The separable convolution network uses a separable 1D convolution kernel instead of 2D kernels to capture the spatial characteristics of input sequence images and then is trained end-to-end using sequence images. Our experiments, which were performed with unmanned aerial vehicle (UAV) and Landsat-8 datasets, show that the method is effective to produce high-quality time-series interpolated images, and the data-driven deep model can better simulate complex and diverse nonlinear image data information.


2011 ◽  
Vol 9 (70) ◽  
pp. 957-971 ◽  
Author(s):  
Shai Revzen ◽  
John M. Guckenheimer

Dynamical systems with asymptotically stable periodic orbits are generic models for rhythmic processes in dissipative physical systems. This paper presents a method for reconstructing the dynamics near a periodic orbit from multivariate time-series data. It is used to test theories about the control of legged locomotion, a context in which time series are short when compared with previous work in nonlinear time-series analysis. The method presented here identifies appropriate dimensions of reduced order models for the deterministic portion of the dynamics. The paper also addresses challenges inherent in identifying dynamical models with data from different individuals.


2016 ◽  
Author(s):  
Matthew P. Harrigan ◽  
Mohammad M. Sultan ◽  
Carlos X. Hernández ◽  
Brooke E. Husic ◽  
Peter Eastman ◽  
...  

MSMBuilder is a software package for building statistical models of high-dimensional time-series data. It is designed with a particular focus on the analysis of atomistic simulations of biomolecular dynamics such as protein folding and conformational change. MSMBuilder is named for its ability to construct Markov State Models (MSMs), a class of models that has gained favor among computational biophysicists. In addition to both well-established and newer MSM methods, the package includes complementary algorithms for understanding time-series data such as hidden Markov models (HMMs) and time-structure based independent component analysis (tICA). MSMBuilder boasts an easy to use command-line interface, as well as clear and consistent abstractions through its Python API (application programming interface). MSMBuilder is developed with careful consideration for compatibility with the broader machine-learning community by following the design of scikit-learn. The package is used primarily by practitioners of molecular dynamics but is just as applicable to other computational or experimental time-series measurements. http://msmbuilder.org


Author(s):  
S. Park ◽  
J. Im

Many satellite sensors including Landsat series have been extensively used for land cover classification. Studies have been conducted to mitigate classification problems associated with the use of single data (e.g., such as cloud contamination) through multi-sensor data fusion and the use of time series data. This study investigated two areas with different environment and climate conditions: one in South Korea and the other in US. Cropland classification was conducted by using multi-temporal Landsat 5, Radarsat-1 and digital elevation models (DEM) based on two machine learning approaches (i.e., random forest and support vector machines). Seven classification scenarios were examined and evaluated through accuracy assessment. Results show that SVM produced the best performance (overall accuracy of 93.87%) when using all temporal and spectral data as input variables. Normalized Difference Water Index (NDWI), SAR backscattering, and Normalized Difference Vegetation Index (NDVI) were identified as more contributing variables than the others for cropland classification.


2018 ◽  
Author(s):  
Tal Zinger ◽  
Pleuni S. Pennings ◽  
Adi Stern

1AbstractWith the advent of deep sequencing techniques, it is now possible to track the evolution of viruses with ever-increasing detail. Here we present FITS (Flexible Inference from Time-Series) – a computational framework that allows inference of either the fitness of a mutation, the mutation rate or the population size from genomic time-series sequencing data. FITS was designed first and foremost for analysis of either short-term Evolve & Resequence (E&R) experiments, or for rapidly recombining populations of viruses. We thoroughly explore the performance of FITS on noisy simulated data, and highlight its ability to infer meaningful information even in those circumstances. In particular FITS is able to categorize a mutation as Advantageous, Neutral or Deleterious. We next apply FITS to empirical data from an E&R experiment on poliovirus where parameters were determined experimentally and demonstrate extremely high accuracy in inference. We highlight the ease of use of FITS for step-wise or iterative inference of mutation rates, population size, and fitness values for each mutation sequenced, when deep sequencing data is available at multiple time-points.AvailabilityFITS is written in C++ and is available both with a highly user friendly graphical user interface but also as a command line program that allows parallel high throughput analyses. Source code, binaries (Windows and Mac) and complementary scripts, are available from GitHub at https://github.com/SternLabTAU/[email protected]


Sign in / Sign up

Export Citation Format

Share Document