scholarly journals Forecasting Individual Aging Trajectories and Survival with an Interpretable Network Model

2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 923-923
Author(s):  
Spencer Farrell ◽  
Arnold Mitnitski ◽  
Kenneth Rockwood ◽  
Andrew Rutenberg

Abstract We have built a computational model of individual aging trajectories of health and survival, containing physical, functional, and biological variables, conditioned on demographic, lifestyle, and medical background information. We combine techniques of modern machine learning with a network approach, where the health variables are coupled by an interaction network within a stochastic dynamical system. The resulting model is scalable to large longitudinal data sets, is predictive of individual high-dimensional health trajectories and survival, and infers an interpretable network of interactions between the health variables. The interaction network gives us the ability to identify which interactions between variables are used by the model, demonstrating that realistic physiological connections are inferred. We use English Longitudinal Study of Aging (ELSA) data to train our model and show that it performs better than standard linear models for health outcomes and survival, while also revealing the relevant interactions. Our model can be used to generate synthetic individuals that age realistically from input data at baseline, as well as the ability to probe future aging outcomes given an arbitrary initial health state.

2021 ◽  
Vol 5 (Supplement_1) ◽  
pp. 676-676
Author(s):  
Spencer Farrell ◽  
Arnold Mitnitski ◽  
Kenneth Rockwood ◽  
Andrew Rutenberg

Abstract We have built a computational model of individual aging trajectories of health and survival, that contains physical, functional, and biological variables, and is conditioned on demographic, lifestyle, and medical background information. We combine techniques of modern machine learning with an interpretable network approach, where health variables are coupled by an explicit interaction network within a stochastic dynamical system. Our model is scalable to large longitudinal data sets, is predictive of individual high-dimensional health trajectories and survival from baseline health states, and infers an interpretable network of directed interactions between the health variables. The network identifies plausible physiological connections between health variables and clusters of strongly connected heath variables. We use English Longitudinal Study of Aging (ELSA) data to train our model and show that it performs better than traditional linear models for health outcomes and survival. Our model can also be used to generate synthetic individuals that age realistically, to impute missing data, and to simulate future aging outcomes given an arbitrary initial health state.


2022 ◽  
Vol 18 (1) ◽  
pp. e1009746
Author(s):  
Spencer Farrell ◽  
Arnold Mitnitski ◽  
Kenneth Rockwood ◽  
Andrew D. Rutenberg

We have built a computational model for individual aging trajectories of health and survival, which contains physical, functional, and biological variables, and is conditioned on demographic, lifestyle, and medical background information. We combine techniques of modern machine learning with an interpretable interaction network, where health variables are coupled by explicit pair-wise interactions within a stochastic dynamical system. Our dynamic joint interpretable network (DJIN) model is scalable to large longitudinal data sets, is predictive of individual high-dimensional health trajectories and survival from baseline health states, and infers an interpretable network of directed interactions between the health variables. The network identifies plausible physiological connections between health variables as well as clusters of strongly connected health variables. We use English Longitudinal Study of Aging (ELSA) data to train our model and show that it performs better than multiple dedicated linear models for health outcomes and survival. We compare our model with flexible lower-dimensional latent-space models to explore the dimensionality required to accurately model aging health outcomes. Our DJIN model can be used to generate synthetic individuals that age realistically, to impute missing data, and to simulate future aging outcomes given arbitrary initial health states.


2020 ◽  
Author(s):  
Collin Edwards ◽  
Elizabeth E. Crone

AbstractUnderstanding organismal phenology has been an emerging interest in ecology, in part because phenological shifts are one of the most conspicuous signs of climate change. While we are seeing increased collection of phenological data and creative use of historical data sets, existing statistical tools to measure phenology are generally either limited (e.g., first day of observation, which has problematic biases) or are challenging to implement (often requiring custom coding, or enough data to fit many parameters). We present a method to fit phenological data with Gaussian curves using linear models, and show how robust phenological metrics can be obtained using standard linear regression tools. We then apply this method to eight years of Baltimore checkerspot data using generalized linear mixed models (GLMMs). This case study illustrates the ability of years with extensive data to inform years with less data and shows that butterfly flight activity is somewhat earlier in warmer years. We believe our new method fills a convenient midpoint between ad hoc measures and custom-coded models.


2021 ◽  
pp. 1-36
Author(s):  
Henry Prakken ◽  
Rosa Ratsma

This paper proposes a formal top-level model of explaining the outputs of machine-learning-based decision-making applications and evaluates it experimentally with three data sets. The model draws on AI & law research on argumentation with cases, which models how lawyers draw analogies to past cases and discuss their relevant similarities and differences in terms of relevant factors and dimensions in the problem domain. A case-based approach is natural since the input data of machine-learning applications can be seen as cases. While the approach is motivated by legal decision making, it also applies to other kinds of decision making, such as commercial decisions about loan applications or employee hiring, as long as the outcome is binary and the input conforms to this paper’s factor- or dimension format. The model is top-level in that it can be extended with more refined accounts of similarities and differences between cases. It is shown to overcome several limitations of similar argumentation-based explanation models, which only have binary features and do not represent the tendency of features towards particular outcomes. The results of the experimental evaluation studies indicate that the model may be feasible in practice, but that further development and experimentation is needed to confirm its usefulness as an explanation model. Main challenges here are selecting from a large number of possible explanations, reducing the number of features in the explanations and adding more meaningful information to them. It also remains to be investigated how suitable our approach is for explaining non-linear models.


2016 ◽  
Vol 2016 ◽  
pp. 1-18 ◽  
Author(s):  
Mustafa Yuksel ◽  
Suat Gonul ◽  
Gokce Banu Laleci Erturkmen ◽  
Ali Anil Sinaci ◽  
Paolo Invernizzi ◽  
...  

Depending mostly on voluntarily sent spontaneous reports, pharmacovigilance studies are hampered by low quantity and quality of patient data. Our objective is to improve postmarket safety studies by enabling safety analysts to seamlessly access a wide range of EHR sources for collecting deidentified medical data sets of selected patient populations and tracing the reported incidents back to original EHRs. We have developed an ontological framework where EHR sources and target clinical research systems can continue using their own local data models, interfaces, and terminology systems, while structural interoperability and Semantic Interoperability are handled through rule-based reasoning on formal representations of different models and terminology systems maintained in the SALUS Semantic Resource Set. SALUS Common Information Model at the core of this set acts as the common mediator. We demonstrate the capabilities of our framework through one of the SALUS safety analysis tools, namely, the Case Series Characterization Tool, which have been deployed on top of regional EHR Data Warehouse of the Lombardy Region containing about 1 billion records from 16 million patients and validated by several pharmacovigilance researchers with real-life cases. The results confirm significant improvements in signal detection and evaluation compared to traditional methods with the missing background information.


2007 ◽  
Vol 8 (5) ◽  
pp. 449-464 ◽  
Author(s):  
C. H. Son ◽  
T. A. Shethaji ◽  
C. J. Rutland ◽  
H Barths ◽  
A Lippert ◽  
...  

Three non-linear k-ε models were implemented into the multi-dimensional computational fluid dynamics code GMTEC with the purpose of comparing them with existing linear k-ε models including renormalization group variations. The primary focus of the present study is to evaluate the potential of these non-linear models in engineering applications such as the internal combustion engine. The square duct flow and the backwards-facing step flow were two simple test cases chosen for which experimental data are available for comparison. Successful simulations for these cases were followed by simulations of an engine-type intake flow to evaluate the performance of the non-linear models in comparison with experimental data and the standard linear k-ε models as well as two renormalization group types. All the non-linear models are found to be an improvement over the standard linear model, but mostly in simple flows. For more complex flows, such as the engine-type case, only the cubic non-linear models appear to make a modest improvement in the mean flow but without any improvement in the root-mean-square values. These improvements are overshadowed by the stiffness of the cubic models and the requirements for smaller time steps. The contributions of each non-linear term to the Reynolds stress tensor are analysed in detail in order to identify the different characteristics of the different non-linear models for engine intake flows.


F1000Research ◽  
2014 ◽  
Vol 3 ◽  
pp. 146 ◽  
Author(s):  
Guanming Wu ◽  
Eric Dawson ◽  
Adrian Duong ◽  
Robin Haw ◽  
Lincoln Stein

High-throughput experiments are routinely performed in modern biological studies. However, extracting meaningful results from massive experimental data sets is a challenging task for biologists. Projecting data onto pathway and network contexts is a powerful way to unravel patterns embedded in seemingly scattered large data sets and assist knowledge discovery related to cancer and other complex diseases. We have developed a Cytoscape app called “ReactomeFIViz”, which utilizes a highly reliable gene functional interaction network and human curated pathways from Reactome and other pathway databases. This app provides a suite of features to assist biologists in performing pathway- and network-based data analysis in a biologically intuitive and user-friendly way. Biologists can use this app to uncover network and pathway patterns related to their studies, search for gene signatures from gene expression data sets, reveal pathways significantly enriched by genes in a list, and integrate multiple genomic data types into a pathway context using probabilistic graphical models. We believe our app will give researchers substantial power to analyze intrinsically noisy high-throughput experimental data to find biologically relevant information.


2021 ◽  
Author(s):  
Andrew J Kavran ◽  
Aaron Clauset

Abstract Background: Large-scale biological data sets are often contaminated by noise, which can impede accurate inferences about underlying processes. Such measurement noise can arise from endogenous biological factors like cell cycle and life history variation, and from exogenous technical factors like sample preparation and instrument variation.Results: We describe a general method for automatically reducing noise in large-scale biological data sets. This method uses an interaction network to identify groups of correlated or anti-correlated measurements that can be combined or “filtered” to better recover an underlying biological signal. Similar to the process of denoising an image, a single network filter may be applied to an entire system, or the system may be first decomposed into distinct modules and a different filter applied to each. Applied to synthetic data with known network structure and signal, network filters accurately reduce noise across a wide range of noise levels and structures. Applied to a machine learning task of predicting changes in human protein expression in healthy and cancerous tissues, network filtering prior to training increases accuracy up to 43% compared to using unfiltered data.Conclusions: Network filters are a general way to denoise biological data and can account for both correlation and anti-correlation between different measurements. Furthermore, we find that partitioning a network prior to filtering can significantly reduce errors in networks with heterogenous data and correlation patterns, and this approach outperforms existing diffusion based methods. Our results on proteomics data indicate the broad potential utility of network filters to applications in systems biology.


2020 ◽  
Author(s):  
Thibaut Sellinger ◽  
Diala Abu Awad ◽  
Aurélien Tellier

AbstractMany methods based on the Sequentially Markovian Coalescent (SMC) have been and are being developed. These methods make use of genome sequence data to uncover population demographic history. More recently, new methods have extended the original theoretical framework, allowing the simultaneous estimation of the demographic history and other biological variables. These methods can be applied to many different species, under different model assumptions, in hopes of unlocking the population/species evolutionary history. Although convergence proofs in particular cases have been given using simulated data, a clear outline of the performance limits of these methods is lacking. We here explore the limits of this methodology, as well as present a tool that can be used to help users quantify what information can be confidently retrieved from given datasets. In addition, we study the consequences for inference accuracy violating the hypotheses and the assumptions of SMC approaches, such as the presence of transposable elements, variable recombination and mutation rates along the sequence and SNP call errors. We also provide a new interpretation of the SMC through the use of the estimated transition matrix and offer recommendations for the most efficient use of these methods under budget constraints, notably through the building of data sets that would be better adapted for the biological question at hand.


Author(s):  
Divya Dasagrandhi ◽  
Arul Salomee Kamalabai Ravindran ◽  
Anusuyadevi Muthuswamy ◽  
Jayachandran K. S.

Understanding the mechanisms of a disease is highly complicated due to the complex pathways involved in the disease progression. Despite several decades of research, the occurrence and prognosis of the diseases is not completely understood even with high throughput experiments like DNA microarray and next-generation sequencing. This is due to challenges in analysis of huge data sets. Systems biology is one of the major divisions of bioinformatics and has laid cutting edge techniques for the better understanding of these pathways. Construction of protein-protein interaction network (PPIN) guides the modern scientists to identify vital proteins through protein-protein interaction network, which facilitates the identification of new drug target and associated proteins. The chapter is focused on PPI databases, construction of PPINs, and its analysis.


Sign in / Sign up

Export Citation Format

Share Document