Forecasting Individual Aging Trajectories and Survival with an Interpretable Network Model

Spencer Farrell; Arnold Mitnitski; Kenneth Rockwood; Andrew Rutenberg

doi:10.1093/geroni/igaa057.3387

Forecasting Individual Aging Trajectories and Survival with an Interpretable Network Model

Innovation in Aging ◽

10.1093/geroni/igaa057.3387 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

pp. 923-923

Author(s):

Spencer Farrell ◽

Arnold Mitnitski ◽

Kenneth Rockwood ◽

Andrew Rutenberg

Keyword(s):

Linear Models ◽

Interaction Network ◽

Background Information ◽

Data Sets ◽

Health State ◽

Stochastic Dynamical System ◽

Medical Background ◽

Biological Variables ◽

Health Trajectories ◽

Standard Linear

Abstract We have built a computational model of individual aging trajectories of health and survival, containing physical, functional, and biological variables, conditioned on demographic, lifestyle, and medical background information. We combine techniques of modern machine learning with a network approach, where the health variables are coupled by an interaction network within a stochastic dynamical system. The resulting model is scalable to large longitudinal data sets, is predictive of individual high-dimensional health trajectories and survival, and infers an interpretable network of interactions between the health variables. The interaction network gives us the ability to identify which interactions between variables are used by the model, demonstrating that realistic physiological connections are inferred. We use English Longitudinal Study of Aging (ELSA) data to train our model and show that it performs better than standard linear models for health outcomes and survival, while also revealing the relevant interactions. Our model can be used to generate synthetic individuals that age realistically from input data at baseline, as well as the ability to probe future aging outcomes given an arbitrary initial health state.

Get full-text (via PubEx)

Interpretable Machine Learning of High-Dimensional Aging Health Trajectories

Innovation in Aging ◽

10.1093/geroni/igab046.2528 ◽

2021 ◽

Vol 5 (Supplement_1) ◽

pp. 676-676

Author(s):

Spencer Farrell ◽

Arnold Mitnitski ◽

Kenneth Rockwood ◽

Andrew Rutenberg

Keyword(s):

Machine Learning ◽

Linear Models ◽

Interaction Network ◽

Background Information ◽

High Dimensional ◽

Health State ◽

Health States ◽

Stochastic Dynamical System ◽

Biological Variables ◽

Health Trajectories

Abstract We have built a computational model of individual aging trajectories of health and survival, that contains physical, functional, and biological variables, and is conditioned on demographic, lifestyle, and medical background information. We combine techniques of modern machine learning with an interpretable network approach, where health variables are coupled by an explicit interaction network within a stochastic dynamical system. Our model is scalable to large longitudinal data sets, is predictive of individual high-dimensional health trajectories and survival from baseline health states, and infers an interpretable network of directed interactions between the health variables. The network identifies plausible physiological connections between health variables and clusters of strongly connected heath variables. We use English Longitudinal Study of Aging (ELSA) data to train our model and show that it performs better than traditional linear models for health outcomes and survival. Our model can also be used to generate synthetic individuals that age realistically, to impute missing data, and to simulate future aging outcomes given an arbitrary initial health state.

Get full-text (via PubEx)

Interpretable machine learning for high-dimensional trajectories of aging health

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009746 ◽

2022 ◽

Vol 18 (1) ◽

pp. e1009746

Author(s):

Spencer Farrell ◽

Arnold Mitnitski ◽

Kenneth Rockwood ◽

Andrew D. Rutenberg

Keyword(s):

Machine Learning ◽

Health Outcomes ◽

Linear Models ◽

Interaction Network ◽

Background Information ◽

High Dimensional ◽

Health States ◽

Stochastic Dynamical System ◽

Biological Variables ◽

Aging Health

We have built a computational model for individual aging trajectories of health and survival, which contains physical, functional, and biological variables, and is conditioned on demographic, lifestyle, and medical background information. We combine techniques of modern machine learning with an interpretable interaction network, where health variables are coupled by explicit pair-wise interactions within a stochastic dynamical system. Our dynamic joint interpretable network (DJIN) model is scalable to large longitudinal data sets, is predictive of individual high-dimensional health trajectories and survival from baseline health states, and infers an interpretable network of directed interactions between the health variables. The network identifies plausible physiological connections between health variables as well as clusters of strongly connected health variables. We use English Longitudinal Study of Aging (ELSA) data to train our model and show that it performs better than multiple dedicated linear models for health outcomes and survival. We compare our model with flexible lower-dimensional latent-space models to explore the dimensionality required to accurately model aging health outcomes. Our DJIN model can be used to generate synthetic individuals that age realistically, to impute missing data, and to simulate future aging outcomes given arbitrary initial health states.

Get full-text (via PubEx)

Fitting phenological curves with Generalized Linear Mixed Models (GLMMs)

10.1101/2020.06.01.127910 ◽

2020 ◽

Author(s):

Collin Edwards ◽

Elizabeth E. Crone

Keyword(s):

Mixed Models ◽

Ad Hoc ◽

Linear Models ◽

Generalized Linear Mixed Models ◽

Linear Mixed Models ◽

Data Sets ◽

Phenological Data ◽

Standard Linear ◽

Phenological Shifts ◽

Phenological Metrics

AbstractUnderstanding organismal phenology has been an emerging interest in ecology, in part because phenological shifts are one of the most conspicuous signs of climate change. While we are seeing increased collection of phenological data and creative use of historical data sets, existing statistical tools to measure phenology are generally either limited (e.g., first day of observation, which has problematic biases) or are challenging to implement (often requiring custom coding, or enough data to fit many parameters). We present a method to fit phenological data with Gaussian curves using linear models, and show how robust phenological metrics can be obtained using standard linear regression tools. We then apply this method to eight years of Baltimore checkerspot data using generalized linear mixed models (GLMMs). This case study illustrates the ability of years with extensive data to inform years with less data and shows that butterfly flight activity is somewhat earlier in warmer years. We believe our new method fills a convenient midpoint between ad hoc measures and custom-coded models.

Get full-text (via PubEx)

A top-level model of case-based argumentation for explanation: Formalisation and experiments

Argument & Computation ◽

10.3233/aac-210009 ◽

2021 ◽

pp. 1-36

Author(s):

Henry Prakken ◽

Rosa Ratsma

Keyword(s):

Machine Learning ◽

Decision Making ◽

Linear Models ◽

Evaluation Studies ◽

Data Sets ◽

Machine Learning Applications ◽

Level Model ◽

Similarities And Differences ◽

Further Development ◽

Case Based

This paper proposes a formal top-level model of explaining the outputs of machine-learning-based decision-making applications and evaluates it experimentally with three data sets. The model draws on AI & law research on argumentation with cases, which models how lawyers draw analogies to past cases and discuss their relevant similarities and differences in terms of relevant factors and dimensions in the problem domain. A case-based approach is natural since the input data of machine-learning applications can be seen as cases. While the approach is motivated by legal decision making, it also applies to other kinds of decision making, such as commercial decisions about loan applications or employee hiring, as long as the outcome is binary and the input conforms to this paper’s factor- or dimension format. The model is top-level in that it can be extended with more refined accounts of similarities and differences between cases. It is shown to overcome several limitations of similar argumentation-based explanation models, which only have binary features and do not represent the tendency of features towards particular outcomes. The results of the experimental evaluation studies indicate that the model may be feasible in practice, but that further development and experimentation is needed to confirm its usefulness as an explanation model. Main challenges here are selecting from a large number of possible explanations, reducing the number of features in the explanations and adding more meaningful information to them. It also remains to be investigated how suitable our approach is for explaining non-linear models.

Get full-text (via PubEx)

An Interoperability Platform Enabling Reuse of Electronic Health Records for Signal Verification Studies

BioMed Research International ◽

10.1155/2016/6741418 ◽

2016 ◽

Vol 2016 ◽

pp. 1-18 ◽

Cited By ~ 5

Author(s):

Mustafa Yuksel ◽

Suat Gonul ◽

Gokce Banu Laleci Erturkmen ◽

Ali Anil Sinaci ◽

Paolo Invernizzi ◽

...

Keyword(s):

Real Life ◽

Case Series ◽

Background Information ◽

Data Sets ◽

Local Data ◽

Lombardy Region ◽

Common Information ◽

Spontaneous Reports ◽

Wide Range ◽

Common Information Model

Depending mostly on voluntarily sent spontaneous reports, pharmacovigilance studies are hampered by low quantity and quality of patient data. Our objective is to improve postmarket safety studies by enabling safety analysts to seamlessly access a wide range of EHR sources for collecting deidentified medical data sets of selected patient populations and tracing the reported incidents back to original EHRs. We have developed an ontological framework where EHR sources and target clinical research systems can continue using their own local data models, interfaces, and terminology systems, while structural interoperability and Semantic Interoperability are handled through rule-based reasoning on formal representations of different models and terminology systems maintained in the SALUS Semantic Resource Set. SALUS Common Information Model at the core of this set acts as the common mediator. We demonstrate the capabilities of our framework through one of the SALUS safety analysis tools, namely, the Case Series Characterization Tool, which have been deployed on top of regional EHR Data Warehouse of the Lombardy Region containing about 1 billion records from 16 million patients and validated by several pharmacovigilance researchers with real-life cases. The results confirm significant improvements in signal detection and evaluation compared to traditional methods with the missing background information.

Get full-text (via PubEx)

Application of non-linear turbulence models in an engine-type flow configuration

International Journal of Engine Research ◽

10.1243/14680874jer00707 ◽

2007 ◽

Vol 8 (5) ◽

pp. 449-464 ◽

Cited By ~ 2

Author(s):

C. H. Son ◽

T. A. Shethaji ◽

C. J. Rutland ◽

H Barths ◽

A Lippert ◽

...

Keyword(s):

Experimental Data ◽

Renormalization Group ◽

Linear Models ◽

Combustion Engine ◽

Mean Flow ◽

Modest Improvement ◽

Non Linear ◽

Standard Linear ◽

Simple Flows ◽

Engine Type

Three non-linear k-ε models were implemented into the multi-dimensional computational fluid dynamics code GMTEC with the purpose of comparing them with existing linear k-ε models including renormalization group variations. The primary focus of the present study is to evaluate the potential of these non-linear models in engineering applications such as the internal combustion engine. The square duct flow and the backwards-facing step flow were two simple test cases chosen for which experimental data are available for comparison. Successful simulations for these cases were followed by simulations of an engine-type intake flow to evaluate the performance of the non-linear models in comparison with experimental data and the standard linear k-ε models as well as two renormalization group types. All the non-linear models are found to be an improvement over the standard linear model, but mostly in simple flows. For more complex flows, such as the engine-type case, only the cubic non-linear models appear to make a modest improvement in the mean flow but without any improvement in the root-mean-square values. These improvements are overshadowed by the stiffness of the cubic models and the requirements for smaller time steps. The contributions of each non-linear term to the Reynolds stress tensor are analysed in detail in order to identify the different characteristics of the different non-linear models for engine intake flows.

Get full-text (via PubEx)

ReactomeFIViz: the Reactome FI Cytoscape app for pathway and network-based data analysis

F1000Research ◽

10.12688/f1000research.4431.1 ◽

2014 ◽

Vol 3 ◽

pp. 146 ◽

Cited By ~ 2

Author(s):

Guanming Wu ◽

Eric Dawson ◽

Adrian Duong ◽

Robin Haw ◽

Lincoln Stein

Keyword(s):

Experimental Data ◽

Data Analysis ◽

Graphical Models ◽

High Throughput ◽

Interaction Network ◽

Large Data ◽

Relevant Information ◽

Data Sets ◽

Data Types ◽

Biological Studies

High-throughput experiments are routinely performed in modern biological studies. However, extracting meaningful results from massive experimental data sets is a challenging task for biologists. Projecting data onto pathway and network contexts is a powerful way to unravel patterns embedded in seemingly scattered large data sets and assist knowledge discovery related to cancer and other complex diseases. We have developed a Cytoscape app called “ReactomeFIViz”, which utilizes a highly reliable gene functional interaction network and human curated pathways from Reactome and other pathway databases. This app provides a suite of features to assist biologists in performing pathway- and network-based data analysis in a biologically intuitive and user-friendly way. Biologists can use this app to uncover network and pathway patterns related to their studies, search for gene signatures from gene expression data sets, reveal pathways significantly enriched by genes in a list, and integrate multiple genomic data types into a pathway context using probabilistic graphical models. We believe our app will give researchers substantial power to analyze intrinsically noisy high-throughput experimental data to find biologically relevant information.

Get full-text (via PubEx)

Denoising large-scale biological data using network filters

10.21203/rs.3.rs-66071/v2 ◽

2021 ◽

Author(s):

Andrew J Kavran ◽

Aaron Clauset

Keyword(s):

Large Scale ◽

Synthetic Data ◽

Interaction Network ◽

Learning Task ◽

Biological Data ◽

Data Sets ◽

Proteomics Data ◽

Life History Variation ◽

Wide Range ◽

Underlying Processes

Abstract Background: Large-scale biological data sets are often contaminated by noise, which can impede accurate inferences about underlying processes. Such measurement noise can arise from endogenous biological factors like cell cycle and life history variation, and from exogenous technical factors like sample preparation and instrument variation.Results: We describe a general method for automatically reducing noise in large-scale biological data sets. This method uses an interaction network to identify groups of correlated or anti-correlated measurements that can be combined or “ﬁltered” to better recover an underlying biological signal. Similar to the process of denoising an image, a single network ﬁlter may be applied to an entire system, or the system may be ﬁrst decomposed into distinct modules and a diﬀerent ﬁlter applied to each. Applied to synthetic data with known network structure and signal, network ﬁlters accurately reduce noise across a wide range of noise levels and structures. Applied to a machine learning task of predicting changes in human protein expression in healthy and cancerous tissues, network ﬁltering prior to training increases accuracy up to 43% compared to using unﬁltered data.Conclusions: Network ﬁlters are a general way to denoise biological data and can account for both correlation and anti-correlation between diﬀerent measurements. Furthermore, we ﬁnd that partitioning a network prior to ﬁltering can signiﬁcantly reduce errors in networks with heterogenous data and correlation patterns, and this approach outperforms existing diﬀusion based methods. Our results on proteomics data indicate the broad potential utility of network ﬁlters to applications in systems biology.

Get full-text (via PubEx)

Limits and Convergence properties of the Sequentially Markovian Coalescent

10.1101/2020.07.23.217091 ◽

2020 ◽

Author(s):

Thibaut Sellinger ◽

Diala Abu Awad ◽

Aurélien Tellier

Keyword(s):

Sequence Data ◽

Demographic History ◽

Simulated Data ◽

Simultaneous Estimation ◽

Data Sets ◽

Performance Limits ◽

Biological Variables ◽

Convergence Proofs ◽

New Interpretation ◽

Population Demographic

AbstractMany methods based on the Sequentially Markovian Coalescent (SMC) have been and are being developed. These methods make use of genome sequence data to uncover population demographic history. More recently, new methods have extended the original theoretical framework, allowing the simultaneous estimation of the demographic history and other biological variables. These methods can be applied to many different species, under different model assumptions, in hopes of unlocking the population/species evolutionary history. Although convergence proofs in particular cases have been given using simulated data, a clear outline of the performance limits of these methods is lacking. We here explore the limits of this methodology, as well as present a tool that can be used to help users quantify what information can be confidently retrieved from given datasets. In addition, we study the consequences for inference accuracy violating the hypotheses and the assumptions of SMC approaches, such as the presence of transposable elements, variable recombination and mutation rates along the sequence and SNP call errors. We also provide a new interpretation of the SMC through the use of the estimated transition matrix and offer recommendations for the most efficient use of these methods under budget constraints, notably through the building of data sets that would be better adapted for the biological question at hand.

Get full-text (via PubEx)

Construction and Analysis of Protein-Protein Interaction Network

Advances in Medical Technologies and Clinical Practice - Computer Applications in Drug Discovery and Development ◽

10.4018/978-1-5225-7326-5.ch009 ◽

2019 ◽

pp. 204-220

Author(s):

Divya Dasagrandhi ◽

Arul Salomee Kamalabai Ravindran ◽

Anusuyadevi Muthuswamy ◽

Jayachandran K. S.

Keyword(s):

Protein Interaction ◽

Protein Interaction Network ◽

Interaction Network ◽

Data Sets ◽

Protein Protein Interaction ◽

Huge Data ◽

Associated Proteins ◽

High Throughput Experiments ◽

Generation Sequencing ◽

Protein Protein Interaction Network

Understanding the mechanisms of a disease is highly complicated due to the complex pathways involved in the disease progression. Despite several decades of research, the occurrence and prognosis of the diseases is not completely understood even with high throughput experiments like DNA microarray and next-generation sequencing. This is due to challenges in analysis of huge data sets. Systems biology is one of the major divisions of bioinformatics and has laid cutting edge techniques for the better understanding of these pathways. Construction of protein-protein interaction network (PPIN) guides the modern scientists to identify vital proteins through protein-protein interaction network, which facilitates the identification of new drug target and associated proteins. The chapter is focused on PPI databases, construction of PPINs, and its analysis.

Get full-text (via PubEx)