Interpretable machine learning for high-dimensional trajectories of aging health

We have built a computational model for individual aging trajectories of health and survival, which contains physical, functional, and biological variables, and is conditioned on demographic, lifestyle, and medical background information. We combine techniques of modern machine learning with an interpretable interaction network, where health variables are coupled by explicit pair-wise interactions within a stochastic dynamical system. Our dynamic joint interpretable network (DJIN) model is scalable to large longitudinal data sets, is predictive of individual high-dimensional health trajectories and survival from baseline health states, and infers an interpretable network of directed interactions between the health variables. The network identifies plausible physiological connections between health variables as well as clusters of strongly connected health variables. We use English Longitudinal Study of Aging (ELSA) data to train our model and show that it performs better than multiple dedicated linear models for health outcomes and survival. We compare our model with flexible lower-dimensional latent-space models to explore the dimensionality required to accurately model aging health outcomes. Our DJIN model can be used to generate synthetic individuals that age realistically, to impute missing data, and to simulate future aging outcomes given arbitrary initial health states.

Download Full-text

Interpretable Machine Learning of High-Dimensional Aging Health Trajectories

Innovation in Aging ◽

10.1093/geroni/igab046.2528 ◽

2021 ◽

Vol 5 (Supplement_1) ◽

pp. 676-676

Author(s):

Spencer Farrell ◽

Arnold Mitnitski ◽

Kenneth Rockwood ◽

Andrew Rutenberg

Keyword(s):

Machine Learning ◽

Linear Models ◽

Interaction Network ◽

Background Information ◽

High Dimensional ◽

Health State ◽

Health States ◽

Stochastic Dynamical System ◽

Biological Variables ◽

Health Trajectories

Abstract We have built a computational model of individual aging trajectories of health and survival, that contains physical, functional, and biological variables, and is conditioned on demographic, lifestyle, and medical background information. We combine techniques of modern machine learning with an interpretable network approach, where health variables are coupled by an explicit interaction network within a stochastic dynamical system. Our model is scalable to large longitudinal data sets, is predictive of individual high-dimensional health trajectories and survival from baseline health states, and infers an interpretable network of directed interactions between the health variables. The network identifies plausible physiological connections between health variables and clusters of strongly connected heath variables. We use English Longitudinal Study of Aging (ELSA) data to train our model and show that it performs better than traditional linear models for health outcomes and survival. Our model can also be used to generate synthetic individuals that age realistically, to impute missing data, and to simulate future aging outcomes given an arbitrary initial health state.

Download Full-text

Forecasting Individual Aging Trajectories and Survival with an Interpretable Network Model

Innovation in Aging ◽

10.1093/geroni/igaa057.3387 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

pp. 923-923

Author(s):

Spencer Farrell ◽

Arnold Mitnitski ◽

Kenneth Rockwood ◽

Andrew Rutenberg

Keyword(s):

Linear Models ◽

Interaction Network ◽

Background Information ◽

Data Sets ◽

Health State ◽

Stochastic Dynamical System ◽

Medical Background ◽

Biological Variables ◽

Health Trajectories ◽

Standard Linear

Abstract We have built a computational model of individual aging trajectories of health and survival, containing physical, functional, and biological variables, conditioned on demographic, lifestyle, and medical background information. We combine techniques of modern machine learning with a network approach, where the health variables are coupled by an interaction network within a stochastic dynamical system. The resulting model is scalable to large longitudinal data sets, is predictive of individual high-dimensional health trajectories and survival, and infers an interpretable network of interactions between the health variables. The interaction network gives us the ability to identify which interactions between variables are used by the model, demonstrating that realistic physiological connections are inferred. We use English Longitudinal Study of Aging (ELSA) data to train our model and show that it performs better than standard linear models for health outcomes and survival, while also revealing the relevant interactions. Our model can be used to generate synthetic individuals that age realistically from input data at baseline, as well as the ability to probe future aging outcomes given an arbitrary initial health state.

Download Full-text

Classification of Brainwaves for Sleep Stages by High-Dimensional FFT Features from EEG Signals

Applied Sciences ◽

10.3390/app10051797 ◽

2020 ◽

Vol 10 (5) ◽

pp. 1797 ◽

Cited By ~ 2

Author(s):

Mera Kartika Delimayanti ◽

Bedy Purnama ◽

Ngoc Giang Nguyen ◽

Mohammad Reza Faisal ◽

Kunti Robiatul Mahmudah ◽

...

Keyword(s):

Machine Learning ◽

Sleep Stage ◽

Machine Learning Algorithms ◽

High Dimensional ◽

Sleep Stages ◽

Eeg Signals ◽

Stage Classification ◽

Sleep Stage Classification ◽

Low Dimensional

Manual classification of sleep stage is a time-consuming but necessary step in the diagnosis and treatment of sleep disorders, and its automation has been an area of active study. The previous works have shown that low dimensional fast Fourier transform (FFT) features and many machine learning algorithms have been applied. In this paper, we demonstrate utilization of features extracted from EEG signals via FFT to improve the performance of automated sleep stage classification through machine learning methods. Unlike previous works using FFT, we incorporated thousands of FFT features in order to classify the sleep stages into 2–6 classes. Using the expanded version of Sleep-EDF dataset with 61 recordings, our method outperformed other state-of-the art methods. This result indicates that high dimensional FFT features in combination with a simple feature selection is effective for the improvement of automated sleep stage classification.

Download Full-text

A top-level model of case-based argumentation for explanation: Formalisation and experiments

Argument & Computation ◽

10.3233/aac-210009 ◽

2021 ◽

pp. 1-36

Author(s):

Henry Prakken ◽

Rosa Ratsma

Keyword(s):

Machine Learning ◽

Decision Making ◽

Linear Models ◽

Evaluation Studies ◽

Data Sets ◽

Machine Learning Applications ◽

Level Model ◽

Similarities And Differences ◽

Further Development ◽

Case Based

This paper proposes a formal top-level model of explaining the outputs of machine-learning-based decision-making applications and evaluates it experimentally with three data sets. The model draws on AI & law research on argumentation with cases, which models how lawyers draw analogies to past cases and discuss their relevant similarities and differences in terms of relevant factors and dimensions in the problem domain. A case-based approach is natural since the input data of machine-learning applications can be seen as cases. While the approach is motivated by legal decision making, it also applies to other kinds of decision making, such as commercial decisions about loan applications or employee hiring, as long as the outcome is binary and the input conforms to this paper’s factor- or dimension format. The model is top-level in that it can be extended with more refined accounts of similarities and differences between cases. It is shown to overcome several limitations of similar argumentation-based explanation models, which only have binary features and do not represent the tendency of features towards particular outcomes. The results of the experimental evaluation studies indicate that the model may be feasible in practice, but that further development and experimentation is needed to confirm its usefulness as an explanation model. Main challenges here are selecting from a large number of possible explanations, reducing the number of features in the explanations and adding more meaningful information to them. It also remains to be investigated how suitable our approach is for explaining non-linear models.

Download Full-text

Forecasting and trading cryptocurrencies with machine learning under changing market conditions

Financial Innovation ◽

10.1186/s40854-020-00217-x ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Helder Sebastião ◽

Pedro Godinho

Keyword(s):

Machine Learning ◽

Linear Models ◽

Test Sample ◽

Trading Strategies ◽

Network Activity ◽

Machine Learning Techniques ◽

Support Vector ◽

Success Rates ◽

Market Conditions ◽

Sharpe Ratios

AbstractThis study examines the predictability of three major cryptocurrencies—bitcoin, ethereum, and litecoin—and the profitability of trading strategies devised upon machine learning techniques (e.g., linear models, random forests, and support vector machines). The models are validated in a period characterized by unprecedented turmoil and tested in a period of bear markets, allowing the assessment of whether the predictions are good even when the market direction changes between the validation and test periods. The classification and regression methods use attributes from trading and network activity for the period from August 15, 2015 to March 03, 2019, with the test sample beginning on April 13, 2018. For the test period, five out of 18 individual models have success rates of less than 50%. The trading strategies are built on model assembling. The ensemble assuming that five models produce identical signals (Ensemble 5) achieves the best performance for ethereum and litecoin, with annualized Sharpe ratios of 80.17% and 91.35% and annualized returns (after proportional round-trip trading costs of 0.5%) of 9.62% and 5.73%, respectively. These positive results support the claim that machine learning provides robust techniques for exploring the predictability of cryptocurrencies and for devising profitable trading strategies in these markets, even under adverse market conditions.

Download Full-text

Prediction in high‐dimensional linear models and application to genomic selection under imperfect linkage disequilibrium

Journal of the Royal Statistical Society Series C (Applied Statistics) ◽

10.1111/rssc.12496 ◽

2021 ◽

Author(s):

Charles‐Elie Rabier ◽

Simona Grusea

Keyword(s):

Linkage Disequilibrium ◽

Genomic Selection ◽

Linear Models ◽

High Dimensional

Download Full-text

Time Series-Analysis Based Engineering of High-Dimensional Wide-Area Stability Indices for Machine Learning

IEEE Access ◽

10.1109/access.2021.3099459 ◽

2021 ◽

pp. 1-1

Author(s):

Raoult Teukam Dabou ◽

Innocent Kamwa ◽

C. Y. Chung ◽

C. F. Mugombozi

Keyword(s):

Machine Learning ◽

Time Series ◽

Time Series Analysis ◽

Wide Area ◽

High Dimensional ◽

Series Analysis ◽

Stability Indices

Download Full-text

Scalable Machine Learning on High-Dimensional Vectors

Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics ◽

10.1145/3405962.3405989 ◽

2020 ◽

Author(s):

Karima Echihabi ◽

Kostas Zoumpatianos ◽

Themis Palpanas

Keyword(s):

Machine Learning ◽

High Dimensional

Download Full-text

Application of Machine Learning to Include Honking Effect in Vehicular Traffic Noise Prediction

Applied Sciences ◽

10.3390/app11136030 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6030

Author(s):

Daljeet Singh ◽

Antonella B. Francavilla ◽

Simona Mancini ◽

Claudio Guarnaccia

Keyword(s):

Machine Learning ◽

Linear Models ◽

Prediction Models ◽

Road Traffic ◽

Traffic Noise ◽

Machine Learning Techniques ◽

Coefficient Of Determination ◽

Road Traffic Noise ◽

Noise Prediction ◽

Volume Percentage

A vehicular road traffic noise prediction methodology based on machine learning techniques has been presented. The road traffic parameters that have been considered are traffic volume, percentage of heavy vehicles, honking occurrences and the equivalent continuous sound pressure level. Leq A method to include the honking effect in the traffic noise prediction has been illustrated. The techniques that have been used for the prediction of traffic noise are decision trees, random forests, generalized linear models and artificial neural networks. The results obtained by using these methods have been compared on the basis of mean square error, correlation coefficient, coefficient of determination and accuracy. It has been observed that honking is an important parameter and contributes to the overall traffic noise, especially in congested Indian road traffic conditions. The effects of honking noise on the human health cannot be ignored and it should be included as a parameter in the future traffic noise prediction models.

Download Full-text

Data Quality Measures and Efficient Evaluation Algorithms for Large-Scale High-Dimensional Data

Applied Sciences ◽

10.3390/app11020472 ◽

2021 ◽

Vol 11 (2) ◽

pp. 472

Author(s):

Hyeongmin Cho ◽

Sangkyun Lee

Keyword(s):

Machine Learning ◽

Data Quality ◽

Large Scale ◽

High Dimensional Data ◽

Quality Measures ◽

Training Data ◽

Measure Data ◽

High Dimensional ◽

Small Scale ◽

Class Separability

Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many datasets are being disclosed and published online. From a data consumer or manager point of view, measuring data quality is an important first step in the learning process. We need to determine which datasets to use, update, and maintain. However, not many practical ways to measure data quality are available today, especially when it comes to large-scale high-dimensional data, such as images and videos. This paper proposes two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset. Classical data quality measures tend to focus only on class separability; however, we suggest that in-class variability is another important data quality factor. We provide efficient algorithms to compute our quality measures based on random projections and bootstrapping with statistical benefits on large-scale high-dimensional data. In experiments, we show that our measures are compatible with classical measures on small-scale data and can be computed much more efficiently on large-scale high-dimensional datasets.

Download Full-text