Distributed learning: Developing a predictive model based on data from multiple hospitals without data leaving the hospital – A real life proof of concept

AbstractThe choice of the most appropriate unsupervised machine-learning method for “heterogeneous” or “mixed” data, i.e. with both continuous and categorical variables, can be challenging. Our aim was to examine the performance of various clustering strategies for mixed data using both simulated and real-life data. We conducted a benchmark analysis of “ready-to-use” tools in R comparing 4 model-based (Kamila algorithm, Latent Class Analysis, Latent Class Model [LCM] and Clustering by Mixture Modeling) and 5 distance/dissimilarity-based (Gower distance or Unsupervised Extra Trees dissimilarity followed by hierarchical clustering or Partitioning Around Medoids, K-prototypes) clustering methods. Clustering performances were assessed by Adjusted Rand Index (ARI) on 1000 generated virtual populations consisting of mixed variables using 7 scenarios with varying population sizes, number of clusters, number of continuous and categorical variables, proportions of relevant (non-noisy) variables and degree of variable relevance (low, mild, high). Clustering methods were then applied on the EPHESUS randomized clinical trial data (a heart failure trial evaluating the effect of eplerenone) allowing to illustrate the differences between different clustering techniques. The simulations revealed the dominance of K-prototypes, Kamila and LCM models over all other methods. Overall, methods using dissimilarity matrices in classical algorithms such as Partitioning Around Medoids and Hierarchical Clustering had a lower ARI compared to model-based methods in all scenarios. When applying clustering methods to a real-life clinical dataset, LCM showed promising results with regard to differences in (1) clinical profiles across clusters, (2) prognostic performance (highest C-index) and (3) identification of patient subgroups with substantial treatment benefit. The present findings suggest key differences in clustering performance between the tested algorithms (limited to tools readily available in R). In most of the tested scenarios, model-based methods (in particular the Kamila and LCM packages) and K-prototypes typically performed best in the setting of heterogeneous data.

Download Full-text

A Probabilistic Model for Real-Time Semantic Prediction of Human Motion Intentions from RGBD-Data

Sensors ◽

10.3390/s21124141 ◽

2021 ◽

Vol 21 (12) ◽

pp. 4141

Author(s):

Wouter Houtman ◽

Gosse Bijlenga ◽

Elena Torta ◽

René van de Molengraft

Keyword(s):

Real Time ◽

Collision Avoidance ◽

Probabilistic Model ◽

Real Life ◽

Human Motion ◽

Model Based ◽

Multiple Hypotheses ◽

Navigation Algorithms ◽

The Right ◽

Motion Behavior

For robots to execute their navigation tasks both fast and safely in the presence of humans, it is necessary to make predictions about the route those humans intend to follow. Within this work, a model-based method is proposed that relates human motion behavior perceived from RGBD input to the constraints imposed by the environment by considering typical human routing alternatives. Multiple hypotheses about routing options of a human towards local semantic goal locations are created and validated, including explicit collision avoidance routes. It is demonstrated, with real-time, real-life experiments, that a coarse discretization based on the semantics of the environment suffices to make a proper distinction between a person going, for example, to the left or the right on an intersection. As such, a scalable and explainable solution is presented, which is suitable for incorporation within navigation algorithms.

Download Full-text

The Application of Local Mean Decomposition and Variable Predictive Model-Based Class Discriminate in Gear Fault Diagnosis

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.1014.510 ◽

2014 ◽

Vol 1014 ◽

pp. 510-515 ◽

Cited By ~ 1

Author(s):

You Cai Xu ◽

Xin Shi Li ◽

Ran Tao ◽

Shu Guo ◽

Min Gou ◽

...

Keyword(s):

Fault Diagnosis ◽

Predictive Model ◽

Feature Vector ◽

Local Mean Decomposition ◽

Outer Race ◽

Model Based ◽

Gear Fault ◽

Local Mean ◽

Gear Fault Diagnosis ◽

Non Stationary Signal

The time-domain energy message conveyed by vibration signals of different gear fault are different, so a method based on local mean decomposition (LMD) and variable predictive model-based class discriminate (VPMCD) is proposed to diagnose gear fault model. The vibration signal of gear which is the research object in this paper is decomposed into a series of product functions (PF) by LMD method. Then a further analysis is to select the PF components which contain main fault information of gear, the energy feature parameters of the selected PF components are used to form a fault feature vector. The variable predictive model-based class discriminate is a new multivariate classification approach for pattern recognition, through taking fully advantages of the fault feature vector. Finally, gear fault diagnosis is distinguished into normal state, inner race fault and outer race fault. The results show that LMD method can decompose a complex non-stationary signal into a number of PF components whose frequency is from high to low. And the method based on LMD and VPMCD has a high fault recognition function by analyzing the fault feature vector of PF.

Download Full-text