scholarly journals Marrying Medical Domain Knowledge With Deep Learning on Electronic Health Records: A Deep Visual Analytics Approach (Preprint)

2020 ◽  
Author(s):  
Rui Li ◽  
Changchang Yin ◽  
Samuel Yang ◽  
Buyue Qian ◽  
Ping Zhang

BACKGROUND Deep learning models have attracted significant interest from health care researchers during the last few decades. There have been many studies that apply deep learning to medical applications and achieve promising results. However, there are three limitations to the existing models: (1) most clinicians are unable to interpret the results from the existing models, (2) existing models cannot incorporate complicated medical domain knowledge (eg, a disease causes another disease), and (3) most existing models lack visual exploration and interaction. Both the electronic health record (EHR) data set and the deep model results are complex and abstract, which impedes clinicians from exploring and communicating with the model directly. OBJECTIVE The objective of this study is to develop an interpretable and accurate risk prediction model as well as an interactive clinical prediction system to support EHR data exploration, knowledge graph demonstration, and model interpretation. METHODS A domain-knowledge–guided recurrent neural network (DG-RNN) model is proposed to predict clinical risks. The model takes medical event sequences as input and incorporates medical domain knowledge by attending to a subgraph of the whole medical knowledge graph. A global pooling operation and a fully connected layer are used to output the clinical outcomes. The middle results and the parameters of the fully connected layer are helpful in identifying which medical events cause clinical risks. DG-Viz is also designed to support EHR data exploration, knowledge graph demonstration, and model interpretation. RESULTS We conducted both risk prediction experiments and a case study on a real-world data set. A total of 554 patients with heart failure and 1662 control patients without heart failure were selected from the data set. The experimental results show that the proposed DG-RNN outperforms the state-of-the-art approaches by approximately 1.5%. The case study demonstrates how our medical physician collaborator can effectively explore the data and interpret the prediction results using DG-Viz. CONCLUSIONS In this study, we present DG-Viz, an interactive clinical prediction system, which brings together the power of deep learning (ie, a DG-RNN–based model) and visual analytics to predict clinical risks and visually interpret the EHR prediction results. Experimental results and a case study on heart failure risk prediction tasks demonstrate the effectiveness and usefulness of the DG-Viz system. This study will pave the way for interactive, interpretable, and accurate clinical risk predictions.

10.2196/20645 ◽  
2020 ◽  
Vol 22 (9) ◽  
pp. e20645
Author(s):  
Rui Li ◽  
Changchang Yin ◽  
Samuel Yang ◽  
Buyue Qian ◽  
Ping Zhang

Background Deep learning models have attracted significant interest from health care researchers during the last few decades. There have been many studies that apply deep learning to medical applications and achieve promising results. However, there are three limitations to the existing models: (1) most clinicians are unable to interpret the results from the existing models, (2) existing models cannot incorporate complicated medical domain knowledge (eg, a disease causes another disease), and (3) most existing models lack visual exploration and interaction. Both the electronic health record (EHR) data set and the deep model results are complex and abstract, which impedes clinicians from exploring and communicating with the model directly. Objective The objective of this study is to develop an interpretable and accurate risk prediction model as well as an interactive clinical prediction system to support EHR data exploration, knowledge graph demonstration, and model interpretation. Methods A domain-knowledge–guided recurrent neural network (DG-RNN) model is proposed to predict clinical risks. The model takes medical event sequences as input and incorporates medical domain knowledge by attending to a subgraph of the whole medical knowledge graph. A global pooling operation and a fully connected layer are used to output the clinical outcomes. The middle results and the parameters of the fully connected layer are helpful in identifying which medical events cause clinical risks. DG-Viz is also designed to support EHR data exploration, knowledge graph demonstration, and model interpretation. Results We conducted both risk prediction experiments and a case study on a real-world data set. A total of 554 patients with heart failure and 1662 control patients without heart failure were selected from the data set. The experimental results show that the proposed DG-RNN outperforms the state-of-the-art approaches by approximately 1.5%. The case study demonstrates how our medical physician collaborator can effectively explore the data and interpret the prediction results using DG-Viz. Conclusions In this study, we present DG-Viz, an interactive clinical prediction system, which brings together the power of deep learning (ie, a DG-RNN–based model) and visual analytics to predict clinical risks and visually interpret the EHR prediction results. Experimental results and a case study on heart failure risk prediction tasks demonstrate the effectiveness and usefulness of the DG-Viz system. This study will pave the way for interactive, interpretable, and accurate clinical risk predictions.


Author(s):  
Lokukaluge P. Perera ◽  
Brage Mo

Ocean internet of things (IoT - onboard and onshore) collects big data sets of ship performance and navigation information under various data handling processes. That extract vessel performance and navigation information that are used for ship energy efficiency and emission control applications. However, the quality of ship performance and navigation data can play an important role in such applications, where sensor faults may introduce various erroneous data regions and that may degrade to the outcome. This study proposes visual analytics, where hidden data patterns, clusters, correlations and other useful information are visually from the respective data set extracted, to identify such erroneous data regions. The domain knowledge (i.e. ship performance and navigation conditions) has also been used to interpret such erroneous data regions and identify the respective sensors that relate to the same situations. Finally, a ship performance and navigation data set of a selected vessel is analyzed to identify erroneous data regions for three selected sensor fault situations (i.e. wind, log speed and draft sensors) under the proposed visual analytics. Hence, this approach can be categorized as a sensor specific fault detection methodology by considering the same results.


Different mathematical models, Artificial Intelligence approach and Past recorded data set is combined to formulate Machine Learning. Machine Learning uses different learning algorithms for different types of data and has been classified into three types. The advantage of this learning is that it uses Artificial Neural Network and based on the error rates, it adjusts the weights to improve itself in further epochs. But, Machine Learning works well only when the features are defined accurately. Deciding which feature to select needs good domain knowledge which makes Machine Learning developer dependable. The lack of domain knowledge affects the performance. This dependency inspired the invention of Deep Learning. Deep Learning can detect features through self-training models and is able to give better results compared to using Artificial Intelligence or Machine Learning. It uses different functions like ReLU, Gradient Descend and Optimizers, which makes it the best thing available so far. To efficiently apply such optimizers, one should have the knowledge of mathematical computations and convolutions running behind the layers. It also uses different pooling layers to get the features. But these Modern Approaches need high level of computation which requires CPU and GPUs. In case, if, such high computational power, if hardware is not available then one can use Google Colaboratory framework. The Deep Learning Approach is proven to improve the skin cancer detection as demonstrated in this paper. The paper also aims to provide the circumstantial knowledge to the reader of various practices mentioned above.


2021 ◽  
Author(s):  
Zhisheng Yang ◽  
Jinyong Cheng

Abstract In recommendation algorithms, data sparsity and cold start problems are always inevitable. In order to solve such problems, researchers apply auxiliary information to recommendation algorithms to mine and obtain more potential information through users' historical records and then improve recommendation performance. This paper proposes a model ST_RippleNet, which combines knowledge graph with deep learning. In this model, users' potential interests are mined in the knowledge graph to stimulate the propagation of users' preferences on the set of knowledge entities. In the propagation of preferences, we adopt a triple-based multi-layer attention mechanism, and the distribution of users' preferences for candidate items formed by users' historical click information is used to predict the final click probability. In ST_RippleNet model, music data set is added to the original movie and book data set, and the improved loss function is applied to the model, which is optimized by RMSProp optimizer. Finally, tanh function is added to predict click probability to improve recommendation performance. Compared with the current mainstream recommendation methods, ST_RippleNet recommendation algorithm has very good performance in AUC and ACC, and has substantial improvement in movie, book and music recommendation.


2020 ◽  
Vol 197 ◽  
pp. 105765
Author(s):  
Branimir Ljubic ◽  
Shoumik Roychoudhury ◽  
Xi Hang Cao ◽  
Martin Pavlovski ◽  
Stefan Obradovic ◽  
...  

2009 ◽  
Vol 18 (01) ◽  
pp. 81-98 ◽  
Author(s):  
MARTIN ATZMUELLER ◽  
FRANK PUPPE ◽  
HANS-PETER BUSCHER

This paper presents a semi-automatic approach for confounding-aware subgroup discovery: Confounding essentially disturbs the measured effect of an association between variables due to the influence of other parameters that were not considered. The proposed method is embedded into a general subgroup discovery approach, and provides the means for detecting potentially confounded subgroup patterns, other unconfounded relations, and/or patterns that are affected by effect-modification. Since there is no purely automatic test for confounding, the discovered relations are presented to the user in a semi-automatic approach. Furthermore, we utilize (causal) domain knowledge for improving the results of the algorithm, since confounding is itself a causal concept. The applicability and benefit of the presented technique is illustrated by real-world examples from a case-study in the medical domain.


Semantic Web ◽  
2021 ◽  
pp. 1-20
Author(s):  
Pierre Monnin ◽  
Chedy Raïssi ◽  
Amedeo Napoli ◽  
Adrien Coulet

Knowledge graphs are freely aggregated, published, and edited in the Web of data, and thus may overlap. Hence, a key task resides in aligning (or matching) their content. This task encompasses the identification, within an aggregated knowledge graph, of nodes that are equivalent, more specific, or weakly related. In this article, we propose to match nodes within a knowledge graph by (i) learning node embeddings with Graph Convolutional Networks such that similar nodes have low distances in the embedding space, and (ii) clustering nodes based on their embeddings, in order to suggest alignment relations between nodes of a same cluster. We conducted experiments with this approach on the real world application of aligning knowledge in the field of pharmacogenomics, which motivated our study. We particularly investigated the interplay between domain knowledge and GCN models with the two following focuses. First, we applied inference rules associated with domain knowledge, independently or combined, before learning node embeddings, and we measured the improvements in matching results. Second, while our GCN model is agnostic to the exact alignment relations (e.g., equivalence, weak similarity), we observed that distances in the embedding space are coherent with the “strength” of these different relations (e.g., smaller distances for equivalences), letting us considering clustering and distances in the embedding space as a means to suggest alignment relations in our case study.


2021 ◽  
Author(s):  
Jose Luis Guadiana Alvare ◽  
Ruben Morales-Menendez ◽  
Fida Hussain ◽  
Etna Rojas Flores ◽  
Arturo García Zendejas ◽  
...  

Abstract Background: Prognostics study the prediction of an event before it happens, to enable critical decision making to be more efficient. The prognostics are very useful for front line physicians to predict how a disease may affect a patient and react accordingly to save the patients’ lives. The coronavirus (COVID-19) is novel and not enough knowledge about the virus’ behaviour and Key performance indicators (KPIs) to assess the mortality risk prediction. However, using a lot of complex and expensive medical biomarkers could be impossible for many low-budget hospitals. This motivates the development of a prediction model that not only maximizes performance but does so using the least number of biomarkers possible. Methods: For the mortality risk prediction, this research work proposes aCOVID-19 mortality risk calculator based on a Deep Learning (DL) model, and based on a data set provided by the HM Hospitals from Madrid, Spain. A pre-processing strategy for unbalanced classes and feature selection is proposed. Results: The DL model is tested, and the following results are achieved include area under the curve (AUC) 0.93, F2 score 0.93, recall 1.00, accuracy, 0.95, precision 0.91, specificity 0.9279 and maximum probability of correct decision(MPCD) 0.93. Conclusion: The MPCD score shows that the proposed DL outperforms on the everyday set when evaluating even with an over-sampling technique. The benefits of imputating unavailable biomarker data are also evaluated. The results are compared against a random forest (RF) algorithm and the newly proposed methods. The results show that the proposed method is significantly best for the risk prediction of the patients with COVID-19.


Sign in / Sign up

Export Citation Format

Share Document