scholarly journals Explanation and Prediction of Clinical Data with Imbalanced Class Distribution based on Pattern Discovery and Disentanglement

2020 ◽  
Author(s):  
Peiyuan Zhou ◽  
Andrew K.C. Wong

Abstract Background Statistical data analysis, especially the advanced machine learning (ML) methods, have attracted considerable interest and application in clinical practices. First, the interpretability of the diagnostic/prognostic results will bring confidence to doctors, patients and their relatives in therapeutics and clinical practice. Furthermore, from the clinical aspect, when the datasets are imbalanced in diagnostic categories, the ordinary ML methods might produce results overwhelmed by the majority classes diminishing prediction accuracy. Hence, it is desirable to have a method that could produce explicit transparent and interpretable results in decision-making, even for data with imbalanced groups.Methods In order to interpret the clinical patterns and conduct diagnostic prediction of patients, we present our new method, Pattern Discovery and Disentanglement for Clinical Data Analysis (cPDD), which is able to discover patterns (correlated traits/indicants) and use them to classify clinical data even if the class distribution is imbalanced. In the most general setting, a relational dataset is a large table such that each column represents an attribute (trait/indicant), each row contains a set of attribute values (AVs) of an entity (patient). Compared to the existing pattern discovery approaches, cPDD can discover a small and succinct set of statistically significant high-order patterns from clinical data for interpreting and predicting the disease class of the patients even for small and rare groups.Results Experiments on synthetic and thoracic clinical dataset showed that cPDD can 1) discover fewer patterns compared to other existing pattern discovery methods; 2) allow the users to interpret succinct sets of patterns coming from uncorrelated sources, even the groups are rare/small; and 3) obtain better performance in prediction compared to other interpretable classification approaches.Conclusions In conclusion, cPDD discovers fewer patterns with greater comprehensive coverage to improve the interpretability of patterns discovered. Experimental results on synthetic data validated that cPDD discover all patterns implanted in the data, display them precisely and succinctly with statistical support for interpretation and prediction, a capability which the traditional ML methods lack. The success of cPDD as a novel explainable method in solving the imbalanced class problem shows its great potential to clinical data analysis for years to come.

2020 ◽  
Author(s):  
Peiyuan Zhou ◽  
Andrew K.C. Wong

Abstract Background: Statistical data analysis, especially the advanced machine learning (ML) methods, have attracted considerable interest in clinical practices. We are looking for interpretability of the diagnostic/prognostic results that will bring confidence to doctors, patients and their relatives in therapeutics and clinical practice. When datasets are imbalanced in diagnostic categories, we notice that the ordinary ML methods might produce results overwhelmed by the majority classes diminishing prediction accuracy. Hence, we need methods that could produce explicit transparent and interpretable results in decision-making, without sacrificing accuracy, even for data with imbalanced groups. Methods: In order to interpret the clinical patterns and conduct diagnostic prediction of patients with high accuracy, we develop a novel method, Pattern Discovery and Disentanglement for Clinical Data Analysis (cPDD), which is able to discover patterns (correlated traits/indicants) and use them to classify clinical data even if the class distribution is imbalanced. In the most general setting, a relational dataset is a large table such that each column represents an attribute (trait/indicant), and each row contains a set of attribute values (AVs) of an entity (patient). Compared to the existing pattern discovery approaches, cPDD can discover a small succinct set of statistically significant high-order patterns from clinical data for interpreting and predicting the disease class of the patients even with groups small and rare.Results: Experiments on synthetic and thoracic clinical dataset showed that cPDD can 1) discover fewer patterns compared to other existing pattern discovery methods; 2) allow the users to interpret succinct sets of patterns coming from uncorrelated sources, even the groups are rare/small; and 3) obtain better performance in prediction compared to other interpretable classification approaches. Conclusions: In conclusion, cPDD discovers fewer patterns with greater comprehensive coverage to improve the interpretability of patterns discovered. Experimental results on synthetic data validated that cPDD discovers all patterns implanted in the data, displays them precisely and succinctly with statistical support for interpretation and prediction, a capability which the traditional ML methods lack. The success of cPDD as a novel interpretable method in solving the imbalanced class problem shows its great potential to clinical data analysis for years to come.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Pei-Yuan Zhou ◽  
Andrew K. C. Wong

Abstract Background Statistical data analysis, especially the advanced machine learning (ML) methods, have attracted considerable interest in clinical practices. We are looking for interpretability of the diagnostic/prognostic results that will bring confidence to doctors, patients and their relatives in therapeutics and clinical practice. When datasets are imbalanced in diagnostic categories, we notice that the ordinary ML methods might produce results overwhelmed by the majority classes diminishing prediction accuracy. Hence, it needs methods that could produce explicit transparent and interpretable results in decision-making, without sacrificing accuracy, even for data with imbalanced groups. Methods In order to interpret the clinical patterns and conduct diagnostic prediction of patients with high accuracy, we develop a novel method, Pattern Discovery and Disentanglement for Clinical Data Analysis (cPDD), which is able to discover patterns (correlated traits/indicants) and use them to classify clinical data even if the class distribution is imbalanced. In the most general setting, a relational dataset is a large table such that each column represents an attribute (trait/indicant), and each row contains a set of attribute values (AVs) of an entity (patient). Compared to the existing pattern discovery approaches, cPDD can discover a small succinct set of statistically significant high-order patterns from clinical data for interpreting and predicting the disease class of the patients even with groups small and rare. Results Experiments on synthetic and thoracic clinical dataset showed that cPDD can 1) discover a smaller set of succinct significant patterns compared to other existing pattern discovery methods; 2) allow the users to interpret succinct sets of patterns coming from uncorrelated sources, even the groups are rare/small; and 3) obtain better performance in prediction compared to other interpretable classification approaches. Conclusions In conclusion, cPDD discovers fewer patterns with greater comprehensive coverage to improve the interpretability of patterns discovered. Experimental results on synthetic data validated that cPDD discovers all patterns implanted in the data, displays them precisely and succinctly with statistical support for interpretation and prediction, a capability which the traditional ML methods lack. The success of cPDD as a novel interpretable method in solving the imbalanced class problem shows its great potential to clinical data analysis for years to come.


2020 ◽  
Author(s):  
Peiyuan Zhou ◽  
Andrew K.C. Wong

Abstract Background: Statistical data analysis, especially the advanced machine learning (ML) methods, have attracted considerable interest in clinical practices. We are looking for interpretability of the diagnostic/prognostic results that will bring confidence to doctors, patients and their relatives in therapeutics and clinical practice. When datasets are imbalanced in diagnostic categories, we notice that the ordinary ML methods might produce results overwhelmed by the majority classes diminishing prediction accuracy. Hence, we need methods that could produce explicit transparent and interpretable results in decision-making, without sacrificing accuracy, even for data with imbalanced groups. Methods: In order to interpret the clinical patterns and conduct diagnostic prediction of patients with high accuracy, we develop a novel method, Pattern Discovery and Disentanglement for Clinical Data Analysis (cPDD), which is able to discover patterns (correlated traits/indicants) and use them to classify clinical data even if the class distribution is imbalanced. In the most general setting, a relational dataset is a large table such that each column represents an attribute (trait/indicant), and each row contains a set of attribute values (AVs) of an entity (patient). Compared to the existing pattern discovery approaches, cPDD can discover a small succinct set of statistically significant high-order patterns from clinical data for interpreting and predicting the disease class of the patients even with groups small and rare.Results: Experiments on synthetic and thoracic clinical dataset showed that cPDD can 1) discover fewer patterns compared to other existing pattern discovery methods; 2) allow the users to interpret succinct sets of patterns coming from uncorrelated sources, even the groups are rare/small; and 3) obtain better performance in prediction compared to other interpretable classification approaches. Conclusions: In conclusion, cPDD discovers fewer patterns with greater comprehensive coverage to improve the interpretability of patterns discovered. Experimental results on synthetic data validated that cPDD discovers all patterns implanted in the data, displays them precisely and succinctly with statistical support for interpretation and prediction, a capability which the traditional ML methods lack. The success of cPDD as a novel interpretable method in solving the imbalanced class problem shows its great potential to clinical data analysis for years to come.


2020 ◽  
Author(s):  
Peiyuan Zhou ◽  
Andrew K.C. Wong

Abstract Background: Statistical data analysis, especially the advanced machine learning (ML) methods, have attracted considerable interest in clinical practices. We are looking for interpretability of the diagnostic/prognostic results that will bring confidence to doctors, patients and their relatives in therapeutics and clinical practice. When datasets are imbalanced in diagnostic categories, we notice that the ordinary ML methods might produce results overwhelmed by the majority classes diminishing prediction accuracy. Hence, it needs methods that could produce explicit transparent and interpretable results in decision-making, without sacrificing accuracy, even for data with imbalanced groups. Methods: In order to interpret the clinical patterns and conduct diagnostic prediction of patients with high accuracy, we develop a novel method, Pattern Discovery and Disentanglement for Clinical Data Analysis (cPDD), which is able to discover patterns (correlated traits/indicants) and use them to classify clinical data even if the class distribution is imbalanced. In the most general setting, a relational dataset is a large table such that each column represents an attribute (trait/indicant), and each row contains a set of attribute values (AVs) of an entity (patient). Compared to the existing pattern discovery approaches, cPDD can discover a small succinct set of statistically significant high-order patterns from clinical data for interpreting and predicting the disease class of the patients even with groups small and rare.Results: Experiments on synthetic and thoracic clinical dataset showed that cPDD can 1) discover a smaller set of succinct significant patterns compared to other existing pattern discovery methods; 2) allow the users to interpret succinct sets of patterns coming from uncorrelated sources, even the groups are rare/small; and 3) obtain better performance in prediction compared to other interpretable classification approaches. Conclusions: In conclusion, cPDD discovers fewer patterns with greater comprehensive coverage to improve the interpretability of patterns discovered. Experimental results on synthetic data validated that cPDD discovers all patterns implanted in the data, displays them precisely and succinctly with statistical support for interpretation and prediction, a capability which the traditional ML methods lack. The success of cPDD as a novel interpretable method in solving the imbalanced class problem shows its great potential to clinical data analysis for years to come.


1993 ◽  
Vol 32 (05) ◽  
pp. 365-372 ◽  
Author(s):  
T. Timmeis ◽  
J. H. van Bemmel ◽  
E. M. van Mulligen

AbstractResults are presented of the user evaluation of an integrated medical workstation for support of clinical research. Twenty-seven users were recruited from medical and scientific staff of the University Hospital Dijkzigt, the Faculty of Medicine of the Erasmus University Rotterdam, and from other Dutch medical institutions; and all were given a written, self-contained tutorial. Subsequently, an experiment was done in which six clinical data analysis problems had to be solved and an evaluation form was filled out. The aim of this user evaluation was to obtain insight in the benefits of integration for support of clinical data analysis for clinicians and biomedical researchers. The problems were divided into two sets, with gradually more complex problems. In the first set users were guided in a stepwise fashion to solve the problems. In the second set each stepwise problem had an open counterpart. During the evaluation, the workstation continuously recorded the user’s actions. From these results significant differences became apparent between clinicians and non-clinicians for the correctness (means 54% and 81%, respectively, p = 0.04), completeness (means 64% and 88%, respectively, p = 0.01), and number of problems solved (means 67% and 90%, respectively, p = 0.02). These differences were absent for the stepwise problems. Physicians tend to skip more problems than biomedical researchers. No statistically significant differences were found between users with and without clinical data analysis experience, for correctness (means 74% and 72%, respectively, p = 0.95), and completeness (means 82% and 79%, respectively, p = 0.40). It appeared that various clinical research problems can be solved easily with support of the workstation; the results of this experiment can be used as guidance for the development of the successor of this prototype workstation and serve as a reference for the assessment of next versions.


2020 ◽  
Vol 3 (1) ◽  
pp. 55-61 ◽  
Author(s):  
Wenlong Zhang ◽  
Yong Han ◽  
Weisha Li ◽  
Lin Cao ◽  
Libo Yan ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document