scholarly journals Gaussian noise up-sampling is better suited than SMOTE and ADASYN for clinical decision making

2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Jacqueline Beinecke ◽  
Dominik Heider

AbstractClinical data sets have very special properties and suffer from many caveats in machine learning. They typically show a high-class imbalance, have a small number of samples and a large number of parameters, and have missing values. While feature selection approaches and imputation techniques address the former problems, the class imbalance is typically addressed using augmentation techniques. However, these techniques have been developed for big data analytics, and their suitability for clinical data sets is unclear.This study analyzed different augmentation techniques for use in clinical data sets and subsequent employment of machine learning-based classification. It turns out that Gaussian Noise Up-Sampling (GNUS) is not always but generally, is as good as SMOTE and ADASYN and even outperform those on some datasets. However, it has also been shown that augmentation does not improve classification at all in some cases.

2021 ◽  
Author(s):  
Daria Kurz ◽  
Carlos Salort S&aacutenchez ◽  
Cristian Axenie

For decades, researchers have used the concepts of rate of change and differential equations to model and forecast neoplastic processes. This expressive mathematical apparatus brought significant insights in oncology by describing the unregulated proliferation and host interactions of cancer cells, as well as their response to treatments. Now, these theories have been given a new life and found new applications. With the advent of routine cancer genome sequencing and the resulting abundance of data, oncology now builds an "arsenal" of new modeling and analysis tools. Models describing the governing physical laws of tumor-host-drug interactions can be now challenged with biological data to make predictions about cancer progression. Our study joins the efforts of the mathematical and computational oncology community by introducing a novel machine learning system for data-driven discovery of mathematical and physical relations in oncology. The system utilizes computational mechanisms such as competition, cooperation, and adaptation in neural networks to simultaneously learn the statistics and the governing relations between multiple clinical data covariates. Targeting an easy adoption in clinical oncology, the solutions of our system reveal human-understandable properties and features hidden in the data. As our experiments demonstrate, our system can describe nonlinear conservation laws in cancer kinetics and growth curves, symmetries in tumor's phenotypic staging transitions, the pre-operative spatial tumor distribution, and up to the nonlinear intracellular and extracellular pharmacokinetics of neoadjuvant therapies. The primary goal of our work is to enhance or improve the mechanistic understanding of cancer dynamics by exploiting heterogeneous clinical data. We demonstrate through multiple instantiations that our system is extracting an accurate human-understandable representation of the underlying dynamics of physical interactions central to typical oncology problems. Our results and evaluation demonstrate that using simple - yet powerful - computational mechanisms, such a machine learning system can support clinical decision making. To this end, our system is a representative tool of the field of mathematical and computational oncology and offers a bridge between the data, the modeler, the data scientist, and the practising clinician.


2021 ◽  
Vol 4 ◽  
Author(s):  
Daria Kurz ◽  
Carlos Salort Sánchez ◽  
Cristian Axenie

For decades, researchers have used the concepts of rate of change and differential equations to model and forecast neoplastic processes. This expressive mathematical apparatus brought significant insights in oncology by describing the unregulated proliferation and host interactions of cancer cells, as well as their response to treatments. Now, these theories have been given a new life and found new applications. With the advent of routine cancer genome sequencing and the resulting abundance of data, oncology now builds an “arsenal” of new modeling and analysis tools. Models describing the governing physical laws of tumor–host–drug interactions can be now challenged with biological data to make predictions about cancer progression. Our study joins the efforts of the mathematical and computational oncology community by introducing a novel machine learning system for data-driven discovery of mathematical and physical relations in oncology. The system utilizes computational mechanisms such as competition, cooperation, and adaptation in neural networks to simultaneously learn the statistics and the governing relations between multiple clinical data covariates. Targeting an easy adoption in clinical oncology, the solutions of our system reveal human-understandable properties and features hidden in the data. As our experiments demonstrate, our system can describe nonlinear conservation laws in cancer kinetics and growth curves, symmetries in tumor’s phenotypic staging transitions, the preoperative spatial tumor distribution, and up to the nonlinear intracellular and extracellular pharmacokinetics of neoadjuvant therapies. The primary goal of our work is to enhance or improve the mechanistic understanding of cancer dynamics by exploiting heterogeneous clinical data. We demonstrate through multiple instantiations that our system is extracting an accurate human-understandable representation of the underlying dynamics of physical interactions central to typical oncology problems. Our results and evaluation demonstrate that, using simple—yet powerful—computational mechanisms, such a machine learning system can support clinical decision-making. To this end, our system is a representative tool of the field of mathematical and computational oncology and offers a bridge between the data, the modeler, the data scientist, and the practicing clinician.


2022 ◽  
Vol 22 (1) ◽  
Author(s):  
Huimin Wang ◽  
Jianxiang Tang ◽  
Mengyao Wu ◽  
Xiaoyu Wang ◽  
Tao Zhang

Abstract Background There are often many missing values in medical data, which directly affect the accuracy of clinical decision making. Discharge assessment is an important part of clinical decision making. Taking the discharge assessment of patients with spontaneous supratentorial intracerebral hemorrhage as an example, this study adopted the missing data processing evaluation criteria more suitable for clinical decision making, aiming at systematically exploring the performance and applicability of single machine learning algorithms and ensemble learning (EL) under different data missing scenarios, as well as whether they had more advantages than traditional methods, so as to provide basis and reference for the selection of suitable missing data processing method in practical clinical decision making. Methods The whole process consisted of four main steps: (1) Based on the original complete data set, missing data was generated by simulation under different missing scenarios (missing mechanisms, missing proportions and ratios of missing proportions of each group). (2) Machine learning and traditional methods (eight methods in total) were applied to impute missing values. (3) The performances of imputation techniques were evaluated and compared by estimating the sensitivity, AUC and Kappa values of prediction models. (4) Statistical tests were used to evaluate whether the observed performance differences were statistically significant. Results The performances of missing data processing methods were different to a certain extent in different missing scenarios. On the whole, machine learning had better imputation performance than traditional methods, especially in scenarios with high missing proportions. Compared with single machine learning algorithms, the performance of EL was more prominent, followed by neural networks. Meanwhile, EL was most suitable for missing imputation under MAR (the ratio of missing proportion 2:1) mechanism, and its average sensitivity, AUC and Kappa values reached 0.908, 0.924 and 0.596 respectively. Conclusions In clinical decision making, the characteristics of missing data should be actively explored before formulating missing data processing strategies. The outstanding imputation performance of machine learning methods, especially EL, shed light on the development of missing data processing technology, and provided methodological support for clinical decision making in presence of incomplete data.


Med ◽  
2021 ◽  
Author(s):  
Lorenz Adlung ◽  
Yotam Cohen ◽  
Uria Mor ◽  
Eran Elinav

2021 ◽  
Vol 10 (4) ◽  
pp. 766 ◽  
Author(s):  
Ljiljana Trtica Majnarić ◽  
František Babič ◽  
Shane O’Sullivan ◽  
Andreas Holzinger

Multimorbidity refers to the coexistence of two or more chronic diseases in one person. Therefore, patients with multimorbidity have multiple and special care needs. However, in practice it is difficult to meet these needs because the organizational processes of current healthcare systems tend to be tailored to a single disease. To improve clinical decision making and patient care in multimorbidity, a radical change in the problem-solving approach to medical research and treatment is needed. In addition to the traditional reductionist approach, we propose interactive research supported by artificial intelligence (AI) and advanced big data analytics. Such research approach, when applied to data routinely collected in healthcare settings, provides an integrated platform for research tasks related to multimorbidity. This may include, for example, prediction, correlation, and classification problems based on multiple interaction factors. However, to realize the idea of this paradigm shift in multimorbidity research, the optimization, standardization, and most importantly, the integration of electronic health data into a common national and international research infrastructure is needed. Ultimately, there is a need for the integration and implementation of efficient AI approaches, particularly deep learning, into clinical routine directly within the workflows of the medical professionals.


Author(s):  
E. Amiri Souri ◽  
A. Chenoweth ◽  
A. Cheung ◽  
S. N. Karagiannis ◽  
S. Tsoka

Abstract Background Prognostic stratification of breast cancers remains a challenge to improve clinical decision making. We employ machine learning on breast cancer transcriptomics from multiple studies to link the expression of specific genes to histological grade and classify tumours into a more or less aggressive prognostic type. Materials and methods Microarray data of 5031 untreated breast tumours spanning 33 published datasets and corresponding clinical data were integrated. A machine learning model based on gradient boosted trees was trained on histological grade-1 and grade-3 samples. The resulting predictive model (Cancer Grade Model, CGM) was applied on samples of grade-2 and unknown-grade (3029) for prognostic risk classification. Results A 70-gene signature for assessing clinical risk was identified and was shown to be 90% accurate when tested on known histological-grade samples. The predictive framework was validated through survival analysis and showed robust prognostic performance. CGM was cross-referenced with existing genomic tests and demonstrated the competitive predictive power of tumour risk. Conclusions CGM is able to classify tumours into better-defined prognostic categories without employing information on tumour size, stage, or subgroups. The model offers means to improve prognosis and support the clinical decision and precision treatments, thereby potentially contributing to preventing underdiagnosis of high-risk tumours and minimising over-treatment of low-risk disease.


2021 ◽  
Vol 28 (1) ◽  
pp. e100251
Author(s):  
Ian Scott ◽  
Stacey Carter ◽  
Enrico Coiera

Machine learning algorithms are being used to screen and diagnose disease, prognosticate and predict therapeutic responses. Hundreds of new algorithms are being developed, but whether they improve clinical decision making and patient outcomes remains uncertain. If clinicians are to use algorithms, they need to be reassured that key issues relating to their validity, utility, feasibility, safety and ethical use have been addressed. We propose a checklist of 10 questions that clinicians can ask of those advocating for the use of a particular algorithm, but which do not expect clinicians, as non-experts, to demonstrate mastery over what can be highly complex statistical and computational concepts. The questions are: (1) What is the purpose and context of the algorithm? (2) How good were the data used to train the algorithm? (3) Were there sufficient data to train the algorithm? (4) How well does the algorithm perform? (5) Is the algorithm transferable to new clinical settings? (6) Are the outputs of the algorithm clinically intelligible? (7) How will this algorithm fit into and complement current workflows? (8) Has use of the algorithm been shown to improve patient care and outcomes? (9) Could the algorithm cause patient harm? and (10) Does use of the algorithm raise ethical, legal or social concerns? We provide examples where an algorithm may raise concerns and apply the checklist to a recent review of diagnostic imaging applications. This checklist aims to assist clinicians in assessing algorithm readiness for routine care and identify situations where further refinement and evaluation is required prior to large-scale use.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Imogen Schofield ◽  
David C. Brodbelt ◽  
Noel Kennedy ◽  
Stijn J. M. Niessen ◽  
David B. Church ◽  
...  

AbstractCushing’s syndrome is an endocrine disease in dogs that negatively impacts upon the quality-of-life of affected animals. Cushing’s syndrome can be a challenging diagnosis to confirm, therefore new methods to aid diagnosis are warranted. Four machine-learning algorithms were applied to predict a future diagnosis of Cushing's syndrome, using structured clinical data from the VetCompass programme in the UK. Dogs suspected of having Cushing's syndrome were included in the analysis and classified based on their final reported diagnosis within their clinical records. Demographic and clinical features available at the point of first suspicion by the attending veterinarian were included within the models. The machine-learning methods were able to classify the recorded Cushing’s syndrome diagnoses, with good predictive performance. The LASSO penalised regression model indicated the best overall performance when applied to the test set with an AUROC = 0.85 (95% CI 0.80–0.89), sensitivity = 0.71, specificity = 0.82, PPV = 0.75 and NPV = 0.78. The findings of our study indicate that machine-learning methods could predict the future diagnosis of a practicing veterinarian. New approaches using these methods could support clinical decision-making and contribute to improved diagnosis of Cushing’s syndrome in dogs.


Author(s):  
Gebeyehu Belay Gebremeskel ◽  
Chai Yi ◽  
Zhongshi He ◽  
Dawit Haile

Purpose – Among the growing number of data mining (DM) techniques, outlier detection has gained importance in many applications and also attracted much attention in recent times. In the past, outlier detection researched papers appeared in a safety care that can view as searching for the needles in the haystack. However, outliers are not always erroneous. Therefore, the purpose of this paper is to investigate the role of outliers in healthcare services in general and patient safety care, in particular. Design/methodology/approach – It is a combined DM (clustering and the nearest neighbor) technique for outliers’ detection, which provides a clear understanding and meaningful insights to visualize the data behaviors for healthcare safety. The outcomes or the knowledge implicit is vitally essential to a proper clinical decision-making process. The method is important to the semantic, and the novel tactic of patients’ events and situations prove that play a significant role in the process of patient care safety and medications. Findings – The outcomes of the paper is discussing a novel and integrated methodology, which can be inferring for different biological data analysis. It is discussed as integrated DM techniques to optimize its performance in the field of health and medical science. It is an integrated method of outliers detection that can be extending for searching valuable information and knowledge implicit based on selected patient factors. Based on these facts, outliers are detected as clusters and point events, and novel ideas proposed to empower clinical services in consideration of customers’ satisfactions. It is also essential to be a baseline for further healthcare strategic development and research works. Research limitations/implications – This paper mainly focussed on outliers detections. Outlier isolation that are essential to investigate the reason how it happened and communications how to mitigate it did not touch. Therefore, the research can be extended more about the hierarchy of patient problems. Originality/value – DM is a dynamic and successful gateway for discovering useful knowledge for enhancing healthcare performances and patient safety. Clinical data based outlier detection is a basic task to achieve healthcare strategy. Therefore, in this paper, the authors focussed on combined DM techniques for a deep analysis of clinical data, which provide an optimal level of clinical decision-making processes. Proper clinical decisions can obtain in terms of attributes selections that important to know the influential factors or parameters of healthcare services. Therefore, using integrated clustering and nearest neighbors techniques give more acceptable searched such complex data outliers, which could be fundamental to further analysis of healthcare and patient safety situational analysis.


Sign in / Sign up

Export Citation Format

Share Document