Evaluation of machine learning applications using real-world EHR data for predicting diabetes-related long-term complications

Author(s):  
Abu Saleh Mohammad Mosa ◽  
Chalermpon Thongmotai ◽  
Humayera Islam ◽  
Tanmoy Paul ◽  
K. S. M. Tozammel Hossain ◽  
...  
2019 ◽  
Vol 73 (12) ◽  
pp. 1012-1017
Author(s):  
Andrea M. Burden

Pharmacoepidemiology is the study of the safety and effectiveness of medications following market approval. The increased availability and size of healthcare utilization databases allows for the study of rare adverse events, sub-group analyses, and long-term follow-up. These datasets are large, including thousands of patient records spanning multiple years of observation, and representative of real-world clinical practice. Thus, one of the main advantages is the possibility to study the real-world safety and effectiveness of medications in uncontrolled environments. Due to the large size (volume), structure (variety), and availability (velocity) of observational healthcare databases there is a large interest in the application of natural language processing and machine learning, including the development of novel models to detect drug–drug interactions, patient phenotypes, and outcome prediction. This report will provide an overview of the current challenges in pharmacoepidemiology and where machine learning applications may be useful for filling the gap.


2019 ◽  
Vol 2019 (1) ◽  
pp. 26-46 ◽  
Author(s):  
Thee Chanyaswad ◽  
Changchang Liu ◽  
Prateek Mittal

Abstract A key challenge facing the design of differential privacy in the non-interactive setting is to maintain the utility of the released data. To overcome this challenge, we utilize the Diaconis-Freedman-Meckes (DFM) effect, which states that most projections of high-dimensional data are nearly Gaussian. Hence, we propose the RON-Gauss model that leverages the novel combination of dimensionality reduction via random orthonormal (RON) projection and the Gaussian generative model for synthesizing differentially-private data. We analyze how RON-Gauss benefits from the DFM effect, and present multiple algorithms for a range of machine learning applications, including both unsupervised and supervised learning. Furthermore, we rigorously prove that (a) our algorithms satisfy the strong ɛ-differential privacy guarantee, and (b) RON projection can lower the level of perturbation required for differential privacy. Finally, we illustrate the effectiveness of RON-Gauss under three common machine learning applications – clustering, classification, and regression – on three large real-world datasets. Our empirical results show that (a) RON-Gauss outperforms previous approaches by up to an order of magnitude, and (b) loss in utility compared to the non-private real data is small. Thus, RON-Gauss can serve as a key enabler for real-world deployment of privacy-preserving data release.


Author(s):  
Xueru Zhang ◽  
Mohammad Mahdi Khalili ◽  
Mingyan Liu

Machine learning models developed from real-world data can inherit potential, preexisting bias in the dataset. When these models are used to inform decisions involving human beings, fairness concerns inevitably arise. Imposing certain fairness constraints in the training of models can be effective only if appropriate criteria are applied. However, a fairness criterion can be defined/assessed only when the interaction between the decisions and the underlying population is well understood. We introduce two feedback models describing how people react when receiving machine-aided decisions and illustrate that some commonly used fairness criteria can end with undesirable consequences while reinforcing discrimination.


Author(s):  
D.P Mandic ◽  
M Chen ◽  
T Gautama ◽  
M.M Van Hulle ◽  
A Constantinides

The need for the characterization of real-world signals in terms of their linear, nonlinear, deterministic and stochastic nature is highlighted and a novel framework for signal modality characterization is presented. A comprehensive analysis of signal nonlinearity characterization methods is provided, and based upon local predictability in phase space, a new criterion for qualitative performance assessment in machine learning is introduced. This is achieved based on a simultaneous assessment of nonlinearity and uncertainty within a real-world signal. Next, for a given embedding dimension, based on the target variance of delay vectors, a novel framework for heterogeneous data fusion is introduced. The proposed signal modality characterization framework is verified by comprehensive simulations and comparison against other established methods. Case studies covering a range of machine learning applications support the analysis.


Information ◽  
2020 ◽  
Vol 11 (7) ◽  
pp. 363
Author(s):  
Ioannis Karamitsos ◽  
Saeed Albarhami ◽  
Charalampos Apostolopoulos

This paper proposes DevOps practices for machine learning application, integrating both the development and operation environment seamlessly. The machine learning processes of development and deployment during the experimentation phase may seem easy. However, if not carefully designed, deploying and using such models may lead to a complex, time-consuming approaches which may require significant and costly efforts for maintenance, improvement, and monitoring. This paper presents how to apply continuous integration (CI) and continuous delivery (CD) principles, practices, and tools so as to minimize waste, support rapid feedback loops, explore the hidden technical debt, improve value delivery and maintenance, and improve operational functions for real-world machine learning applications.


2020 ◽  
Vol 10 (21) ◽  
pp. 7759
Author(s):  
Zhenlong Zhu ◽  
Yilong Liang

In recent years, the number of machine learning applications (especially those involving deep learning) applied to predicting and discovering material properties has been increasing. This paper is based on using microstructure and carbon content to train machine learning models to predict the residual stress of carburized steel. First, a semantic segmentation model of the material organization structure (SegModel-MOS) was constructed based on the AlexNet network and initially trained on the PASCAL VOC2012 dataset. Then, the trained model was fine-tuned on an enhanced homemade dataset consisting of optical microstructures. The experimental results show that SegModel-MOS can distinguish acicular martensite, retained austenite, and lath martensite in microstructures. Finally, we used both support vector machine (SVM) and decision tree (DT) algorithms to establish a mapping relationship between the microstructure, carbon content, and residual stress to predict the residual stress of steel from its microstructure and carbon content. The experiments verified that the prediction model constructed in this study exhibits high accuracy and can directly predict residual stress without requiring any long-term measurements. Thus, the developed model provides a new approach to the study of residual stress in steel.


Author(s):  
Zhiyun Lu ◽  
Liyu Chen ◽  
Chao-Kai Chiang ◽  
Fei Sha

Hyper-parameter tuning is of crucial importance for real-world machine learning applications. While existing works mainly focus on speeding up the tuning process, we propose to study the problem of hyper-parameter tuning under a budget constraint, which is a more realistic scenario in developing large-scale systems. We formulate the task into a sequential decision making problem and propose a solution, which uses a Bayesian belief model to predict future performances, and an action-value function to plan and select the next configuration to run. With long term prediction and planning capability, our method is able to early stop unpromising configurations, and adapt the tuning behaviors to different constraints.  Experiment results show that our method outperforms existing algorithms, including the-state-of-the-art one, on real-world tuning tasks across a range of different budgets.


Sign in / Sign up

Export Citation Format

Share Document