Evaluation of machine learning applications using real-world EHR data for predicting diabetes-related long-term complications

Pharmacoepidemiology is the study of the safety and effectiveness of medications following market approval. The increased availability and size of healthcare utilization databases allows for the study of rare adverse events, sub-group analyses, and long-term follow-up. These datasets are large, including thousands of patient records spanning multiple years of observation, and representative of real-world clinical practice. Thus, one of the main advantages is the possibility to study the real-world safety and effectiveness of medications in uncontrolled environments. Due to the large size (volume), structure (variety), and availability (velocity) of observational healthcare databases there is a large interest in the application of natural language processing and machine learning, including the development of novel models to detect drug–drug interactions, patient phenotypes, and outcome prediction. This report will provide an overview of the current challenges in pharmacoepidemiology and where machine learning applications may be useful for filling the gap.

RON-Gauss: Enhancing Utility in Non-Interactive Private Data Release

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2019-0003 ◽

2019 ◽

Vol 2019 (1) ◽

pp. 26-46 ◽

Cited By ~ 2

Author(s):

Thee Chanyaswad ◽

Changchang Liu ◽

Prateek Mittal

Keyword(s):

Machine Learning ◽

Real World ◽

Differential Privacy ◽

Real Data ◽

The Novel ◽

Private Data ◽

Data Release ◽

Order Of Magnitude ◽

Real World Datasets

Abstract A key challenge facing the design of differential privacy in the non-interactive setting is to maintain the utility of the released data. To overcome this challenge, we utilize the Diaconis-Freedman-Meckes (DFM) effect, which states that most projections of high-dimensional data are nearly Gaussian. Hence, we propose the RON-Gauss model that leverages the novel combination of dimensionality reduction via random orthonormal (RON) projection and the Gaussian generative model for synthesizing differentially-private data. We analyze how RON-Gauss benefits from the DFM effect, and present multiple algorithms for a range of machine learning applications, including both unsupervised and supervised learning. Furthermore, we rigorously prove that (a) our algorithms satisfy the strong ɛ-differential privacy guarantee, and (b) RON projection can lower the level of perturbation required for differential privacy. Finally, we illustrate the effectiveness of RON-Gauss under three common machine learning applications – clustering, classification, and regression – on three large real-world datasets. Our empirical results show that (a) RON-Gauss outperforms previous approaches by up to an order of magnitude, and (b) loss in utility compared to the non-private real data is small. Thus, RON-Gauss can serve as a key enabler for real-world deployment of privacy-preserving data release.

Tutorial on Software Testing & Quality Assurance for Machine Learning Applications from research bench to real world

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD ◽

10.1145/3371158.3371233 ◽

2020 ◽

Author(s):

Sandya Mannarswamy ◽

Shourya Roy ◽

Saravanan Chidambaram

Keyword(s):

Machine Learning ◽

Quality Assurance ◽

Software Testing ◽

Real World ◽

Long-Term Impacts of Fair Machine Learning

Ergonomics in Design The Quarterly of Human Factors Applications ◽

10.1177/1064804619884160 ◽

2019 ◽

Vol 28 (3) ◽

pp. 7-11

Author(s):

Xueru Zhang ◽

Mohammad Mahdi Khalili ◽

Mingyan Liu

Keyword(s):

Machine Learning ◽

Real World ◽

Human Beings ◽

Learning Models ◽

Real World Data ◽

World Data ◽

Fairness Concerns ◽

Fairness Constraints ◽

Machine Learning Models

Machine learning models developed from real-world data can inherit potential, preexisting bias in the dataset. When these models are used to inform decisions involving human beings, fairness concerns inevitably arise. Imposing certain fairness constraints in the training of models can be effective only if appropriate criteria are applied. However, a fairness criterion can be defined/assessed only when the interaction between the decisions and the underlying population is well understood. We introduce two feedback models describing how people react when receiving machine-aided decisions and illustrate that some commonly used fairness criteria can end with undesirable consequences while reinforcing discrimination.

Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences ◽

On the characterization of the deterministic/stochastic and linear/nonlinear nature of time series

10.1098/rspa.2007.0154 ◽

2008 ◽

Vol 464 (2093) ◽

pp. 1141-1160 ◽

Cited By ~ 35

Author(s):

D.P Mandic ◽

M Chen ◽

T Gautama ◽

M.M Van Hulle ◽

A Constantinides

Keyword(s):

Machine Learning ◽

Real World ◽

Heterogeneous Data ◽

Characterization Methods ◽

Nature Of Time ◽

Qualitative Performance ◽

New Criterion ◽

Simultaneous Assessment

The need for the characterization of real-world signals in terms of their linear, nonlinear, deterministic and stochastic nature is highlighted and a novel framework for signal modality characterization is presented. A comprehensive analysis of signal nonlinearity characterization methods is provided, and based upon local predictability in phase space, a new criterion for qualitative performance assessment in machine learning is introduced. This is achieved based on a simultaneous assessment of nonlinearity and uncertainty within a real-world signal. Next, for a given embedding dimension, based on the target variance of delay vectors, a novel framework for heterogeneous data fusion is introduced. The proposed signal modality characterization framework is verified by comprehensive simulations and comparison against other established methods. Case studies covering a range of machine learning applications support the analysis.

Machine Learning Applications in Real World

International Journal of Hybrid Information Technology ◽

10.21742/ijhit.2019.12.1.04 ◽

2019 ◽

Vol 12 (1) ◽

Keyword(s):

Machine Learning ◽

Real World ◽

Applying DevOps Practices of Continuous Automation for Machine Learning

Information ◽

10.3390/info11070363 ◽

2020 ◽

Vol 11 (7) ◽

pp. 363

Author(s):

Ioannis Karamitsos ◽

Saeed Albarhami ◽

Charalampos Apostolopoulos

Keyword(s):

Machine Learning ◽

Real World ◽

Feedback Loops ◽

Learning Processes ◽

Technical Debt ◽

Complex Time ◽

Continuous Integration ◽

Continuous Delivery ◽

Operation Environment

This paper proposes DevOps practices for machine learning application, integrating both the development and operation environment seamlessly. The machine learning processes of development and deployment during the experimentation phase may seem easy. However, if not carefully designed, deploying and using such models may lead to a complex, time-consuming approaches which may require significant and costly efforts for maintenance, improvement, and monitoring. This paper presents how to apply continuous integration (CI) and continuous delivery (CD) principles, practices, and tools so as to minimize waste, support rapid feedback loops, explore the hidden technical debt, improve value delivery and maintenance, and improve operational functions for real-world machine learning applications.

Prediction of Residual Stress of Carburized Steel Based on Machine Learning

Applied Sciences ◽

10.3390/app10217759 ◽

2020 ◽

Vol 10 (21) ◽

pp. 7759

Author(s):

Zhenlong Zhu ◽

Yilong Liang

Keyword(s):

Machine Learning ◽

Residual Stress ◽

Carbon Content ◽

Lath Martensite ◽

Semantic Segmentation ◽

Support Vector ◽

New Approach ◽

Carburized Steel ◽

In recent years, the number of machine learning applications (especially those involving deep learning) applied to predicting and discovering material properties has been increasing. This paper is based on using microstructure and carbon content to train machine learning models to predict the residual stress of carburized steel. First, a semantic segmentation model of the material organization structure (SegModel-MOS) was constructed based on the AlexNet network and initially trained on the PASCAL VOC2012 dataset. Then, the trained model was fine-tuned on an enhanced homemade dataset consisting of optical microstructures. The experimental results show that SegModel-MOS can distinguish acicular martensite, retained austenite, and lath martensite in microstructures. Finally, we used both support vector machine (SVM) and decision tree (DT) algorithms to establish a mapping relationship between the microstructure, carbon content, and residual stress to predict the residual stress of steel from its microstructure and carbon content. The experiments verified that the prediction model constructed in this study exhibits high accuracy and can directly predict residual stress without requiring any long-term measurements. Thus, the developed model provides a new approach to the study of residual stress in steel.

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

Hyper-parameter Tuning under a Budget Constraint

10.24963/ijcai.2019/796 ◽

2019 ◽

Author(s):

Zhiyun Lu ◽

Liyu Chen ◽

Chao-Kai Chiang ◽

Fei Sha

Keyword(s):

Real World ◽

Large Scale ◽

Parameter Tuning ◽

Budget Constraint ◽

Sequential Decision ◽

Decision Making Problem ◽

Belief Model ◽

Long Term Prediction

Hyper-parameter tuning is of crucial importance for real-world machine learning applications. While existing works mainly focus on speeding up the tuning process, we propose to study the problem of hyper-parameter tuning under a budget constraint, which is a more realistic scenario in developing large-scale systems. We formulate the task into a sequential decision making problem and propose a solution, which uses a Bayesian belief model to predict future performances, and an action-value function to plan and select the next configuration to run. With long term prediction and planning capability, our method is able to early stop unpromising configurations, and adapt the tuning behaviors to different constraints. Experiment results show that our method outperforms existing algorithms, including the-state-of-the-art one, on real-world tuning tasks across a range of different budgets.

Abstract #806387: Real-World Long-Term Safety and Efficacy of Proprotein Convertase Subtilisin/Kexin Type 9 (PCSK9) Inhibitors

Endocrine Practice ◽

10.1016/s1530-891x(20)39700-7 ◽

2020 ◽

Vol 26 ◽

pp. 166-167

Author(s):

Mallory Kuchis

Keyword(s):

Real World ◽

Proprotein Convertase ◽

Pcsk9 Inhibitors ◽

Safety And Efficacy