Abstract
Context
Accurate methods for early gestational diabetes mellitus (GDM) (during the first trimester of pregnancy) prediction in Chinese and other populations are lacking.
Objectives
Establishing effective models to predict early GDM.
Setting
Pregnancy data for 73 variables during the first trimester were extracted from the electronic medical record system.
Main measures
Based on a machine learning (ML) driven feature selection method, 17 variables were selected for early GDM prediction. In order to facilitate clinical application, 7 variables were selected from the 17-variable panel. Advanced ML approaches were then employed using the 7-variable dataset and the 73-variable dataset to build models predicting early GDM for different situations respectively.
Results
16,819 and 14,992 cases were included in the training and testing sets, respectively. Using 73 variables, the deep neural network model achieved high discriminative power, with area under the curve (AUC) values of 0.80. The 7-variable logistic regression (LR) model also achieved effective discriminate power (AUC = 0.77). Low BMI (≤ 17) was related to an increased risk of GDM, compared to a BMI in the range of 17 to 18 (minimum risk interval) (11.8% vs 8.7%, P = 0.0935). TT3 and TT4 were superior to FT3 and FT4 in predicting GDM. Lipoprotein (a) was demonstrated a promising predictive value (AUC = 0.66).
Conclusions
We employed ML models that achieved high accuracy in predicting GDM in early pregnancy. A clinically cost-effective 7-variable LR model was simultaneously developed. The relationship of GDM with thyroxine and BMI was investigated in the Chinese population.