scholarly journals Prediction Models of Early Childhood Caries Based on Machine Learning Algorithms

Author(s):  
You-Hyun Park ◽  
Sung-Hwa Kim ◽  
Yoon-Young Choi

In this study, we developed machine learning-based prediction models for early childhood caries and compared their performances with the traditional regression model. We analyzed the data of 4195 children aged 1–5 years from the Korea National Health and Nutrition Examination Survey data (2007–2018). Moreover, we developed prediction models using the XGBoost (version 1.3.1), random forest, and LightGBM (version 3.1.1) algorithms in addition to logistic regression. Two different methods were applied for variable selection, including a regression-based backward elimination and a random forest-based permutation importance classifier. We compared the area under the receiver operating characteristic (AUROC) values and misclassification rates of the different models and observed that all four prediction models had AUROC values ranging between 0.774 and 0.785. Furthermore, no significant difference was observed between the AUROC values of the four models. Based on the results, we can confirm that both traditional logistic regression and ML-based models can show favorable performance and can be used to predict early childhood caries, identify ECC high-risk groups, and implement active preventive treatments. However, further research is essential to improving the performance of the prediction model using recent methods, such as deep learning.

Genes ◽  
2021 ◽  
Vol 12 (4) ◽  
pp. 462
Author(s):  
Katarzyna Zaorska ◽  
Tomasz Szczapa ◽  
Maria Borysewicz-Lewicka ◽  
Michał Nowicki ◽  
Karolina Gerreth

Background: Several genes and single nucleotide polymorphisms (SNPs) have been associated with early childhood caries. However, they are highly age- and population-dependent and the majority of existing caries prediction models are based on environmental and behavioral factors only and are scarce in infants. Methods: We examined 6 novel and previously analyzed 22 SNPs in the cohort of 95 Polish children (48 caries, 47 caries-free) aged 2–3 years. All polymorphisms were genotyped from DNA extracted from oral epithelium samples. We used Fisher’s exact test, receiver operator characteristic (ROC) curve and uni-/multi-variable logistic regression to test the association of SNPs with the disease, followed by the neural network (NN) analysis. Results: The logistic regression (LogReg) model showed 90% sensitivity and 96% specificity, overall accuracy of 93% (p < 0.0001), and the area under the curve (AUC) was 0.970 (95%CI: 0.912–0.994; p < 0.0001). We found 90.9–98.4% and 73.6–87.2% prediction accuracy in the test and validation predictions, respectively. The strongest predictors were: AMELX_rs17878486 and TUFT1_rs2337360 (in both LogReg and NN), MMP16_rs1042937 (in NN) and ENAM_rs12640848 (in LogReg). Conclusions: Neural network prediction model might be a substantial tool for screening/early preventive treatment of patients at high risk of caries development in the early childhood. The knowledge of potential risk status could allow early targeted training in oral hygiene and modifications of eating habits.


2021 ◽  
Author(s):  
Maryam Koopaie ◽  
Mahsa Salamati ◽  
Roshanak Montazeri ◽  
Mansour Davoudi ◽  
Sajad Kolahdooz

Abstract Background: Early childhood caries is the most common infectious disease in childhood, with a high prevalence in developing countries. Recognition of the factors affecting early childhood caries and its pathophysiology, allows better control of disease. Cystatin S as one of the salivary proteins, has an important role in pellicle formation, tooth re-mineralization and protection. The aim of the present study is to assess salivary cystatin S levels and demographic data in early childhood caries children in comparison with caries-free ones using statistical analysis and machine learning methods. Methods: A cross-sectional case-control study was undertaken on 20 cases of early childhood caries and 20 caries-free children as a control group. Unstimulated whole saliva samples by suction method was collected. Cystatin S concentrations were determined using human cystatin S ELISA kit. A checklist was collected for each participant about the demographic characteristics, oral health status and dietary habits by interviewing parents. The regression and receiver operating characteristic (ROC) curve analysis was done to evaluate the potential role of cystatin S salivary level and demographic using machine learning and statistical analysis.Results: The mean value of salivary cystatin S concentration in early childhood caries group was 191.55±81.90 (ng/ml) and in caries-free group was 370.06±128.87 (ng/ml). T-test analysis showed that there is a statistically significant difference between early childhood caries and caries-free group in salivary cystatin S level (p: 0.032). Investigation of area under the curve and accuracy of ROC curve revealed that logistic regression model based on the salivary cystatin S levels and birth weight had most and acceptable potential for discriminating of early childhood caries from caries-free controls. After that the machine learning models and finally salivary cystatin S levels had more capability for differentiation of early childhood caries from caries-free controls.Conclusion: Salivary cystatin S in caries-free children was higher than the children with early childhood caries. Therefore, cystatin S protein can be used as a biomarker for early prediction of early childhood caries, furthermore cystatin S is a protective factor against dental caries.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Maryam Koopaie ◽  
Mahsa Salamati ◽  
Roshanak Montazeri ◽  
Mansour Davoudi ◽  
Sajad Kolahdooz

Abstract Background Early childhood caries is the most common infectious disease in childhood, with a high prevalence in developing countries. The assessment of the variables that influence early childhood caries as well as its pathophysiology leads to improved control of this disease. Cystatin S, as one of the salivary proteins, has an essential role in pellicle formation, tooth re-mineralization, and protection. The present study aims to assess salivary cystatin S levels and demographic data in early childhood caries in comparison with caries-free ones using statistical analysis and machine learning methods. Methods A cross-sectional, case–control study was undertaken on 20 cases of early childhood caries and 20 caries-free children as a control. Unstimulated whole saliva samples were collected by suction. Cystatin S concentrations in samples were determined using human cystatin S ELISA kit. The checklist was collected from participants about demographic characteristics, oral health status, and dietary habits by interviewing parents. Regression and receiver operating characteristic (ROC) curve analysis were done to evaluate the potential role of cystatin S salivary level and demographic using statistical analysis and machine learning. Results The mean value of salivary cystatin S concentration in the early childhood caries group was 191.55 ± 81.90 (ng/ml) and in the caries-free group was 370.06 ± 128.87 (ng/ml). T-test analysis showed a statistically significant difference between early childhood caries and caries-free groups in salivary cystatin S levels (p = 0.032). Investigation of the area under the curve (AUC) and accuracy of the ROC curve revealed that the logistic regression model based on salivary cystatin S levels and birth weight had the most and acceptable potential for discriminating of early childhood caries from caries-free controls. Furthermore, using salivary cystatin S levels enhanced the capability of machine learning methods to differentiate early childhood caries from caries-free controls. Conclusion Salivary cystatin S levels in caries-free children were higher than the children with early childhood caries. Results of the present study suggest that considering clinical examination, demographic and socioeconomic factors, along with the salivary cystatin S levels, could be usefull for early diagnosis ofearly childhood caries in high-risk children; furthermore, cystatin S is a protective factor against dental caries.


2018 ◽  
Author(s):  
Liyan Pan ◽  
Guangjian Liu ◽  
Xiaojian Mao ◽  
Huixian Li ◽  
Jiexin Zhang ◽  
...  

BACKGROUND Central precocious puberty (CPP) in girls seriously affects their physical and mental development in childhood. The method of diagnosis—gonadotropin-releasing hormone (GnRH)–stimulation test or GnRH analogue (GnRHa)–stimulation test—is expensive and makes patients uncomfortable due to the need for repeated blood sampling. OBJECTIVE We aimed to combine multiple CPP–related features and construct machine learning models to predict response to the GnRHa-stimulation test. METHODS In this retrospective study, we analyzed clinical and laboratory data of 1757 girls who underwent a GnRHa test in order to develop XGBoost and random forest classifiers for prediction of response to the GnRHa test. The local interpretable model-agnostic explanations (LIME) algorithm was used with the black-box classifiers to increase their interpretability. We measured sensitivity, specificity, and area under receiver operating characteristic (AUC) of the models. RESULTS Both the XGBoost and random forest models achieved good performance in distinguishing between positive and negative responses, with the AUC ranging from 0.88 to 0.90, sensitivity ranging from 77.91% to 77.94%, and specificity ranging from 84.32% to 87.66%. Basal serum luteinizing hormone, follicle-stimulating hormone, and insulin-like growth factor-I levels were found to be the three most important factors. In the interpretable models of LIME, the abovementioned variables made high contributions to the prediction probability. CONCLUSIONS The prediction models we developed can help diagnose CPP and may be used as a prescreening tool before the GnRHa-stimulation test.


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Matthijs Blankers ◽  
Louk F. M. van der Post ◽  
Jack J. M. Dekker

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.


2020 ◽  
Author(s):  
Jun Ke ◽  
Yiwei Chen ◽  
Xiaoping Wang ◽  
Zhiyong Wu ◽  
qiongyao Zhang ◽  
...  

Abstract BackgroundThe purpose of this study is to identify the risk factors of in-hospital mortality in patients with acute coronary syndrome (ACS) and to evaluate the performance of traditional regression and machine learning prediction models.MethodsThe data of ACS patients who entered the emergency department of Fujian Provincial Hospital from January 1, 2017 to March 31, 2020 for chest pain were retrospectively collected. The study used univariate and multivariate logistic regression analysis to identify risk factors for in-hospital mortality of ACS patients. The traditional regression and machine learning algorithms were used to develop predictive models, and the sensitivity, specificity, and receiver operating characteristic curve were used to evaluate the performance of each model.ResultsA total of 7810 ACS patients were included in the study, and the in-hospital mortality rate was 1.75%. Multivariate logistic regression analysis found that age and levels of D-dimer, cardiac troponin I, N-terminal pro-B-type natriuretic peptide (NT-proBNP), lactate dehydrogenase (LDH), high-density lipoprotein (HDL) cholesterol, and calcium channel blockers were independent predictors of in-hospital mortality. The study found that the area under the receiver operating characteristic curve of the models developed by logistic regression, gradient boosting decision tree (GBDT), random forest, and support vector machine (SVM) for predicting the risk of in-hospital mortality were 0.963, 0.960, 0.963, and 0.959, respectively. Feature importance evaluation found that NT-proBNP, LDH, and HDL cholesterol were top three variables that contribute the most to the prediction performance of the GBDT model and random forest model.ConclusionsThe predictive model developed using logistic regression, GBDT, random forest, and SVM algorithms can be used to predict the risk of in-hospital death of ACS patients. Based on our findings, we recommend that clinicians focus on monitoring the changes of NT-proBNP, LDH, and HDL cholesterol, as this may improve the clinical outcomes of ACS patients.


2020 ◽  
Author(s):  
Victoria Garcia-Montemayor ◽  
Alejandro Martin-Malo ◽  
Carlo Barbieri ◽  
Francesco Bellocchio ◽  
Sagrario Soriano ◽  
...  

Abstract Background Besides the classic logistic regression analysis, non-parametric methods based on machine learning techniques such as random forest are presently used to generate predictive models. The aim of this study was to evaluate random forest mortality prediction models in haemodialysis patients. Methods Data were acquired from incident haemodialysis patients between 1995 and 2015. Prediction of mortality at 6 months, 1 year and 2 years of haemodialysis was calculated using random forest and the accuracy was compared with logistic regression. Baseline data were constructed with the information obtained during the initial period of regular haemodialysis. Aiming to increase accuracy concerning baseline information of each patient, the period of time used to collect data was set at 30, 60 and 90 days after the first haemodialysis session. Results There were 1571 incident haemodialysis patients included. The mean age was 62.3 years and the average Charlson comorbidity index was 5.99. The mortality prediction models obtained by random forest appear to be adequate in terms of accuracy [area under the curve (AUC) 0.68–0.73] and superior to logistic regression models (ΔAUC 0.007–0.046). Results indicate that both random forest and logistic regression develop mortality prediction models using different variables. Conclusions Random forest is an adequate method, and superior to logistic regression, to generate mortality prediction models in haemodialysis patients.


2021 ◽  
Vol 162 (22) ◽  
pp. 861-869
Author(s):  
Andrea Radácsi ◽  
Tímea Dergez ◽  
Laura Csabai ◽  
Nóra Stáczer ◽  
Krisztián Katona ◽  
...  

Összefoglaló. Bevezetés és célkitűzés: A 3 év alatti korcsoportban a súlyos kisgyermekkori fogszuvasodás (S-ECC) gyakoriságának vizsgálata és a kisgyermekes szülők szájegészséggel kapcsolatos ismereteinek felmérése. Módszer: Háromszázhatvankét, 36 hónapos vagy annál fiatalabb gyermek (átlagéletkor: 28,49 ± 5,25 hónap) fogászati szűrővizsgálatát végeztük el, és szüleik számára önkéntesen kitölthető kérdőívet állítottunk össze. A 306 kitöltött kérdőív eredményét a gyermekek fogászati statusával egyénenként összevetve statisztikai analízist végeztünk. Eredmények: A gyermekek fogászati szűrésekor 15,46% volt a cariesprevalencia: df-t-index = 0,685 ± 2,20; az általunk javasolt, az előtört fogak számával módosított df-t-index = 0,758 ± 2,42, SiC-index = 2,06 ± 3,33. A vizsgált populációban nem találtunk tömött vagy fogszuvasodás miatt eltávolított fogat. A korábban szájhigiénés tájékoztatásban részesült/nem részesült szülők gyermekeinél nem volt szignifikáns különbség a szuvas fogak számának (p = 0,196), a fogyasztott folyadék cukortartalmának (81,5%/71,5%) és a bevitel módjának (p = 0,453) tekintetében. A gyermeküket 6 hónapos korukig kizárólagosan anyatejjel tápláló édesanyák nagyobb eséllyel választották a vízzel történő itatást (75%/52%) pohárból (68,1%/28,8%) a későbbiekben. Az édesanya legmagasabb iskolai végzettsége kulcsszerepet játszik mind az anyatejes táplálás melletti elhivatottságban (felsőfok: 53,4%, középfok: 34,2%, alapfok: 37,5%), mind a cariesfrekvencia csökkentésében (p = 0,015). Következtetés: A szülői szájhigiénés prevenciós tájékoztatás jelenleg nem hatékony. Eredményeink alapján a fogászati szűrést 1 éves kor előtt meg kell kezdeni, amely a hatékony, rendszeres fogászati prevenciós tanácsadás lehetőségét is megteremtené. Szükségesnek látjuk a védőnők és a gyermekháziorvosok folyamatos továbbképzésében a kisgyermekkori fogszuvasodás alapismereteinek integrációját. Orv Hetil. 2021; 162(22): 861–869. Summary. Introduction and objective: To investigate the frequency of severe early childhood caries (S-ECC) under 3 years of age and to assess the oral health related knowledge of parents/guardians with preschool children. Method: 362 children younger than 36 months (mean age: 28.49 ± 5.25 months) were screened and a voluntary questionnaire for their parents was compiled. Statistical analysis was carried out comparing the results of the 306 completed questionnaires with the dental status of the screened children. Results: Caries prevalence in the examined population was 15.46%, df-t index = 0.685 ± 2.20, our suggested modified df-t index based on the number of erupted teeth = 0.758 ± 2.42, SiC-index = 2.06 ± 3.33. No filled or extracted tooth due to caries was found in the study group. There was no significant difference in the number of carious teeth (p = 0.196), consumption of sugar-containing drinks (81.5%/71.5%) and administration of drinks (p = 0.453) in the case of children whose parents had previously received/not received oral hygiene information. Mothers who exclusively breastfed until the age of 6 months were more likely to choose to offer water (75%/52%) from cup (68.1%/28.8%) later. The mother’s highest level of education plays a key role in both the commitment to breastfeeding (tertiary: 53.4%, secondary: 34.2%, primary: 37.5%) and in reducing the incidence of caries (p = 0.015). Conclusion: Parental oral hygiene preventive instruction is currently ineffective. Based on our results, dental screening should be started before the age of 1 year, that would also provide an opportunity of effective, regular dental prevention counseling. There is a need for the integration of the basics of early childhood caries in the continuous professional training of district nurses and pediatricians. Orv Hetil. 2021; 162(22): 861–869.


2021 ◽  
Author(s):  
Sangil Lee ◽  
Brianna Mueller ◽  
W. Nick Street ◽  
Ryan M. Carnahan

AbstractIntroductionDelirium is a cerebral dysfunction seen commonly in the acute care setting. Delirium is associated with increased mortality and morbidity and is frequently missed in the emergency department (ED) by clinical gestalt alone. Identifying those at risk of delirium may help prioritize screening and interventions.ObjectiveOur objective was to identify clinically valuable predictive models for prevalent delirium within the first 24 hours of hospitalization based on the available data by assessing the performance of logistic regression and a variety of machine learning models.MethodsThis was a retrospective cohort study to develop and validate a predictive risk model to detect delirium using patient data obtained around an ED encounter. Data from electronic health records for patients hospitalized from the ED between January 1, 2014, and December 31, 2019, were extracted. Eligible patients were aged 65 or older, admitted to an inpatient unit from the emergency department, and had at least one DOSS assessment or CAM-ICU recorded while hospitalized. The outcome measure of this study was delirium within one day of hospitalization determined by a positive DOSS or CAM assessment. We developed the model with and without the Barthel index for activity of daily living, since this was measured after hospital admission.ResultsThe area under the ROC curves for delirium ranged from .69 to .77 without the Barthel index. Random forest and gradient-boosted machine showed the highest AUC of .77. At the 90% sensitivity threshold, gradient-boosted machine, random forest, and logistic regression achieved a specificity of 35%. After the Barthel index was included, random forest, gradient-boosted machine, and logistic regression models demonstrated the best predictive ability with respective AUCs of .85 to .86.ConclusionThis study demonstrated the use of machine learning algorithms to identify the combination of variables that are predictive of delirium within 24 hours of hospitalization from the ED.


Blood ◽  
2021 ◽  
Vol 138 (Supplement 1) ◽  
pp. 3520-3520
Author(s):  
Laurent Miguet ◽  
Caroline Mayeur-Rousse ◽  
Alice Eischen ◽  
Anne-Cecile Galoisy ◽  
Delphine C. M. Rolland ◽  
...  

Abstract Introduction: B-cell immunophenotype could be swiftly assessed by flow cytometry on blood samples or bone marrow aspirate specimens. It provides crucial information later refined with histologic, genetic and molecular features to assert accurate diagnosis of chronic B-cell lymphoproliferative disorders (B-CLPD). Besides Matutes score we identified additional useful markers, i.e. CD148 and CD180 to classify mantle cell lymphoma (MCL) and marginal zone lymphoma (MZL), respectively. Furthermore, CD200 is known to be highly expressed in chronic lymphoid leukemia (CLL) while absent in MCL. Hypothesis: The determination of CD148, CD180 and CD200 expression on B-cells by flow cytometry on blood samples and/or bone marrow aspirates could be a potent tool to accurately identify B-CLPD. We postulated the existence of the following specific expression patterns in B-CLPD: CD148 dim/CD180 dim/CD200 bright for CLL, CD148 dim/CD180 dim/CD200 dim for lymphoplasmocytic lymphoma (LPL), CD148 bright/CD180 dim/CD200 neg/dim for MCL and CD148 dim/CD180 bright/CD200 dim for MZL . Methods: In a prospective study we investigated the expression of CD148/CD180/CD200 on B-cells from 673 patients at the time of B-CLPD diagnosis in our hospital from 2014 to 2020. We analyzed 440 blood and 233 bone marrow aspirate specimens using a BD FACSCanto II flow cytometry instrument. Based solely on CD148/CD180/CD200 specific expression patterns we postulated a diagnosis of CLL, LPL, MCL or MZL. These postulated diagnoses were later confronted to the final diagnoses when all histologic, genetic and molecular features were finalized. Sensitivity, specificity, positive and negative predictive values of the expression profiles were determined. In addition, to investigate the relative importance of these three CD markers we then normalized their mean fluorescence intensities (MFI) and applied several supervised machine learning algorithms including Logistic Regression, Random Forest and Light Gradient Boosting Machine (LightGBM). Results: Out of the 673 clinical samples the CD148/CD180/CD200 expression patterns classified 212 specimens as CLL/SLL (30.8%), 160 as LPL (23.8%), 76 as MCL (11.28%) and 169 as MZL (25%). These diagnosis hypotheses were retrospectively compared to the final diagnoses based on all histologic, genetic and molecular features These diagnosis hypotheses of CLL, LPL, MCL and MZL were consistent with the final diagnosis in 583 out of the 617 corresponding cases (94%) with high positive and negative predictive values. The characteristics of the diagnosis accuracy are detailed in the table below. HCL and FL were not further investigated as their immunophenotype usually do not overlap with those of other B-CLPD. Seventeen out of 617 patients (17/617, 5.3%) did not displayed a clear CD148/CD180/CD200 pattern: 9 LPL, 4 CLL and 4 MZL. In sixteen patients (16/617, 5.0%) the diagnosis hypothesis based on this strategy was not confirmed after completion of the exploration including karyotype, MYD88 L265P mutational status, CCND1 overexpression and pathology explorations. We next investigated the relative importance of these 3 markers. We focused on MFI values of CD148, CD180 and CD200 and three categorical "positive or negative" markers (CD5, CD23, FMC7) that were assembled into a composite marker. After Cox-box normalization of CD148, CD180 and CD200 MFIs, a set of supervised machine learning algorithms including Logistic Regression, Random Forest and Light Gradient Boosting Machine (LightGBM) were applied to the cohort of CLL, LPL, MCL and MZL. We established that the highest diagnosis weights were obtained for CD200 in CLL, CD200 and CD148 in MCL (negatively and positively, respectively), CD180 in MZL. In LPL, CD148, CD180 and CD200 had the highest weights using LightGBM and Random Forest algorithms, while Logistic Regression determined that CD5 and CD23 had the highest (negative) weights. In conclusion, the determination of CD148/CD180/CD200 surface expression patterns by flow cytometry, along with morphology, allowed to assert an accurate diagnosis hypothesis in CLL, MCL, LPL and MZL with high positive and negative predictive values. Machine learning algorithms allowed to measure the relative importance of these markers, that could be of great help in case of discordant expression of the main diagnosis markers. Figure 1 Figure 1. Disclosures No relevant conflicts of interest to declare.


Sign in / Sign up

Export Citation Format

Share Document