scholarly journals 144Identification of non-vaccinated children using decision trees

2021 ◽  
Vol 50 (Supplement_1) ◽  
Author(s):  
Thiago Melo Santos ◽  
Bianca Cata-Preta ◽  
Cesar G Victora ◽  
Aluisio J D Barros

Abstract Background Non-vaccinated children are a particularly vulnerable and understudied group. Machine learning algorithms, such as decision trees, might be useful for identifying subgroups with high prevalence of zero dose (neither BCG, polio, DPT nor measles vaccines received). Methods We developed Classification and Regression Tree models using data from DHS surveys of India 2015 and Chad 2014 in order to identify risk groups of zero dose. Results The first split variable for India was the child’s place of delivery, followed by the mother’s tetanus vaccination status for the higher-risk subgroup of children born in noninstitutional facilities. For Chad, administrative region was selected, and two high zero dose regions were defined. For those regions, children whose mother did not receive any dose of tetanus vaccine were also considered a higher-risk subgroup. Conclusions Two trees were created with only two splits each. Subgroups with zero dose prevalence higher than 40% were identified. Key messages Decision trees might be valuable tools for exploratory data analysis and risk groups identification in epidemiological research.

2014 ◽  
Vol 10 ◽  
pp. P798-P799
Author(s):  
Charlotte Teunissen ◽  
Niki S.M. Schoonenboom ◽  
Pieter Jelle Visser ◽  
Wiesje M. Van der Flier ◽  
Dirk Knol ◽  
...  

2020 ◽  
Vol 27 (1) ◽  
pp. 107327482092472
Author(s):  
Monica E. Reyes ◽  
Heloise Borges ◽  
Muhamed Said Adjao ◽  
Nisha Vijayakumar ◽  
Philippe E. Spiess ◽  
...  

Although penile carcinoma is a rare malignancy, there is still an unmet need to identify prognostic factors associated with poor survival. In this study, we utilized demographic and clinical information to identify the most informative variables associated with overall survival in patients with penile cancer. From a full model including all covariates found to be statistically significant in univariable analyses, we identified a parsimonious reduced model containing tumor site (penis glans: hazard ratio [HR] = 0.48; 95% CI: 0.28-0.85 and penis not otherwise specified: HR = 0.45; 95% CI: 0.25-0.84), undetermined tumor differentiation (HR = 0.48; 95% CI: 0.27-0.86), and TNM stage III/IV (HR = 2.83; 95% CI: 1.68-4.75). When all of the covariates from the full model were subjected to classification and regression tree analysis, we identified 6 novel risk groups. Of particular interest, we found marriage was associated with substantial improvement in survival among men with the same stage and disease site. Specifically, among single/widowed/divorced men with TNM stage 0-II and prepuce/penis corpus/overlapping lesions had worse survival (5-year survival = 18.2%) versus married men (5-year survival = 62.5%). Since marital status is linked to social support, these findings warrant a deeper investigation into the relationships between disease prognosis and social support in patients with penile carcinoma.


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e2849 ◽  
Author(s):  
Chunrong Mi ◽  
Falk Huettmann ◽  
Yumin Guo ◽  
Xuesong Han ◽  
Lijia Wen

Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha,n = 33), White-naped Crane (Grus vipio,n = 40), and Black-necked Crane (Grus nigricollis,n = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid assessments and decisions for efficient conservation.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Nathella Pavan Kumar ◽  
Syed Hissar ◽  
Kannan Thiruvengadam ◽  
Velayuthum V. Banurekha ◽  
Sarath Balaji ◽  
...  

Abstract Background Diagnosing tuberculosis (TB) in children is challenging due to paucibacillary disease, and lack of ability for microbiologic confirmation. Hence, we measured the plasma chemokines as biomarkers for diagnosis of pediatric tuberculosis. Methods We conducted a prospective case control study using children with confirmed, unconfirmed and unlikely TB. Multiplex assay was performed to examine the plasma CC and CXC levels of chemokines. Results Baseline levels of CCL1, CCL3, CXCL1, CXCL2 and CXCL10 were significantly higher in active TB (confirmed TB and unconfirmed TB) in comparison to unlikely TB children. Receiver operating characteristics curve analysis revealed that CCL1, CXCL1 and CXCL10 could act as biomarkers distinguishing confirmed or unconfirmed TB from unlikely TB with the sensitivity and specificity of more than 80%. In addition, combiROC exhibited more than 90% sensitivity and specificity in distinguishing confirmed and unconfirmed TB from unlikely TB. Finally, classification and regression tree models also offered more than 90% sensitivity and specificity for CCL1 with a cutoff value of 28 pg/ml, which clearly classify active TB from unlikely TB. The levels of CCL1, CXCL1, CXCL2 and CXCL10 exhibited a significant reduction following anti-TB treatment. Conclusion Thus, a baseline chemokine signature of CCL1/CXCL1/CXCL10 could serve as an accurate biomarker for the diagnosis of pediatric tuberculosis.


Author(s):  
Bahareh Ghasemain ◽  
Dawod Talebpoor Asl ◽  
Binh Thai Pham ◽  
Mohammadtghi Avand ◽  
Huu Duy Nguyen ◽  
...  

Shallow landslides through land degrading not only lead to threat the properly and life of human but they also may produce huge ecosystem damages. The aim of this study was to compare the performance of two decision tree machine learning algorithms including classification and regression tree (CART) and reduced error pruning tree (REPTree) for shallow landslide susceptibility mapping in Bijar, Kurdistan province, Iran. We first used 20 conditioning factors and then they were tested by information gain ratio (IGR) technique to select the most important ones. We then constructed a geodatabase based on the selected factors along with a total of 111 landslide locations with a ratio of 80/20 (for calibration/validation). The performance of the models was checked by the true positive rate (TP Rate), false positive rate (FP Rate), precision, recall, F1-Measure, Kappa, mean absolute error, and area under the receiver operatic curve (AUC). Results of IGR specified that the slope angle and TWI had the most contribution to shallow landslide occurrence in the study area. Moreover, results concluded that although these models had a high goodness-of-fit and prediction accuracy, the CART model (AUC=0.856) outperformed the REPTree model (AUC=0.837). Therefore, the CART model can be used as a promising tool and also as a base classifier to hybrid with optimization algorithms and Meta classifiers for spatial prediction of shallow landslide-prone areas.


2016 ◽  
Author(s):  
Chunrong Mi ◽  
Falk Huettmann ◽  
Yumin Guo ◽  
Xuesong Han ◽  
Lijia Wen

Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution, and more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha, n=33), White-naped Crane (Grus vipio, n=40), and Black-necked Crane (Grus nigricollis, n=75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models) Besides, we developed an ensemble forecast by averaging predicted probability of above four models results. Commonly-used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. Latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years, and by now, has been known to perform extremely well in ecological predictions. However, while increasingly on the rise its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and it allows robust and rapid assessments and decisions for efficient conservation.


Sign in / Sign up

Export Citation Format

Share Document