scholarly journals Machine Learning Methods to Better Predict Post-Hematopoietic Stem Cell Transplant (HSCT) Leukemic Relapse in Pediatric Patients with Acute Lymphoblastic Leukemia: Random Forest (RF) Classification Featuring Serial Post-Transplant Lineage-Specific Chimerism

Blood ◽  
2020 ◽  
Vol 136 (Supplement 1) ◽  
pp. 6-7
Author(s):  
David C Shyr ◽  
Bing Melody Zhang ◽  
Robertson Parkman ◽  
Simon E. Brewer

The ability to accurately predict leukemic relapse post-HSCT would improve outcomes by allowing pre-emptive therapeutic strategies. Recent studies have identified post-transplant T- and CD34 cell chimerism as predictors of relapse in patients, who had undergone HSCT for hematologic malignancies (Preuner et al, 2016; Lee et al, 2015). However, these studies assess relapse risk looking at only a single threshold of chimerism using standard regression analysis, which permits only limited consideration of other patient variables. As the result, the findings of these analysis are frequently not applicable to patients generally. Machine learning methods offer the possibility to capture nonlinear relationships and simultaneous interactions between multiple variables, thus better recapitulate the dynamics and nuances of the relapse process in different patients. We use machine learning methods, specifically random forest classification (RF), to build a predictive model of post-transplant relapse and to analyze the data from a cohort of 46 pediatric patients, who received HSCT for acute lymphoblastic leukemia (ALL) and had serial lineage-specific chimerism testing post-transplant. Our model achieved 58 % sensitivity and 98% specificity at predicting relapses in cross validation compared to a baseline model (24% sensitivity, 76% specificity). Consistent with previous reports, our model implicates both peripheral blood (PB) donor CD34 and CD3 chimerism as important variables for relapse. More importantly, the RF showed how different variables interacted with each other, providing additional insights into how to best interpret post-transplant chimerism results. To our knowledge, this is the first study featuring RF machine learning methods in the clinical setting of relapse after HSCT. We use a dataset of patients with ALL undergoing HSCT at Lucile Packard Children's Hospital from 2012 to 2018. Variables collected are summarized in Table 1. The analytical sensitivity of STR-based chimerism testing is 1%. Chimerism results on the same day of relapse were excluded from the analysis. The RF model is based on a set of 500 individual decision trees, each based on a bootstrapped sample of the patient data. A 5-fold cross-validation was used to test predictive skill, with 20% of patients excluded from each fold. We compared results with a Monte Carlo baseline model in which relapse status was repeatedly assigned randomly to each patient with a probability based on the prevalence of relapse in our cohort. Patients, transplantation, and relapse characteristics are summarized in Table 2. Chimerism data are summarized in Table 3. The cross-validation results show a robust predictive skill of relapse within 2 years post-transplant. Our RF achieved 58% sensitivity and 98% specificity, greatly improving the predictive values from the base model (Table 4). Variable importance, the ability of a variable to decrease the error of the prediction model, was calculated for all variables used in our RF (Figure 1). Our analysis shows that the age at the time of transplant has the highest importance, followed by PB donor CD34 chimerism. Bone marrow chimerism generally has lower importance suggesting PB monitoring only is adequate in the clinical setting. We showcase the relationships of 1) age at transplant, 2) donor PB CD34, and 3) donor PB CD3 chimerism to the odds of relapse using a partial dependence plot. Younger patients relapse less often. Donor PB CD34 chimerism exhibits a threshold effect, in which the odds of relapse dramatically decreases when it is above 95% while donor PB CD3 chimerism has a more gradual linear profile (Figure 2). 2D dependence plot of donor PB CD34 and PB CD3 chimerism shows the interaction of the two variables (Figure 3) as continuous variables; relapse risk remaining low with even if donor PB CD3 chimerism is as low as 50% as long as donor PB CD34 chimerism is > 95%. Our study shows that machine learning methods such as RF can be very useful at making accurate predictive model of post-HSCT complications that incorporates multiple variables, allowing for more granular differentiation between different patients. Such analyses can enable more effective deployment of risk-adapted, personalized treatment. By building hundreds of independent decision trees, the RF is also able provide useful insights to the interaction between different variables in a clinically relevant manner. Disclosures No relevant conflicts of interest to declare.

2021 ◽  
Vol 15 ◽  
Author(s):  
Jinyu Zang ◽  
Yuanyuan Huang ◽  
Lingyin Kong ◽  
Bingye Lei ◽  
Pengfei Ke ◽  
...  

Recently, machine learning techniques have been widely applied in discriminative studies of schizophrenia (SZ) patients with multimodal magnetic resonance imaging (MRI); however, the effects of brain atlases and machine learning methods remain largely unknown. In this study, we collected MRI data for 61 first-episode SZ patients (FESZ), 79 chronic SZ patients (CSZ) and 205 normal controls (NC) and calculated 4 MRI measurements, including regional gray matter volume (GMV), regional homogeneity (ReHo), amplitude of low-frequency fluctuation and degree centrality. We systematically analyzed the performance of two classifications (SZ vs NC; FESZ vs CSZ) based on the combinations of three brain atlases, five classifiers, two cross validation methods and 3 dimensionality reduction algorithms. Our results showed that the groupwise whole-brain atlas with 268 ROIs outperformed the other two brain atlases. In addition, the leave-one-out cross validation was the best cross validation method to select the best hyperparameter set, but the classification performances by different classifiers and dimensionality reduction algorithms were quite similar. Importantly, the contributions of input features to both classifications were higher with the GMV and ReHo features of brain regions in the prefrontal and temporal gyri. Furthermore, an ensemble learning method was performed to establish an integrated model, in which classification performance was improved. Taken together, these findings indicated the effects of these factors in constructing effective classifiers for psychiatric diseases and showed that the integrated model has the potential to improve the clinical diagnosis and treatment evaluation of SZ.


2020 ◽  
pp. 5-18
Author(s):  
N. N. Kiselyova ◽  
◽  
V. A. Dudarev ◽  
V. V. Ryazanov ◽  
O. V. Sen’ko ◽  
...  

New chalcospinels of the most common compositions were predicted: AIBIIICIVX4 (X — S or Se) and AIIBIIICIIIS4 (A, B, and C are various chemical elements). They are promising for the search for new materials for magneto-optical memory elements, sensors and anodes in sodium-ion batteries. The parameter “a” values of their crystal lattice are estimated. When predicting only the values of chemical elements properties were used. The calculations were carried out using machine learning programs that are part of the information-analytical system developed by the authors (various ensembles of algorithms of: the binary decision trees, the linear machine, the search for logical regularities of classes, the support vector machine, Fisher linear discriminant, the k-nearest neighbors, the learning a multilayer perceptron and a neural network), — for predicting chalcospinels not yet obtained, as well as an extensive family of regression methods, presented in the scikit-learn package for the Python language, and multilevel machine learning methods that were proposed by the authors — for estimation of the new chalcospinels lattice parameter value). The prediction accuracy of new chalcospinels according to the results of the cross-validation is not lower than 80%, and the prediction accuracy of the parameter of their crystal lattice (according to the results of calculating the mean absolute error (when cross-validation in the leave-one-out mode)) is ± 0.1 Å. The effectiveness of using multilevel machine learning methods to predict the physical properties of substances was shown.


Author(s):  
Razieh Sheikhpour ◽  
Roohallah Fazli ◽  
Sanaz Mehrabani

Background: Microarray experiments can simultaneously determine the expression of thousands of genes. Identification of potential genes from microarray data for diagnosis of cancer is important. This study aimed to identify genes for the diagnosis of acute myeloid and lymphoblastic leukemia using a sparse feature selection method. Materials and Methods: In this descriptive study, the expression of 7129 genes of 25 patients with acute myeloid leukemia (AML), and 47 patients with lymphoblastic leukemia (ALL) achieved by the microarray technology were used in this study. Then, the important genes were identified using a sparse feature selection method to diagnose AML and ALL tissues based on the machine learning methods such as support vector machine (SVM), Gaussian kernel density estimation based classifier (GKDEC), k-nearest neighbor (KNN), and linear discriminant classifier (LDC). Results: Diagnosis of ALL and AML was done with the accuracy of 100% using 8 genes of microarray data selected by the sparse feature selection method, GKDEC, and LDC. Moreover, the KNN classifier using 6 genes and the SVM classifier using 7 genes diagnosed AML and ALL with the accuracy of 91.18% and 94.12%, respectively. The gene with the description “Paired-box protein PAX2 (PAX2) gene, exon 11 and complete CDs” was determined as the most important gene in the diagnosis of ALL and AML. Conclusion: The experimental results of the current study showed that AML and ALL can be diagnosed with high accuracy using sparse feature selection and machine learning methods. It seems that the investigation of the expression of selected genes in this study can be helpful in the diagnosis of ALL and AML.


Risks ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 133
Author(s):  
Andrey Koltays ◽  
Anton Konev ◽  
Alexander Shelupanov

The need to assess the risks of the trustworthiness of counterparties is increasing every year. The identification of increasing cases of unfair behavior among counterparties only confirms the relevance of this topic. The existing work in the field of information and economic security does not create a reasonable methodology that allows for a comprehensive study and an adequate assessment of a counterparty (for example, a developer company) in the field of software design and development. The purpose of this work is to assess the risks of a counterparty’s trustworthiness in the context of the digital transformation of the economy, which in turn will reduce the risk of offenses and crimes that constitute threats to the security of organizations. This article discusses the main methods used in the construction of a mathematical model for assessing the trustworthiness of a counterparty. The main difficulties in assessing the accuracy and completeness of the model are identified. The use of cross-validation to eliminate difficulties in building a model is described. The developed model, using machine learning methods, gives an accurate result with a small number of compared counterparties, which corresponds to the order of checking a counterparty in a real system. The results of calculations in this model show the possibility of using machine learning methods in assessing the risks of counterparty trustworthiness.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Sorayya Rezayi ◽  
Niloofar Mohammadzadeh ◽  
Hamid Bouraghi ◽  
Soheila Saeedi ◽  
Ali Mohammadpour

Background. Leukemia is fatal cancer in both children and adults and is divided into acute and chronic. Acute lymphoblastic leukemia (ALL) is a subtype of this cancer. Early diagnosis of this disease can have a significant impact on the treatment of this disease. Computational intelligence-oriented techniques can be used to help physicians identify and classify ALL rapidly. Materials and Method. In this study, the utilized dataset was collected from a CodaLab competition to classify leukemic cells from normal cells in microscopic images. Two famous deep learning networks, including residual neural network (ResNet-50) and VGG-16 were employed. These two networks are already trained by our assigned parameters, meaning we did not use the stored weights; we adjusted the weights and learning parameters too. Also, a convolutional network with ten convolutional layers and 2 ∗ 2 max-pooling layers—with strides 2—was proposed, and six common machine learning techniques were developed to classify acute lymphoblastic leukemia into two classes. Results. The validation accuracies (the mean accuracy of training and test networks for 100 training cycles) of the ResNet-50, VGG-16, and the proposed convolutional network were found to be 81.63%, 84.62%, and 82.10%, respectively. Among applied machine learning methods, the lowest obtained accuracy was related to multilayer perceptron (27.33%) and highest for random forest (81.72%). Conclusion. This study showed that the proposed convolutional neural network has optimal accuracy in the diagnosis of ALL. By comparing various convolutional neural networks and machine learning methods in diagnosing this disease, the convolutional neural network achieved good performance and optimal execution time without latency. This proposed network is less complex than the two pretrained networks and can be employed by pathologists and physicians in clinical systems for leukemia diagnosis.


Blood ◽  
2021 ◽  
Author(s):  
Hassan Awada ◽  
Arda Durmaz ◽  
Carmelo Gurnari ◽  
Ashwin Kishtagari ◽  
Manja Meggendorfer ◽  
...  

While genomic alterations drive the pathogenesis of acute myeloid leukemia (AML), traditional classifications are largely based on morphology and prototypic genetic founder lesions define only a small proportion of AML patients. The historical subdivision of primary/de novo AML (pAML) and secondary AML (sAML) has shown to variably correlate with genetic patterns. Perhaps, the combinatorial complexity and heterogeneity of AML genomic architecture have precluded, so far, the genomic-based subclassification to identify distinct molecularly-defined subtypes more reflective of shared pathogenesis. We integrated cytogenetic and gene sequencing data from a multicenter cohort of 6,788 AML patients that were analyzed using standard and machine learning methods to generate a novel AML molecular subclassification with biological correlates corresponding to underlying pathogenesis. Standard supervised analyses resulted in modest cross-validation accuracy when attempting to use molecular patterns to predict traditional pathomorphological AML classifications. We performed unsupervised analysis by applying Bayesian Latent Class method that identified 4 unique genomic clusters of distinct prognoses. Invariant genomic features driving each cluster were extracted and resulted in 97% cross-validation accuracy when used for genomic subclassification. Subclasses of AML defined by molecular signatures overlapped current pathomorphological and clinically-defined AML subtypes. We internally and externally validated our results and share an open-access molecular classification scheme for AML patients. Although the heterogeneity inherent in the genomic changes across nearly 7,000 AML patients is too vast for traditional prediction methods, however, machine learning methods allowed for the definition of novel genomic AML subclasses indicating that traditional pathomorphological definitions may be less reflective of overlapping pathogenesis.


2021 ◽  
Vol 13 (20) ◽  
pp. 4149
Author(s):  
Soo-In Sohn ◽  
Young-Ju Oh ◽  
Subramani Pandian ◽  
Yong-Ho Lee ◽  
John-Lewis Zinia Zaukuu ◽  
...  

The feasibility of rapid and non-destructive classification of six different Amaranthus species was investigated using visible-near-infrared (Vis-NIR) spectra coupled with chemometric approaches. The focus of this research would be to use a handheld spectrometer in the field to classify six Amaranthus sp. in different geographical regions of South Korea. Spectra were obtained from the adaxial side of the leaves at 1.5 nm intervals in the Vis-NIR spectral range between 400 and 1075 nm. The obtained spectra were assessed with four different preprocessing methods in order to detect the optimum preprocessing method with high classification accuracy. Preprocessed spectra of six Amaranthus sp. were used as input for the machine learning-based chemometric analysis. All the classification results were validated using cross-validation to produce robust estimates of classification accuracies. The different combinations of preprocessing and modeling were shown to have a classification accuracy of between 71% and 99.7% after the cross-validation. The combination of Savitzky-Golay preprocessing and Support vector machine showed a maximum mean classification accuracy of 99.7% for the discrimination of Amaranthus sp. Considering the high number of spectra involved in this study, the growth stage of the plants, varying measurement locations, and the scanning position of leaves on the plant are all important. We conclude that Vis-NIR spectroscopy, in combination with appropriate preprocessing and machine learning methods, may be used in the field to effectively classify Amaranthus sp. for the effective management of the weedy species and/or for monitoring their food applications.


Author(s):  
Jing Xu ◽  
Fuyi Li ◽  
André Leier ◽  
Dongxu Xiang ◽  
Hsin-Hui Shen ◽  
...  

Abstract Antimicrobial peptides (AMPs) are a unique and diverse group of molecules that play a crucial role in a myriad of biological processes and cellular functions. AMP-related studies have become increasingly popular in recent years due to antimicrobial resistance, which is becoming an emerging global concern. Systematic experimental identification of AMPs faces many difficulties due to the limitations of current methods. Given its significance, more than 30 computational methods have been developed for accurate prediction of AMPs. These approaches show high diversity in their data set size, data quality, core algorithms, feature extraction, feature selection techniques and evaluation strategies. Here, we provide a comprehensive survey on a variety of current approaches for AMP identification and point at the differences between these methods. In addition, we evaluate the predictive performance of the surveyed tools based on an independent test data set containing 1536 AMPs and 1536 non-AMPs. Furthermore, we construct six validation data sets based on six different common AMP databases and compare different computational methods based on these data sets. The results indicate that amPEPpy achieves the best predictive performance and outperforms the other compared methods. As the predictive performances are affected by the different data sets used by different methods, we additionally perform the 5-fold cross-validation test to benchmark different traditional machine learning methods on the same data set. These cross-validation results indicate that random forest, support vector machine and eXtreme Gradient Boosting achieve comparatively better performances than other machine learning methods and are often the algorithms of choice of multiple AMP prediction tools.


2021 ◽  
Vol 23 (1) ◽  
Author(s):  
Asmir Vodencarevic ◽  
◽  
Koray Tascilar ◽  
Fabian Hartmann ◽  
Michaela Reiser ◽  
...  

Abstract Background Biological disease-modifying anti-rheumatic drugs (bDMARDs) can be tapered in some rheumatoid arthritis (RA) patients in sustained remission. The purpose of this study was to assess the feasibility of building a model to estimate the individual flare probability in RA patients tapering bDMARDs using machine learning methods. Methods Longitudinal clinical data of RA patients on bDMARDs from a randomized controlled trial of treatment withdrawal (RETRO) were used to build a predictive model to estimate the probability of a flare. Four basic machine learning models were trained, and their predictions were additionally combined to train an ensemble learning method, a stacking meta-classifier model to predict the individual flare probability within 14 weeks after each visit. Prediction performance was estimated using nested cross-validation as the area under the receiver operating curve (AUROC). Predictor importance was estimated using the permutation importance approach. Results Data of 135 visits from 41 patients were included. A model selection approach based on nested cross-validation was implemented to find the most suitable modeling formalism for the flare prediction task as well as the optimal model hyper-parameters. Moreover, an approach based on stacking different classifiers was successfully applied to create a powerful and flexible prediction model with the final measured AUROC of 0.81 (95%CI 0.73–0.89). The percent dose change of bDMARDs, clinical disease activity (DAS-28 ESR), disease duration, and inflammatory markers were the most important predictors of a flare. Conclusion Machine learning methods were deemed feasible to predict flares after tapering bDMARDs in RA patients in sustained remission.


Sign in / Sign up

Export Citation Format

Share Document