Using Machine Intelligence Algorithms to Develop a Geno-Clinical Model to Predict Responses to Hypomethylating Agents in Myelodysplastic Syndromes

Blood ◽  
2016 ◽  
Vol 128 (22) ◽  
pp. 3193-3193
Author(s):  
Aziz Nazha ◽  
Mikkael A. Sekeres ◽  
Rafael Bejar ◽  
John Barnard ◽  
Karam Al-Issa ◽  
...  

Abstract Background: While treatment with the hypomethylating agents (HMAs) azacitidine (AZA) and decitabine (DAC) improves cytopenias and prolongs survival in MDS patients (pts), only 30-40% of pts respond. Genomic and/or clinical models that can predict which pts will respond could prevent prolonged exposure to ineffective therapy, avoid toxicities and decrease unnecessary treatment costs. Machine learning (ML), a field of artificial intelligence, is an advanced computational analysis of complex data sets that can overcome some of the limitations of standard statistical methods. ML uses computational algorithms to automatically extract hidden information from a dataset by learning from relationships, patterns, and trends in the data. Thus, ML can produce powerful, reliable and reproducible predictive models based on large and complex datasets. The aim of this project is to build a geno-clinical model that uses ML algorthims to predict responses to HMAs. Methods: We screened a cohort of 433 pts with MDS who received HMAs at multiple academic institutions for the presence of common myeloid somatic mutations in 29 genes. Responses were assessed per International Working Group 2006 criteria. Five popular supervised classification ML algorithms including: random forest (RF), tree ensemble (TE), naive bayes (NB), decision tree (DT), and support vector machine (SVM) algorithms were used individually and in combination to enhance the accuracy of the proposed model (bag of model approach). For each iteration, the dataset was divided randomly into training and validation cohorts. The partition of the dataset was repeated multiple times randomly to minimize biases in pt selection. A 10-fold cross validation was also used on the entire dataset to assure data reproducibility. Important variables were selected using backward feature elimination and tree depth scores. Performance was evaluated according to the area under curve (AUC) and accuracy matrix. All analyses were done using KNIME (an open analytic platform for ML). Results: Among 433 pts, 193 (45%) received AZA, 176 (40%) DAC, and 64 (14%) received HMA +/- combination. The median age was 70 years (range, 31-100) and 28% were females. Responses included: 95 (58%) complete remission (CR), 14 (3%) marrow CR, 16 (4%) partial remission (PR), and 59 (14%) hematologic improvement (HI). For the purpose of this analysis, pts with CR/PR/HI were considered as responders. The most commonly mutated genes were: ASXL1 (31%), TET2 (22%), SRSF2 (17%), RUNX1 (15%), and DNMT3A (14%). In univariate analyses, no single mutation was more prevalent in responders compared to non-responders except NF1 (more common in non-responders, p = .04). A logistic regression multivariate analysis did not produce a reliable and reproducible model. When applying ML algorithms on learner (80% randomly selected pts) and predictor cohorts, the accuracy rate in predicting responses for RF was 64%, for TE 60%, for NB 60%, for DT 66%, and for SVM 51%. When results from each model were combined (a bag of models approach), the accuracy increased to 69%. Backward feature elimination and tree depth scores identified the following factors as predictors of response: hemoglobin <10 g/dl, platelets < 30 k/ml, age > 69 years, TP53 with variant allelic frequency (VAF) >15%, CBL VAF >30%, and RUNX1 VAF > 25%. Only ASXL1mutations at any VAF were predictive of HMA resistance. Interestingly, none of the mutations were selected for response or resistance when the models did not include VAF. Neither treatment modality with azacitidine vs. decitabine vs. combination nor treatment center impacted response. When the analysis was restricted to pts with higher-risk disease by IPSS, the accuracy rate in predicting responses improved: for RF it became 71%, for TE 65%, for NB 60%, for DT 64%, and for SVM 76%. When the analysis was focused on pts who achieved CR vs. No CR, the models predicted the response differently. The RF and TE models were able to predict No CR with an accuracy rate of 75% and 76% respectively. Other models were able to predict CR and No CR with lower accuracy. Conclusion: We propose a novel geno-clinical model that uses machine intelligence to predict HMA response/resistance in pts with MDS. The model has a higher accuracy rate in higher-risk MDS pts. ML can open opportunities in translating genomic data into reliable predictive models that can aid physicians in clinical decision making. Disclosures Bejar: Celgene: Consultancy, Honoraria; Foundation Medicine: Consultancy; Genoptix: Consultancy, Honoraria, Patents & Royalties: No royalties.

2021 ◽  
Author(s):  
Naimul M. Khan

Exploration and visualization of complex data has become an integral part of life. But there is a semantic gap between the users and the visualization scientists. The priority of the users is usability while that of the scientists is techniques. Information-Assisted Visualization (IAV) can help bridge this gap, where additional information extracted from the raw data is presented to the user in an easily interpretable way. This thesis proposes some novel machine intelligence based systems for intuitive IAV. The majority of the thesis focuses on Direct Volume Rendering, where Transfer Functions (TF) are used to color the volume data to expose structures. Existing TF design methods require manipulating complex widgets, which may be difficult for the user. We propose two novel approaches towards TF design. In the data-centric approach, we generate an organized representation of the data through clustering and provide the user with some intuitive control over the output in the cluster domain. We use Spherical Self-Organizing Maps (SS)M) as the core of this approach. Instead of manipulating complex widgets, the user interacts with the simple SSOM color-coded lattice to design the TF. In the image-centric approach, the user interaction with the data is direct and minimal. The user interactions create the training data, and supervised classification is used to generate the TF. First, we propose novel supervised classifiers that combine the local information available through Support Vector Machine-based classifiers and the global information available through Nonparametric Discriminant Analysis-based classifiers. Using these classifiers, we propose a TF design method where the user interacts with the volume slices directly to generate the output. Finally, we explore the use of IAV for home-based physical rehabilitation. We propose an information-assisted visual valuation framework which can compare a user’s performance of a physical exercise with that of an expert using our novel Incremental Dynamic Time Warping method and communicate the results visually through our color-mapped skeleton silhouette. All the proposed techniques are accompanied by detailed experimental results comparing them against the state-of-the-art. The results shows the potential of using machine learning techniques to achieve visualization tasks in a simpler yet more effective way.


2021 ◽  
Vol 11 ◽  
Author(s):  
Stefania Montemezzi ◽  
Giulio Benetti ◽  
Maria Vittoria Bisighin ◽  
Lucia Camera ◽  
Chiara Zerbato ◽  
...  

ObjectivesTo test whether 3T MRI radiomics of breast malignant lesions improves the performance of predictive models of complete response to neoadjuvant chemotherapy when added to other clinical, histological and radiological information.MethodsWomen who consecutively had pre-neoadjuvant chemotherapy (NAC) 3T DCE-MRI between January 2016 and October 2019 were retrospectively included in the study. 18F-FDG PET-CT and histological information obtained through lesion biopsy were also available. All patients underwent surgery and specimens were analyzed. Subjects were divided between complete responders (Pinder class 1i or 1ii) and non-complete responders to NAC. Geometric, first order or textural (higher order) radiomic features were extracted from pre-NAC MRI and feature reduction was performed. Five radiomic features were added to other available information to build predictive models of complete response to NAC using three different classifiers (logistic regression, support vector machines regression and random forest) and exploring the whole set of possible feature selections.ResultsThe study population consisted of 20 complete responders and 40 non-complete responders. Models including MRI radiomic features consistently showed better performance compared to combinations of other clinical, histological and radiological information. The AUC (ROC analysis) of predictors that did not include radiomic features reached up to 0.89, while all three classifiers gave AUC higher than 0.90 with the inclusion of radiomic information (range: 0.91-0.98).ConclusionsRadiomic features extracted from 3T DCE-MRI consistently improved predictive models of complete response to neo-adjuvant chemotherapy. However, further investigation is necessary before this information can be used for clinical decision making.


2012 ◽  
Vol 594-597 ◽  
pp. 3011-3014
Author(s):  
Jian Guang Niu ◽  
Chun Yan Gao ◽  
Xiu Qing Xing

This paper established a relatively good index system of quality cost projections. The quality cost of construction enterprise is predicted by introducing a new mathematical model — Support Vector Regression Model (SVR). SVR is one of the best methods on dealing with small samples, avoiding the defects of neural network that is easy to fall into local minimum, lower accuracy rate, and it verified Unascertained-SVR model is feasible and good accuracy by example.


2021 ◽  
Vol 11 ◽  
Author(s):  
Wei Du ◽  
Yu Wang ◽  
Dongdong Li ◽  
Xueming Xia ◽  
Qiaoyue Tan ◽  
...  

PurposeTo build and evaluate a radiomics-based nomogram that improves the predictive performance of the LVSI in cervical cancer non-invasively before the operation.MethodThis study involved 149 patients who underwent surgery with cervical cancer from February 2017 to October 2019. Radiomics features were extracted from T2 weighted imaging (T2WI). The radiomic features were selected by logistic regression with the least absolute shrinkage and selection operator (LASSO) penalty in the training cohort. Based on the selected features, support vector machine (SVM) algorithm was used to build the radiomics signature on the training cohort. Incorporating radiomics signature and clinical risk factors, the radiomics-based nomogram was developed. The sensitivity, specificity, accuracy, and area under the curve (AUC) and Receiver operating characteristic (ROC) curve were calculated to assess these models.ResultThe radiomics model performed much better than the clinical model in both training (AUCs 0.925 vs. 0.786, accuracies 87.5% vs. 70.5%, sensitivities 83.6% vs. 41.7% and specificities 90.9% vs. 94.7%) and testing (AUCs 0.911 vs. 0.706, accuracies 84.0% vs. 71.3%, sensitivities 81.1% vs. 43.4% and specificities 86.4% vs. 95.0%). The combined model based on the radiomics signature and tumor stage, tumor infiltration depth and tumor pathology yielded the best performance (training cohort, AUC = 0.943, accuracies 89.5%, sensitivities 85.4% and specificities 92.9%; testing cohort, AUC = 0.923, accuracies 84.6%, sensitivities 84.0% and specificities 85.1%).ConclusionRadiomics-based nomogram was a useful tool for predicting LVSI of cervical cancer. This would aid the selection of the optimal therapeutic strategy and clinical decision-making for individuals.


2021 ◽  
Author(s):  
Naimul M. Khan

Exploration and visualization of complex data has become an integral part of life. But there is a semantic gap between the users and the visualization scientists. The priority of the users is usability while that of the scientists is techniques. Information-Assisted Visualization (IAV) can help bridge this gap, where additional information extracted from the raw data is presented to the user in an easily interpretable way. This thesis proposes some novel machine intelligence based systems for intuitive IAV. The majority of the thesis focuses on Direct Volume Rendering, where Transfer Functions (TF) are used to color the volume data to expose structures. Existing TF design methods require manipulating complex widgets, which may be difficult for the user. We propose two novel approaches towards TF design. In the data-centric approach, we generate an organized representation of the data through clustering and provide the user with some intuitive control over the output in the cluster domain. We use Spherical Self-Organizing Maps (SS)M) as the core of this approach. Instead of manipulating complex widgets, the user interacts with the simple SSOM color-coded lattice to design the TF. In the image-centric approach, the user interaction with the data is direct and minimal. The user interactions create the training data, and supervised classification is used to generate the TF. First, we propose novel supervised classifiers that combine the local information available through Support Vector Machine-based classifiers and the global information available through Nonparametric Discriminant Analysis-based classifiers. Using these classifiers, we propose a TF design method where the user interacts with the volume slices directly to generate the output. Finally, we explore the use of IAV for home-based physical rehabilitation. We propose an information-assisted visual valuation framework which can compare a user’s performance of a physical exercise with that of an expert using our novel Incremental Dynamic Time Warping method and communicate the results visually through our color-mapped skeleton silhouette. All the proposed techniques are accompanied by detailed experimental results comparing them against the state-of-the-art. The results shows the potential of using machine learning techniques to achieve visualization tasks in a simpler yet more effective way.


2020 ◽  
Vol 21 ◽  
Author(s):  
Sukanya Panja ◽  
Sarra Rahem ◽  
Cassandra J. Chu ◽  
Antonina Mitrofanova

Background: In recent years, the availability of high throughput technologies, establishment of large molecular patient data repositories, and advancement in computing power and storage have allowed elucidation of complex mechanisms implicated in therapeutic response in cancer patients. The breadth and depth of such data, alongside experimental noise and missing values, requires a sophisticated human-machine interaction that would allow effective learning from complex data and accurate forecasting of future outcomes, ideally embedded in the core of machine learning design. Objective: In this review, we will discuss machine learning techniques utilized for modeling of treatment response in cancer, including Random Forests, support vector machines, neural networks, and linear and logistic regression. We will overview their mathematical foundations and discuss their limitations and alternative approaches all in light of their application to therapeutic response modeling in cancer. Conclusion: We hypothesize that the increase in the number of patient profiles and potential temporal monitoring of patient data will define even more complex techniques, such as deep learning and causal analysis, as central players in therapeutic response modeling.


2021 ◽  
Vol 5 (2) ◽  
Author(s):  
Alexander Knyshov ◽  
Samantha Hoang ◽  
Christiane Weirauch

Abstract Automated insect identification systems have been explored for more than two decades but have only recently started to take advantage of powerful and versatile convolutional neural networks (CNNs). While typical CNN applications still require large training image datasets with hundreds of images per taxon, pretrained CNNs recently have been shown to be highly accurate, while being trained on much smaller datasets. We here evaluate the performance of CNN-based machine learning approaches in identifying three curated species-level dorsal habitus datasets for Miridae, the plant bugs. Miridae are of economic importance, but species-level identifications are challenging and typically rely on information other than dorsal habitus (e.g., host plants, locality, genitalic structures). Each dataset contained 2–6 species and 126–246 images in total, with a mean of only 32 images per species for the most difficult dataset. We find that closely related species of plant bugs can be identified with 80–90% accuracy based on their dorsal habitus alone. The pretrained CNN performed 10–20% better than a taxon expert who had access to the same dorsal habitus images. We find that feature extraction protocols (selection and combination of blocks of CNN layers) impact identification accuracy much more than the classifying mechanism (support vector machine and deep neural network classifiers). While our network has much lower accuracy on photographs of live insects (62%), overall results confirm that a pretrained CNN can be straightforwardly adapted to collection-based images for a new taxonomic group and successfully extract relevant features to classify insect species.


Sensors ◽  
2019 ◽  
Vol 19 (2) ◽  
pp. 419 ◽  
Author(s):  
Dongdong Du ◽  
Jun Wang ◽  
Bo Wang ◽  
Luyi Zhu ◽  
Xuezhen Hong

Postharvest kiwifruit continues to ripen for a period until it reaches the optimal “eating ripe” stage. Without damaging the fruit, it is very difficult to identify the ripeness of postharvest kiwifruit by conventional means. In this study, an electronic nose (E-nose) with 10 metal oxide semiconductor (MOS) gas sensors was used to predict the ripeness of postharvest kiwifruit. Three different feature extraction methods (the max/min values, the difference values and the 70th s values) were employed to discriminate kiwifruit at different ripening times by linear discriminant analysis (LDA), and results showed that the 70th s values method had the best performance in discriminating kiwifruit at different ripening stages, obtaining a 100% original accuracy rate and a 99.4% cross-validation accuracy rate. Partial least squares regression (PLSR), support vector machine (SVM) and random forest (RF) were employed to build prediction models for overall ripeness, soluble solids content (SSC) and firmness. The regression results showed that the RF algorithm had the best performance in predicting the ripeness indexes of postharvest kiwifruit compared with PLSR and SVM, which illustrated that the E-nose data had high correlations with overall ripeness (training: R2 = 0.9928; testing: R2 = 0.9928), SSC (training: R2 = 0.9749; testing: R2 = 0.9143) and firmness (training: R2 = 0.9814; testing: R2 = 0.9290). This study demonstrated that E-nose could be a comprehensive approach to predict the ripeness of postharvest kiwifruit through aroma volatiles.


Author(s):  
M. C. Maya-Piedrahita ◽  
P. M. Herrera-Gomez ◽  
L. Berrío-Mesa ◽  
D. A. Cárdenas-Peña ◽  
A. A. Orozco-Gutierrez

As a neurodevelopmental pathology, Attention Deficit Hyperactivity Disorder (ADHD) mainly arises during childhood. Persistent patterns of generalized inattention, impulsivity, or hyperactivity characterize ADHD that may persist into adulthood. The conventional diagnosis relies on clinical observational processes yielding high rates of overdiagnosis due to varying interpretations among specialists or missing information. Although several studies have designed objective behavioral features to overcome such an issue, they lack significance. Despite electroencephalography (EEG) analyses extracting alternative biomarkers using signal processing techniques, the nonlinearity and nonstationarity of EEG signals restrain performance and generalization of hand-crafted features. This work proposes a methodology to support ADHD diagnosis by characterizing EEG signals from hidden Markov models (HMM), classifying subjects based on similarity measures for probability functions, and spatially interpreting the results using graphic embeddings of stochastic dynamic models. The methodology learns a single HMM for EEG signal from each patient, so favoring the inter-subject variability. Then, the Probability Product Kernel, specifically developed for assessing the similarity between HMMs, fed a support vector machine that classifies subjects according to their stochastic dynamics. Lastly, the kernel variant of Principal Component Analysis provided a means to visualize the EEG transitions in a two-dimensional space, evidencing dynamic differences between ADHD and Healthy Control children. From the electrophysiological perspective, we recorded EEG under the Stop Signal Task modified with reward levels, which considers cognitive features of interest as insufficient motivational circuits recruitment. The methodology compares the supported diagnosis in two EEG channel setups (whole channel set and channels of interest in frontocentral area) and four frequency bands (Theta, Alpha, Beta rhythms, and a wideband). Results evidence an accuracy rate of 97.0% in the Beta band and in the channels where previous works found error-related negativity events. Such accuracy rate strongly supports the dual pathway hypothesis and motivational deficit concerning the pathophysiology of ADHD. It also demonstrates the utility of joining inhibitory and motivational paradigms with dynamic EEG analysis into a noninvasive and affordable diagnostic tool for ADHD patients.


2018 ◽  
Vol 7 (2.8) ◽  
pp. 684 ◽  
Author(s):  
V V. Ramalingam ◽  
Ayantan Dandapath ◽  
M Karthik Raja

Heart related diseases or Cardiovascular Diseases (CVDs) are the main reason for a huge number of death in the world over the last few decades and has emerged as the most life-threatening disease, not only in India but in the whole world. So, there is a need of reliable, accurate and feasible system to diagnose such diseases in time for proper treatment. Machine Learning algorithms and techniques have been applied to various medical datasets to automate the analysis of large and complex data. Many researchers, in recent times, have been using several machine learning techniques to help the health care industry and the professionals in the diagnosis of heart related diseases. This paper presents a survey of various models based on such algorithms and techniques andanalyze their performance. Models based on supervised learning algorithms such as Support Vector Machines (SVM), K-Nearest Neighbour (KNN), NaïveBayes, Decision Trees (DT), Random Forest (RF) and ensemble models are found very popular among the researchers.


Sign in / Sign up

Export Citation Format

Share Document