Identifying Individualized Risk Profiles for Radiotherapy-Induced Lymphopenia Among Patients With Esophageal Cancer Using Machine Learning

2021 ◽  
pp. 1044-1053
Author(s):  
Cong Zhu ◽  
Radhe Mohan ◽  
Steven H. Lin ◽  
Goo Jun ◽  
Ashraf Yaseen ◽  
...  

PURPOSE Radiotherapy (RT)-induced lymphopenia (RIL) is commonly associated with adverse clinical outcomes in patients with cancer. Using machine learning techniques, a retrospective study was conducted for patients with esophageal cancer treated with proton and photon therapies to characterize the principal pretreatment clinical and radiation dosimetric risk factors of grade 4 RIL (G4RIL) as well as to establish G4RIL risk profiles. METHODS A single-institution retrospective data of 746 patients with esophageal cancer treated with photons (n = 500) and protons (n = 246) was reviewed. The primary end point of our study was G4RIL. Clustering techniques were applied to identify patient subpopulations with similar pretreatment clinical and radiation dosimetric characteristics. XGBoost was built on a training set (n = 499) to predict G4RIL risks. Predictive performance was assessed on the remaining n = 247 patients. SHapley Additive exPlanations were used to rank the importance of individual predictors. Counterfactual analyses compared patients' risk profiles assuming that they had switched modalities. RESULTS Baseline absolute lymphocyte count and volumes of lung and spleen receiving ≥ 15 and ≥ 5 Gy, respectively, were the most important G4RIL risk determinants. The model achieved sensitivitytesting-set 0.798 and specificitytesting-set 0.667 with an area under the receiver operating characteristics curve (AUCtesting-set) of 0.783. The G4RIL risk for an average patient receiving protons increased by 19% had the patient switched to photons. Reductions in G4RIL risk were maximized with proton therapy for patients with older age, lower baseline absolute lymphocyte count, and higher lung and heart dose. CONCLUSION G4RIL risk varies for individual patients with esophageal cancer and is modulated by radiotherapy dosimetric parameters. The framework for machine learning presented can be applied broadly to study risk determinants of other adverse events, providing the basis for adapting treatment strategies for mitigation.

The online discussion forums and blogs are very vibrant platforms for cancer patients to express their views in the form of stories. These stories sometimes become a source of inspiration for some patients who are anxious in searching the similar cases. This paper proposes a method using natural language processing and machine learning to analyze unstructured texts accumulated from patient’s reviews and stories. The proposed methodology aims to identify behavior, emotions, side-effects, decisions and demographics associated with the cancer victims. The pre-processing phase of our work involves extraction of web text followed by text-cleaning where some special characters and symbols are omitted, and finally tagging the texts using NLTK’s (Natural Language Toolkit) POS (Parts of Speech) Tagger. The post-processing phase performs training of seven machine learning classifiers (refer Table 6). The Decision Tree classifier shows the higher precision (0.83) among the other classifiers while, the Area under the operating Characteristics (AUC) for Support Vector Machine (SVM) classifier is highest (0.98).


2020 ◽  
Author(s):  
Serge Assaad ◽  
Lawrence Carin ◽  
Anthony Joseph Viera

Abstract Background: Researchers and consumers have limited options for objectively collecting or tracking data related to food choices. Objective: To develop and pilot test an algorithm that could accurately categorize food items from a meal photograph. Methods: We used a dataset of 7721 meal photographs taken by patrons in a cafeteria setting. We designed 22 broad categories recognizable by image that are parents of the original 1239 types of items in the photographs. We split the dataset into 3 mutually exclusive subsets: a training set (5250 images), a validation set (1312 images), and a test set (1159 images). Using a convolutional neural network and standard machine learning techniques, we tested the operating characteristics of the algorithm. Results: Salad recognition had the lowest specificity (0.74), while multiple categories had specificities close to 1.0 (e.g. cereals, pastries, sushi, yogurt). Areas under the ROC curve (AUCs), reflecting trade-offs between sensitivity and specificity, ranged from 0.73 (for yogurt) to 0.97 (for sushi). Conclusions: This work provides proof-of-concept for an algorithm that can categorize food items from a meal photograph.


2019 ◽  
Vol 37 (4_suppl) ◽  
pp. 147-147 ◽  
Author(s):  
David M. Routman ◽  
Thomas J Whitaker ◽  
Courtney N. Day ◽  
William S. Harmsen ◽  
Michelle A. Neben-Wittich ◽  
...  

147 Background: Lymphopenia during radiation therapy (RT) has been associated with worse oncologic outcomes in a number of malignancies, including esophageal cancer (EC). No studies to date have investigated specific dosimetric parameters associated with this lymphopenia in EC. We performed an analysis of RT dose to multiple organs at risk (OARs) to investigate associations with grade 4 lymphopenia (G4L). Methods: Consecutive EC patients receiving curative intent chemoradiotherapy +/- surgery between July of 2015 and December of 2017 were included. Lymphocyte nadir was defined as the lowest lymphocyte count during RT. G4L was defined as absolute lymphocyte count <200/mm3. Dose to OARs including aorta, body, bone marrow, heart, liver, lung, and spleen were calculated. Univariate logistic regression analyses were performed for each OAR at the 1, 5, 10, 15, 20, 30, 35, 40, and 50 Gy levels with volume receiving dose ‘x’(VxGy) analyzed as a continuous variable per 10% increase. Clinical tumor volume (CTV) and RT modality (photon vs. proton) as well clinical factors including sex, stage (I/II vs. III/IV), age (per 10 year increase), and BMI (per 5 unit increase) were also analyzed. Results: One hundred forty-four pts were identified for inclusion. Seventy-nine pts received photon RT and 65 proton RT. Chemotherapy was weekly carbotaxol (99%). G4L at nadir was 40% overall (56% photon, 22% proton). By organ, body V1-V30Gy (OR 1.45-8.18, p<0.01), heart V1-V30Gy (OR 1.24-1.49, p<0.01), liver V1-V35Gy (OR 1.23-2.75, p<0.01), lung V1-V30Gy (OR 1.26-5.73 p<0.01), and spleen V1-V40Gy (OR 1.26-1.49 p<0.01) were highly associated with G4L whereas dose to aorta and bone marrow were not. Advanced stage (OR, 3.92 p<0.01), photon vs. proton (OR 4.58 p<0.01), and CTV (per 100 cc’s (OR=1.21, p<0.01)) were also associated with G4L. Sex, age, and BMI were not associated with G4L. Conclusions: Low to intermediate dose volumes to OARs including body, spleen, liver, lungs, and heart were associated with G4L. These findings provide rational for the differences seen in rates of G4L for photon versus proton RT.


2021 ◽  
pp. 1-29
Author(s):  
Ahmed Alsaihati ◽  
Mahmoud Abughaban ◽  
Salaheldin Elkatatny ◽  
Abdulazeez Abdulraheem

Abstract Fluid loss into formations is a common operational issue that is frequently encountered when drilling across naturally or induced fractured formations. This could pose significant operational risks, such as well-control, stuck pipe, and wellbore instability, which, in turn, lead to an increase of well time and cost. This research aims to use and evaluate different machine learning techniques, namely: support vector machines, random forests, and K-nearest neighbors in detecting loss circulation occurrences while drilling using solely drilling surface parameters. Actual field data of seven wells, which had suffered partial or severe loss circulation, were used to build predictive models, while Well-8 was used to compare the performance of the developed models. Different performance metrics were used to evaluate the performance of the developed models. Recall, precision, and F1-score measures were used to evaluate the ability of the developed model to detect loss circulation occurrences. The results showed the K-nearest neighbors classifier achieved a high F1-score of 0.912 in detecting loss circulation occurrence in the testing set, while the random forests was the second-best classifier with almost the same F1-score of 0.910. The support vector machines achieved an F1-score of 0.83 in predicting the loss circulation occurrence in the testing set. The K-nearest neighbors outperformed other models in detecting the loss circulation occurrences in Well-8 with an F1-score of 0.80. The main contribution of this research as compared to previous studies is that it identifies losses events based on real-time measurements of the active pit volume.


2019 ◽  
Vol 11 (23) ◽  
pp. 6669 ◽  
Author(s):  
Raghu Garg ◽  
Himanshu Aggarwal ◽  
Piera Centobelli ◽  
Roberto Cerchione

At present, due to the unavailability of natural resources, society should take the maximum advantage of data, information, and knowledge to achieve sustainability goals. In today’s world condition, the existence of humans is not possible without the essential proliferation of plants. In the photosynthesis procedure, plants use solar energy to convert into chemical energy. This process is responsible for all life on earth, and the main controlling factor for proper plant growth is soil since it holds water, air, and all essential nutrients of plant nourishment. Though, due to overexposure, soil gets despoiled, so fertilizer is an essential component to hold the soil quality. In that regard, soil analysis is a suitable method to determine soil quality. Soil analysis examines the soil in laboratories and generates reports of unorganized and insignificant data. In this study, different big data analysis machine learning methods are used to extracting knowledge from data to find out fertilizer recommendation classes on behalf of present soil nutrition composition. For this experiment, soil analysis reports are collected from the Tata soil and water testing center. In this paper, Mahoot library is used for analysis of stochastic gradient descent (SGD), artificial neural network (ANN) performance on Hadoop environment. For better performance evaluation, we also used single machine experiments for random forest (RF), K-nearest neighbors K-NN, regression tree (RT), support vector machine (SVM) using polynomial function, SVM using radial basis function (RBF) methods. Detailed experimental analysis was carried out using overall accuracy, AUC–ROC (receiver operating characteristics (ROC), and area under the ROC curve (AUC)) curve, mean absolute prediction error (MAE), root mean square error (RMSE), and coefficient of determination (R2) validation measurements on soil reports dataset. The results provide a comparison of solution classes and conclude that the SGD outperforms other approaches. Finally, the proposed results support to select the solution or recommend a class which suggests suitable fertilizer to crops for maximum production.


Blood ◽  
2021 ◽  
Vol 138 (Supplement 1) ◽  
pp. 1023-1023
Author(s):  
Taylor Olmsted Kim ◽  
Derek MacMath ◽  
Rowland W Pettit ◽  
Susan E Kirk ◽  
Amanda B Grimes ◽  
...  

Abstract Introduction: Pediatric immune thrombocytopenia (ITP) is the most common acquired bleeding disorder of childhood with 3-4,000 new cases annually. While most children experience self-resolving disease, 25% go on to have chronic ITP (cITP) with thrombocytopenia persisting beyond one year. Given the likelihood of spontaneous resolution and potential side effects of initial treatments, standard management for patients without severe or life-threatening bleeding is often observation. However, if it were possible to predict development of chronic disease, earlier initiation of long-term therapies could minimize bleeding risk, reduce fatigue, ease activity restrictions, and mitigate the poorer health-related quality of life that characterizes cITP. Though associations between variables and cITP have been reported, none are strong enough alone to dictate clinical decisions regarding ITP management. Machine learning (ML) is a set of statistical tools that can be utilized to make predictions or cluster data from large datasets. ML models are well suited to complex patterns within clinical datasets. ML can incorporate greater numbers of variables than could be used with traditional statistical models and can assess the impact of variables contingent upon the state of multiple other variables. In this study, we used a large clinical dataset to test a series of ML models on their ability to predict cITP development. Methods: Our group identified 696 pediatric ITP patients cared for at Texas Children's Hematology Center (TXCH) from 2012 to 2020. Of these, 332 had confirmed acute ITP (self-resolved disease in &lt;1 year), and 253 were diagnosed with cITP. Demographic information, presenting clinical features, and laboratory data drawn within 1 month of diagnosis were tabulated for this cohort. Variables included age, gender, race, ethnicity, presence of primary ITP (defined as ITP that is not caused by another underlying disorder), presenting platelet count, absolute leukocyte count, absolute lymphocyte count, absolute eosinophil count, immature platelet fraction (IPF), mean platelet volume (MPV), direct antiglobulin test (DAT), anti-nuclear antibody (ANA) titer, and immunoglobulin levels. We tested the capabilities of several ML methods in predicting cITP using these presenting clinical and laboratory parameters. We performed a 10-fold cross validation to compare average performance metrics of a 100 tree random forest method against logistic ridge regression, support vector machine (SVM), naïve bayes, and AdaBoost methods. We tested feature importance of clinical variables with relation to cITP using the Gini index. Cross-validated ML method performance was compared using the area under the curve (AUC) receiver operator curve (ROC), as well as F1 statistic, classification accuracy (CA), precision or positive predictive value, and recall or sensitivity. Analyses was performed using Orange v2.7 (https://orangedatamining.com). Results: The top five most informative clinical features by Gini index were primary ITP, MPV, IPF, absolute lymphocyte count, and ANA titer. Comparing our five ML methods after 10-fold cross validation, the 100 tree random forest model was the top performing method on average (AUC = 0.795, CA = 0.737, F1=0.734, Precision = 0.738, Recall = 0.737). With an AUC of approximately 0.8, there is an 80% chance the model will accurately distinguish cITP from aITP. A close second performing method was the naïve bayes (AUC 0.792, CA = 0.698, F1 = 0.671, Precision = 0.737, Recall = 0.698). We present the average cross validated AUC ROC curves and the full ML method test statistics in Figure 1. Conclusions: Clinical and laboratory features present at the time of initial ITP diagnosis can be utilized to predict the development of cITP in pediatric patients using ML models. Ensemble decision tree methods are promising candidates for further ML method refinement, as AUC ROC of predicting cITP with a 100 tree RF model is &gt; 0.7. Our group is expanding this model through incorporation of genotyping data from both acute and cITP patients. Ultimately, these ML models, in the form of an online tool, could be applied to predict cITP, allowing providers to initiate upfront interventions for those ITP patients who are unlikely to experience spontaneous disease resolution. Figure 1 Figure 1. Disclosures Kirk: Biomarin: Honoraria. Powers: American Regent: Research Funding. Despotovic: Agios: Consultancy; Apellis: Consultancy; UpToDate: Patents & Royalties: Royalties; Novartis: Consultancy, Research Funding.


Circulation ◽  
2018 ◽  
Vol 138 (Suppl_2) ◽  
Author(s):  
Tomohisa Seki ◽  
Tomoyoshi Tamura ◽  
Masaru Suzuki

Introduction and Objective: Early prognostication for cardiogenic out-of-hospital cardiac arrest (OHCA) patients remain challenging. Recently, advanced machine learning techniques have been employed for clinical diagnosis and prognostication for various conditions. Therefore, in this study, we attempted to establish a prognostication model for cardiogenic OHCA using an advanced machine learning technique. Methods and Results: Data of a prospective multi-center cohort study of OHCA patients transported by an ambulance to 67 medical institutions in Kanto area of Japan between January 2012 and March 2013 was used in this study. Data for cardiogenic OHCA patients aged ≥18 years were retrieved and patients were grouped according to the time of calls for ambulances (training set: between January 1, 2012 and December 12, 2012; test set: between January 1, 2013 and March 31, 2013). From among 421 variables observed during the period between calls for ambulances and initial in-hospital treatments of cardiogenic OHCA, 38 prehospital factors or 56 prehospital factors and initial in-hospital factors were used for prognostication, respectively. Prognostication models for 1-year survival were established with random forest method, an advanced machine learning method that aggregates a series of decision trees for classification and regression. After 10-fold internal cross validation in the training set, prognostication models were validated using test set. Area under the receiver operating characteristics curve (AUC) was used to evaluate the prediction performance of models. Prognostication models trained with 38 variables or 56 variables for 1-year survival showed AUC values of 0.93±0.01 and 0.95±0.01, respectively. Conclusions: Prognostication models trained with advanced machine learning technique showed favorable prediction capability for 1-year survival of cardiogenic OHCA. These results indicate that an advanced machine learning technique can be applicable to establish early prognostication model for cardiogenic OHCA.


2019 ◽  
Vol 9 (9) ◽  
pp. 231 ◽  
Author(s):  
Attallah ◽  
Sharkas ◽  
Gadelkarim

Magnetic resonance imaging (MRI) is a common imaging technique used extensively to study human brain activities. Recently, it has been used for scanning the fetal brain. Amongst 1000 pregnant women, 3 of them have fetuses with brain abnormality. Hence, the primary detection and classification are important. Machine learning techniques have a large potential in aiding the early detection of these abnormalities, which correspondingly could enhance the diagnosis process and follow up plans. Most research focused on the classification of abnormal brains in a primary age has been for newborns and premature infants, with fewer studies focusing on images for fetuses. These studies associated fetal scans to scans after birth for the detection and classification of brain defects early in the neonatal age. This type of brain abnormality is named small for gestational age (SGA). This article proposes a novel framework for the classification of fetal brains at an early age (before the fetus is born). As far as we could know, this is the first study to classify brain abnormalities of fetuses of widespread gestational ages (GAs). The study incorporates several machine learning classifiers, such as diagonal quadratic discriminates analysis (DQDA), K-nearest neighbour (K-NN), random forest, naïve Bayes, and radial basis function (RBF) neural network classifiers. Moreover, several bagging and Adaboosting ensembles models have been constructed using random forest, naïve Bayes, and RBF network classifiers. The performances of these ensembles have been compared with their individual models. Our results show that our novel approach can successfully identify and classify numerous types of defects within MRI images of the fetal brain of various GAs. Using the KNN classifier, we were able to achieve the highest classification accuracy and area under receiving operating characteristics of 95.6% and 99% respectively. In addition, ensemble classifiers improved the results of their respective individual models.


Diagnostics ◽  
2021 ◽  
Vol 11 (10) ◽  
pp. 1933
Author(s):  
Boris Malyugin ◽  
Sergej Sakhnov ◽  
Svetlana Izmailova ◽  
Ernest Boiko ◽  
Nadezhda Pozdeyeva ◽  
...  

The accurate diagnosis of keratoconus, especially in its early stages of development, allows one to utilise timely and proper treatment strategies for slowing the progression of the disease and provide visual rehabilitation. Various keratometry indices and classifications for quantifying the severity of keratoconus have been developed. Today, many of them involve the use of the latest methods of computer processing and data analysis. The main purpose of this work was to develop a machine-learning-based algorithm to precisely determine the stage of keratoconus, allowing optimal management of patients with this disease. A multicentre retrospective study was carried out to obtain a database of patients with keratoconus and to use machine-learning techniques such as principal component analysis and clustering. The created program allows for us to distinguish between a normal state; preclinical keratoconus; and stages 1, 2, 3 and 4 of the disease, with an accuracy in terms of the AUC of 0.95 to 1.00 based on keratotopographer readings, relative to the adapted Amsler–Krumeich algorithm. The predicted stage and additional diagnostic criteria were then used to create a standardised keratoconus management algorithm. We also developed a web-based interface for the algorithm, providing us the opportunity to use the software in a clinical environment.


Sign in / Sign up

Export Citation Format

Share Document