Beyond the In-Practice CBC: The Research CBC Parameters-Driven Machine Learning Predictive Modeling for Early Differentiation among Leukemias

A targeted and timely treatment can be a beneficial tool for patients with hematological emergencies (particularly acute leukemias). The key challenges in the early diagnosis of leukemias and related hematological disorders are their symptom-sharing nature and prolonged turnaround time as well as the expertise needed in reporting confirmatory tests. The present study made use of the potential morphological and immature fraction-related parameters (research items or cell population data) generated during complete blood cell count (CBC), through artificial intelligence (AI)/machine learning (ML) predictive modeling for early (at the pre-microscopic level) differentiation of various types of leukemias: acute from chronic as well as myeloid from lymphoid. The routine CBC parameters along with research CBC items from a hematology analyzer in the diagnosis of 1577 study subjects with hematological neoplasms were collected. The statistical and data visualization tools, including heat-map and principal component analysis (PCA,) helped in the evaluation of the predictive capacity of research CBC items. Next, research CBC parameter-driven artificial neural network (ANN) predictive modeling was developed to use the hidden trend (disease’s signature) by increasing the auguring accuracy of these potential morphometric parameters in differentiation of leukemias. The classical statistics for routine and research CBC parameters showed that as a whole, all study items are significantly deviated among various types of leukemias (study groups). The CPD parameter-driven heat-map gave clustering (separation) of myeloid from lymphoid leukemias, followed by the segregation (nodding) of the acute from the chronic class of that particular lineage. Furthermore, acute promyelocytic leukemia (APML) was also well individuated from other types of acute myeloid leukemia (AML). The PCA plot guided by research CBC items at notable variance vindicated the aforementioned findings of the CPD-driven heat-map. Through training of ANN predictive modeling, the CPD parameters successfully differentiate the chronic myeloid leukemia (CML), AML, APML, acute lymphoid leukemia (ALL), chronic lymphoid leukemia (CLL), and other related hematological neoplasms with AUC values of 0.937, 0.905, 0.805, 0.829, 0.870, and 0.789, respectively, at an agreeably significant (10.6%) false prediction rate. Overall practical results of using our ANN model were found quite satisfactory with values of 83.1% and 89.4.7% for training and testing datasets, respectively. We proposed that research CBC parameters could potentially be used for early differentiation of leukemias in the hematology–oncology unit. The CPD-driven ANN modeling is a novel practice that substantially strengthens the predictive potential of CPD items, allowing the clinicians to be confident about the typical trend of the “disease fingerprint” shown by these automated potential morphometric items.

Download Full-text

Abstract LB-246: MicroRNA profiling can classify acute leukemias of ambiguous lineage as either acute myeloid leukemia or acute lymphoid leukemia.

10.1158/1538-7445.am2013-lb-246 ◽

2013 ◽

Author(s):

Dave C. de Leeuw ◽

Willemijn van den Ancker ◽

Fedor Denkers ◽

Rene X. Menezes ◽

Theresia M. Westers ◽

...

Keyword(s):

Acute Myeloid Leukemia ◽

Myeloid Leukemia ◽

Lymphoid Leukemia ◽

Acute Leukemias ◽

Acute Lymphoid Leukemia ◽

Microrna Profiling ◽

Acute Myeloid

Download Full-text

MicroRNA Profiling Can Classify Acute Leukemias of Ambiguous Lineage as Either Acute Myeloid Leukemia or Acute Lymphoid Leukemia

Clinical Cancer Research ◽

10.1158/1078-0432.ccr-12-3657 ◽

2013 ◽

Vol 19 (8) ◽

pp. 2187-2196 ◽

Cited By ~ 26

Author(s):

David C. de Leeuw ◽

Willemijn van den Ancker ◽

Fedor Denkers ◽

Renée X. de Menezes ◽

Theresia M. Westers ◽

...

Keyword(s):

Acute Myeloid Leukemia ◽

Myeloid Leukemia ◽

Lymphoid Leukemia ◽

Acute Leukemias ◽

Acute Lymphoid Leukemia ◽

Microrna Profiling ◽

Acute Myeloid

Download Full-text

Sequential Azacitidine and Lenalidomide for Patients with Relapsed and Refractory Acute Myeloid Leukemia: Clinical Results and Predictive Modeling Using Computational Analysis

SSRN Electronic Journal ◽

10.2139/ssrn.3307702 ◽

2018 ◽

Author(s):

Brett Stevens ◽

Amanda Winters ◽

Jonathan A. Gutman ◽

Aaron Fullerton ◽

Gregory Hemenway ◽

...

Keyword(s):

Acute Myeloid Leukemia ◽

Predictive Modeling ◽

Myeloid Leukemia ◽

Computational Analysis ◽

Clinical Results ◽

Refractory Acute Myeloid Leukemia ◽

Acute Myeloid

Download Full-text

AB0652 MACHINE LEARNING TO PREDICT EARLY TNF INHIBITOR USERS IN PATIENTS WITH ANKYLOSING SPONDYLITIS

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-eular.3743 ◽

2020 ◽

Vol 79 (Suppl 1) ◽

pp. 1620.1-1621

Author(s):

J. Lee ◽

H. Kim ◽

S. Y. Kang ◽

S. Lee ◽

Y. H. Eun ◽

...

Keyword(s):

Machine Learning ◽

Ankylosing Spondylitis ◽

Tnf Inhibitors ◽

Tnf Inhibitor ◽

Ann Model ◽

Learning Models ◽

Feature Importance ◽

Importance Analysis ◽

Baseline Characteristics ◽

Machine Learning Models

Background:Tumor necrosis factor (TNF) inhibitors are important drugs in treating patients with ankylosing spondylitis (AS). However, they are not used as a first-line treatment for AS. There is an insufficient treatment response to the first-line treatment, non-steroidal anti-inflammatory drugs (NSAIDs), in over 40% of patients. If we can predict who will need TNF inhibitors at an earlier phase, adequate treatment can be provided at an appropriate time and potential damages can be avoided. There is no precise predictive model at present. Recently, various machine learning methods show great performances in predictions using clinical data.Objectives:We aim to generate an artificial neural network (ANN) model to predict early TNF inhibitor users in patients with ankylosing spondylitis.Methods:The baseline demographic and laboratory data of patients who visited Samsung Medical Center rheumatology clinic from Dec. 2003 to Sep. 2018 were analyzed. Patients were divided into two groups: early TNF inhibitor users treated by TNF inhibitors within six months of their follow-up (early-TNF users), and the others (non-early-TNF users). Machine learning models were formulated to predict the early-TNF users using the baseline data. Additionally, feature importance analysis was performed to delineate significant baseline characteristics.Results:The numbers of early-TNF and non-early-TNF users were 90 and 509, respectively. The best performing ANN model utilized 3 hidden layers with 50 hidden nodes each; its performance (area under curve (AUC) = 0.75) was superior to logistic regression model, support vector machine, and random forest model (AUC = 0.72, 0.65, and 0.71, respectively) in predicting early-TNF users. Feature importance analysis revealed erythrocyte sedimentation rate (ESR), C-reactive protein (CRP), and height as the top significant baseline characteristics for predicting early-TNF users. Among these characteristics, height was revealed by machine learning models but not by conventional statistical techniques.Conclusion:Our model displayed superior performance in predicting early TNF users compared with logistic regression and other machine learning models. Machine learning can be a vital tool in predicting treatment response in various rheumatologic diseases.Disclosure of Interests:None declared

Download Full-text

Using Predictive Modeling and Supervised Machine Learning to Identify Patients at Risk for Venous Thromboembolism Following Posterior Lumbar Fusion

Global Spine Journal ◽

10.1177/21925682211019361 ◽

2021 ◽

pp. 219256822110193

Author(s):

Kevin Y. Wang ◽

Ijezie Ikwuezunma ◽

Varun Puvanesarajah ◽

Jacob Babu ◽

Adam Margalit ◽

...

Keyword(s):

Machine Learning ◽

At Risk ◽

Venous Thromboembolism ◽

Risk Stratification ◽

Predictive Model ◽

Predictive Modeling ◽

Lumbar Fusion ◽

Posterior Lumbar Fusion ◽

Single Level ◽

Patients At Risk

Study Design: Retrospective review. Objective: To use predictive modeling and machine learning to identify patients at risk for venous thromboembolism (VTE) following posterior lumbar fusion (PLF) for degenerative spinal pathology. Methods: Patients undergoing single-level PLF in the inpatient setting were identified in the National Surgical Quality Improvement Program database. Our outcome measure of VTE included all patients who experienced a pulmonary embolism and/or deep venous thrombosis within 30-days of surgery. Two different methodologies were used to identify VTE risk: 1) a novel predictive model derived from multivariable logistic regression of significant risk factors, and 2) a tree-based extreme gradient boosting (XGBoost) algorithm using preoperative variables. The methods were compared against legacy risk-stratification measures: ASA and Charlson Comorbidity Index (CCI) using area-under-the-curve (AUC) statistic. Results: 13, 500 patients who underwent single-level PLF met the study criteria. Of these, 0.95% had a VTE within 30-days of surgery. The 5 clinical variables found to be significant in the multivariable predictive model were: age > 65, obesity grade II or above, coronary artery disease, functional status, and prolonged operative time. The predictive model exhibited an AUC of 0.716, which was significantly higher than the AUCs of ASA and CCI (all, P < 0.001), and comparable to that of the XGBoost algorithm ( P > 0.05). Conclusion: Predictive analytics and machine learning can be leveraged to aid in identification of patients at risk of VTE following PLF. Surgeons and perioperative teams may find these tools useful to augment clinical decision making risk stratification tool.

Download Full-text

Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks

Nature Communications ◽

10.1038/s41467-021-23103-1 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Abdulkadir Canatar ◽

Blake Bordelon ◽

Cengiz Pehlevan

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Kernel Regression ◽

Learning Task ◽

Learning Curves ◽

Generalization Error ◽

Theoretical Understanding ◽

Classical Statistics ◽

Deep Networks ◽

Model Alignment

AbstractA theoretical understanding of generalization remains an open problem for many machine learning models, including deep networks where overparameterization leads to better performance, contradicting the conventional wisdom from classical statistics. Here, we investigate generalization error for kernel regression, which, besides being a popular machine learning method, also describes certain infinitely overparameterized neural networks. We use techniques from statistical mechanics to derive an analytical expression for generalization error applicable to any kernel and data distribution. We present applications of our theory to real and synthetic datasets, and for many kernels including those that arise from training deep networks in the infinite-width limit. We elucidate an inductive bias of kernel regression to explain data with simple functions, characterize whether a kernel is compatible with a learning task, and show that more data may impair generalization when noisy or not expressible by the kernel, leading to non-monotonic learning curves with possibly many peaks.

Download Full-text

The P190, P210, and P230 Forms of the BCR/ABL Oncogene Induce a Similar Chronic Myeloid Leukemia–like Syndrome in Mice but Have Different Lymphoid Leukemogenic Activity

Journal of Experimental Medicine ◽

10.1084/jem.189.9.1399 ◽

1999 ◽

Vol 189 (9) ◽

pp. 1399-1412 ◽

Cited By ~ 346

Author(s):

Shaoguang Li ◽

Robert L. Ilaria ◽

Ryan P. Million ◽

George Q. Daley ◽

Richard A. Van Etten

Keyword(s):

Chronic Myeloid Leukemia ◽

Tyrosine Kinase ◽

Target Cell ◽

Myeloid Leukemia ◽

Kinase Activity ◽

Philadelphia Chromosome ◽

Lymphoid Cells ◽

Tyrosine Kinase Activity ◽

Lymphoid Leukemia ◽

B Lymphoid

The product of the Philadelphia chromosome (Ph) translocation, the BCR/ABL oncogene, exists in three principal forms (P190, P210, and P230 BCR/ABL) that are found in distinct forms of Ph-positive leukemia, suggesting the three proteins have different leukemogenic activity. We have directly compared the tyrosine kinase activity, in vitro transformation properties, and in vivo leukemogenic activity of the P190, P210, and P230 forms of BCR/ABL. P230 exhibited lower intrinsic tyrosine kinase activity than P210 and P190. Although all three oncogenes transformed both myeloid (32D cl3) and lymphoid (Ba/F3) interleukin (IL)-3–dependent cell lines to become independent of IL-3 for survival and growth, their ability to stimulate proliferation of Ba/F3 lymphoid cells differed and correlated directly with tyrosine kinase activity. In a murine bone marrow transduction/transplantation model, the three forms of BCR/ABL were equally potent in the induction of a chronic myeloid leukemia (CML)–like myeloproliferative syndrome in recipient mice when 5-fluorouracil (5-FU)–treated donors were used. Analysis of proviral integration showed the CML-like disease to be polyclonal and to involve multiple myeloid and B lymphoid lineages, implicating a primitive multipotential target cell. Secondary transplantation revealed that only certain minor clones gave rise to day 12 spleen colonies and induced disease in secondary recipients, suggesting heterogeneity among the target cell population. In contrast, when marrow from non– 5-FU–treated donors was used, a mixture of CML-like disease, B lymphoid acute leukemia, and macrophage tumors was observed in recipients. P190 BCR/ABL induced lymphoid leukemia with shorter latency than P210 or P230. The lymphoid leukemias and macrophage tumors had provirus integration patterns that were oligo- or monoclonal and limited to the tumor cells, suggesting a lineage-restricted target cell with a requirement for additional events in addition to BCR/ABL transduction for full malignant transformation. These results do not support the hypothesis that P230 BCR/ABL induces a distinct and less aggressive form of CML in humans, and suggest that the rarity of P190 BCR/ABL in human CML may reflect infrequent BCR intron 1 breakpoints during the genesis of the Ph chromosome in stem cells, rather than intrinsic differences in myeloid leukemogenicity between P190 and P210.

Download Full-text

Combustion Tuning for a Gas Turbine Power Plant Using Data-Driven and Machine Learning Approach

Journal of Engineering for Gas Turbines and Power ◽

10.1115/1.4050020 ◽

2021 ◽

Vol 143 (3) ◽

Author(s):

Suhui Li ◽

Huaxin Zhu ◽

Min Zhu ◽

Gang Zhao ◽

Xiaofeng Wei

Keyword(s):

Machine Learning ◽

Power Plant ◽

Gas Turbine ◽

Operating Conditions ◽

Data Driven ◽

Ann Model ◽

Promising Alternative ◽

Combustion Performance ◽

Wide Range ◽

Gas Turbine Power Plant

Abstract Conventional physics-based or experimental-based approaches for gas turbine combustion tuning are time consuming and cost intensive. Recent advances in data analytics provide an alternative method. In this paper, we present a cross-disciplinary study on the combustion tuning of an F-class gas turbine that combines machine learning with physics understanding. An artificial-neural-network-based (ANN) model is developed to predict the combustion performance (outputs), including NOx emissions, combustion dynamics, combustor vibrational acceleration, and turbine exhaust temperature. The inputs of the ANN model are identified by analyzing the key operating variables that impact the combustion performance, such as the pilot and the premixed fuel flow, and the inlet guide vane angle. The ANN model is trained by field data from an F-class gas turbine power plant. The trained model is able to describe the combustion performance at an acceptable accuracy in a wide range of operating conditions. In combination with the genetic algorithm, the model is applied to optimize the combustion performance of the gas turbine. Results demonstrate that the data-driven method offers a promising alternative for combustion tuning at a low cost and fast turn-around.

Download Full-text

Machine learning-based failure mode identification of RCSPSW

IABSE Congress, Christchurch 2021: Resilient technologies for sustainable infrastructure ◽

10.2749/christchurch.2021.1150 ◽

2021 ◽

Author(s):

Dongqi Jiang ◽

Shanquan Liu ◽

Tao Chen ◽

Gang Bi

Keyword(s):

Machine Learning ◽

Failure Mode ◽

Failure Modes ◽

Shear Failure ◽

Tall Buildings ◽

Shear Walls ◽

Superior Performance ◽

Ann Model ◽

Mode Recognition ◽

Wide Range

<p>Reinforced concrete – steel plate composite shear walls (RCSPSW) have attracted great interests in the construction of tall buildings. From the perspective of life-cycle maintenance, the failure mode recognition is critical in determining the post-earthquake recovery strategies. This paper presents a comprehensive study on a wide range of existing experimental tests and develops a unique library of 17 parameters that affects RCSPSW’s failure modes. A total of 127 specimens are compiled and three types of failure modes are considered: flexure, shear and flexure-shear failure modes. Various machine learning (ML) techniques such as decision trees, random forests (RF), <i>K</i>-nearest neighbours and artificial neural network (ANN) are adopted to identify the failure mode of RCSPSW. RF and ANN algorithm show superior performance as compared to other ML approaches. In Particular, ANN model with one hidden layer and 10 neurons is sufficient for failure mode recognition of RCSPSW.</p>

Download Full-text

Machine Learning and Artificial Intelligence Provides Wolfcamp Completion Design Insight

10.2118/204199-ms ◽

2021 ◽

Author(s):

Robert Shelley ◽

Oladapo Oduba ◽

Howard Melcher

Keyword(s):

Machine Learning ◽

Water Saturation ◽

Saturation Pressure ◽

Ann Model ◽

Permian Basin ◽

Treatment Rate ◽

Feed Forward Neural Network ◽

Well Production ◽

Reservoir Type ◽

The Impact

Abstract The subject of this paper is the application of a unique machine learning approach to the evaluation of Wolfcamp B completions. A database consisting of Reservoir, Completion, Frac and Production information from 301 Multi-Fractured Horizontal Wolfcamp B Completions was assembled. These completions were from a 10-County area located in the Texas portion of the Permian Basin. Within this database there is a wide variation in completion design from many operators; lateral lengths ranging from a low of about 4,000 ft to a high of almost 15,000 ft, proppant intensities from 500 to 4,000 lb/ft and frac stage spacing from 59 to 769 ft. Two independent self-organizing data mappings (SOM) were performed; the first on completion and frac stage parameters, the second on reservoir and geology. Characteristics for wells assigned to each SOM bin were determined. These two mappings were then combined into a reservoir type vs completion type matrix. This type of approach is intended to remove systemactic errors in measuement, bias and inconsistencies in the database so that more realistic assessments about well performance can be made. Production for completion and reservoir type combinations were determined. As a final step, a feed forward neural network (ANN) model was developed from the mapped data. This model was used to estimate Wolfcamp B production and economics for completion and frac designs. In the performance of this project, it became apparent that the incorporation of reservoir data was essential to understanding the impact of completion and frac design on multi-fractured horizontal Wolfcamp B well production and economic performance. As we would expect, wells with the most permeability, higher pore pressure, effective porosity and lower water saturation have the greatest potential for hydrocarbon production. The most effective completion types have an optimum combination of proppant intensity, fluid intensity, treatment rate, frac stage spacing and perforation clustering. This paper will be of interest to anyone optimizing hydraulically fractured Wolfcamp B completion design or evaluating Permian Basin prospects. Also, of interest is the impact of reservoir and completion characteristics such as permeability, porosity, water saturation, pressure, offset well production, proppant intensity, fluid intensity, frac stage spacing and lateral length on well production and economics. The methodology used to evaluate the impact of reservoir and completion parameters for this Wolfcamp project is unique and novel. In addition, compared to other methodologies, it is low cost and fast. And though the focus of this paper is on the Wolfcamp B Formation in the Midland Basin, this approach and workflow can be applied to any formation in any Basin, provided sufficient data is available.

Download Full-text