scholarly journals Utilizing the Heterogeneity of Clinical Data for Model Refinement and Rule Discovery Through the Application of Genetic Algorithms to Calibrate a High-Dimensional Agent-Based Model of Systemic Inflammation

2021 ◽  
Vol 12 ◽  
Author(s):  
Chase Cockrell ◽  
Gary An

Introduction: Accounting for biological heterogeneity represents one of the greatest challenges in biomedical research. Dynamic computational and mathematical models can be used to enhance the study and understanding of biological systems, but traditional methods for calibration and validation commonly do not account for the heterogeneity of biological data, which may result in overfitting and brittleness of these models. Herein we propose a machine learning approach that utilizes genetic algorithms (GAs) to calibrate and refine an agent-based model (ABM) of acute systemic inflammation, with a focus on accounting for the heterogeneity seen in a clinical data set, thereby avoiding overfitting and increasing the robustness and potential generalizability of the underlying simulation model.Methods: Agent-based modeling is a frequently used modeling method for multi-scale mechanistic modeling. However, the same properties that make ABMs well suited to representing biological systems also present significant challenges with respect to their construction and calibration, particularly with respect to the selection of potential mechanistic rules and the large number of associated free parameters. We have proposed that machine learning approaches (such as GAs) can be used to more effectively and efficiently deal with rule selection and parameter space characterization; the current work applies GAs to the challenge of calibrating a complex ABM to a specific data set, while preserving biological heterogeneity reflected in the range and variance of the data. This project uses a GA to augment the rule-set for a previously validated ABM of acute systemic inflammation, the Innate Immune Response ABM (IIRABM) to clinical time series data of systemic cytokine levels from a population of burn patients. The genome for the GA is a vector generated from the IIRABM’s Model Rule Matrix (MRM), which is a matrix representation of not only the constants/parameters associated with the IIRABM’s cytokine interaction rules, but also the existence of rules themselves. Capturing heterogeneity is accomplished by a fitness function that incorporates the sample value range (“error bars”) of the clinical data.Results: The GA-enabled parameter space exploration resulted in a set of putative MRM rules and associated parameterizations which closely match the cytokine time course data used to design the fitness function. The number of non-zero elements in the MRM increases significantly as the model parameterizations evolve toward a fitness function minimum, transitioning from a sparse to a dense matrix. This results in a model structure that more closely resembles (at a superficial level) the structure of data generated by a standard differential gene expression experimental study.Conclusion: We present an HPC-enabled machine learning/evolutionary computing approach to calibrate a complex ABM to complex clinical data while preserving biological heterogeneity. The integration of machine learning, HPC, and multi-scale mechanistic modeling provides a pathway forward to more effectively representing the heterogeneity of clinical populations and their data.

2019 ◽  
Author(s):  
R Chase Cockrell ◽  
Gary An

AbstractIntroductionAgent-based modeling frequently used modeling method for multi-scale mechanistic modeling. However, the same properties that make agent-based models (ABMs) well suited to representing biological systems also present significant challenges with respect to their construction and calibration, particularly with respect to the selection of potential mechanistic rules and the large number of free parameters often present in these models. We have proposed that various machine learning approaches (such as genetic algorithms (GAs)) can be used to more effectively and efficiently deal with rule selection and parameter space characterization; the current work applies GAs to the challenge of calibrating a complex ABM to a specific data set, while preserving biological heterogeneity.MethodsThis project uses a GA to augment the rule-set for a previously validated ABM of acute systemic inflammation, the Innate Immune Response ABM (IIRABM) to clinical time series data of systemic cytokine levels from a population of burn patients. The genome for the GA is a vector generated from the IIRABM’s Model Rule Matrix (MRM), which is a matrix representation of not only the constants/parameters associated with the IIRABM’s cytokine interaction rules, but also the existence of rules themselves. Capturing heterogeneity is accomplished by a fitness function that incorporates the sample value range (“error bars”) of the clinical data.ResultsThe GA-enabled parameter space exploration resulted in a set of putative MRM rules and associated parameterizations which closely match the cytokine time course data used to design the fitness function. The number of non-zero elements in the MRM increases significantly as the model parameterizations evolve towards a fitness function minimum, transitioning from a sparse to a dense matrix. This results in a model structure that more closely resembles (at a superficial level) the structure of data generated by a standard differential gene expression experimental study.ConclusionWe present an HPC-enabled evolutionary computing approach to calibrate a complex ABM to clinical data while preserving biological heterogeneity. The integration of machine learning, HPC, and multi-scale mechanistic modeling provides a pathway forward to effectively represent the heterogeneity of clinical populations and their data.Author SummaryIn this work, we utilize genetic algorithms (GA) to operate on the internal rule set of a computational of the human immune response to injury, the Innate Immune Response Agent-Based Model (IIRABM), such that it is iteratively refined to generate cytokine time series that closely match what is seen in a clinical cohort of burn patients. At the termination of the GA, there exists an ensemble of candidate model rule-sets/parameterizations which are validated by the experimental data;


A large volume of datasets is available in various fields that are stored to be somewhere which is called big data. Big Data healthcare has clinical data set of every patient records in huge amount and they are maintained by Electronic Health Records (EHR). More than 80 % of clinical data is the unstructured format and reposit in hundreds of forms. The challenges and demand for data storage, analysis is to handling large datasets in terms of efficiency and scalability. Hadoop Map reduces framework uses big data to store and operate any kinds of data speedily. It is not solely meant for storage system however conjointly a platform for information storage moreover as processing. It is scalable and fault-tolerant to the systems. Also, the prediction of the data sets is handled by machine learning algorithm. This work focuses on the Extreme Machine Learning algorithm (ELM) that can utilize the optimized way of finding a solution to find disease risk prediction by combining ELM with Cuckoo Search optimization-based Support Vector Machine (CS-SVM). The proposed work also considers the scalability and accuracy of big data models, thus the proposed algorithm greatly achieves the computing work and got good results in performance of both veracity and efficiency.


Author(s):  
Eda Ustaoglu ◽  
Arif Çagdaş Aydinoglu

Land-use change models are tools to support analyses, assessments, and policy decisions concerning the causes and consequences of land-use dynamics, by providing a framework for the analysis of land-use change processes and making projections for the future land-use/cover patterns. There is a variety of modelling approaches that were developed from different disciplinary backgrounds. Following the reviews in the literature, this chapter focuses on various modelling tools and practices that range from pattern-based methods such as machine learning and GIS (Geographic Information System)-based approaches, to process-based methods such as structural economic or agent-based models. For each of these methods, an overview is given for the advances that have been progressed by geographers, natural and economy scientists in developing these models of spatial land-use change. It is noted that further progress is needed in terms of model development, and integration of models operating at various scales that better address the multi-scale characteristics of the land-use system.


2015 ◽  
Vol 9s3 ◽  
pp. BBI.S29473 ◽  
Author(s):  
William Seffens ◽  
Chad Evans ◽  

Health-care initiatives are pushing the development and utilization of clinical data for medical discovery and translational research studies. Machine learning tools implemented for Big Data have been applied to detect patterns in complex diseases. This study focuses on hypertension and examines phenotype data across a major clinical study called Minority Health Genomics and Translational Research Repository Database composed of self-reported African American (AA) participants combined with related cohorts. Prior genome-wide association studies for hypertension in AAs presumed that an increase of disease burden in susceptible populations is due to rare variants. But genomic analysis of hypertension, even those designed to focus on rare variants, has yielded marginal genome-wide results over many studies. Machine learning and other nonparametric statistical methods have recently been shown to uncover relationships in complex phenotypes, genotypes, and clinical data. We trained neural networks with phenotype data for missing-data imputation to increase the usable size of a clinical data set. Validity was established by showing performance effects using the expanded data set for the association of phenotype variables with case/control status of patients. Data mining classification tools were used to generate association rules.


2020 ◽  
Author(s):  
Rebecca O'Donovan ◽  
Emre Sezgin ◽  
Sven Bambach ◽  
Eric Butter ◽  
Simon Lin

BACKGROUND Qualitative self- or parent-reports used in assessing children’s behavioral disorders are often inconvenient to collect and can be misleading due to missing information, rater biases, and limited validity. A data-driven approach to quantify behavioral disorders could alleviate these concerns. This study proposes a machine learning approach to identify screams in voice recordings that avoids the need to gather large amounts of clinical data for model training. OBJECTIVE The goal of this study is to evaluate if a machine learning model trained only on publicly available audio data sets could be used to detect screaming sounds in audio streams captured in an at-home setting. METHODS Two sets of audio samples were prepared to evaluate the model: a subset of the publicly available AudioSet data set and a set of audio data extracted from the TV show Supernanny, which was chosen for its similarity to clinical data. Scream events were manually annotated for the Supernanny data, and existing annotations were refined for the AudioSet data. Audio feature extraction was performed with a convolutional neural network pretrained on AudioSet. A gradient-boosted tree model was trained and cross-validated for scream classification on the AudioSet data and then validated independently on the Supernanny audio. RESULTS On the held-out AudioSet clips, the model achieved a receiver operating characteristic (ROC)–area under the curve (AUC) of 0.86. The same model applied to three full episodes of Supernanny audio achieved an ROC-AUC of 0.95 and an average precision (positive predictive value) of 42% despite screams only making up 1.3% (n=92/7166 seconds) of the total run time. CONCLUSIONS These results suggest that a scream-detection model trained with publicly available data could be valuable for monitoring clinical recordings and identifying tantrums as opposed to depending on collecting costly privacy-protected clinical data for model training.


Author(s):  
Christos Valelis ◽  
Fotios K. Anagnostopoulos ◽  
Spyros Basilakos ◽  
Emmanuel N. Saridakis

The existence or not of pathologies in the context of Lagrangian theory is studied with the aid of Machine Learning algorithms. Using an example in the framework of classical mechanics, we make a proof of concept, that the construction of new physical theories using machine learning is possible. Specifically, we utilize a fully-connected, feed-forward neural network architecture, aiming to discriminate between “healthy” and “nonhealthy” Lagrangians, without explicitly extracting the relevant equations of motion. The network, after training, is used as a fitness function in the concept of a genetic algorithm and new healthy Lagrangians are constructed. These new Lagrangians are different from the Lagrangians contained in the initial data set. Hence, searching for Lagrangians possessing a number of pre-defined properties is significantly simplified within our approach. The framework employed in this work can be used to explore more complex physical theories, such as generalizations of General Relativity in gravitational physics, or constructions in solid state physics, in which the standard procedure can be laborious.


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. e14138-e14138
Author(s):  
Beung-Chul AHN ◽  
Kyoung Ho Pyo ◽  
Dongmin Jung ◽  
Chun-Feng Xin ◽  
Chang Gon Kim ◽  
...  

e14138 Background: Immune checkpoint inhibitors have become breakthrough therapy for various types of cancers. However, regarding their total response rate around 20% based on clinical trials, predicting accurate aPD-1 response for individual patient is unestablished. The presence of PD-L1 expression or tumor infiltrating lymphocyte may be used as indicators of response but are limited. We developed models using machine learning methods to predict the aPD-1 response. Methods: A total of 126 advanced NSCLC patients treated with the aPD-1 were enrolled. Their clinical characteristics, treatment outcomes, and adverse events were collected. Total clinical data (n = 126) consist of 15 variables were divided into two subsets, discovery set (n = 63) and test set (n = 63). Thirteen supervised learning algorithms including support vector machine and regularized regression (lasso, ridge, elastic net) were applied on discovery set for model development and on test set for validation. Each model were evaluated according to the ROC curve and cross-validation method. Same methods were used to the subset which had additional flow cytometry data (n = 40). Results: The median age was 64 and 69.8% were male. Adenocarcinoma was predominant (69.8%) and twenty patients (15.1%) were driver mutation positive. Clinical data set (n = 126) demonstrated that the Ridge regression (AUC: 0.79) was the best model for prediction. Of 15 clinical variables, tumor burden, age, ECOG PS and PD-L1, were most important based on the random forest algorithm. When we merged the clinical and flow cytometry data, the Ridge regression model (AUC:0.82) showed better performance compared to using clinical data only. Among 52 variables of merged set, the top most important immune markers were as follows: CD3+CD8+CD25+/Teff-CD28, CD3+CD8+CD25-/Teff-Ki-67, and CD3+CD8+CD25+/Teff-NY-ESO/Teff-PD-1, which indicate activated tumor specific T cell subset. Conclusions: Our machine learning based model has benefit for predicting aPD-1 responses. After further validation in independent patient cohort, the supervised learning based non-invasive predictive score can be established to predict aPD-1 response.


2019 ◽  
Vol 8 (2) ◽  
pp. 2550-2563

Chronic kidney disease (CKD) is one of the most widely spread diseases across the world. Mysteriously some of the areas in the world like Srilanka, Nicrgua and Uddanam (India), this disease affect more and it is cause of thousands of deaths particular areas. Now days, the prevention with utilizing statistical analysis and early detection of CKD with utilizing Machine Learning (ML) and Neural Networks (NNs) are the most important topics. In this research work, we collected the data form Uddanam (costal area of srikakulam district, A.P, India) about patient’s clinical data, living styles (Habits and culture) and environmental conditions (water, land and etc.) data from 2016 to 2019. In this paper, we conduct the statistical analysis, Machine Learning (ML) and Neural Network application on clinical data set of Uddanam CKD for prevention and early detection of CKD. As per statistical analysis we can prevent the CKD in the Uddanam area. As per ML analysis Naive Bayes model is the best where the process model is constructed within 0.06 seconds and prediction accuracy is 99.9%. In the analysis of NNs, the 9 neurons hidden layer (HL) Artificial Neural Network (ANN) is very accurate than other all models where it performs 100% of accuracy for predicting CKD and it takes the 0.02 seconds process time.


Author(s):  
Elena Hernández-Pereira ◽  
Oscar Fontenla-Romero ◽  
Verónica Bolón-Canedo ◽  
Brais Cancela-Barizo ◽  
Bertha Guijarro-Berdiñas ◽  
...  

AbstractIn this study, we analyze the capability of several state of the art machine learning methods to predict whether patients diagnosed with CoVid-19 (CoronaVirus disease 2019) will need different levels of hospital care assistance (regular hospital admission or intensive care unit admission), during the course of their illness, using only demographic and clinical data. For this research, a data set of 10,454 patients from 14 hospitals in Galicia (Spain) was used. Each patient is characterized by 833 variables, two of which are age and gender and the other are records of diseases or conditions in their medical history. In addition, for each patient, his/her history of hospital or intensive care unit (ICU) admissions due to CoVid-19 is available. This clinical history will serve to label each patient and thus being able to assess the predictions of the model. Our aim is to identify which model delivers the best accuracies for both hospital and ICU admissions only using demographic variables and some structured clinical data, as well as identifying which of those are more relevant in both cases. The results obtained in the experimental study show that the best models are those based on oversampling as a preprocessing phase to balance the distribution of classes. Using these models and all the available features, we achieved an area under the curve (AUC) of 76.1% and 80.4% for predicting the need of hospital and ICU admissions, respectively. Furthermore, feature selection and oversampling techniques were applied and it has been experimentally verified that the relevant variables for the classification are age and gender, since only using these two features the performance of the models is not degraded for the two mentioned prediction problems.


2021 ◽  
Vol 4 ◽  
Author(s):  
Danielle Barnes ◽  
Luis Polanco ◽  
Jose A. Perea

Many and varied methods currently exist for featurization, which is the process of mapping persistence diagrams to Euclidean space, with the goal of maximally preserving structure. However, and to our knowledge, there are presently no methodical comparisons of existing approaches, nor a standardized collection of test data sets. This paper provides a comparative study of several such methods. In particular, we review, evaluate, and compare the stable multi-scale kernel, persistence landscapes, persistence images, the ring of algebraic functions, template functions, and adaptive template systems. Using these approaches for feature extraction, we apply and compare popular machine learning methods on five data sets: MNIST, Shape retrieval of non-rigid 3D Human Models (SHREC14), extracts from the Protein Classification Benchmark Collection (Protein), MPEG7 shape matching, and HAM10000 skin lesion data set. These data sets are commonly used in the above methods for featurization, and we use them to evaluate predictive utility in real-world applications.


Sign in / Sign up

Export Citation Format

Share Document