scholarly journals Recapitulation of patient-specific 3D chromatin conformation using machine learning and validation of identified enhancer-gene targets

2021 ◽  
Author(s):  
Duo Xu ◽  
Andre Neil Forbes ◽  
Sandra Cohen ◽  
Ann Palladino ◽  
Tatiana Karadimitriou ◽  
...  

Regulatory networks containing enhancer to gene edges define cellular state and their rewiring is a hallmark of cancer. While efforts, such as ENCODE, have revealed these networks for reference tissues and cell-lines by integrating multi-omics data, the same methods cannot be applied for large patient cohorts due to the constraints on generating ChIP-seq and three-dimensional data from limited material in patient biopsies. We trained a supervised machine learning model using genomic 3D signatures of physical enhancer-gene connections that can predict accurate connections using data from ATAC-seq and RNA-seq assays only, which can be easily generated from patient biopsies. Our method overcomes the major limitations of correlation-based approaches that cannot distinguish between distinct target genes of given enhancers in different samples, which is a hallmark of network rewiring in cancer. Our model achieved an AUROC (area under receiver operating characteristic curve) of 0.91 and, importantly, can distinguish between active regulatory elements with connections to target genes and poised elements with no connections to target genes. Our predicted regulatory elements are validated by multi-omics data, including histone modification marks from ENCODE, with an average specificity of 0.92. Application of our model on chromatin accessibility and transcriptomic data from 400 cancer patients across 22 cancer types revealed novel cancer-type and subtype-specific enhancer-gene connections for known cancer genes. In one example, we identified two enhancers that regulate the expression of ESR1 in only ER+ breast cancer (BRCA) samples but not in ER- samples. These enhancers are predicted to contribute to the high expression of ESR1 in 93% of ER+ BRCA samples. Functional validation using CRISPRi confirms that inhibition of these enhancers decreases the expression of ESR1 in ER+ samples.

Author(s):  
Minsik Oh ◽  
Sungjoon Park ◽  
Sun Kim ◽  
Heejoon Chae

Abstract Gene expressions are subtly regulated by quantifiable measures of genetic molecules such as interaction with other genes, methylation, mutations, transcription factor and histone modifications. Integrative analysis of multi-omics data can help scientists understand the condition or patient-specific gene regulation mechanisms. However, analysis of multi-omics data is challenging since it requires not only the analysis of multiple omics data sets but also mining complex relations among different genetic molecules by using state-of-the-art machine learning methods. In addition, analysis of multi-omics data needs quite large computing infrastructure. Moreover, interpretation of the analysis results requires collaboration among many scientists, often requiring reperforming analysis from different perspectives. Many of the aforementioned technical issues can be nicely handled when machine learning tools are deployed on the cloud. In this survey article, we first survey machine learning methods that can be used for gene regulation study, and we categorize them according to five different goals: gene regulatory subnetwork discovery, disease subtype analysis, survival analysis, clinical prediction and visualization. We also summarize the methods in terms of multi-omics input types. Then, we explain why the cloud is potentially a good solution for the analysis of multi-omics data, followed by a survey of two state-of-the-art cloud systems, Galaxy and BioVLAB. Finally, we discuss important issues when the cloud is used for the analysis of multi-omics data for the gene regulation study.


2018 ◽  
Vol 129 (4) ◽  
pp. 675-688 ◽  
Author(s):  
Samir Kendale ◽  
Prathamesh Kulkarni ◽  
Andrew D. Rosenberg ◽  
Jing Wang

AbstractEditor’s PerspectiveWhat We Already Know about This TopicWhat This Article Tells Us That Is NewBackgroundHypotension is a risk factor for adverse perioperative outcomes. Machine-learning methods allow large amounts of data for development of robust predictive analytics. The authors hypothesized that machine-learning methods can provide prediction for the risk of postinduction hypotension.MethodsData was extracted from the electronic health record of a single quaternary care center from November 2015 to May 2016 for patients over age 12 that underwent general anesthesia, without procedure exclusions. Multiple supervised machine-learning classification techniques were attempted, with postinduction hypotension (mean arterial pressure less than 55 mmHg within 10 min of induction by any measurement) as primary outcome, and preoperative medications, medical comorbidities, induction medications, and intraoperative vital signs as features. Discrimination was assessed using cross-validated area under the receiver operating characteristic curve. The best performing model was tuned and final performance assessed using split-set validation.ResultsOut of 13,323 cases, 1,185 (8.9%) experienced postinduction hypotension. Area under the receiver operating characteristic curve using logistic regression was 0.71 (95% CI, 0.70 to 0.72), support vector machines was 0.63 (95% CI, 0.58 to 0.60), naive Bayes was 0.69 (95% CI, 0.67 to 0.69), k-nearest neighbor was 0.64 (95% CI, 0.63 to 0.65), linear discriminant analysis was 0.72 (95% CI, 0.71 to 0.73), random forest was 0.74 (95% CI, 0.73 to 0.75), neural nets 0.71 (95% CI, 0.69 to 0.71), and gradient boosting machine 0.76 (95% CI, 0.75 to 0.77). Test set area for the gradient boosting machine was 0.74 (95% CI, 0.72 to 0.77).ConclusionsThe success of this technique in predicting postinduction hypotension demonstrates feasibility of machine-learning models for predictive analytics in the field of anesthesiology, with performance dependent on model selection and appropriate tuning.


2021 ◽  
Author(s):  
Naveena Yanamala ◽  
Nanda H. Krishna ◽  
Quincy A. Hathaway ◽  
Aditya Radhakrishnan ◽  
Srinidhi Sunkara ◽  
...  

AbstractPatients with influenza and SARS-CoV2/Coronavirus disease 2019 (COVID-19) infections have different clinical course and outcomes. We developed and validated a supervised machine learning pipeline to distinguish the two viral infections using the available vital signs and demographic dataset from the first hospital/emergency room encounters of 3,883 patients who had confirmed diagnoses of influenza A/B, COVID-19 or negative laboratory test results. The models were able to achieve an area under the receiver operating characteristic curve (ROC AUC) of at least 97% using our multiclass classifier. The predictive models were externally validated on 15,697 encounters in 3,125 patients available on TrinetX database that contains patient-level data from different healthcare organizations. The influenza vs. COVID-19-positive model had an AUC of 98%, and 92% on the internal and external test sets, respectively. Our study illustrates the potentials of machine-learning models for accurately distinguishing the two viral infections. The code is made available at https://github.com/ynaveena/COVID-19-vs-Influenza and may be have utility as a frontline diagnostic tool to aid healthcare workers in triaging patients once the two viral infections start cocirculating in the communities.


2020 ◽  
Vol 1 (6) ◽  
pp. 236-244
Author(s):  
Matthias A. Verstraete ◽  
Ryan E. Moore ◽  
Martin Roche ◽  
Michael A. Conditt

Aims The use of technology to assess balance and alignment during total knee surgery can provide an overload of numerical data to the surgeon. Meanwhile, this quantification holds the potential to clarify and guide the surgeon through the surgical decision process when selecting the appropriate bone recut or soft tissue adjustment when balancing a total knee. Therefore, this paper evaluates the potential of deploying supervised machine learning (ML) models to select a surgical correction based on patient-specific intra-operative assessments. Methods Based on a clinical series of 479 primary total knees and 1,305 associated surgical decisions, various ML models were developed. These models identified the indicated surgical decision based on available, intra-operative alignment, and tibiofemoral load data. Results With an associated area under the receiver-operator curve ranging between 0.75 and 0.98, the optimized ML models resulted in good to excellent predictions. The best performing model used a random forest approach while considering both alignment and intra-articular load readings. Conclusion The presented model has the potential to make experience available to surgeons adopting new technology, bringing expert opinion in their operating theatre, but also provides insight in the surgical decision process. More specifically, these promising outcomes indicated the relevance of considering the overall limb alignment in the coronal and sagittal plane to identify the appropriate surgical decision.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Naveena Yanamala ◽  
Nanda H. Krishna ◽  
Quincy A. Hathaway ◽  
Aditya Radhakrishnan ◽  
Srinidhi Sunkara ◽  
...  

AbstractPatients with influenza and SARS-CoV2/Coronavirus disease 2019 (COVID-19) infections have a different clinical course and outcomes. We developed and validated a supervised machine learning pipeline to distinguish the two viral infections using the available vital signs and demographic dataset from the first hospital/emergency room encounters of 3883 patients who had confirmed diagnoses of influenza A/B, COVID-19 or negative laboratory test results. The models were able to achieve an area under the receiver operating characteristic curve (ROC AUC) of at least 97% using our multiclass classifier. The predictive models were externally validated on 15,697 encounters in 3125 patients available on TrinetX database that contains patient-level data from different healthcare organizations. The influenza vs COVID-19-positive model had an AUC of 98.8%, and 92.8% on the internal and external test sets, respectively. Our study illustrates the potentials of machine-learning models for accurately distinguishing the two viral infections. The code is made available at https://github.com/ynaveena/COVID-19-vs-Influenza and may have utility as a frontline diagnostic tool to aid healthcare workers in triaging patients once the two viral infections start cocirculating in the communities.


Hypertension ◽  
2020 ◽  
Vol 76 (Suppl_1) ◽  
Author(s):  
Sachin Aryal ◽  
Ahmad Alimadadi ◽  
Ishan Manandhar ◽  
Bina Joe ◽  
Xi Cheng

In recent years, the microbiome has been recognized as an important factor associated with cardiovascular disease (CVD), which is the leading cause of human mortality worldwide. Disparities in gut microbial compositions between individuals with and without CVD were reported, whereby, we hypothesized that utilizing such microbiome-based data for training with supervised machine learning (ML) models could be exploited as a new strategy for evaluation of cardiovascular health. To test our hypothesis, we analyzed the metagenomics data extracted from the American Gut Project. Specifically, 16S rRNA reads from stool samples of 478 CVD and 473 non-CVD control samples were analyzed using five supervised ML algorithms: random forest (RF), support vector machine with radial kernel (svmRadial), decision tree (DT), elastic net (ENet) and neural networks (NN). Thirty-nine differential bacterial taxa (LEfSe: LDA > 2) were identified between CVD and non-CVD groups. ML classifications, using these taxonomic features, achieved an AUC (area under the receiver operating characteristic curve) of ~0.58 (RF). However, by choosing the top 500 high-variance features of operational taxonomic units (OTUs) for training ML models, an improved AUC of ~0.65 (RF) was achieved. Further, by limiting the selection to only the top 25 highly contributing OTU features to reduce the dimensionality of feature space, the AUC was further significantly enhanced to ~0.70 (RF). In summary, this study is the first to demonstrate the successful development of a ML model using microbiome-based datasets for a systematic diagnostic screening of CVD.


ACI Open ◽  
2019 ◽  
Vol 03 (02) ◽  
pp. e88-e97
Author(s):  
Mohammadamin Tajgardoon ◽  
Malarkodi J. Samayamuthu ◽  
Luca Calzoni ◽  
Shyam Visweswaran

Abstract Background Machine learning models that are used for predicting clinical outcomes can be made more useful by augmenting predictions with simple and reliable patient-specific explanations for each prediction. Objectives This article evaluates the quality of explanations of predictions using physician reviewers. The predictions are obtained from a machine learning model that is developed to predict dire outcomes (severe complications including death) in patients with community acquired pneumonia (CAP). Methods Using a dataset of patients diagnosed with CAP, we developed a predictive model to predict dire outcomes. On a set of 40 patients, who were predicted to be either at very high risk or at very low risk of developing a dire outcome, we applied an explanation method to generate patient-specific explanations. Three physician reviewers independently evaluated each explanatory feature in the context of the patient's data and were instructed to disagree with a feature if they did not agree with the magnitude of support, the direction of support (supportive versus contradictory), or both. Results The model used for generating predictions achieved a F1 score of 0.43 and area under the receiver operating characteristic curve (AUROC) of 0.84 (95% confidence interval [CI]: 0.81–0.87). Interreviewer agreement between two reviewers was strong (Cohen's kappa coefficient = 0.87) and fair to moderate between the third reviewer and others (Cohen's kappa coefficient = 0.49 and 0.33). Agreement rates between reviewers and generated explanations—defined as the proportion of explanatory features with which majority of reviewers agreed—were 0.78 for actual explanations and 0.52 for fabricated explanations, and the difference between the two agreement rates was statistically significant (Chi-square = 19.76, p-value < 0.01). Conclusion There was good agreement among physician reviewers on patient-specific explanations that were generated to augment predictions of clinical outcomes. Such explanations can be useful in interpreting predictions of clinical outcomes.


2021 ◽  
Vol 9 (Suppl 3) ◽  
pp. A858-A858
Author(s):  
Vinnu Bhardwaj ◽  
Amin Momin ◽  
Jonathan Johnston ◽  
Elizabeth Speltz ◽  
Tyler Borrman ◽  
...  

BackgroundPACT Pharma has developed a state-of-the-art approach to validate predicted neoepitopes (neoEs) and their cognate T cell receptors (neoTCRs) by capturing neoepitope-specific T cells from peripheral blood. This neoTCR discovery and validation process is being applied in clinical trial (NCT03970382) evaluating personalized neoTCR-T cell therapy to treat patients across eight solid tumor types. Extensive pre-, on- and post-treatment data related to this trial has been accumulated in the PACTImmune Database (PIDB) which represents a growing data asset for patient-specific tumor immunogenicity in solid tumors. Here we present a specific use case of applying machine learning (ML) to significantly improve neoE-HLA predictions and further model anticipated improvements of TCR capture as a direct consequence.MethodsPACT has developed capabilities for high-throughput manufacturing of single polypeptide (comPACT protein) which consists of the predicted neoE peptide together with Beta-2-Microglobulin and the HLA heavy chain. comPACT molecules are considered successfully produced when protein yields reach concentrations >1uM. Data used for this study consisted of >26000 neoE-HLA predictions for 62 different HLA alleles. We applied ML to learn patterns that are predictive of neoE-HLAs that can be successfully produced as comPACTs, using scikit-learn and XGBoost. Data was first split into training and testing data. Models were trained on training data and model hyperparameters were tuned using 5-fold cross validation (5xCV). The performance of the models during 5xCV and on test data was measured using the area under the receiver operating characteristic curve (AUC). We additionally performed experimental prospective validation of the models. To do this, 603 neoE-HLAs (from 7 previously unseen cancer samples) were selected for comPACT production using netMHCpan4.1 and the newly trained models.ResultsThe mean AUC for the 5xCV of the selected models ranged from 0.75 to 0.86 depending upon the HLA allele (SD <0.05 for every model). The AUC on the test data ranged from 0.75 to 0.92 (median = 0.85). Prospective validation resulted on average in a 22% higher success rate (range 11%–39%) using the new models as compared to the netMHCpan4.1 predictions. This is expected to result in increased capture of neoepitope-specific CD8+ T cells as the PIDB indicates that 3.2% of the successful comPACTs result in validated neoTCRs.ConclusionsPIDB based ML predictions of neoE-HLAs led to a significant increase in TCR-capturing comPACT success rates. Because of this work, it is predicted both neoE-specific CD8+ T cell capture and actionable neoTCR options will increase per patient.


2016 ◽  
Vol 6 (1) ◽  
Author(s):  
Hsin-Yun Wu ◽  
Cihun-Siyong Alex Gong ◽  
Shih-Pin Lin ◽  
Kuang-Yi Chang ◽  
Mei-Yung Tsou ◽  
...  

Abstract Patient-controlled epidural analgesia (PCEA) has been applied to reduce postoperative pain in orthopedic surgical patients. Unfortunately, PCEA is occasionally accompanied by nausea and vomiting. The logistic regression (LR) model is widely used to predict vomiting, and recently support vector machines (SVM), a supervised machine learning method, has been used for classification and prediction. Unlike our previous work which compared Artificial Neural Networks (ANNs) with LR, this study uses a SVM-based predictive model to identify patients with high risk of vomiting during PCEA and comparing results with those derived from the LR-based model. From January to March 2007, data from 195 patients undergoing PCEA following orthopedic surgery were applied to develop two predictive models. 75% of the data were randomly selected for training, while the remainder was used for testing to validate predictive performance. The area under curve (AUC) was measured using the Receiver Operating Characteristic curve (ROC). The area under ROC curves of LR and SVM models were 0.734 and 0.929, respectively. A computer-based predictive model can be used to identify those who are at high risk for vomiting after PCEA, allowing for patient-specific therapeutic intervention or the use of alternative analgesic methods.


2020 ◽  
Vol 1 (6) ◽  
pp. 236-244 ◽  
Author(s):  
Matthias A. Verstraete ◽  
Ryan E. Moore ◽  
Martin Roche ◽  
Michael A. Conditt

Aims The use of technology to assess balance and alignment during total knee surgery can provide an overload of numerical data to the surgeon. Meanwhile, this quantification holds the potential to clarify and guide the surgeon through the surgical decision process when selecting the appropriate bone recut or soft tissue adjustment when balancing a total knee. Therefore, this paper evaluates the potential of deploying supervised machine learning (ML) models to select a surgical correction based on patient-specific intra-operative assessments. Methods Based on a clinical series of 479 primary total knees and 1,305 associated surgical decisions, various ML models were developed. These models identified the indicated surgical decision based on available, intra-operative alignment, and tibiofemoral load data. Results With an associated area under the receiver-operator curve ranging between 0.75 and 0.98, the optimized ML models resulted in good to excellent predictions. The best performing model used a random forest approach while considering both alignment and intra-articular load readings. Conclusion The presented model has the potential to make experience available to surgeons adopting new technology, bringing expert opinion in their operating theatre, but also provides insight in the surgical decision process. More specifically, these promising outcomes indicated the relevance of considering the overall limb alignment in the coronal and sagittal plane to identify the appropriate surgical decision.


Sign in / Sign up

Export Citation Format

Share Document