scholarly journals netDx: Software for building interpretable patient classifiers by multi-'omic data integration using patient similarity networks

F1000Research ◽  
2021 ◽  
Vol 9 ◽  
pp. 1239
Author(s):  
Shraddha Pai ◽  
Philipp Weber ◽  
Ruth Isserlin ◽  
Hussam Kaka ◽  
Shirley Hui ◽  
...  

Patient classification based on clinical and genomic data will further the goal of precision medicine. Interpretability is of particular relevance for models based on genomic data, where sample sizes are relatively small (in the hundreds), increasing overfitting risk netDx is a machine learning method to integrate multi-modal patient data and build a patient classifier. Patient data are converted into networks of patient similarity, which is intuitive to clinicians who also use patient similarity for medical diagnosis. Features passing selection are integrated, and new patients are assigned to the class with the greatest profile similarity. netDx has excellent performance, outperforming most machine-learning methods in binary cancer survival prediction. It handles missing data – a common problem in real-world data – without requiring imputation. netDx also has excellent interpretability, with native support to group genes into pathways for mechanistic insight into predictive features. The netDx Bioconductor package provides multiple workflows for users to build custom patient classifiers. It provides turnkey functions for one-step predictor generation from multi-modal data, including feature selection over multiple train/test data splits. Workflows offer versatility with custom feature design, choice of similarity metric; speed is improved by parallel execution. Built-in functions and examples allow users to compute model performance metrics such as AUROC, AUPR, and accuracy. netDx uses RCy3 to visualize top-scoring pathways and the final integrated patient network in Cytoscape. Advanced users can build more complex predictor designs with functional building blocks used in the default design. Finally, the netDx Bioconductor package provides a novel workflow for pathway-based patient classification from sparse genetic data.

F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 1239
Author(s):  
Shraddha Pai ◽  
Philipp Weber ◽  
Ruth Isserlin ◽  
Hussam Kaka ◽  
Shirley Hui ◽  
...  

Patient classification based on clinical and genomic data will further the goal of precision medicine. Interpretability is of particular relevance for models based on genomic data, where sample sizes are relatively small (in the hundreds), increasing overfitting risk netDx is a machine learning method to integrate multi-modal patient data and build a patient classifier. Patient data are converted into networks of patient similarity, which is intuitive to clinicians who also use patient similarity for medical diagnosis. Features passing selection are integrated, and new patients are assigned to the class with the greatest profile similarity. netDx has excellent performance, outperforming most machine-learning methods in binary cancer survival prediction. It handles missing data – a common problem in real-world data – without requiring imputation. netDx also has excellent interpretability, with native support to group genes into pathways for mechanistic insight into predictive features. The netDx Bioconductor package provides multiple workflows for users to build custom patient classifiers. It provides turnkey functions for one-step predictor generation from multi-modal data, including feature selection over multiple train/test data splits. Workflows offer versatility with custom feature design, choice of similarity metric; speed is improved by parallel execution. Built-in functions and examples allow users to compute model performance metrics such as AUROC, AUPR, and accuracy. netDx uses RCy3 to visualize top-scoring pathways and the final integrated patient network in Cytoscape. Advanced users can build more complex predictor designs with functional building blocks used in the default design. Finally, the netDx Bioconductor package provides a novel workflow for pathway-based patient classification from sparse genetic data.


Sensors ◽  
2019 ◽  
Vol 19 (10) ◽  
pp. 2266 ◽  
Author(s):  
Nikolaos Sideris ◽  
Georgios Bardis ◽  
Athanasios Voulodimos ◽  
Georgios Miaoulis ◽  
Djamchid Ghazanfarpour

The constantly increasing amount and availability of urban data derived from varying sources leads to an assortment of challenges that include, among others, the consolidation, visualization, and maximal exploitation prospects of the aforementioned data. A preeminent problem affecting urban planning is the appropriate choice of location to host a particular activity (either commercial or common welfare service) or the correct use of an existing building or empty space. In this paper, we propose an approach to address these challenges availed with machine learning techniques. The proposed system combines, fuses, and merges various types of data from different sources, encodes them using a novel semantic model that can capture and utilize both low-level geometric information and higher level semantic information and subsequently feeds them to the random forests classifier, as well as other supervised machine learning models for comparisons. Our experimental evaluation on multiple real-world data sets comparing the performance of several classifiers (including Feedforward Neural Networks, Support Vector Machines, Bag of Decision Trees, k-Nearest Neighbors and Naïve Bayes), indicated the superiority of Random Forests in terms of the examined performance metrics (Accuracy, Specificity, Precision, Recall, F-measure and G-mean).


2020 ◽  
Vol 14 (Supplement_1) ◽  
pp. S170-S171
Author(s):  
M Madgwick ◽  
P Sudhakar ◽  
N S Tabib ◽  
P Norvaisas ◽  
P Creed ◽  
...  

Abstract Background Inflammatory bowel disease (IBD) has been shown to be associated with alterations in the intestinal microbiome. However, the precise nature of these microbial changes remains unclear. With billions of microbes present within the gut, novel and powerful computational techniques are required to identify the relevant shifts in microbiota contributing to the disease. Machine learning (ML) allows a data-driven approach to identify these discrete dynamic changes, while the findings of the ML algorithms can be interpreted using systems biology (SB) techniques. By combining ML and SB approaches, we aim to characterise key microbial factors in IBD pathogenesis, distinct patterns of variability in a diverse patient cohort and provide a method for patient stratification. Methods The causal relationship between the changes in the gut microbiome and IBD is difficult to establish. Data from cross-sectional studies are plagued by confounding factors and inconsistencies between cohorts. To overcome this, the authors used rich longitudinal datasets and integrated metagenomic, multi-omic and clinical patient data. This workflow has been validated using large longitudinal IBD databases, including data from IBDMDB. We assessed the performance of the ML models using well-documented performance metrics to ensure the outcomes were robust. Results As a baseline, we used multiple ML models to predict disease type (UC, CD and non-IBD) from integrated multi-omics profiles. We analysed multiple ML techniques, including linear (e.g. linear mixed model), non-linear (e.g. Random Forest), time-series models (e.g. Rotation Forest) and deep learning models (e.g. long short-term memory network model). The authors identified the models which would allow flexibility to analyse the dynamic nature of the microbiome and allow integration of the microbiome data with clinical patient data. The payoff of greater flexibility was a reduction in the model performance in terms of identifying specific features from the metagenomics that could be used as biomarkers. However, we were able to identify connections between microbial and host proteins relevant to IBD and were able to stratify these by the patient’s metagenomic data. Conclusion We have developed an integrated ml-based microbiome analysis pipeline to identify biomarkers for IBD from longitudinal metagenomic data. Furthermore, using a variety of SB approaches, we were able to interpret the predicted key microbial features and communities by inferring connections between microbial and host proteins. This pipeline will enable us to analyse vast amounts of patient microbiome data in the context of clinical and metagenomic data, to allow identification of biomarkers for disease subtypes.


2018 ◽  
Author(s):  
Sherif Tawfik ◽  
Olexandr Isayev ◽  
Catherine Stampfl ◽  
Joseph Shapter ◽  
David Winkler ◽  
...  

Materials constructed from different van der Waals two-dimensional (2D) heterostructures offer a wide range of benefits, but these systems have been little studied because of their experimental and computational complextiy, and because of the very large number of possible combinations of 2D building blocks. The simulation of the interface between two different 2D materials is computationally challenging due to the lattice mismatch problem, which sometimes necessitates the creation of very large simulation cells for performing density-functional theory (DFT) calculations. Here we use a combination of DFT, linear regression and machine learning techniques in order to rapidly determine the interlayer distance between two different 2D heterostructures that are stacked in a bilayer heterostructure, as well as the band gap of the bilayer. Our work provides an excellent proof of concept by quickly and accurately predicting a structural property (the interlayer distance) and an electronic property (the band gap) for a large number of hybrid 2D materials. This work paves the way for rapid computational screening of the vast parameter space of van der Waals heterostructures to identify new hybrid materials with useful and interesting properties.


2020 ◽  
Vol 21 ◽  
Author(s):  
Sukanya Panja ◽  
Sarra Rahem ◽  
Cassandra J. Chu ◽  
Antonina Mitrofanova

Background: In recent years, the availability of high throughput technologies, establishment of large molecular patient data repositories, and advancement in computing power and storage have allowed elucidation of complex mechanisms implicated in therapeutic response in cancer patients. The breadth and depth of such data, alongside experimental noise and missing values, requires a sophisticated human-machine interaction that would allow effective learning from complex data and accurate forecasting of future outcomes, ideally embedded in the core of machine learning design. Objective: In this review, we will discuss machine learning techniques utilized for modeling of treatment response in cancer, including Random Forests, support vector machines, neural networks, and linear and logistic regression. We will overview their mathematical foundations and discuss their limitations and alternative approaches all in light of their application to therapeutic response modeling in cancer. Conclusion: We hypothesize that the increase in the number of patient profiles and potential temporal monitoring of patient data will define even more complex techniques, such as deep learning and causal analysis, as central players in therapeutic response modeling.


Diagnostics ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 1263
Author(s):  
Samy Ammari ◽  
Raoul Sallé de Chou ◽  
Tarek Assi ◽  
Mehdi Touat ◽  
Emilie Chouzenoux ◽  
...  

Anti-angiogenic therapy with bevacizumab is a widely used therapeutic option for recurrent glioblastoma (GBM). Nevertheless, the therapeutic response remains highly heterogeneous among GBM patients with discordant outcomes. Recent data have shown that radiomics, an advanced recent imaging analysis method, can help to predict both prognosis and therapy in a multitude of solid tumours. The objective of this study was to identify novel biomarkers, extracted from MRI and clinical data, which could predict overall survival (OS) and progression-free survival (PFS) in GBM patients treated with bevacizumab using machine-learning algorithms. In a cohort of 194 recurrent GBM patients (age range 18–80), radiomics data from pre-treatment T2 FLAIR and gadolinium-injected MRI images along with clinical features were analysed. Binary classification models for OS at 9, 12, and 15 months were evaluated. Our classification models successfully stratified the OS. The AUCs were equal to 0.78, 0.85, and 0.76 on the test sets (0.79, 0.82, and 0.87 on the training sets) for the 9-, 12-, and 15-month endpoints, respectively. Regressions yielded a C-index of 0.64 (0.74) for OS and 0.57 (0.69) for PFS. These results suggest that radiomics could assist in the elaboration of a predictive model for treatment selection in recurrent GBM patients.


2021 ◽  
Vol 10 (4) ◽  
pp. 199
Author(s):  
Francisco M. Bellas Aláez ◽  
Jesus M. Torres Palenzuela ◽  
Evangelos Spyrakos ◽  
Luis González Vilas

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.


2021 ◽  
Vol 28 (1) ◽  
pp. e100262
Author(s):  
Mustafa Khanbhai ◽  
Patrick Anyadi ◽  
Joshua Symons ◽  
Kelsey Flott ◽  
Ara Darzi ◽  
...  

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.


Sign in / Sign up

Export Citation Format

Share Document