scholarly journals Interpreting k-mer–based signatures for antibiotic resistance prediction

GigaScience ◽  
2020 ◽  
Vol 9 (10) ◽  
Author(s):  
Magali Jaillard ◽  
Mattia Palmieri ◽  
Alex van Belkum ◽  
Pierre Mahé

Abstract Background Recent years have witnessed the development of several k-mer–based approaches aiming to predict phenotypic traits of bacteria on the basis of their whole-genome sequences. While often convincing in terms of predictive performance, the underlying models are in general not straightforward to interpret, the interplay between the actual genetic determinant and its translation as k-mers being generally hard to decipher. Results We propose a simple and computationally efficient strategy allowing one to cope with the high correlation inherent to k-mer–based representations in supervised machine learning models, leading to concise and easily interpretable signatures. We demonstrate the benefit of this approach on the task of predicting the antibiotic resistance profile of a Klebsiella pneumoniae strain from its genome, where our method leads to signatures defined as weighted linear combinations of genetic elements that can easily be identified as genuine antibiotic resistance determinants, with state-of-the-art predictive performance. Conclusions By enhancing the interpretability of genomic k-mer–based antibiotic resistance prediction models, our approach improves their clinical utility and hence will facilitate their adoption in routine diagnostics by clinicians and microbiologists. While antibiotic resistance was the motivating application, the method is generic and can be transposed to any other bacterial trait. An R package implementing our method is available at https://gitlab.com/biomerieux-data-science/clustlasso.

2020 ◽  
Author(s):  
Jenna Marie Reps ◽  
Ross Williams ◽  
Seng Chan You ◽  
Thomas Falconer ◽  
Evan Minty ◽  
...  

Abstract Objective: To demonstrate how the Observational Healthcare Data Science and Informatics (OHDSI) collaborative network and standardization can be utilized to scale-up external validation of patient-level prediction models by enabling validation across a large number of heterogeneous observational healthcare datasets.Materials & Methods: Five previously published prognostic models (ATRIA, CHADS2, CHADS2VASC, Q-Stroke and Framingham) that predict future risk of stroke in patients with atrial fibrillation were replicated using the OHDSI frameworks. A network study was run that enabled the five models to be externally validated across nine observational healthcare datasets spanning three countries and five independent sites. Results: The five existing models were able to be integrated into the OHDSI framework for patient-level prediction and they obtained mean c-statistics ranging between 0.57-0.63 across the 6 databases with sufficient data to predict stroke within 1 year of initial atrial fibrillation diagnosis for females with atrial fibrillation. This was comparable with existing validation studies. The validation network study was run across nine datasets within 60 days once the models were replicated. An R package for the study was published at https://github.com/OHDSI/StudyProtocolSandbox/tree/master/ExistingStrokeRiskExternalValidation.Discussion: This study demonstrates the ability to scale up external validation of patient-level prediction models using a collaboration of researchers and a data standardization that enable models to be readily shared across data sites. External validation is necessary to understand the transportability or reproducibility of a prediction model, but without collaborative approaches it can take three or more years for a model to be validated by one independent researcher. Conclusion : In this paper we show it is possible to both scale-up and speed-up external validation by showing how validation can be done across multiple databases in less than 2 months. We recommend that researchers developing new prediction models use the OHDSI network to externally validate their models.


Author(s):  
Jenna Marie Reps ◽  
Ross D Williams ◽  
Seng Chan You ◽  
Thomas Falconer ◽  
Evan Minty ◽  
...  

Abstract Background: To demonstrate how the Observational Healthcare Data Science and Informatics (OHDSI) collaborative network and standardization can be utilized to scale-up external validation of patient-level prediction models by enabling validation across a large number of heterogeneous observational healthcare datasets.Methods: Five previously published prognostic models (ATRIA, CHADS2, CHADS2VASC, Q-Stroke and Framingham) that predict future risk of stroke in patients with atrial fibrillation were replicated using the OHDSI frameworks. A network study was run that enabled the five models to be externally validated across nine observational healthcare datasets spanning three countries and five independent sites. Results: The five existing models were able to be integrated into the OHDSI framework for patient-level prediction and they obtained mean c-statistics ranging between 0.57-0.63 across the 6 databases with sufficient data to predict stroke within 1 year of initial atrial fibrillation diagnosis for females with atrial fibrillation. This was comparable with existing validation studies. The validation network study was run across nine datasets within 60 days once the models were replicated. An R package for the study was published at https://github.com/OHDSI/StudyProtocolSandbox/tree/master/ExistingStrokeRiskExternalValidation.Conclusion : This study demonstrates the ability to scale up external validation of patient-level prediction models using a collaboration of researchers and a data standardization that enable models to be readily shared across data sites. External validation is necessary to understand the transportability or reproducibility of a prediction model, but without collaborative approaches it can take three or more years for a model to be validated by one independent researcher. In this paper we show it is possible to both scale-up and speed-up external validation by showing how validation can be done across multiple databases in less than 2 months. We recommend that researchers developing new prediction models use the OHDSI network to externally validate their models.


2019 ◽  
Author(s):  
Anna Mikhaylova ◽  
Timothy Thornton

AbstractPredicting gene expression with genetic data has garnered significant attention in recent years. PrediXcan is one of the most widely used gene-based association methods for testing imputed gene expression values with a phenotype due to the invaluable insight the method has shown into the relationship between complex traits and the component of gene expression that can be attributed to genetic variation. The prediction models for PrediXcan, however, were obtained using supervised machine learning methods and training data from the Depression and Gene Network (DGN) and the Genotype-Tissue Expression (GTEx) data, where the majority of subjects are of European descent. Many genetic studies, however, include samples from multi-ethnic populations, and in this paper we assess the accuracy of gene expression predictions with PrediXcan in diverse populations. Using transcriptomic data from the GEUVADIS (Genetic European Variation in Health and Disease) RNA sequencing project and whole genome sequencing data from the 1000 Genomes project, we evaluate and compare the predictive performance of PrediXcan in an African population (Yoruban) and four European populations. Prediction results are obtained using a range of models from PrediXcan weight databases, and Pearson’s correlation coefficient is used to measure prediction accuracy. We demonstrate that the predictive performance of PrediXcan varies across populations (F-test p-value < 0.001), where prediction accuracy is the worst in the Yoruban sample compared to European samples. Moreover, the performance of PrediXcan varies not only among distant populations, but also among closely related populations as well. We also find that the qualitative performance of PrediXcan for the populations considered is consistent across all weight databases used.


2018 ◽  
Vol 28 (9) ◽  
pp. 2768-2786 ◽  
Author(s):  
Thomas PA Debray ◽  
Johanna AAG Damen ◽  
Richard D Riley ◽  
Kym Snell ◽  
Johannes B Reitsma ◽  
...  

It is widely recommended that any developed—diagnostic or prognostic—prediction model is externally validated in terms of its predictive performance measured by calibration and discrimination. When multiple validations have been performed, a systematic review followed by a formal meta-analysis helps to summarize overall performance across multiple settings, and reveals under which circumstances the model performs suboptimal (alternative poorer) and may need adjustment. We discuss how to undertake meta-analysis of the performance of prediction models with either a binary or a time-to-event outcome. We address how to deal with incomplete availability of study-specific results (performance estimates and their precision), and how to produce summary estimates of the c-statistic, the observed:expected ratio and the calibration slope. Furthermore, we discuss the implementation of frequentist and Bayesian meta-analysis methods, and propose novel empirically-based prior distributions to improve estimation of between-study heterogeneity in small samples. Finally, we illustrate all methods using two examples: meta-analysis of the predictive performance of EuroSCORE II and of the Framingham Risk Score. All examples and meta-analysis models have been implemented in our newly developed R package “metamisc”.


2020 ◽  
Author(s):  
Jenna Marie Reps ◽  
Ross D Williams ◽  
Seng Chan You ◽  
Thomas Falconer ◽  
Evan Minty ◽  
...  

Abstract Background To demonstrate how the Observational Healthcare Data Science and Informatics (OHDSI) collaborative network and standardization can be utilized to scale-up external validation of patient-level prediction models by enabling validation across a large number of heterogeneous observational healthcare datasets.Methods Five previously published prognostic models (ATRIA, CHADS2, CHADS2VASC, Q-Stroke and Framingham) that predict future risk of stroke in patients with atrial fibrillation were replicated using the OHDSI frameworks. A network study was run that enabled the five models to be externally validated across nine observational healthcare datasets spanning three countries and five independent sites. Results The five existing models were able to be integrated into the OHDSI framework for patient-level prediction and they obtained mean c-statistics ranging between 0.57-0.63 across the 6 databases with sufficient data to predict stroke within 1 year of initial atrial fibrillation diagnosis for females with atrial fibrillation. This was comparable with existing validation studies. The validation network study was run across nine datasets within 60 days once the models were replicated. An R package for the study was published at https://github.com/OHDSI/StudyProtocolSandbox/tree/master/ExistingStrokeRiskExternalValidation .Conclusion This study demonstrates the ability to scale up external validation of patient-level prediction models using a collaboration of researchers and a data standardization that enable models to be readily shared across data sites. External validation is necessary to understand the transportability or reproducibility of a prediction model, but without collaborative approaches it can take three or more years for a model to be validated by one independent researcher. In this paper we show it is possible to both scale-up and speed-up external validation by showing how validation can be done across multiple databases in less than 2 months. We recommend that researchers developing new prediction models use the OHDSI network to externally validate their models.


Author(s):  
Renáta Németh ◽  
Fanni Máté ◽  
Eszter Katona ◽  
Márton Rakovics ◽  
Domonkos Sik

AbstractSupervised machine learning on textual data has successful industrial/business applications, but it is an open question whether it can be utilized in social knowledge building outside the scope of hermeneutically more trivial cases. Combining sociology and data science raises several methodological and epistemological questions. In our study the discursive framing of depression is explored in online health communities. Three discursive frameworks are introduced: the bio-medical, psychological, and social framings of depression. ~80 000 posts were collected, and a sample of them was manually classified. Conventional bag-of-words models, Gradient Boosting Machine, word-embedding-based models and a state-of-the-art Transformer-based model with transfer learning, called DistilBERT were applied to expand this classification on the whole database. According to our experience ‘discursive framing’ proves to be a complex and hermeneutically difficult concept, which affects the degree of both inter-annotator agreement and predictive performance. Our finding confirms that the level of inter-annotator disagreement provides a good estimate for the objective difficulty of the classification. By identifying the most important terms, we also interpreted the classification algorithms, which is of great importance in social sciences. We are convinced that machine learning techniques can extend the horizon of qualitative text analysis. Our paper supports a smooth fit of the new techniques into the traditional toolbox of social sciences.


2019 ◽  
Author(s):  
Jenna Marie Reps ◽  
Ross Williams ◽  
Seng Chan You ◽  
Thomas Falconer ◽  
Evan Minty ◽  
...  

Abstract Objective To demonstrate how the Observational Healthcare Data Science and Informatics (OHDSI) collaborative network and standardization can be utilized to externally validate patient-level prediction models at scale. Materials & Methods Five previously published prognostic models (ATRIA, CHADS2, CHADS2VASC, Q-Stroke and Framingham) that predict future risk of stroke in patients with atrial fibrillation were replicated using the OHDSI frameworks and a network study was run that enabled the five models to be externally validated across nine datasets spanning three countries and five independent sites. Results The five existing models were able to be integrated into the OHDSI framework for patient-level prediction and their performances in predicting stroke within 1 year of initial atrial fibrillation diagnosis for females were comparable with existing studies. The validation network study took 60 days once the models were replicated and an R package for the study was published to collaborators at https://github.com/OHDSI/StudyProtocolSandbox/tree/master/ExistingStrokeRiskExternalValidation. Discussion This study demonstrates the ability to scale up external validation of patient-level prediction models using a collaboration of researchers and data standardization that enable models to be readily shared across data sites. External validation is necessary to understand the transportability and reproducibility of prediction models, but without collaborative approaches it can take three or more years to be validated by one independent researcher. Conclusion In this paper we show it is possible to both scale-up and speed-up external validation by showing how validation can be done across multiple databases in less than 2 months.


2018 ◽  
Vol 69 (5) ◽  
pp. 1240-1243
Author(s):  
Manuela Arbune ◽  
Mioara Decusara ◽  
Luana Andreea Macovei ◽  
Aurelia Romila ◽  
Alina Viorica Iancu ◽  
...  

The aim of the present study was to characterize the antibiotic resistance profile of enterobacteriaceae strains isolated in Infectious Diseases Hospital Galati, Romania, during 2016, in order to guide the local antibiotic stewardship strategy. There are 597 biological samples with positive cultures for enterobacteriaceae, related to invasive and non-invasive infections. The main bacterial genus were E. coli 62%, Klebsiella spp 15%, Proteus spp 11% and Salmonella spp 6%. Over a half of isolated strains have one or more antibiotic resistance. The resistance level depends on bacterial genus, with highest level found among the rare isolates: Enterobacter spp, Citrobacter spp, Morganella spp and Serratia spp. The rate of MDR was 17.,6% for E. coli, 40.9% for Klebsiella spp and 50.7% for Proteus spp. while the rate of strains producing Extended Spectrum of Beta Lactamase are 7.2% for E. coli, 28.4% for Klebsiella spp and 12.3% for Proteus spp. The carbapenem resistant strains were found in 1.1% cases.


2020 ◽  
Vol 70 (12) ◽  
pp. 4287-4294

Cancer is the second leading cause of death in Romania and worldwide. Cancer patients are at increasing risk of acquiring bacterial infection with multi-resistant germs, including multidrug-resistant (MDR) strains of Gram-negative bacteria involved in nosocomial infection. Romania is one of the South-Eastern European countries with one of the highest prevalence rates of MDR pathogens. To determine the resistance pattern of bacterial profile and antibiotic resistance pattern in cancer patients admitted at the County Emergency Clinical Hospital Craiova, Romania. A retrospective study of bacterial pathogens was carried out on 90 adult cancer patients admitted from January to December 2018. The analysis of the resistance patterns for the action of the appropriate antibiotics was performed using Vitek 2 Compact system and diffusion method. In this study there were analysed 92 samples from 90 oncological patients (37-86 years). A total of 157 bacterial isolates were obtained, of which 37 strains of Staphylococcus aureus (23.56%), followed by Streptococcus pneumoniae (23- 14.64%), Klebsiella spp. and Escherichia coli (22 - 14,01%). The most common isolates were from respiratory tract (86 isolates - 54.77%). High rates of MDR were found for E. coli (63.63%), MRSA (61,11%) and Klebsiella spp. (54,54%), while one third of the isolated strains of Pseudomonas aeruginosa, Acinetobacter spp. and Proteus spp. were MDR. The findings of this study may be the basis for further more extensive studies highlighting the germs involved in the infectious pathology of cancer patients, in order to determine the antimicrobial resistance and to improve the methods of prophylaxis and treatment. Keywords: multidrug resistance (MDR), cancer patients, bacterial pathogen


2020 ◽  
Vol 28 (2) ◽  
pp. 253-265 ◽  
Author(s):  
Gabriela Bitencourt-Ferreira ◽  
Amauri Duarte da Silva ◽  
Walter Filgueira de Azevedo

Background: The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. Objective: Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. Methods: We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. Results: Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. Conclusion: Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.


Sign in / Sign up

Export Citation Format

Share Document