scholarly journals Identifying genes associated with invasive disease in S. pneumoniae by applying a machine learning approach to whole genome sequence typing data

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Uri Obolski ◽  
Andrea Gori ◽  
José Lourenço ◽  
Craig Thompson ◽  
Robin Thompson ◽  
...  
2017 ◽  
Vol 5 (3) ◽  
Author(s):  
Mariam Iskander ◽  
Kristy Hayden ◽  
Gary Van Domselaar ◽  
Raymond Tsang

ABSTRACT Haemophilus influenzae is an important human pathogen that primarily infects small children. In recent years, H. influenzae serotype a has emerged as a significant cause of invasive disease among indigenous populations. Here, we present the first complete whole-genome sequence of H. influenzae serotype a.


2017 ◽  
Vol 5 (13) ◽  
Author(s):  
Maria Giufrè ◽  
Rita Cardines ◽  
Marina Cerquetti

ABSTRACT In the present era of conjugate vaccines against Haemophilus influenzae type b, non-vaccine-preventable strains are of concern. Here, we report the first whole-genome sequence of an invasive H. influenzae type e strain. This genomic information will enable further investigations on encapsulated non-type b H. influenzae strains.


2018 ◽  
Author(s):  
Uri Obolski ◽  
Andrea Gori ◽  
José Lourenço ◽  
Craig Thompson ◽  
Robin Thompson ◽  
...  

AbstractStreptococcus pneumoniaeis a normal commensal of the upper respiratory tract but can also invade the bloodstream or CSF (cerebrospinal fluid), causing invasive pneumococcal disease (IPD). In this study, we attempt to identify genes associated with IPD by applying a random forest machine-learning algorithm to whole genome sequence (WGS) data. We find 43 genes consistently associated with IPD across three geographically distinct WGS data sets of pneumococcal carriage isolates. Of these genes, 23 genes have previously shown to be directly relevant to IPD, while the other 18 are uncharacterized.


2018 ◽  
Author(s):  
Nathan Wan ◽  
David Weinberg ◽  
Tzu-Yu Liu ◽  
Katherine Niehaus ◽  
Daniel Delubac ◽  
...  

AbstractBackgroundBlood-based methods using cell-free DNA (cfDNA) are under development as an alternative to existing screening tests. However, early-stage detection of cancer using tumor-derived cfDNA has proven challenging because of the small proportion of cfDNA derived from tumor tissue in early-stage disease. A machine learning approach to discover signatures in cfDNA, potentially reflective of both tumor and non-tumor contributions, may represent a promising direction for the early detection of cancer.MethodsWhole-genome sequencing was performed on cfDNA extracted from plasma samples (N=546 colorectal cancer and 271 non-cancer controls). Reads aligning to protein-coding gene bodies were extracted, and read counts were normalized. cfDNA tumor fraction was estimated using IchorCNA. Machine learning models were trained using k-fold cross-validation and confounder-based cross-validation to assess generalization performance.ResultsIn a colorectal cancer cohort heavily weighted towards early-stage cancer (80% stage I/II), we achieved a mean AUC of 0.92 (95% CI 0.91-0.93) with a mean sensitivity of 85% (95% CI 83-86%) at 85% specificity. Sensitivity generally increased with tumor stage and increasing tumor fraction. Stratification by age, sequencing batch, and institution demonstrated the impact of these confounders and provided a more accurate assessment of generalization performance.ConclusionsA machine learning approach using cfDNA achieved high sensitivity and specificity in a large, predominantly early-stage, colorectal cancer cohort. The possibility of systematic technical and institution-specific biases warrants similar confounder analyses in other studies. Prospective validation of this machine learning method and evaluation of a multi-analyte approach are underway.


Sign in / Sign up

Export Citation Format

Share Document