scholarly journals Epidemiological data analysis of viral quasispecies in the next-generation sequencing era

Author(s):  
Sergey Knyazev ◽  
Lauren Hughes ◽  
Pavel Skums ◽  
Alexander Zelikovsky

Abstract The unprecedented coverage offered by next-generation sequencing (NGS) technology has facilitated the assessment of the population complexity of intra-host RNA viral populations at an unprecedented level of detail. Consequently, analysis of NGS datasets could be used to extract and infer crucial epidemiological and biomedical information on the levels of both infected individuals and susceptible populations, thus enabling the development of more effective prevention strategies and antiviral therapeutics. Such information includes drug resistance, infection stage, transmission clusters and structures of transmission networks. However, NGS data require sophisticated analysis dealing with millions of error-prone short reads per patient. Prior to the NGS era, epidemiological and phylogenetic analyses were geared toward Sanger sequencing technology; now, they must be redesigned to handle the large-scale NGS datasets and properly model the evolution of heterogeneous rapidly mutating viral populations. Additionally, dedicated epidemiological surveillance systems require big data analytics to handle millions of reads obtained from thousands of patients for rapid outbreak investigation and management. We survey bioinformatics tools analyzing NGS data for (i) characterization of intra-host viral population complexity including single nucleotide variant and haplotype calling; (ii) downstream epidemiological analysis and inference of drug-resistant mutations, age of infection and linkage between patients; and (iii) data collection and analytics in surveillance systems for fast response and control of outbreaks.

2016 ◽  
Vol 62 (4) ◽  
pp. 647-654 ◽  
Author(s):  
Tyler F Beck ◽  
James C Mullikin ◽  
Leslie G Biesecker ◽  

Abstract BACKGROUND Next-generation sequencing (NGS) data are used for both clinical care and clinical research. DNA sequence variants identified using NGS are often returned to patients/participants as part of clinical or research protocols. The current standard of care is to validate NGS variants using Sanger sequencing, which is costly and time-consuming. METHODS We performed a large-scale, systematic evaluation of Sanger-based validation of NGS variants using data from the ClinSeq® project. We first used NGS data from 19 genes in 5 participants, comparing them to high-throughput Sanger sequencing results on the same samples, and found no discrepancies among 234 NGS variants. We then compared NGS variants in 5 genes from 684 participants against data from Sanger sequencing. RESULTS Of over 5800 NGS-derived variants, 19 were not validated by Sanger data. Using newly designed sequencing primers, Sanger sequencing confirmed 17 of the NGS variants, and the remaining 2 variants had low quality scores from exome sequencing. Overall, we measured a validation rate of 99.965% for NGS variants using Sanger sequencing, which was higher than many existing medical tests that do not necessitate orthogonal validation. CONCLUSIONS A single round of Sanger sequencing is more likely to incorrectly refute a true-positive variant from NGS than to correctly identify a false-positive variant from NGS. Validation of NGS-derived variants using Sanger sequencing has limited utility, and best practice standards should not include routine orthogonal Sanger validation of NGS variants.


Cancers ◽  
2021 ◽  
Vol 13 (13) ◽  
pp. 3148
Author(s):  
Youngjun Park ◽  
Dominik Heider ◽  
Anne-Christin Hauschild

The rapid improvement of next-generation sequencing (NGS) technologies and their application in large-scale cohorts in cancer research led to common challenges of big data. It opened a new research area incorporating systems biology and machine learning. As large-scale NGS data accumulated, sophisticated data analysis methods became indispensable. In addition, NGS data have been integrated with systems biology to build better predictive models to determine the characteristics of tumors and tumor subtypes. Therefore, various machine learning algorithms were introduced to identify underlying biological mechanisms. In this work, we review novel technologies developed for NGS data analysis, and we describe how these computational methodologies integrate systems biology and omics data. Subsequently, we discuss how deep neural networks outperform other approaches, the potential of graph neural networks (GNN) in systems biology, and the limitations in NGS biomedical research. To reflect on the various challenges and corresponding computational solutions, we will discuss the following three topics: (i) molecular characteristics, (ii) tumor heterogeneity, and (iii) drug discovery. We conclude that machine learning and network-based approaches can add valuable insights and build highly accurate models. However, a well-informed choice of learning algorithm and biological network information is crucial for the success of each specific research question.


2019 ◽  
Vol 25 (31) ◽  
pp. 3350-3357 ◽  
Author(s):  
Pooja Tripathi ◽  
Jyotsna Singh ◽  
Jonathan A. Lal ◽  
Vijay Tripathi

Background: With the outbreak of high throughput next-generation sequencing (NGS), the biological research of drug discovery has been directed towards the oncology and infectious disease therapeutic areas, with extensive use in biopharmaceutical development and vaccine production. Method: In this review, an effort was made to address the basic background of NGS technologies, potential applications of NGS in drug designing. Our purpose is also to provide a brief introduction of various Nextgeneration sequencing techniques. Discussions: The high-throughput methods execute Large-scale Unbiased Sequencing (LUS) which comprises of Massively Parallel Sequencing (MPS) or NGS technologies. The Next geneinvolved necessarily executes Largescale Unbiased Sequencing (LUS) which comprises of MPS or NGS technologies. These are related terms that describe a DNA sequencing technology which has revolutionized genomic research. Using NGS, an entire human genome can be sequenced within a single day. Conclusion: Analysis of NGS data unravels important clues in the quest for the treatment of various lifethreatening diseases and other related scientific problems related to human welfare.


Author(s):  
Anne Krogh Nøhr ◽  
Kristian Hanghøj ◽  
Genis Garcia Erill ◽  
Zilong Li ◽  
Ida Moltke ◽  
...  

Abstract Estimation of relatedness between pairs of individuals is important in many genetic research areas. When estimating relatedness, it is important to account for admixture if this is present. However, the methods that can account for admixture are all based on genotype data as input, which is a problem for low-depth next-generation sequencing (NGS) data from which genotypes are called with high uncertainty. Here we present a software tool, NGSremix, for maximum likelihood estimation of relatedness between pairs of admixed individuals from low-depth NGS data, which takes the uncertainty of the genotypes into account via genotype likelihoods. Using both simulated and real NGS data for admixed individuals with an average depth of 4x or below we show that our method works well and clearly outperforms all the commonly used state-of-the-art relatedness estimation methods PLINK, KING, relateAdmix, and ngsRelate that all perform quite poorly. Hence, NGSremix is a useful new tool for estimating relatedness in admixed populations from low-depth NGS data. NGSremix is implemented in C/C ++ in a multi-threaded software and is freely available on Github https://github.com/KHanghoj/NGSremix.


Molecules ◽  
2018 ◽  
Vol 23 (2) ◽  
pp. 399 ◽  
Author(s):  
Sima Taheri ◽  
Thohirah Lee Abdullah ◽  
Mohd Yusop ◽  
Mohamed Hanafi ◽  
Mahbod Sahebi ◽  
...  

PLoS ONE ◽  
2015 ◽  
Vol 10 (10) ◽  
pp. e0139868 ◽  
Author(s):  
Mohan A. V. S. K. Katta ◽  
Aamir W. Khan ◽  
Dadakhalandar Doddamani ◽  
Mahendar Thudi ◽  
Rajeev K. Varshney

Blood ◽  
2015 ◽  
Vol 126 (23) ◽  
pp. 3854-3854 ◽  
Author(s):  
Amy E Knight Johnson ◽  
Lucia Guidugli ◽  
Kelly Arndt ◽  
Gorka Alkorta-Aranburu ◽  
Viswateja Nelakuditi ◽  
...  

Abstract Introduction: Myelodysplastic syndrome (MDS) and acute leukemia (AL) are a clinically diverse and genetically heterogeneous group of hematologic malignancies. Familial forms of MDS/AL have been increasingly recognized in recent years, and can occur as a primary event or secondary to genetic syndromes, such as inherited bone marrow failure syndromes (IBMFS). It is critical to confirm a genetic diagnosis in patients with hereditary predisposition to hematologic malignancies in order to provide prognostic information and cancer risk assessment, and to aid in identification of at-risk or affected family members. In addition, a molecular diagnosis can help tailor medical management including informing the selection of family members for allogeneic stem cell transplantation donors. Until recently, clinical testing options for this diverse group of hematologic malignancy predisposition genes were limited to the evaluation of single genes by Sanger sequencing, which is a time consuming and expensive process. To improve the diagnosis of hereditary predisposition to hematologic malignancies, our CLIA-licensed laboratory has recently developed Next-Generation Sequencing (NGS) panel-based testing for these genes. Methods: Thirty six patients with personal and/or family history of aplastic anemia, MDS or AL were referred for clinical diagnostic testing. DNA from the referred patients was obtained from cultured skin fibroblasts or peripheral blood and was utilized for preparing libraries with the SureSelectXT Enrichment System. Libraries were sequenced on an Illumina MiSeq instrument and the NGS data was analyzed with a custom bioinformatic pipeline, targeting a panel of 76 genes associated with IBMFS and/or familial MDS/AL. Results: Pathogenic and highly likely pathogenic variants were identified in 7 out of 36 patients analyzed, providing a positive molecular diagnostic rate of 20%. Overall, 6 out of the 7 pathogenic changes identified were novel. In 2 unrelated patients with MDS, heterozygous pathogenic sequence changes were identified in the GATA2 gene. Heterozygous pathogenic changes in the following autosomal dominant genes were each identified in a single patient: RPS26 (Diamond-Blackfan anemia 10), RUNX1 (familial platelet disorder with propensity to myeloid malignancy), TERT (dyskeratosis congenita 4) and TINF2 (dyskeratosis congenita 3). In addition, one novel heterozygous sequence change (c.826+5_826+9del, p.?) in the Fanconi anemia associated gene FANCA was identified. . The RNA analysis demonstrated this variant causes skipping of exon 9 and results in a premature stop codon in exon 10. Further review of the NGS data provided evidence of an additional large heterozygous multi-exon deletion in FANCA in the same patient. This large deletion was confirmed using array-CGH (comparative genomic hybridization). Conclusions: This study demonstrates the effectiveness of using NGS technology to identify patients with a hereditary predisposition to hematologic malignancies. As many of the genes associated with hereditary predisposition to hematologic malignancies have similar or overlapping clinical presentations, analysis of a diverse panel of genes is an efficient and cost-effective approach to molecular diagnostics for these disorders. Unlike Sanger sequencing, NGS technology also has the potential to identify large exonic deletions and duplications. In addition, RNA splicing assay has proven to be helpful in clarifying the pathogenicity of variants suspected to affect splicing. This approach will also allow for identification of a molecular defect in patients who may have atypical presentation of disease. Disclosures No relevant conflicts of interest to declare.


2020 ◽  
Vol 22 (Supplement_2) ◽  
pp. ii164-ii164
Author(s):  
Mary Jane Lim-Fat ◽  
Gilbert Youssef ◽  
Mehdi Touat ◽  
Bryan Iorgulescu ◽  
Eleanor Woodward ◽  
...  

Abstract BACKGROUND Comprehensive next generation sequencing (NGS) is available through many academic institutions and commercial entities, and is incorporated in practice guidelines for glioblastoma (GBM). We retrospective evaluated the practice patterns and utility of incorporating NGS data into routine care of GBM patients at a clinical trials-focused academic center. METHODS We identified 1,011 consecutive adult patients with histologically confirmed GBM with OncoPanel testing, a targeted exome NGS platform of 447 cancer-associated genes at Dana Farber Cancer Institute (DFCI), from 2013-2019. We selected and retrospectively reviewed clinical records of all IDH-wildtype GBM patients treated at DFCI. RESULTS We identified 557 GBM IDH-wildtype patients, of which 227 were male (40.7%). OncoPanel testing revealed 833 single nucleotide variants and indels in 44 therapeutically relevant genes (Tier 1 or 2 mutations) including PIK3CA (n=51), BRAF (n=9), FGFR1 (n=8), MSH2 (n=4), MSH6 (n=2) and MLH1 (n=1). Copy number analysis revealed 509 alterations in 18 therapeutically relevant genes including EGFR amplification (n= 186), PDGFRA amplification (N=39) and CDKN2A/2B homozygous loss (N=223). Median overall survival was 17.5 months for the whole cohort. Seventy-four therapeutic clinical trials accrued 144 patients in the upfront setting (25.9%) and 203 patients (36.4%) at recurrence. Altogether, NGS data for 107 patients (19.2%) were utilized for clinical trial enrollment or targeted therapy indications. High mutational burden (>17mutations/Mb) was identified in 11/464 samples (2.4%); of whom 3/11 received immune checkpoint blockade. Four patients received compassionate use therapy targeting EGFRvIII (rindopepimut, n=2), CKD4/6 (abemaciclib, n=1) and BRAFV600E (dabrafenib/trametinib, n=1). CONCLUSION While NGS has greatly improved diagnosis and molecular classification, we highlight that NGS remains underutilized in selecting therapy in GBM, even in a setting where clinical trials and off-label therapies are relatively accessible. Continued efforts to develop better targeted therapies and efficient clinical trial design are required to maximize the potential benefits of genomically-stratified data.


F1000Research ◽  
2015 ◽  
Vol 4 ◽  
pp. 50 ◽  
Author(s):  
Michael T. Wolfinger ◽  
Jörg Fallmann ◽  
Florian Eggenhofer ◽  
Fabian Amman

Recent achievements in next-generation sequencing (NGS) technologies lead to a high demand for reuseable software components to easily compile customized analysis workflows for big genomics data. We present ViennaNGS, an integrated collection of Perl modules focused on building efficient pipelines for NGS data processing. It comes with functionality for extracting and converting features from common NGS file formats, computation and evaluation of read mapping statistics, as well as normalization of RNA abundance. Moreover, ViennaNGS provides software components for identification and characterization of splice junctions from RNA-seq data, parsing and condensing sequence motif data, automated construction of Assembly and Track Hubs for the UCSC genome browser, as well as wrapper routines for a set of commonly used NGS command line tools.


Sign in / Sign up

Export Citation Format

Share Document