scholarly journals Integrative Analysis of Next-Generation Sequencing for Next-Generation Cancer Research toward Artificial Intelligence

Cancers ◽  
2021 ◽  
Vol 13 (13) ◽  
pp. 3148
Author(s):  
Youngjun Park ◽  
Dominik Heider ◽  
Anne-Christin Hauschild

The rapid improvement of next-generation sequencing (NGS) technologies and their application in large-scale cohorts in cancer research led to common challenges of big data. It opened a new research area incorporating systems biology and machine learning. As large-scale NGS data accumulated, sophisticated data analysis methods became indispensable. In addition, NGS data have been integrated with systems biology to build better predictive models to determine the characteristics of tumors and tumor subtypes. Therefore, various machine learning algorithms were introduced to identify underlying biological mechanisms. In this work, we review novel technologies developed for NGS data analysis, and we describe how these computational methodologies integrate systems biology and omics data. Subsequently, we discuss how deep neural networks outperform other approaches, the potential of graph neural networks (GNN) in systems biology, and the limitations in NGS biomedical research. To reflect on the various challenges and corresponding computational solutions, we will discuss the following three topics: (i) molecular characteristics, (ii) tumor heterogeneity, and (iii) drug discovery. We conclude that machine learning and network-based approaches can add valuable insights and build highly accurate models. However, a well-informed choice of learning algorithm and biological network information is crucial for the success of each specific research question.

2016 ◽  
Vol 62 (4) ◽  
pp. 647-654 ◽  
Author(s):  
Tyler F Beck ◽  
James C Mullikin ◽  
Leslie G Biesecker ◽  

Abstract BACKGROUND Next-generation sequencing (NGS) data are used for both clinical care and clinical research. DNA sequence variants identified using NGS are often returned to patients/participants as part of clinical or research protocols. The current standard of care is to validate NGS variants using Sanger sequencing, which is costly and time-consuming. METHODS We performed a large-scale, systematic evaluation of Sanger-based validation of NGS variants using data from the ClinSeq® project. We first used NGS data from 19 genes in 5 participants, comparing them to high-throughput Sanger sequencing results on the same samples, and found no discrepancies among 234 NGS variants. We then compared NGS variants in 5 genes from 684 participants against data from Sanger sequencing. RESULTS Of over 5800 NGS-derived variants, 19 were not validated by Sanger data. Using newly designed sequencing primers, Sanger sequencing confirmed 17 of the NGS variants, and the remaining 2 variants had low quality scores from exome sequencing. Overall, we measured a validation rate of 99.965% for NGS variants using Sanger sequencing, which was higher than many existing medical tests that do not necessitate orthogonal validation. CONCLUSIONS A single round of Sanger sequencing is more likely to incorrectly refute a true-positive variant from NGS than to correctly identify a false-positive variant from NGS. Validation of NGS-derived variants using Sanger sequencing has limited utility, and best practice standards should not include routine orthogonal Sanger validation of NGS variants.


Author(s):  
Sergey Knyazev ◽  
Lauren Hughes ◽  
Pavel Skums ◽  
Alexander Zelikovsky

Abstract The unprecedented coverage offered by next-generation sequencing (NGS) technology has facilitated the assessment of the population complexity of intra-host RNA viral populations at an unprecedented level of detail. Consequently, analysis of NGS datasets could be used to extract and infer crucial epidemiological and biomedical information on the levels of both infected individuals and susceptible populations, thus enabling the development of more effective prevention strategies and antiviral therapeutics. Such information includes drug resistance, infection stage, transmission clusters and structures of transmission networks. However, NGS data require sophisticated analysis dealing with millions of error-prone short reads per patient. Prior to the NGS era, epidemiological and phylogenetic analyses were geared toward Sanger sequencing technology; now, they must be redesigned to handle the large-scale NGS datasets and properly model the evolution of heterogeneous rapidly mutating viral populations. Additionally, dedicated epidemiological surveillance systems require big data analytics to handle millions of reads obtained from thousands of patients for rapid outbreak investigation and management. We survey bioinformatics tools analyzing NGS data for (i) characterization of intra-host viral population complexity including single nucleotide variant and haplotype calling; (ii) downstream epidemiological analysis and inference of drug-resistant mutations, age of infection and linkage between patients; and (iii) data collection and analytics in surveillance systems for fast response and control of outbreaks.


2017 ◽  
Vol 3 (3) ◽  
pp. 44 ◽  
Author(s):  
Radek Vodicka ◽  
Radek Vrtel ◽  
Katerina Mensikova ◽  
Petr Kanovsky ◽  
Iva Dolinova ◽  
...  

Parkinson's disease (PD) can be caused by genetic changes in a lot of genes. The effect of these changes is determined by the nature of the mutation and ranges from weak associations to pathogenic mutation which leads to loss of protein function. Our study is based on epidemiological data which show significantly increased prevalence of PD (2.9 %) in an isolated population of South-Eastern Moravia in the Czech Republic. We compared two different Next Generation Sequencing (NGS) data analysis approaches in DNA from 28 PD patients in the genes responsible for Parkinsonism (ADH1C, ATP13A2, EIF4G1, FBXO7, GBA + GBAP1, GIGYF2, HTRA2, LRRK2, MAPT, PARK2, PARK7, PINK1, PLA2G6, SNCA, UCHL1 and VPS35) using: 1) already described missense rare variants or pathogenic mutations 2) twelve control DNA samples from the same isolated population. Ion Torrent NGS data processing and trimming from Fastaq through “bam” to “vcf” files was done parallely by Torrent Suite/Ion Reporter and NextGENe software. After filtering out, three missense mutations were found in LRRK2 gene: rs33995883 in 6/0 patients/control (p/c); rs33958906 in 1/1p/c; rs781737269 in 3/0p/c; one missense mutation in MAPT gene rs63750072 in 6/1p/c; and one mutation in HTRA2 gene rs72470545 in 3/1p/c. Both the results from NextGENe with Ion Torrent adaptation and from Ion Reporter significantly correlated in variant calling. Our study may contribute to further explain the genetic background of Parkinsonism.


2021 ◽  
pp. 401-410
Author(s):  
Anna S. Sowa ◽  
Lisa Dussling ◽  
Jörg Hagmann ◽  
Sebastian J. Schultheiss

Abstract The wide application of next-generation sequencing (NGS) has facilitated and accelerated causal gene finding and breeding in the field of plant sciences. A wide variety of techniques and computational strategies is available that needs to be appropriately tailored to the species, genetic architecture of the trait of interest, breeding system and available resources. Utilizing these NGS methods, the typical computational steps of marker discovery, genetic mapping and identification of causal mutations can be achieved in a single step in a cost- and time-efficient manner. Rather than focusing on a few high-impact genetic variants that explain phenotypes, increased computational power allows modelling of phenotypes based on genome-wide molecular markers, known as genomic selection (GS). Solely based on this genotype information, modern GS approaches can accurately predict breeding values for a given trait (the average effects of alleles over all loci that are anticipated to be transferred from the parent to the progeny) based on a large training population of genotyped and phenotyped individuals (Crossa et al., 2017). Once trained, the model offers great reductions in breeding speed and costs. We advocate for improving conventional GS methods by applying advanced techniques based on machine learning (ML) and outline how this approach can also be used for causal gene finding. Subsequent to genetic causes of agronomically important traits, epigenetic mechanisms such as DNA methylation play a crucial role in shaping phenotypes and can become interesting targets in breeding pipelines. We highlight an ML approach shown to detect functional methylation changes sensitively from NGS data. We give an overview about commonly applied strategies and provide practical considerations in choosing and performing NGS-based gene finding and NGS-assisted breeding.


Sign in / Sign up

Export Citation Format

Share Document