A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing

Context.—Advances in sequencing technology with the commercialization of next-generation sequencing (NGS) has substantially increased the feasibility of sequencing human genomes and exomes. Next-generation sequencing has been successfully applied to the discovery of disease-causing genes in rare, inherited disorders. By necessity, the advent of NGS has fostered the concurrent development of bioinformatics approaches to expeditiously analyze the large data sets generated. Next-generation sequencing has been used for important discoveries in the research setting and is now being implemented into the clinical diagnostic arena. Objective.—To review the current literature on technical and bioinformatics approaches for exome and genome sequencing and highlight examples of successful disease gene discovery in inherited disorders. To discuss the challenges for implementing NGS in the clinical research and diagnostic arenas. Data Sources.—Literature review and authors' experience. Conclusions.—Next-generation sequencing approaches are powerful and require an investment in infrastructure and personnel expertise for effective use; however, the potential for improvement of patient care through faster and more accurate molecular diagnoses is high.

Download Full-text

Next Generation Sequencing Technology in the Clinic and Its Challenges

Cancers ◽

10.3390/cancers13081751 ◽

2021 ◽

Vol 13 (8) ◽

pp. 1751

Author(s):

Lau K. Vestergaard ◽

Douglas N. P. Oliveira ◽

Claus K. Høgdall ◽

Estrid V. Høgdall

Keyword(s):

Next Generation Sequencing ◽

Large Data ◽

Data Sets ◽

Next Generation ◽

Biological Drugs ◽

Next Generation Sequencing Technology ◽

Bioinformatic Tools ◽

Sequencing Technologies ◽

Genome Complexity ◽

Generation Sequencing

Data analysis has become a crucial aspect in clinical oncology to interpret output from next-generation sequencing-based testing. NGS being able to resolve billions of sequencing reactions in a few days has consequently increased the demand for tools to handle and analyze such large data sets. Many tools have been developed since the advent of NGS, featuring their own peculiarities. Increased awareness when interpreting alterations in the genome is therefore of utmost importance, as the same data using different tools can provide diverse outcomes. Hence, it is crucial to evaluate and validate bioinformatic pipelines in clinical settings. Moreover, personalized medicine implies treatment targeting efficacy of biological drugs for specific genomic alterations. Here, we focused on different sequencing technologies, features underlying the genome complexity, and bioinformatic tools that can impact the final annotation. Additionally, we discuss the clinical demand and design for implementing NGS.

Download Full-text

k-mer-Based Metagenomics Tools Provide a Fast and Sensitive Approach for the Detection of Viral Contaminants in Biopharmaceutical and Vaccine Manufacturing Applications Using Next-Generation Sequencing

mSphere ◽

10.1128/msphere.01336-20 ◽

2021 ◽

Vol 6 (2) ◽

Author(s):

Madolyn L. MacDonald ◽

Shawn W. Polson ◽

Kelvin H. Lee

Keyword(s):

Next Generation Sequencing ◽

Sensitivity And Specificity ◽

Data Sets ◽

Next Generation ◽

Viral Detection ◽

Read Alignment ◽

Ensure Patient Safety ◽

Hela Cell Lines ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

ABSTRACT Adventitious agent detection during the production of vaccines and biotechnology-based medicines is of critical importance to ensure the final product is free from any possible viral contamination. Increasing the speed and accuracy of viral detection is beneficial as a means to accelerate development timelines and to ensure patient safety. Here, several rapid viral metagenomics approaches were tested on simulated next-generation sequencing (NGS) data sets and existing data sets from virus spike-in studies done in CHO-K1 and HeLa cell lines. It was observed that these rapid methods had comparable sensitivity to full-read alignment methods used for NGS viral detection for these data sets, but their specificity could be improved. A method that first filters host reads using KrakenUniq and then selects the virus classification tool based on the number of remaining reads is suggested as the preferred approach among those tested to detect nonlatent and nonendogenous viruses. Such an approach shows reasonable sensitivity and specificity for the data sets examined and requires less time and memory as full-read alignment methods. IMPORTANCE Next-generation sequencing (NGS) has been proposed as a complementary method to detect adventitious viruses in the production of biotherapeutics and vaccines to current in vivo and in vitro methods. Before NGS can be established in industry as a main viral detection technology, further investigation into the various aspects of bioinformatics analyses required to identify and classify viral NGS reads is needed. In this study, the ability of rapid metagenomics tools to detect viruses in biopharmaceutical relevant samples is tested and compared to recommend an efficient approach. The results showed that KrakenUniq can quickly and accurately filter host sequences and classify viral reads and had comparable sensitivity and specificity to slower full read alignment approaches, such as BLASTn, for the data sets examined.

Download Full-text

VikNGS: A C ++ Variant Integration Kit for Next Generation Sequencing Association Analysis

Bioinformatics ◽

10.1093/bioinformatics/btz716 ◽

2019 ◽

Cited By ~ 1

Author(s):

Zeynep Baskurt ◽

Scott Mastromatteo ◽

Jiafen Gong ◽

Richard F Wintle ◽

Stephen W Scherer ◽

...

Keyword(s):

Next Generation Sequencing ◽

Genetic Association ◽

Association Analysis ◽

Supplementary Information ◽

Next Generation Sequencing Data ◽

Data Sets ◽

Next Generation ◽

Sequencing Data ◽

Combining Data ◽

Generation Sequencing

Abstract Integration of next generation sequencing data (NGS) across different research studies can improve the power of genetic association testing by increasing sample size and can obviate the need for sequencing controls. If differential genotype uncertainty across studies is not accounted for, combining data sets can produce spurious association results. We developed the Variant Integration Kit for NGS (VikNGS), a fast cross-platform software package, to enable aggregation of several data sets for rare and common variant genetic association analysis of quantitative and binary traits with covariate adjustment. VikNGS also includes a graphical user interface, power simulation functionality and data visualization tools. Availability The VikNGS package can be downloaded at http://www.tcag.ca/tools/index.html. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Application of SNPViz v2.0 using next-generation sequencing data sets in the discovery of potential causative mutations in candidate genes associated with phenotypes

International Journal of Data Mining and Bioinformatics ◽

10.1504/ijdmb.2021.116886 ◽

2021 ◽

Vol 25 (1/2) ◽

pp. 65

Author(s):

Shuai Zeng ◽

Mária Škrabišová ◽

Zhen Lyu ◽

Yen On Chan ◽

Nicholas Dietz ◽

...

Keyword(s):

Next Generation Sequencing ◽

Candidate Genes ◽

Next Generation Sequencing Data ◽

Data Sets ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

A Cluster-Based Approach for the Discovery of Copy Number Variations From Next-Generation Sequencing Data

Frontiers in Genetics ◽

10.3389/fgene.2021.699510 ◽

2021 ◽

Vol 12 ◽

Author(s):

Guojun Liu ◽

Junying Zhang

Keyword(s):

Next Generation Sequencing ◽

Copy Number ◽

Read Depth ◽

Copy Number Variations ◽

Experimental Results ◽

Data Sets ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing ◽

Cnv Detection

The next-generation sequencing technology offers a wealth of data resources for the detection of copy number variations (CNVs) at a high resolution. However, it is still challenging to correctly detect CNVs of different lengths. It is necessary to develop new CNV detection tools to meet this demand. In this work, we propose a new CNV detection method, called CBCNV, for the detection of CNVs of different lengths from whole genome sequencing data. CBCNV uses a clustering algorithm to divide the read depth segment profile, and assigns an abnormal score to each read depth segment. Based on the abnormal score profile, Tukey’s fences method is adopted in CBCNV to forecast CNVs. The performance of the proposed method is evaluated on simulated data sets, and is compared with those of several existing methods. The experimental results prove that the performance of CBCNV is better than those of several existing methods. The proposed method is further tested and verified on real data sets, and the experimental results are found to be consistent with the simulation results. Therefore, the proposed method can be expected to become a routine tool in the analysis of CNVs from tumor-normal matched samples.

Download Full-text

Assembling and Validating Bioinformatic Pipelines for Next-Generation Sequencing Clinical Assays

Archives of Pathology & Laboratory Medicine ◽

10.5858/arpa.2019-0476-ra ◽

2020 ◽

Vol 144 (9) ◽

pp. 1118-1130 ◽

Cited By ~ 2

Author(s):

Jeffrey A SoRelle ◽

Megan Wachsmann ◽

Brandi L. Cantarel

Keyword(s):

Genetic Variation ◽

Next Generation Sequencing ◽

Proficiency Testing ◽

Real World ◽

Data Exchange ◽

Genetic Alterations ◽

Data Sets ◽

Next Generation ◽

Clinical Laboratories ◽

Generation Sequencing

Context.— Clinical next-generation sequencing (NGS) is being rapidly adopted, but analysis and interpretation of large data sets prompt new challenges for a clinical laboratory setting. Clinical NGS results rely heavily on the bioinformatics pipeline for identifying genetic variation in complex samples. The choice of bioinformatics algorithms, genome assembly, and genetic annotation databases are important for determining genetic alterations associated with disease. The analysis methods are often tuned to the assay to maximize accuracy. Once a pipeline has been developed, it must be validated to determine accuracy and reproducibility for samples similar to real-world cases. In silico proficiency testing or institutional data exchange will ensure consistency among clinical laboratories. Objective.— To provide molecular pathologists a step-by-step guide to bioinformatics analysis and validation design in order to navigate the regulatory and validation standards of implementing a bioinformatic pipeline as a part of a new clinical NGS assay. Data Sources.— This guide uses published studies on genomic analysis, bioinformatics methods, and methods comparison studies to inform the reader on what resources, including open source software tools and databases, are available for genetic variant detection and interpretation. Conclusions.— This review covers 4 key concepts: (1) bioinformatic analysis design for detecting genetic variation, (2) the resources for assessing genetic effects, (3) analysis validation assessment experiments and data sets, including a diverse set of samples to mimic real-world challenges that assess accuracy and reproducibility, and (4) if concordance between clinical laboratories will be improved by proficiency testing designed to test bioinformatic pipelines.

Download Full-text

Visualizing Next-Generation Sequencing Cancer Data Sets with Cloud Computing

Lecture Notes in Computer Science - Advanced Visual Interfaces. Supporting Big Data Applications ◽

10.1007/978-3-319-50070-6_4 ◽

2016 ◽

pp. 50-62 ◽

Cited By ~ 1

Author(s):

Paul Walsh ◽

Brendan Lawlor ◽

Brian Kelly ◽

Timmy Manning ◽

Timm Heuss ◽

...

Keyword(s):

Cloud Computing ◽

Next Generation Sequencing ◽

Data Sets ◽

Next Generation ◽

Cancer Data ◽

Generation Sequencing

Download Full-text

Adaptation and Validation of E-Probe Diagnostic Nucleic Acid Analysis for Detection of Escherichia coli O157:H7 in Metagenomic Data from Complex Food Matrices

Journal of Food Protection ◽

10.4315/0362-028x.jfp-15-440 ◽

2016 ◽

Vol 79 (4) ◽

pp. 574-581 ◽

Cited By ~ 5

Author(s):

TRENNA BLAGDEN ◽

WILLIAM SCHNEIDER ◽

ULRICH MELCHER ◽

JON DANIELS ◽

JACQUELINE FLETCHER

Keyword(s):

Escherichia Coli ◽

Next Generation Sequencing ◽

Nucleic Acid ◽

Next Generation Sequencing Data ◽

Data Sets ◽

Escherichia Coli O157 ◽

Next Generation ◽

Sequencing Data ◽

Nucleic Acid Analysis ◽

Generation Sequencing

ABSTRACT The Centers for Disease Control and Prevention recently emphasized the need for enhanced technologies to use in investigations of outbreaks of foodborne illnesses. To address this need, e-probe diagnostic nucleic acid analysis (EDNA) was adapted and validated as a tool for the rapid, effective identification and characterization of multiple pathogens in a food matrix. In EDNA, unassembled next generation sequencing data sets from food sample metagenomes are queried using pathogen-specific sequences known as electronic probes (e-probes). In this study, the query of mock sequence databases demonstrated the potential of EDNA for the detection of foodborne pathogens. The method was then validated using next generation sequencing data sets created by sequencing the metagenome of alfalfa sprouts inoculated with Escherichia coli O157:H7. Nonspecific hits in the negative control sample indicated the need for additional filtration of the e-probes to enhance specificity. There was no significant difference in the ability of an e-probe to detect the target pathogen based upon the length of the probe set oligonucleotides. The results from the queries of the sample database using E. coli e-probe sets were significantly different from those obtained using random decoy probe sets and exhibited 100% precision. The results support the use of EDNA as a rapid response methodology in foodborne outbreaks and investigations for establishing comprehensive microbial profiles of complex food samples.

Download Full-text