D3Oncoprint: Stand-Alone Software to Visualize and Dynamically Explore Annotated Genomic Mutation Files

Purpose Advances in next-generation sequencing technologies have led to a reduction in sequencing costs, which has increased the availability of genomic data sets to many laboratories. Increasing amounts of sequencing data require effective analysis tools to use genomic data for biologic discovery and patient management. Available packages typically require advanced programming knowledge and system administration privileges, or they are Web services that force researchers to work on outside servers. Methods To support the interactive exploration of genomic data sets on local machines with no programming skills required, we developed D3Oncoprint, a standalone application to visualize and dynamically explore annotated genomic mutation files. D3Oncoprint provides links to curated variants lists from CIViC, My Cancer Genome, OncoKB, and Food and Drug Administration–approved drugs to facilitate the use of genomic data for biomedical discovery and application. D3Oncoprint also includes curated gene lists from BioCarta pathways and FoundationOne cancer panels to explore commonly investigated biologic processes. Results This software provides a flexible environment to dynamically explore one or more variant mutation profiles provided as input. The focus on interactive visualization with biologic and medical annotation significantly lowers the barriers between complex genomics data and biomedical investigators. We describe how D3Oncoprint helps researchers explore their own data without the need for an extensive computational background. Conclusion D3Oncoprint is free software for noncommercial use. It is available for download from the Web site of the Biometric Research Program of the Division of Cancer Treatment and Diagnosis at the National Cancer Institute ( https://brb.nci.nih.gov/d3oncoprint ). We believe that this tool provides an important means of empowering researchers to translate information from collected data sets to biologic insights and clinical development.

Download Full-text

Measuring evolutionary cancer dynamics from genome sequencing, one patient at a time

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2020-0075 ◽

2020 ◽

Vol 0 (0) ◽

Author(s):

Giulio Caravagna

Keyword(s):

Genome Sequencing ◽

Cancer Evolution ◽

Sequencing Data ◽

Evolutionary Forces ◽

Sequencing Technologies ◽

Cancer Genome Sequencing ◽

Multiple Resolutions ◽

Multiple Patients ◽

Single Tumour ◽

Generation Sequencing

AbstractCancers progress through the accumulation of somatic mutations which accrue during tumour evolution, allowing some cells to proliferate in an uncontrolled fashion. This growth process is intimately related to latent evolutionary forces moulding the genetic and epigenetic composition of tumour subpopulations. Understanding cancer requires therefore the understanding of these selective pressures. The adoption of widespread next-generation sequencing technologies opens up for the possibility of measuring molecular profiles of cancers at multiple resolutions, across one or multiple patients. In this review we discuss how cancer genome sequencing data from a single tumour can be used to understand these evolutionary forces, overviewing mathematical models and inferential methods adopted in field of Cancer Evolution.

Download Full-text

Bridging the Gap between Vertebrate Cytogenetics and Genomics with Single-Chromosome Sequencing (ChromSeq)

Genes ◽

10.3390/genes12010124 ◽

2021 ◽

Vol 12 (1) ◽

pp. 124

Author(s):

Alessio Iannucci ◽

Alexey I. Makunin ◽

Artem P. Lisachov ◽

Claudio Ciofi ◽

Roscoe Stanyon ◽

...

Keyword(s):

Genome Evolution ◽

Karyotype Evolution ◽

Genomic Data ◽

Anolis Carolinensis ◽

Vertebrate Genome ◽

Single Chromosome ◽

Sequencing Technologies ◽

Novel Approaches ◽

Genome Assemblies ◽

Generation Sequencing

The study of vertebrate genome evolution is currently facing a revolution, brought about by next generation sequencing technologies that allow researchers to produce nearly complete and error-free genome assemblies. Novel approaches however do not always provide a direct link with information on vertebrate genome evolution gained from cytogenetic approaches. It is useful to preserve and link cytogenetic data with novel genomic discoveries. Sequencing of DNA from single isolated chromosomes (ChromSeq) is an elegant approach to determine the chromosome content and assign genome assemblies to chromosomes, thus bridging the gap between cytogenetics and genomics. The aim of this paper is to describe how ChromSeq can support the study of vertebrate genome evolution and how it can help link cytogenetic and genomic data. We show key examples of ChromSeq application in the refinement of vertebrate genome assemblies and in the study of vertebrate chromosome and karyotype evolution. We also provide a general overview of the approach and a concrete example of genome refinement using this method in the species Anolis carolinensis.

Download Full-text

CAMISIM: Simulating metagenomes and microbial communities

10.1101/300970 ◽

2018 ◽

Cited By ~ 4

Author(s):

Adrian Fritz ◽

Peter Hofmann ◽

Stephan Majda ◽

Eik Dahms ◽

Johannes Dröge ◽

...

Keyword(s):

Microbial Communities ◽

De Novo ◽

Real Data ◽

Small Data ◽

Data Sets ◽

Sequencing Data ◽

Taxonomic Profiling ◽

Benchmark Data ◽

Sequencing Technologies ◽

Wide Range

Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required. Here, we describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series and differential abundance studies, includes real and simulated strain-level diversity, and generates second and third generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT and metaSPAdes, on several thousand small data sets generated with CAMISIM. CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with truth standards for method evaluation. All data sets and the software are freely available at: https://github.com/CAMI-challenge/CAMISIM

Download Full-text

Spatially constrained tumour growth affects the patterns of clonal selection and neutral drift in cancer genomic data

10.1101/544536 ◽

2019 ◽

Cited By ~ 3

Author(s):

Kate Chkhaidze ◽

Timon Heide ◽

Benjamin Werner ◽

Marc J. Williams ◽

Weini Huang ◽

...

Keyword(s):

Next Generation Sequencing ◽

Tumour Growth ◽

Evolutionary Dynamics ◽

Clonal Selection ◽

Genomic Data ◽

Confounding Factors ◽

Data Generation ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

AbstractQuantification of the effect of spatial tumour sampling on the patterns of mutations detected in next-generation sequencing data is largely lacking. Here we use a spatial stochastic cellular automaton model of tumour growth that accounts for somatic mutations, selection, drift and spatial constrains, to simulate multi-region sequencing data derived from spatial sampling of a neoplasm. We show that the spatial structure of a solid cancer has a major impact on the detection of clonal selection and genetic drift from bulk sequencing data and single-cell sequencing data. Our results indicate that spatial constrains can introduce significant sampling biases when performing multi-region bulk sampling and that such bias becomes a major confounding factor for the measurement of the evolutionary dynamics of human tumours. We present a statistical inference framework that takes into account the spatial effects of a growing tumour and allows inferring the evolutionary dynamics from patient genomic data. Our analysis shows that measuring cancer evolution using next-generation sequencing while accounting for the numerous confounding factors requires a mechanistic model-based approach that captures the sources of noise in the data.SummarySequencing the DNA of cancer cells from human tumours has become one of the main tools to study cancer biology. However, sequencing data are complex and often difficult to interpret. In particular, the way in which the tissue is sampled and the data are collected, impact the interpretation of the results significantly. We argue that understanding cancer genomic data requires mathematical models and computer simulations that tell us what we expect the data to look like, with the aim of understanding the impact of confounding factors and biases in the data generation step. In this study, we develop a spatial simulation of tumour growth that also simulates the data generation process, and demonstrate that biases in the sampling step and current technological limitations severely impact the interpretation of the results. We then provide a statistical framework that can be used to overcome these biases and more robustly measure aspects of the biology of tumours from the data.

Download Full-text

VikNGS: A C ++ Variant Integration Kit for Next Generation Sequencing Association Analysis

Bioinformatics ◽

10.1093/bioinformatics/btz716 ◽

2019 ◽

Cited By ~ 1

Author(s):

Zeynep Baskurt ◽

Scott Mastromatteo ◽

Jiafen Gong ◽

Richard F Wintle ◽

Stephen W Scherer ◽

...

Keyword(s):

Next Generation Sequencing ◽

Genetic Association ◽

Association Analysis ◽

Supplementary Information ◽

Next Generation Sequencing Data ◽

Data Sets ◽

Next Generation ◽

Sequencing Data ◽

Combining Data ◽

Generation Sequencing

Abstract Integration of next generation sequencing data (NGS) across different research studies can improve the power of genetic association testing by increasing sample size and can obviate the need for sequencing controls. If differential genotype uncertainty across studies is not accounted for, combining data sets can produce spurious association results. We developed the Variant Integration Kit for NGS (VikNGS), a fast cross-platform software package, to enable aggregation of several data sets for rare and common variant genetic association analysis of quantitative and binary traits with covariate adjustment. VikNGS also includes a graphical user interface, power simulation functionality and data visualization tools. Availability The VikNGS package can be downloaded at http://www.tcag.ca/tools/index.html. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Computational Approaches for Transcriptome Assembly Based on Sequencing Technologies

Current Bioinformatics ◽

10.2174/1574893614666190410155603 ◽

2020 ◽

Vol 15 (1) ◽

pp. 2-16

Author(s):

Yuwen Luo ◽

Xingyu Liao ◽

Fang-Xiang Wu ◽

Jianxin Wang

Keyword(s):

De Novo ◽

Transcriptome Assembly ◽

Critical Role ◽

High Sensitivity ◽

Biological Properties ◽

Sequencing Data ◽

Sequencing Technologies ◽

Long Reads ◽

Massive Sequencing ◽

Generation Sequencing

Transcriptome assembly plays a critical role in studying biological properties and examining the expression levels of genomes in specific cells. It is also the basis of many downstream analyses. With the increase of speed and the decrease in cost, massive sequencing data continues to accumulate. A large number of assembly strategies based on different computational methods and experiments have been developed. How to efficiently perform transcriptome assembly with high sensitivity and accuracy becomes a key issue. In this work, the issues with transcriptome assembly are explored based on different sequencing technologies. Specifically, transcriptome assemblies with next-generation sequencing reads are divided into reference-based assemblies and de novo assemblies. The examples of different species are used to illustrate that long reads produced by the third-generation sequencing technologies can cover fulllength transcripts without assemblies. In addition, different transcriptome assemblies using the Hybrid-seq methods and other tools are also summarized. Finally, we discuss the future directions of transcriptome assemblies.

Download Full-text

Detecting somatic mutations in genomic sequences by means of Kolmogorov–Arnold analysis

Royal Society Open Science ◽

10.1098/rsos.150143 ◽

2015 ◽

Vol 2 (8) ◽

pp. 150143 ◽

Cited By ~ 3

Author(s):

V. G. Gurzadyan ◽

H. Yan ◽

G. Vlahovic ◽

A. Kashin ◽

P. Killela ◽

...

Keyword(s):

Clinical Diagnostics ◽

Genomic Research ◽

Genomic Sequences ◽

Sequencing Data ◽

Sequencing Technologies ◽

Cancer Genome Sequencing ◽

Frequent Mutations ◽

Using Data ◽

First Time ◽

Generation Sequencing

The Kolmogorov–Arnold stochasticity parameter technique is applied for the first time to the study of cancer genome sequencing, to reveal mutations. Using data generated by next-generation sequencing technologies, we have analysed the exome sequences of brain tumour patients with matched tumour and normal blood. We show that mutations contained in sequencing data can be revealed using this technique, thus providing a new methodology for determining subsequences of given length containing mutations, i.e. its value differs from those of subsequences without mutations. A potential application for this technique involves simplifying the procedure of finding segments with mutations, speeding up genomic research and accelerating its implementation in clinical diagnostics. Moreover, the prediction of a mutation associated with a family of frequent mutations in numerous types of cancers based purely on the value of the Kolmogorov function indicates that this applied marker may recognize genomic sequences that are in extremely low abundance and can be used in revealing new types of mutations.

Download Full-text

WBFQC: A new approach for compressing next-generation sequencing data splitting into homogeneous streams

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972001850018x ◽

2018 ◽

Vol 16 (05) ◽

pp. 1850018 ◽

Cited By ~ 1

Author(s):

Sanjeev Kumar ◽

Suneeta Agarwal ◽

Ranvijay

Keyword(s):

Next Generation Sequencing ◽

Genomic Data ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Compression Technique ◽

Compression Algorithms ◽

Ngs Data ◽

And Storage ◽

Generation Sequencing

Genomic data nowadays is playing a vital role in number of fields such as personalized medicine, forensic, drug discovery, sequence alignment and agriculture, etc. With the advancements and reduction in the cost of next-generation sequencing (NGS) technology, these data are growing exponentially. NGS data are being generated more rapidly than they could be significantly analyzed. Thus, there is much scope for developing novel data compression algorithms to facilitate data analysis along with data transfer and storage directly. An innovative compression technique is proposed here to address the problem of transmission and storage of large NGS data. This paper presents a lossless non-reference-based FastQ file compression approach, segregating the data into three different streams and then applying appropriate and efficient compression algorithms on each. Experiments show that the proposed approach (WBFQC) outperforms other state-of-the-art approaches for compressing NGS data in terms of compression ratio (CR), and compression and decompression time. It also has random access capability over compressed genomic data. An open source FastQ compression tool is also provided here ( http://www.algorithm-skg.com/wbfqc/home.html ).

Download Full-text

Next-Generation Sequencing–Based Cancer Panel Data Conversion Using International Standards to Implement a Clinical Next-Generation Sequencing Research System: Single-Institution Study

JMIR Medical Informatics ◽

10.2196/14710 ◽

2020 ◽

Vol 8 (4) ◽

pp. e14710 ◽

Cited By ~ 1

Author(s):

Phillip Park ◽

Soo-Yong Shin ◽

Seog Yun Park ◽

Jeonghee Yun ◽

Chulmin Shin ◽

...

Keyword(s):

Clinical Practice ◽

Next Generation Sequencing ◽

Clinical Data ◽

Genomic Data ◽

International Standards ◽

Research System ◽

Next Generation ◽

Sequencing Data ◽

Clinical Sequencing ◽

Generation Sequencing

Background The analytical capacity and speed of next-generation sequencing (NGS) technology have been improved. Many genetic variants associated with various diseases have been discovered using NGS. Therefore, applying NGS to clinical practice results in precision or personalized medicine. However, as clinical sequencing reports in electronic health records (EHRs) are not structured according to recommended standards, clinical decision support systems have not been fully utilized. In addition, integrating genomic data with clinical data for translational research remains a great challenge. Objective To apply international standards to clinical sequencing reports and to develop a clinical research information system to integrate standardized genomic data with clinical data. Methods We applied the recently published ISO/TS 20428 standard to 367 clinical sequencing reports generated by panel (91 genes) sequencing in EHRs and implemented a clinical NGS research system by extending the clinical data warehouse to integrate the necessary clinical data for each patient. We also developed a user interface with a clinical research portal and an NGS result viewer. Results A single clinical sequencing report with 28 items was restructured into four database tables and 49 entities. As a result, 367 patients’ clinical sequencing data were connected with clinical data in EHRs, such as diagnosis, surgery, and death information. This system can support the development of cohort or case-control datasets as well. Conclusions The standardized clinical sequencing data are not only for clinical practice and could be further applied to translational research.

Download Full-text

DR2S: An Integrated Algorithm Providing Reference-Grade Haplotype Sequences from Heterozygous Samples

10.1101/2020.11.09.374140 ◽

2020 ◽

Author(s):

Steffen Klasberg ◽

Alexander H. Schmidt ◽

Vinzenz Lange ◽

Gerhard Schöfl

Keyword(s):

Allelic Variation ◽

R Package ◽

Full Length ◽

Reference Sequence ◽

Read Length ◽

Sequencing Data ◽

High Quality ◽

Reference Allele ◽

Sequencing Technologies ◽

Generation Sequencing

AbstractBackgroundHigh resolution HLA genotyping of donors and recipients is a crucially important prerequisite for haematopoetic stem-cell transplantation and relies heavily on the quality and completeness of immuno-genetic reference sequence databases of allelic variation.ResultsHere, we report on DR2S, an R package that leverages the strengths of two sequencing technologies – the accuracy of next-generation sequencing with the read length of third-generation sequencing technologies like PacBio’s SMRT sequencing or ONT’s nanopore sequencing – to reconstruct fully-phased high-quality full-length haplotype sequences. Although optimised for HLA and KIR genes, DR2S is applicable to all loci with known reference sequences provided that full-length sequencing data is available for analysis. In addition, DR2S integrates supporting tools for easy visualisation and quality control of the reconstructed haplotype to ensure suitability for submission to public allele databases.ConclusionsDR2S is a largely automated workflow designed to create high-quality fully-phased reference allele sequences for highly polymorphic gene regions such as HLA or KIR. It has been used by biologists to successfully characterise and submit more than 500 HLA alleles and more than 500 KIR alleles to the IPD-IMGT/HLA and IPD-KIR databases.

Download Full-text