SCSIM: Jointly simulating correlated single-cell and bulk next-generation DNA sequencing data

AbstractBackgroundRecently, it has become possible to collect next-generation DNA sequencing data sets that are composed of multiple samples from multiple biological units where each of these samples may be from a single cell or bulk tissue. Yet, there does not yet exist a tool for simulating DNA sequencing data from such a nested sampling arrangement with single-cell and bulk samples so that developers of analysis methods can assess accuracy and precision.ResultsWe have developed a tool that simulates DNA sequencing data from hierarchically grouped (correlated) samples where each sample is designated bulk or single-cell. Our tool uses a simple configuration file to define the experimental arrangement and can be integrated into software pipelines for testing of variant callers or other genomic tools.ConclusionsThe DNA sequencing data generated by our simulator is representative of real data and integrates seamlessly with standard downstream analysis tools.

Download Full-text

Splicing Express: a software suite for alternative splicing analysis using next-generation sequencing data

PeerJ ◽

10.7717/peerj.1419 ◽

2015 ◽

Vol 3 ◽

pp. e1419 ◽

Cited By ~ 6

Author(s):

Jose E. Kroll ◽

Jihoon Kim ◽

Lucila Ohno-Machado ◽

Sandro J. de Souza

Keyword(s):

Alternative Splicing ◽

Dna Sequencing ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Software Suite ◽

Next Generation Dna Sequencing ◽

Sequencing Technologies ◽

Body Map ◽

Sequencing Platforms

Motivation.Alternative splicing events (ASEs) are prevalent in the transcriptome of eukaryotic species and are known to influence many biological phenomena. The identification and quantification of these events are crucial for a better understanding of biological processes. Next-generation DNA sequencing technologies have allowed deep characterization of transcriptomes and made it possible to address these issues. ASEs analysis, however, represents a challenging task especially when many different samples need to be compared. Some popular tools for the analysis of ASEs are known to report thousands of events without annotations and/or graphical representations. A new tool for the identification and visualization of ASEs is here described, which can be used by biologists without a solid bioinformatics background.Results.A software suite namedSplicing Expresswas created to perform ASEs analysis from transcriptome sequencing data derived from next-generation DNA sequencing platforms. Its major goal is to serve the needs of biomedical researchers who do not have bioinformatics skills.Splicing Expressperforms automatic annotation of transcriptome data (GTF files) using gene coordinates available from the UCSC genome browser and allows the analysis of data from all available species. The identification of ASEs is done by a known algorithm previously implemented in another tool namedSplooce. As a final result,Splicing Expresscreates a set of HTML files composed of graphics and tables designed to describe the expression profile of ASEs among all analyzed samples. By using RNA-Seq data from the Illumina Human Body Map and the Rat Body Map, we show thatSplicing Expressis able to perform all tasks in a straightforward way, identifying well-known specific events.Availability and Implementation.Splicing Expressis written in Perl and is suitable to run only in UNIX-like systems. More details can be found at:http://www.bioinformatics-brazil.org/splicingexpress.

Download Full-text

The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data

Genome Research ◽

10.1101/gr.107524.110 ◽

2010 ◽

Vol 20 (9) ◽

pp. 1297-1303 ◽

Cited By ~ 11694

Author(s):

A. McKenna ◽

M. Hanna ◽

E. Banks ◽

A. Sivachenko ◽

K. Cibulskis ◽

...

Keyword(s):

Dna Sequencing ◽

Genome Analysis ◽

Next Generation ◽

Sequencing Data ◽

Mapreduce Framework ◽

Next Generation Dna Sequencing ◽

Genome Analysis Toolkit

Download Full-text

P14 Very High Quality Next-generation Dna Sequencing Data From Human Genomic Dna Samples Stored, And Intermittently Defrosted Over Two Decades

Thorax ◽

10.1136/thoraxjnl-2014-206260.164 ◽

2014 ◽

Vol 69 (Suppl 2) ◽

pp. A83-A83

Author(s):

C. Shovlin ◽

F. Govani ◽

I. Mollet ◽

E. Thomas ◽

M. Jones ◽

...

Keyword(s):

Dna Sequencing ◽

Genomic Dna ◽

Next Generation ◽

Sequencing Data ◽

High Quality ◽

Next Generation Dna Sequencing ◽

Human Genomic ◽

Very High ◽

Human Genomic Dna

Download Full-text

A highly parallel next-generation DNA sequencing data analysis pipeline in Hadoop

2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2015.7359781 ◽

2015 ◽

Cited By ~ 2

Author(s):

Kareem S. Aggour ◽

Vijay S. Kumar ◽

Dipen P. Sangurdekar ◽

Lee A. Newberg ◽

Chinnappa D. Kodira

Keyword(s):

Data Analysis ◽

Dna Sequencing ◽

Next Generation ◽

Sequencing Data ◽

Analysis Pipeline ◽

Next Generation Dna Sequencing ◽

Data Analysis Pipeline ◽

Sequencing Data Analysis

Download Full-text

A framework for variation discovery and genotyping using next-generation DNA sequencing data

Nature Genetics ◽

10.1038/ng.806 ◽

2011 ◽

Vol 43 (5) ◽

pp. 491-498 ◽

Cited By ~ 6258

Author(s):

Mark A DePristo ◽

Eric Banks ◽

Ryan Poplin ◽

Kiran V Garimella ◽

Jared R Maguire ◽

...

Keyword(s):

Dna Sequencing ◽

Next Generation ◽

Sequencing Data ◽

Next Generation Dna Sequencing

Download Full-text

Statistical method evaluation for differentially methylated CpGs in base resolution next-generation DNA sequencing data

Briefings in Bioinformatics ◽

10.1093/bib/bbw133 ◽

2016 ◽

pp. bbw133 ◽

Cited By ~ 7

Author(s):

Yun Zhang ◽

Saurabh Baheti ◽

Zhifu Sun

Keyword(s):

Dna Sequencing ◽

Statistical Method ◽

Next Generation ◽

Sequencing Data ◽

Method Evaluation ◽

Next Generation Dna Sequencing

Download Full-text

Distinguishing linear and branched evolution given single-cell DNA sequencing data of tumors

Algorithms for Molecular Biology ◽

10.1186/s13015-021-00194-5 ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Leah L. Weber ◽

Mohammed El-Kebir

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Evolutionary Process ◽

Treatment Decision ◽

Real Data ◽

Current Data ◽

Fast Method ◽

Sequencing Data ◽

Evolutionary Trajectory ◽

Cancer Types

Abstract Background Cancer arises from an evolutionary process where somatic mutations give rise to clonal expansions. Reconstructing this evolutionary process is useful for treatment decision-making as well as understanding evolutionary patterns across patients and cancer types. In particular, classifying a tumor’s evolutionary process as either linear or branched and understanding what cancer types and which patients have each of these trajectories could provide useful insights for both clinicians and researchers. While comprehensive cancer phylogeny inference from single-cell DNA sequencing data is challenging due to limitations with current sequencing technology and the complexity of the resulting problem, current data might provide sufficient signal to accurately classify a tumor’s evolutionary history as either linear or branched. Results We introduce the Linear Perfect Phylogeny Flipping (LPPF) problem as a means of testing two alternative hypotheses for the pattern of evolution, which we prove to be NP-hard. We develop Phyolin, which uses constraint programming to solve the LPPF problem. Through both in silico experiments and real data application, we demonstrate the performance of our method, outperforming a competing machine learning approach. Conclusion Phyolin is an accurate, easy to use and fast method for classifying an evolutionary trajectory as linear or branched given a tumor’s single-cell DNA sequencing data.

Download Full-text