scholarly journals GRIDSS, PURPLE, LINX: Unscrambling the tumor genome via integrated analysis of structural variation and copy number

2019 ◽  
Author(s):  
Daniel L. Cameron ◽  
Jonathan Baber ◽  
Charles Shale ◽  
Anthony T. Papenfuss ◽  
Jose Espejo Valle-Inclan ◽  
...  

AbstractWe have developed a novel, integrated and comprehensive purity, ploidy, structural variant and copy number somatic analysis toolkit for whole genome sequencing data of paired tumor/normal samples. We show that the combination of using GRIDSS for somatic structural variant calling and PURPLE for somatic copy number alteration calling allows highly sensitive, precise and consistent copy number and structural variant determination, as well as providing novel insights for short structural variants and regions of complex local topology. LINX, an interpretation tool, leverages the integrated structural variant and copy number calling to cluster individual structural variants into higher order events and chains them together to predict local derivative chromosome structure. LINX classifies and extensively annotates genomic rearrangements including simple and reciprocal breaks, LINE, viral and pseudogene insertions, and complex events such as chromothripsis. LINX also comprehensively calls genic fusions including chained fusions. Finally, our toolkit provides novel visualisation methods providing insight into complex genomic rearrangements.

2021 ◽  
Author(s):  
Pierre Morisse ◽  
Fabrice Legeai ◽  
Claire Lemaitre

Linked-Reads technologies, popularized by 10x Genomics, combine the high- quality and low cost of short-reads sequencing with a long-range information by adding barcodes that tag reads originating from the same long DNA fragment. Thanks to their high-quality and long-range information, such reads are thus particularly useful for various applications such as genome scaffolding and structural variant calling. As a result, multiple structural variant calling methods were developed within the last few years. However, these methods were mainly tested on human data, and do not run well on non-human organisms, for which reference genomes are highly fragmented, or sequencing data display high levels of heterozygosity. Moreover, even on human data, most tools still require large amounts of computing resources. We present LEVIATHAN, a new structural variant calling tool that aims to address these issues, and especially better scale and apply to a wide variety of organisms. Our method relies on a barcode index, that allows to quickly compare the similarity of all possible pairs of regions in terms of amount of common barcodes. Region pairs sharing a sufficient number of barcodes are then considered as potential structural variants, and complementary, classical short reads methods are applied to further refine the breakpoint coordinates. Our experiments on simulated data underline that our method compares well to the state-of-the-art, both in terms of recall and precision, and also in terms of resource consumption. Moreover, LEVIATHAN was successfully applied to a real dataset from a non-model organism, while all other tools either failed to run or required unreasonable amounts of resources. LEVIATHAN is implemented in C++, supported on Linux platforms, and available under AGPL-3.0 License at https://github.com/morispi/LEVIATHAN.


2020 ◽  
Author(s):  
Charles Shale ◽  
Jonathan Baber ◽  
Daniel L. Cameron ◽  
Marie Wong ◽  
Mark J. Cowley ◽  
...  

AbstractComplex somatic genomic rearrangement and copy number alterations (CNA) are hallmarks of nearly all cancers. Whilst whole genome sequencing (WGS) in principle allows comprehensive profiling of these events, biological and clinical interpretation remains challenging. We have developed LINX, a novel algorithm which allows interpretation of short-read paired-end WGS derived structural variant and CNA data by clustering raw structural variant calls into distinct events, predicting their impact on the local structure of the derivative chromosome, and annotating their functional impact on affected genes. Novel visualisations facilitate further investigation of complex genomic rearrangements. We show that LINX provides insights into a diverse range of structural variation events including single and double break-junction events, mobile element insertions, complex shattering and high amplification events. We demonstrate that LINX can reliably detect a wide range of pathogenic rearrangements including gene fusions, immunoglobulin enhancer rearrangements, intragenic deletions and duplications. Uniquely, LINX also predicts chained fusions which we demonstrate account for 13% of clinically relevant oncogenic fusions. LINX also reports a class of inactivation events we term homozygous disruptions which may be a driver mutation in up to 8.8% of tumors including frequently affecting PTEN, TP53 and RB1, and are likely missed by many standard WGS analysis pipelines.


2017 ◽  
Author(s):  
Marek Cmero ◽  
Cheng Soon Ong ◽  
Ke Yuan ◽  
Jan Schröder ◽  
Kangbo Mo ◽  
...  

We present SVclone, a computational method for inferring the cancer cell fraction of structural variant breakpoints from whole-genome sequencing data. We validate our approach using simulated and real tumour samples, and demonstrate its utility on 2,778 whole-genome sequenced tumours. We find a subset of liver, breast and ovarian cancer cases with decreased overall survival that have subclonally enriched copy-number neutral rearrangements, an observation that could not be discovered with currently available methods.


GigaScience ◽  
2021 ◽  
Vol 10 (9) ◽  
Author(s):  
Lanying Wei ◽  
Martin Dugas ◽  
Sarah Sandmann

Abstract Background Artifact chimeric reads are enriched in next-generation sequencing data generated from formalin-fixed paraffin-embedded (FFPE) samples. Previous work indicated that these reads are characterized by erroneous split-read support that is interpreted as evidence of structural variants. Thus, a large number of false-positive structural variants are detected. To our knowledge, no tool is currently available to specifically call or filter structural variants in FFPE samples. To overcome this gap, we developed 2 R packages: SimFFPE and FilterFFPE. Results SimFFPE is a read simulator, specifically designed for next-generation sequencing data from FFPE samples. A mixture of characteristic artifact chimeric reads, as well as normal reads, is generated. FilterFFPE is a filtration algorithm, removing artifact chimeric reads from sequencing data while keeping real chimeric reads. To evaluate the performance of FilterFFPE, we performed structural variant calling with 3 common tools (Delly, Lumpy, and Manta) with and without prior filtration with FilterFFPE. After applying FilterFFPE, the mean positive predictive value improved from 0.27 to 0.48 in simulated samples and from 0.11 to 0.27 in real samples, while sensitivity remained basically unchanged or even slightly increased. Conclusions FilterFFPE improves the performance of SV calling in FFPE samples. It was validated by analysis of simulated and real data.


2014 ◽  
Vol 32 (15_suppl) ◽  
pp. e22171-e22171
Author(s):  
Yan W. Asmann ◽  
Chen Wang ◽  
Brian M. Necela ◽  
Xianfeng Chen ◽  
Jean-Pierre A. Kocher ◽  
...  

2014 ◽  
Vol 13s3 ◽  
pp. CIN.S14023
Author(s):  
Hatice Gulcin Ozer ◽  
Aisulu Usubalieva ◽  
Adrienne Dorrance ◽  
Ayse Selen Yilmaz ◽  
Michael Caligiuri ◽  
...  

The genome-wide discoveries such as detection of copy number alterations (CNA) from high-throughput whole-genome sequencing data enabled new developments in personalized medicine. The CNAs have been reported to be associated with various diseases and cancers including acute myeloid leukemia. However, there are multiple challenges to the use of current CNA detection tools that lead to high false-positive rates and thus impede widespread use of such tools in cancer research. In this paper, we discuss these issues and propose possible solutions. First, since the entire genome cannot be mapped due to some regions lacking sequence uniqueness, current methods cannot be appropriately adjusted to handle these regions in the analyses. Thus, detection of medium-sized CNAs is also being directly affected by these mappability problems. The requirement for matching control samples is also an important limitation because acquiring matching controls might not be possible or might not be cost efficient. Here we present an approach that addresses these issues and detects medium-sized CNAs in cancer genomes by (1) masking unmappable regions during the initial CNA detection phase, (2) using pool of a few normal samples as control, and (3) employing median filtering to adjust CNA ratios to its surrounding coverage and eliminate false positives.


2019 ◽  
Author(s):  
Xin Zhou ◽  
Lu Zhang ◽  
Xiaodong Fang ◽  
Yichen Liu ◽  
David L. Dill ◽  
...  

AbstractHuman diploid genome assembly enables identifying maternal and paternal genetic variations. Algorithms based on 10x linked-read sequencing have been developed for de novo assembly, variant calling and haplotyping. Another linked-read technology, single tube long fragment read (stLFR), has recently provided a low-cost single tube solution that can enable long fragment data. However, no existing software is available for human diploid assembly and variant calls. We develop Aquila stLFR to adapt to the key characteristics of stLFR. Aquila stLFR assembles near perfect diploid assembled contigs, and the assembly-based variant calling shows that Aquila stLFR detects large numbers of structural variants which were not easily spanned by Illumina short-reads. Furthermore, the hybrid assembly mode Aquila hybrid allows a hybrid assembly based on both stLFR and 10x linked-reads libraries, demonstrating that these two technologies can always be complementary to each other for assembly to improve contiguity and the variants detection, regardless of assembly quality of the library itself from single sequencing technology. The overlapped structural variants (SVs) from two independent sequencing data of the same individual, and the SVs from hybrid assemblies provide us a high-confidence profile to study them.AvailabilitySource code and documentation are available on https://github.com/maiziex/Aquila_stLFR.


2017 ◽  
Author(s):  
Jeremiah Wala ◽  
Pratiti Bandopadhayay ◽  
Noah Greenwald ◽  
Ryan O’Rourke ◽  
Ted Sharpe ◽  
...  

AbstractStructural variants (SVs), including small insertion and deletion variants (indels), are challenging to detect through standard alignment-based variant calling methods. Sequence assembly offers a powerful approach to identifying SVs, but is difficult to apply at-scale genome-wide for SV detection due to its computational complexity and the difficulty of extracting SVs from assembly contigs. We describe SvABA, an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements. We evaluated SvABA’s performance on the NA12878 human genome and in simulated and real cancer genomes. SvABA demonstrates superior sensitivity and specificity across a large spectrum of SVs, and substantially improved detection performance for variants in the 20-300 bp range, compared with existing methods. SvABA also identifies complex somatic rearrangements with chains of short (< 1,000 bp) templated-sequence insertions copied from distant genomic regions. We applied SvABA to 344 cancer genomes from 11 cancer types, and found that templated-sequence insertions occur in ~4% of all somatic rearrangements. Finally, we demonstrate that SvABA can identify sites of viral integration and cancer driver alterations containing medium-sized SVs.


Sign in / Sign up

Export Citation Format

Share Document