ClinSV: Clinical grade structural and copy number variant detection from whole genome sequencing data

AbstractWhole genome sequencing (WGS) has the potential to outperform clinical microarrays for the detection of structural variants (SV) including copy number variants (CNVs), but has been challenged by high false positive rates. Here we present ClinSV, a WGS based SV integration, annotation, prioritisation and visualisation method, which identified 99.8% of pathogenic ClinVar CNVs >10kb and 11/11 pathogenic variants from matched microarrays. The false positive rate was low (1.5–4.5%) and reproducibility high (95–99%). In clinical practice, ClinSV identified reportable variants in 22 of 485 patients (4.7%) of which 35–63% were not detectable by current clinical microarray designs.

Download Full-text

ClinSV: clinical grade structural and copy number variant detection from whole genome sequencing data

Genome Medicine ◽

10.1186/s13073-021-00841-x ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Andre E. Minoche ◽

Ben Lundie ◽

Greg B. Peters ◽

Thomas Ohnesorg ◽

Mark Pinese ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

False Positive ◽

Copy Number ◽

False Positive Rate ◽

Copy Number Variants ◽

Copy Number Variant ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data

AbstractWhole genome sequencing (WGS) has the potential to outperform clinical microarrays for the detection of structural variants (SV) including copy number variants (CNVs), but has been challenged by high false positive rates. Here we present ClinSV, a WGS based SV integration, annotation, prioritization, and visualization framework, which identified 99.8% of simulated pathogenic ClinVar CNVs > 10 kb and 11/11 pathogenic variants from matched microarrays. The false positive rate was low (1.5–4.5%) and reproducibility high (95–99%). In clinical practice, ClinSV identified reportable variants in 22 of 485 patients (4.7%) of which 35–63% were not detectable by current clinical microarray designs. ClinSV is available at https://github.com/KCCG/ClinSV.

Download Full-text

15. A validation study of copy number variant (CNVs) detection to replace constitutional microarray from low resolution whole genome sequencing data

Cancer Genetics ◽

10.1016/j.cancergen.2018.04.018 ◽

2018 ◽

Vol 224-225 ◽

pp. 56

Author(s):

Alka Chaubey ◽

C. Alexander Valencia ◽

Zeqiang Ma ◽

Gerard Irzyk ◽

Edward Szekeres ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Validation Study ◽

Copy Number ◽

Copy Number Variant ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Low Resolution

Download Full-text

Evaluation of the performance of copy number variant prediction tools for the detection of deletions from whole genome sequencing data

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2019.103174 ◽

2019 ◽

Vol 94 ◽

pp. 103174 ◽

Cited By ~ 4

Author(s):

Whitney Whitford ◽

Klaus Lehnert ◽

Russell G. Snell ◽

Jessie C. Jacobsen

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Copy Number Variant ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Prediction Tools

Download Full-text

Detection and characterization of copy number variants based on whole-genome sequencing by DNBSEQ platforms

10.1101/786962 ◽

2019 ◽

Author(s):

Junhua Rao ◽

Lihua Peng ◽

Fang Chen ◽

Hui Jiang ◽

Chunyu Geng ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Copy Number Variants ◽

Copy Number Variant ◽

Whole Genome ◽

Genome Wide ◽

Wide Range ◽

Distribution Sensitivity ◽

Cnv Detection

AbstractBackgroundNext-generation sequence (NGS) has rapidly developed in past years which makes whole-genome sequencing (WGS) becoming a more cost- and time-efficient choice in wide range of biological researches. We usually focus on some variant detection via WGS data, such as detection of single nucleotide polymorphism (SNP), insertion and deletion (Indel) and copy number variant (CNV), which playing an important role in many human diseases. However, the feasibility of CNV detection based on WGS by DNBSEQ™ platforms was unclear. We systematically analysed the genome-wide CNV detection power of DNBSEQ™ platforms and Illumina platforms on NA12878 with five commonly used tools, respectively.ResultsDNBSEQ™ platforms showed stable ability to detect slighter more CNVs on genome-wide (average 1.24-fold than Illumina platforms). Then, CNVs based on DNBSEQ™ platforms and Illumina platforms were evaluated with two public benchmarks of NA12878, respectively. DNBSEQ™ and Illumina platforms showed similar sensitivities and precisions on both two benchmarks. Further, the difference between tools for CNV detection was analyzed, and indicated the selection of tool for CNV detection could affected the CNV performance, such as count, distribution, sensitivity and precision.ConclusionThe major contribution of this paper is providing a comprehensive guide for CNV detection based on WGS by DNBSEQ™ platforms for the first time.

Download Full-text

SEG - A Software Program for Finding Somatic Copy Number Alterations in Whole Genome Sequencing Data of Cancer

Computational and Structural Biotechnology Journal ◽

10.1016/j.csbj.2018.09.001 ◽

2018 ◽

Vol 16 ◽

pp. 335-341 ◽

Cited By ~ 2

Author(s):

Mucheng Zhang ◽

Deli Liu ◽

Jie Tang ◽

Yuan Feng ◽

Tianfang Wang ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Copy Number Alterations ◽

Sequencing Data ◽

Software Program ◽

Somatic Copy Number Alterations

Download Full-text

Identification of Medium-Sized Copy Number Alterations in Whole-Genome Sequencing

Cancer Informatics ◽

10.4137/cin.s14023 ◽

2014 ◽

Vol 13s3 ◽

pp. CIN.S14023

Author(s):

Hatice Gulcin Ozer ◽

Aisulu Usubalieva ◽

Adrienne Dorrance ◽

Ayse Selen Yilmaz ◽

Michael Caligiuri ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Copy Number Alterations ◽

Sequencing Data ◽

Entire Genome ◽

Multiple Challenges ◽

Cost Efficient

The genome-wide discoveries such as detection of copy number alterations (CNA) from high-throughput whole-genome sequencing data enabled new developments in personalized medicine. The CNAs have been reported to be associated with various diseases and cancers including acute myeloid leukemia. However, there are multiple challenges to the use of current CNA detection tools that lead to high false-positive rates and thus impede widespread use of such tools in cancer research. In this paper, we discuss these issues and propose possible solutions. First, since the entire genome cannot be mapped due to some regions lacking sequence uniqueness, current methods cannot be appropriately adjusted to handle these regions in the analyses. Thus, detection of medium-sized CNAs is also being directly affected by these mappability problems. The requirement for matching control samples is also an important limitation because acquiring matching controls might not be possible or might not be cost efficient. Here we present an approach that addresses these issues and detects medium-sized CNAs in cancer genomes by (1) masking unmappable regions during the initial CNA detection phase, (2) using pool of a few normal samples as control, and (3) employing median filtering to adjust CNA ratios to its surrounding coverage and eliminate false positives.

Download Full-text

SECNVs: A Simulator of Copy Number Variants and Whole-Exome Sequences from Reference Genomes

10.1101/824128 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yue Xing ◽

Alan R. Dabney ◽

Xiao Li ◽

Guosong Wang ◽

Clare A. Gill ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Copy Number Variants ◽

Whole Genome ◽

Sequencing Data ◽

Software Applications ◽

Exome Sequencing Data ◽

Whole Exome ◽

Whole Exome Sequencing Data

AbstractCopy number variants are insertions and deletions of 1 kb or larger in a genome that play an important role in phenotypic changes and human disease. Many software applications have been developed to detect copy number variants using either whole-genome sequencing or whole-exome sequencing data. However, there is poor agreement in the results from these applications. Simulated datasets containing copy number variants allow comprehensive comparisons of the operating characteristics of existing and novel copy number variant detection methods. Several software applications have been developed to simulate copy number variants and other structural variants in whole-genome sequencing data. However, none of the applications reliably simulate copy number variants in whole-exome sequencing data. We have developed and tested SECNVs (Simulator of Exome Copy Number Variants), a fast, robust and customizable software application for simulating copy number variants and whole-exome sequences from a reference genome. SECNVs is easy to install, implements a wide range of commands to customize simulations, can output multiple samples at once, and incorporates a pipeline to output rearranged genomes, short reads and BAM files in a single command. Variants generated by SECNVs are detected with high sensitivity and precision by tools commonly used to detect copy number variants. SECNVs is publicly available at https://github.com/YJulyXing/SECNVs.

Download Full-text

Whole-genome sequencing analysis of clozapine-induced myocarditis

10.1101/2021.07.26.21261157 ◽

2021 ◽

Author(s):

Ankita Narang ◽

Paul Lacaze ◽

Kathlyn Ronaldson ◽

John McNeil ◽

Mahesh Jayaram ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genome Wide Association Study ◽

Copy Number Variants ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Analysis ◽

Sequencing Data ◽

Dna Variation ◽

Rare Genetic Variants

One of the concerns limiting the use of clozapine in schizophrenia treatment is the risk of rare but potentially fatal myocarditis. Our previous genome-wide association study and human leucocyte antigen analyses identified putative loci associated with clozapine-induced myocarditis. However, the contribution of DNA variation in cytochrome P450 genes, copy number variants and rare deleterious variants have not been investigated. We explored these unexplored classes of DNA variation using whole-genome sequencing data from 25 cases with clozapine-induced myocarditis and 25 demographically-matched clozapine-tolerant control subjects. We identified 15 genes based on rare variant gene-burden analysis (MLLT6, CADPS, TACC2, L3MBTL4, NPY, SLC25A21, PARVB, GPR179, ACAD9, NOL8, C5orf33, FAM127A, AFDN, SLC6A11, PXDN) nominally associated (p<0.05) with clozapine-induced myocarditis. Of these genes, 13 were expressed in human myocardial tissue. Although independent replication of these findings is required, our study provides preliminary insights into the potential role of rare genetic variants in susceptibility to clozapine-induced myocarditis.

Download Full-text

A Bioinformatics Pipeline for Estimating Mitochondria DNA Copy Number and Heteroplasmy Levels from Whole Genome Sequencing Data

10.1101/2021.12.28.21268452 ◽

2021 ◽

Author(s):

Stephanie L Battle ◽

Daniela Puiu ◽

Eric Boerwinkle ◽

Kent Taylor ◽

Jerome Rotter ◽

...

Keyword(s):

Mitochondrial Genome ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Dna Molecules ◽

Sequencing Data ◽

Bioinformatics Pipeline ◽

Accurate Identification

Mitochondrial diseases are a heterogeneous group of disorders that can be caused by mutations in the nuclear or mitochondrial genome. Mitochondrial DNA variants may exist in a state of heteroplasmy, where a percentage of DNA molecules harbor a variant, or homoplasmy, where all DNA molecules have a variant. The relative quantity of mtDNA in a cell, or copy number (mtDNA-CN), is associated with mitochondrial function, human disease, and mortality. To facilitate accurate identification of heteroplasmy and quantify mtDNA-CN, we built a bioinformatics pipeline that takes whole genome sequencing data and outputs mitochondrial variants, and mtDNA-CN. We incorporate variant annotations to facilitate determination of variant significance. Our pipeline yields uniform coverage by remapping to a circularized chrM and recovering reads falsely mapped to nuclear-encoded mitochondrial sequences. Notably, we construct a consensus chrM sequence for each sample and recall heteroplasmy against the sample's unique mitochondrial genome. We observe an approximately 3-fold increased association with age for heteroplasmic variants in non-homopolymer regions and, are better able to capture genetic variation in the D-loop of chrM compared to existing software. Our bioinformatics pipeline more accurately captures features of mitochondrial genetics than existing pipelines that are important in understanding how mitochondrial dysfunction contributes to disease.

Download Full-text

Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing

10.1101/333617 ◽

2018 ◽

Cited By ~ 11

Author(s):

Isidro Cortés-Ciriano ◽

June-Koo Lee ◽

Ruibin Xi ◽

Dhawal Jain ◽

Youngsook L. Jung ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Copy Number ◽

Human Cancer ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

End Joining ◽

Cancer Types ◽

Non Homologous End Joining

SummaryChromothripsis is a newly discovered mutational phenomenon involving massive, clustered genomic rearrangements that occurs in cancer and other diseases. Recent studies in cancer suggest that chromothripsis may be far more common than initially inferred from low resolution DNA copy number data. Here, we analyze the patterns of chromothripsis across 2,658 tumors spanning 39 cancer types using whole-genome sequencing data. We find that chromothripsis events are pervasive across cancers, with a frequency of >50% in several cancer types. Whereas canonical chromothripsis profiles display oscillations between two copy number states, a considerable fraction of the events involves multiple chromosomes as well as additional structural alterations. In addition to non-homologous end-joining, we detect signatures of replicative processes and templated insertions. Chromothripsis contributes to oncogene amplification as well as to inactivation of genes such as mismatch-repair related genes. These findings show that chromothripsis is a major process driving genome evolution in human cancer.

Download Full-text