scholarly journals A Bioinformatics Pipeline for Estimating Mitochondria DNA Copy Number and Heteroplasmy Levels from Whole Genome Sequencing Data

Author(s):  
Stephanie L Battle ◽  
Daniela Puiu ◽  
Eric Boerwinkle ◽  
Kent Taylor ◽  
Jerome Rotter ◽  
...  

Mitochondrial diseases are a heterogeneous group of disorders that can be caused by mutations in the nuclear or mitochondrial genome. Mitochondrial DNA variants may exist in a state of heteroplasmy, where a percentage of DNA molecules harbor a variant, or homoplasmy, where all DNA molecules have a variant. The relative quantity of mtDNA in a cell, or copy number (mtDNA-CN), is associated with mitochondrial function, human disease, and mortality. To facilitate accurate identification of heteroplasmy and quantify mtDNA-CN, we built a bioinformatics pipeline that takes whole genome sequencing data and outputs mitochondrial variants, and mtDNA-CN. We incorporate variant annotations to facilitate determination of variant significance. Our pipeline yields uniform coverage by remapping to a circularized chrM and recovering reads falsely mapped to nuclear-encoded mitochondrial sequences. Notably, we construct a consensus chrM sequence for each sample and recall heteroplasmy against the sample's unique mitochondrial genome. We observe an approximately 3-fold increased association with age for heteroplasmic variants in non-homopolymer regions and, are better able to capture genetic variation in the D-loop of chrM compared to existing software. Our bioinformatics pipeline more accurately captures features of mitochondrial genetics than existing pipelines that are important in understanding how mitochondrial dysfunction contributes to disease.

2019 ◽  
Author(s):  
Clare Puttick ◽  
Kishore R Kumar ◽  
Ryan L Davis ◽  
Mark Pinese ◽  
David M Thomas ◽  
...  

AbstractMotivationMitochondrial diseases (MDs) are the most common group of inherited metabolic disorders and are often challenging to diagnose due to extensive genotype-phenotype heterogeneity. MDs are caused by mutations in the nuclear or mitochondrial genome, where pathogenic mitochondrial variants are usually heteroplasmic and typically at much lower allelic fraction in the blood than affected tissues. Both genomes can now be readily analysed using unbiased whole genome sequencing (WGS), but most nuclear variant detection methods fail to detect low heteroplasmy variants in the mitochondrial genome.ResultsWe present mity, a bioinformatics pipeline for detecting and interpreting heteroplasmic SNVs and INDELs in the mitochondrial genome using WGS data. In 2,980 healthy controls, we observed on average 3,166× coverage in the mitochondrial genome using WGS from blood. mity utilises this high depth to detect pathogenic mitochondrial variants, even at low heteroplasmy. mity enables easy interpretation of mitochondrial variants and can be incorporated into existing diagnostic WGS pipelines. This could simplify the diagnostic pathway, avoid invasive tissue biopsies and increase the diagnostic rate for MDs and other conditions caused by impaired mitochondrial function.Availabilitymity is available from https://github.com/KCCG/mityunder an MIT [email protected], [email protected], [email protected]


2014 ◽  
Vol 13s3 ◽  
pp. CIN.S14023
Author(s):  
Hatice Gulcin Ozer ◽  
Aisulu Usubalieva ◽  
Adrienne Dorrance ◽  
Ayse Selen Yilmaz ◽  
Michael Caligiuri ◽  
...  

The genome-wide discoveries such as detection of copy number alterations (CNA) from high-throughput whole-genome sequencing data enabled new developments in personalized medicine. The CNAs have been reported to be associated with various diseases and cancers including acute myeloid leukemia. However, there are multiple challenges to the use of current CNA detection tools that lead to high false-positive rates and thus impede widespread use of such tools in cancer research. In this paper, we discuss these issues and propose possible solutions. First, since the entire genome cannot be mapped due to some regions lacking sequence uniqueness, current methods cannot be appropriately adjusted to handle these regions in the analyses. Thus, detection of medium-sized CNAs is also being directly affected by these mappability problems. The requirement for matching control samples is also an important limitation because acquiring matching controls might not be possible or might not be cost efficient. Here we present an approach that addresses these issues and detects medium-sized CNAs in cancer genomes by (1) masking unmappable regions during the initial CNA detection phase, (2) using pool of a few normal samples as control, and (3) employing median filtering to adjust CNA ratios to its surrounding coverage and eliminate false positives.


2018 ◽  
Author(s):  
Isidro Cortés-Ciriano ◽  
June-Koo Lee ◽  
Ruibin Xi ◽  
Dhawal Jain ◽  
Youngsook L. Jung ◽  
...  

SummaryChromothripsis is a newly discovered mutational phenomenon involving massive, clustered genomic rearrangements that occurs in cancer and other diseases. Recent studies in cancer suggest that chromothripsis may be far more common than initially inferred from low resolution DNA copy number data. Here, we analyze the patterns of chromothripsis across 2,658 tumors spanning 39 cancer types using whole-genome sequencing data. We find that chromothripsis events are pervasive across cancers, with a frequency of >50% in several cancer types. Whereas canonical chromothripsis profiles display oscillations between two copy number states, a considerable fraction of the events involves multiple chromosomes as well as additional structural alterations. In addition to non-homologous end-joining, we detect signatures of replicative processes and templated insertions. Chromothripsis contributes to oncogene amplification as well as to inactivation of genes such as mismatch-repair related genes. These findings show that chromothripsis is a major process driving genome evolution in human cancer.


2016 ◽  
Vol 56 (1) ◽  
pp. 15.9.1-15.9.17 ◽  
Author(s):  
Keiran M. Raine ◽  
Peter Van Loo ◽  
David C. Wedge ◽  
David Jones ◽  
Andrew Menzies ◽  
...  

Gene ◽  
2019 ◽  
Vol 699 ◽  
pp. 145-154 ◽  
Author(s):  
Mengqin Duan ◽  
Liang Chen ◽  
Qinyu Ge ◽  
Na Lu ◽  
Junji Li ◽  
...  

2018 ◽  
Author(s):  
Eric R. Lucas ◽  
Alistair Miles ◽  
Nicholas J. Harding ◽  
Chris S. Clarkson ◽  
Mara K. N. Lawniczak ◽  
...  

AbstractBackgroundPolymorphisms in the copy number of a genetic region can influence gene expression, coding sequence and zygosity, making them powerful actors in the evolutionary process. Copy number variants (CNVs) are however understudied, being more difficult to detect than single nucleotide polymorphisms. We take advantage of the intense selective pressures on the major malaria vector Anopheles gambiae, caused by the widespread use of insecticides for malaria control, to investigate the role of CNVs in the evolution of insecticide resistance.ResultsUsing the whole-genome sequencing data from 1142 samples in the An. gambiae 1000 genomes project, we identified 1557 independent increases in copy number, encompassing a total of 267 genes, which were enriched for gene families linked to metabolic insecticide resistance. The five major candidate genes for metabolic resistance were all found in at least one CNV, and were often the target of multiple independent CNVs, reaching as many as 16 CNVs in Cyp9k1. These CNVs have furthermore been spreading due to positive selection, indicated by high local CNV frequencies and extended haplotype homozygosity.ConclusionsOur results demonstrate the importance of CNVs in the response to selection, with CNVs being closely associated with genes involved in the evolution of resistance to insecticides, highlighting the urgent need to identify their relative contributions to resistance and to track their spread as the application of insecticide in malaria endemic countries intensifies. Our detailed descriptions of CNVs found across the species range provides the tools to do so.


2021 ◽  
Author(s):  
Carolin M Sauer ◽  
Matthew D Eldridge ◽  
Maria Vias ◽  
James A Hall ◽  
Samantha E Boyle ◽  
...  

Low-coverage or shallow whole genome sequencing (sWGS) approaches can efficiently detect somatic copy number aberrations (SCNAs) at low cost. This is clinically important for many cancers, in particular cancers with severe chromosomal instability (CIN) that frequently lack actionable point mutations and are characterised by poor disease outcome. Absolute copy number (ACN), measured in DNA copies per cancer cell, is required for meaningful comparisons between copy number states, but is challenging to estimate and in practice often requires manual curation. Using a total of 60 cancer cell lines, 148 patient-derived xenograft (PDX) and 142 clinical tissue samples, we evaluate the performance of available tools for obtaining ACN from sWGS. We provide a validated and refined tool called Rascal (relative to absolute copy number scaling) that provides improved fitting algorithms and enables interactive visualisation of copy number profiles. These approaches are highly applicable to both pre-clinical and translational research studies on SCNA-driven cancers and provide more robust ACN fits from sWGS data than currently available tools.


Sign in / Sign up

Export Citation Format

Share Document