scholarly journals Measuring the invisible - The sequences causal of genome size differences in eyebrights (Euphrasia) revealed by k-mers

2021 ◽  
Author(s):  
Hannes Becher ◽  
Jacob Sampson ◽  
Alex D Twyford

Genome size variation within plant (and other) taxa may be due to presence/absence variation in low-copy sequences or copy number variation in genomic repeats of various frequency classes. However, identifying the sequences underpinning genome size variation has been challenging because genome assemblies commonly contain collapsed representations of repetitive sequences and because genome skimming studies miss low-copy number sequences. Here, we take a novel approach based on k-mers, short sub-sequences of equal length k, generated from whole genome sequencing data of diploid eyebrights (Euphrasia), a group of plants which have considerable genome size variation within a ploidy level. We compare k-mer inventories within and between closely related species, and quantify the contribution of different copy number classes to genome size differences. We further assign high-copy number k-mers to specific repeat types as retrieved from the RepeatExplorer2 pipeline. We find complex patterns of k-mer differences between samples. While all copy number classes contributed to genome size variation, the largest contribution came from repeats with 1000-10,000 genomic copies including the 45S rDNA satellite DNA and, unexpectedly, a repeat associated with an Angela transposable element. We also find size differences in the low-copy number class, likely indicating differences in gene space between our samples. In this study, we demonstrate that it is possible to pinpoint the sequences causing genome size variation within species without use of a reference genome. Such sequences can serve as targets for future cytogenetic studies. We also show that studies of genome size variation should go beyond repeats and consider the whole genome. To allow future work with other taxonomic groups, we share our analysis pipeline, which is straightforward to run, relying largely on standard GNU command line tools.

2014 ◽  
Vol 13s3 ◽  
pp. CIN.S14023
Author(s):  
Hatice Gulcin Ozer ◽  
Aisulu Usubalieva ◽  
Adrienne Dorrance ◽  
Ayse Selen Yilmaz ◽  
Michael Caligiuri ◽  
...  

The genome-wide discoveries such as detection of copy number alterations (CNA) from high-throughput whole-genome sequencing data enabled new developments in personalized medicine. The CNAs have been reported to be associated with various diseases and cancers including acute myeloid leukemia. However, there are multiple challenges to the use of current CNA detection tools that lead to high false-positive rates and thus impede widespread use of such tools in cancer research. In this paper, we discuss these issues and propose possible solutions. First, since the entire genome cannot be mapped due to some regions lacking sequence uniqueness, current methods cannot be appropriately adjusted to handle these regions in the analyses. Thus, detection of medium-sized CNAs is also being directly affected by these mappability problems. The requirement for matching control samples is also an important limitation because acquiring matching controls might not be possible or might not be cost efficient. Here we present an approach that addresses these issues and detects medium-sized CNAs in cancer genomes by (1) masking unmappable regions during the initial CNA detection phase, (2) using pool of a few normal samples as control, and (3) employing median filtering to adjust CNA ratios to its surrounding coverage and eliminate false positives.


2021 ◽  
Author(s):  
Stephanie L Battle ◽  
Daniela Puiu ◽  
Eric Boerwinkle ◽  
Kent Taylor ◽  
Jerome Rotter ◽  
...  

Mitochondrial diseases are a heterogeneous group of disorders that can be caused by mutations in the nuclear or mitochondrial genome. Mitochondrial DNA variants may exist in a state of heteroplasmy, where a percentage of DNA molecules harbor a variant, or homoplasmy, where all DNA molecules have a variant. The relative quantity of mtDNA in a cell, or copy number (mtDNA-CN), is associated with mitochondrial function, human disease, and mortality. To facilitate accurate identification of heteroplasmy and quantify mtDNA-CN, we built a bioinformatics pipeline that takes whole genome sequencing data and outputs mitochondrial variants, and mtDNA-CN. We incorporate variant annotations to facilitate determination of variant significance. Our pipeline yields uniform coverage by remapping to a circularized chrM and recovering reads falsely mapped to nuclear-encoded mitochondrial sequences. Notably, we construct a consensus chrM sequence for each sample and recall heteroplasmy against the sample's unique mitochondrial genome. We observe an approximately 3-fold increased association with age for heteroplasmic variants in non-homopolymer regions and, are better able to capture genetic variation in the D-loop of chrM compared to existing software. Our bioinformatics pipeline more accurately captures features of mitochondrial genetics than existing pipelines that are important in understanding how mitochondrial dysfunction contributes to disease.


2018 ◽  
Author(s):  
Isidro Cortés-Ciriano ◽  
June-Koo Lee ◽  
Ruibin Xi ◽  
Dhawal Jain ◽  
Youngsook L. Jung ◽  
...  

SummaryChromothripsis is a newly discovered mutational phenomenon involving massive, clustered genomic rearrangements that occurs in cancer and other diseases. Recent studies in cancer suggest that chromothripsis may be far more common than initially inferred from low resolution DNA copy number data. Here, we analyze the patterns of chromothripsis across 2,658 tumors spanning 39 cancer types using whole-genome sequencing data. We find that chromothripsis events are pervasive across cancers, with a frequency of >50% in several cancer types. Whereas canonical chromothripsis profiles display oscillations between two copy number states, a considerable fraction of the events involves multiple chromosomes as well as additional structural alterations. In addition to non-homologous end-joining, we detect signatures of replicative processes and templated insertions. Chromothripsis contributes to oncogene amplification as well as to inactivation of genes such as mismatch-repair related genes. These findings show that chromothripsis is a major process driving genome evolution in human cancer.


2017 ◽  
Author(s):  
James HR Farmery ◽  
Mike L Smith ◽  
Andy G Lynch ◽  

ABSTRACTTelomere length is a risk factor in disease and the dynamics of telomere length are crucial to our understanding of cell replication and vitality. The proliferation of whole genome sequencing represents an unprecedented opportunity to glean new insights into telomere biology on a previously unimaginable scale. To this end, a number of approaches for estimating telomere length from whole-genome sequencing data have been proposed. Here we present Telomerecat, a novel approach to the estimation of telomere length. Previous methods have been dependent on the number of telomeres present in a cell being known, which may be problematic when analysing aneuploid cancer data and non-human samples. Telomerecat is designed to be agnostic to the number of telomeres present, making it suited for the purpose of estimating telomere length in cancer studies. Telomerecat also accounts for interstitial telomeric reads and presents a novel approach to dealing with sequencing errors. We show that Telomerecat performs well at telomere length estimation when compared to leading experimental and computational methods. Furthermore, we show that it detects expected patterns in longitudinal data, technical replicates, and cross-species comparisons. We also apply the method to a cancer cell data, uncovering an interesting relationship with the underlying telomerase genotype.


2016 ◽  
Vol 56 (1) ◽  
pp. 15.9.1-15.9.17 ◽  
Author(s):  
Keiran M. Raine ◽  
Peter Van Loo ◽  
David C. Wedge ◽  
David Jones ◽  
Andrew Menzies ◽  
...  

Author(s):  
Pavel Neumann ◽  
Ludmila Oliveira ◽  
Jana Čížková ◽  
Tae-Soo Jang ◽  
Sonja Klemme ◽  
...  

SummaryThe parasitic genus Cuscuta (Convolvulaceae) is exceptional among plants with respect to centromere organization, including both monocentric and holocentric chromosomes, and substantial variation in genome size and chromosome number. We investigated 12 species representing the diversity of the genus in a phylogenetic context to reveal the molecular and evolutionary processes leading to diversification of their genomes.We measured genome sizes and investigated karyotypes and centromere organization using molecular cytogenetic techniques. We also performed low-pass whole genome sequencing and comparative analysis of repetitive DNA composition.A remarkable 102-fold variation in genome sizes (342–34,734 Mbp/1C) was detected for monocentric Cuscuta species, while genomes of holocentric species were of moderate sizes (533–1,545 Mbp/1C). The genome size variation was primarily driven by the differential accumulation of repetitive sequences. The transition to holocentric chromosomes in the subgenus Cuscuta was associated with loss of histone H2A phosphorylation and elimination of centromeric retrotransposons. In addition, the basic chromosome number (x) decreased from 15 to 7, presumably due to chromosome fusions.We demonstrated that the transition to holocentricity in Cuscuta was accompanied by significant changes in epigenetic marks, chromosome number and the repetitive DNA sequence composition.


2020 ◽  
Author(s):  
Yuan yuan Lei ◽  
Guo chao Zhang ◽  
Chao qi Zhang ◽  
li yan xue ◽  
Zhen lin Yang ◽  
...  

Abstract Background: Genomic instability plays a large role in the process of cancer. Tumor mutational burden (TMB) is closely related to immunotherapy outcome and is an important manifestation of genomic instability. However, the cost of TMB detection is extremely high, which limits the use of TMB in clinical practice. Another new indicator of genome instability, CNVA (the average copy number variation) which calculates the changes of 0.5 Mb chromosomal fragments, requires extremely low sequencing depth, and is expected to replace TMB as a new marker of immune efficacy.Methods: A total of 50 samples (23 of which came from patients who received immunotherapy) were subjected to low-depth (10X) chromosome sequencing on the MGI platform. CNVA was calculated by the formula avg (abs (copy number-2)). Then, we analyzed the relationship between CNVA and immune infiltration or immunotherapy efficacy. In addition, through the analysis of whole genome sequencing data of 509 lung adenocarcinoma in the TCGA database, we compared CNVA with classic marker TMB to evaluate the value of CNVA as an immune evaluation index.Results: Compared with the low CNVA group, the high CNVA group had higher expression of PD-L1, CD39 and CD19, and more infiltration of CD8 + T cells and CD3 + T cells. Among the 23 patients treated with immunotherapy, the average CNVA value of the SD (stable disease)/PR (partial response) group was higher than that of the PD (progressive disease) group (P <0.05). The data of whole genome sequencing data of 509 lung adenocarcinomas from TCGA and real-time quantitative PCR results of 22 frozen specimens found that CNVA was more correlated with CD8 and PD-L1 than TMB. In addition, CNVA showed a specific positive correlation with TMB (r = 0.2728, p < 0.0001).Conclusion: CNVA can be a good indicator of immune infiltration and predicting immunotherapy efficacy. With its low cost and potential clinical application for testing, it is expected to become a substitute for TMB.


Sign in / Sign up

Export Citation Format

Share Document