scholarly journals Comparative Analysis of Genomic Repeat Content in Gomphocerine Grasshoppers Reveals Expansion of Satellite DNA and Helitrons in Species with Unusually Large Genomes

2020 ◽  
Vol 12 (7) ◽  
pp. 1180-1193
Author(s):  
Abhijeet Shah ◽  
Joseph I Hoffman ◽  
Holger Schielzeth

Abstract Eukaryotic organisms vary widely in genome size and much of this variation can be explained by differences in the abundance of repetitive elements. However, the phylogenetic distributions and turnover rates of repetitive elements are largely unknown, particularly for species with large genomes. We therefore used de novo repeat identification based on low coverage whole-genome sequencing to characterize the repeatomes of six species of gomphocerine grasshoppers, an insect clade characterized by unusually large and variable genome sizes. Genome sizes of the six species ranged from 8.4 to 14.0 pg DNA per haploid genome and thus include the second largest insect genome documented so far (with the largest being another acridid grasshopper). Estimated repeat content ranged from 79% to 96% and was strongly correlated with genome size. Averaged over species, these grasshopper repeatomes comprised significant amounts of DNA transposons (24%), LINE elements (21%), helitrons (13%), LTR retrotransposons (12%), and satellite DNA (8.5%). The contribution of satellite DNA was particularly variable (ranging from <1% to 33%) as was the contribution of helitrons (ranging from 7% to 20%). The age distribution of divergence within clusters was unimodal with peaks ∼4–6%. The phylogenetic distribution of repetitive elements was suggestive of an expansion of satellite DNA in the lineages leading to the two species with the largest genomes. Although speculative at this stage, we suggest that the expansion of satellite DNA could be secondary and might possibly have been favored by selection as a means of stabilizing greatly expanded genomes.

Author(s):  
Aleksandra Beric ◽  
Makenzie E Mabry ◽  
Alex E Harkess ◽  
Julia Brose ◽  
M Eric Schranz ◽  
...  

Abstract Genome sizes of plants have long piqued the interest of researchers due to the vast differences among organisms. However, the mechanisms that drive size differences have yet to be fully understood. Two important contributing factors to genome size are expansions of repetitive elements, such as transposable elements (TEs), and whole-genome duplications (WGD). Although studies have found correlations between genome size and both TE abundance and polyploidy, these studies typically test for these patterns within a genus or species. The plant order Brassicales provides an excellent system to further test if genome size evolution patterns are consistent across larger time scales, as there are numerous WGDs. This order is also home to one of the smallest plant genomes, Arabidopsis thaliana—chosen as the model plant system for this reason—as well as to species with very large genomes. With new methods that allow for TE characterization from low-coverage genome shotgun data and 71 taxa across the Brassicales, we confirm correlation between genome size and TE content, however, we are unable to reconstruct phylogenetic relationships and do not detect any shift in TE abundance associated with WGD.


Genes ◽  
2021 ◽  
Vol 12 (11) ◽  
pp. 1710
Author(s):  
J. Antonio Baeza ◽  
José Luis Molina-Quirós ◽  
Sebastián Hernández-Muñoz

The ‘Pez Gallo’ or the Roosterfish, Nematistius pectoralis, is an ecologically relevant species in the shallow water soft-bottom environments and a target of a most lucrative recreational sport fishery in the Central Eastern Pacific Ocean. According to the International Union for Conservation of Nature, N. pectoralis is assessed globally as Data Deficient. Using low-coverage short Illumina 300 bp pair-end reads sequencing, this study reports, for the first time, the genome size, single/low-copy genome content, and nuclear repetitive elements, including the 45S rRNA DNA operon and microsatellites, in N. pectoralis. The haploid genome size estimated using a k-mer approach was 816.04 Mbp, which is within the range previously reported for other representatives of the Carangiformes order. Single/low-copy genome content (63%) was relatively high. A large portion of repetitive sequences could not be assigned to the known repeat element families. Considering only annotated repetitive elements, the most common were classified as Satellite DNA which were considerably more abundant than Class I-Long Interspersed Nuclear Elements and Class I-LTR Retroviral elements. The nuclear ribosomal operon in N. pectoralis consists of, in the following order: a 5′ ETS (length = 948 bp), ssrDNA (1835 bp), ITS1 (724 bp), a 5.8S rDNA (158 bp), ITS2 (508 bp), lsrDNA (3924 bp), and a 3′ ETS (32 bp). A total of 44 SSRs were identified. These newly developed genomic resources are most relevant for improving the understanding of biology, developing conservation plans, and managing the fishery of the iconic N. pectoralis.


2017 ◽  
Author(s):  
Teri Evans ◽  
Andrew Johnson ◽  
Matt Loose

AbstractLarge repeat rich genomes present challenges for assembly and identification of gene models with short read technologies. Here we present a method we call Virtual Genome Walking which uses an iterative assembly approach to first identify exons from de-novo assembled transcripts and assemble whole genome reads against each exon. This process is iterated allowing the extension of exons. These linked assemblies are refined to generate gene models including upstream and downstream genomic sequence as well as intronic sequence. We test this method using a 20X genomic read set for the axolotl, the genome of which is estimated to be 30 Gb in size. These reads were previously reported to be effectively impossible to assemble. Here we provide almost 1 Gb of assembled sequence describing over 19,000 gene models for the axolotl. Gene models stop assembling either due to localised low coverage in the genomic reads, or the presence of repeats. We validate our observations by comparison with previously published axolotl bacterial artificial chromosome (BAC) sequences. In addition we analysed axolotl intron length, intron-exon structure, repeat content and synteny. These gene-models, sequences and annotations are freely available for download from https://tinyurl.com/y8gydc6n. The software pipeline including a docker image is available from https://github.com/LooseLab/iterassemble. These methods will increase the value of low coverage sequencing of understudied model systems.


2021 ◽  
Vol 11 ◽  
Author(s):  
Ljudevit Luka Boštjančić ◽  
Lena Bonassin ◽  
Lucija Anušić ◽  
Leona Lovrenčić ◽  
Višnja Besendorfer ◽  
...  

Pontastacus leptodactylus is a native European crayfish species found in both freshwater and brackish environments. It has commercial importance for fisheries and aquaculture industries. Up till now, most studies concerning P. leptodactylus have focused onto gaining knowledge about its phylogeny and population genetics. However, little is known about the chromosomal evolution and genome organization of this species. Therefore, we performed clustering analysis of a low coverage genomic dataset to identify and characterize repetitive DNA in the P. leptodactylus genome. In addition, the karyogram of P. leptodactylus (2n = 180) is presented here for the first time consisting of 75 metacentric, 14 submetacentric, and a submetacentric/metacentric heteromorphic chromosome pair. We determined the genome size to be at ~18.7 gigabase pairs. Repetitive DNA represents about 54.85% of the genome. Satellite DNA repeats are the most abundant type of repetitive DNA, making up to ~28% of the total amount of repetitive elements, followed by the Ty3/Gypsy retroelements (~15%). Our study established a surprisingly high diversity of satellite repeats in P. leptodactylus. The genome of P. leptodactylus is by far the most satellite-rich genome discovered to date with 258 satellite families described. Of the five mapped satellite DNA families on chromosomes, PlSAT3-411 co-localizes with the AT-rich DAPI positive probable (peri)centromeric heterochromatin on all chromosomes, while PlSAT14-79 co-localizes with the AT-rich DAPI positive (peri)centromeric heterochromatin on one chromosome and is also located subterminally and intercalary on some chromosomes. PlSAT1-21 is located intercalary in the vicinity of the (peri)centromeric heterochromatin on some chromosomes, while PlSAT6-70 and PlSAT7-134 are located intercalary on some P. leptodactylus chromosomes. The FISH results reveal amplification of interstitial telomeric repeats (ITRs) in P. leptodactylus. The prevalence of repetitive elements, especially the satellite DNA repeats, may have provided a driving force for the evolution of the P. leptodactylus genome.


Genome ◽  
2013 ◽  
Vol 56 (9) ◽  
pp. 487-494 ◽  
Author(s):  
Kate L. Hertweck

The research field of comparative genomics is moving from a focus on genes to a more holistic view including the repetitive complement. This study aimed to characterize relative proportions of the repetitive fraction of large, complex genomes in a nonmodel system. The monocotyledonous plant order Asparagales (onion, asparagus, agave) comprises some of the largest angiosperm genomes and represents variation in both genome size and structure (karyotype). Anonymous, low coverage, single-end Illumina data from 11 exemplar Asparagales taxa were assembled using a de novo method. Resulting contigs were annotated using a reference library of available monocot repetitive sequences. Mapping reads to contigs provided rough estimates of relative proportions of each type of transposon in the nuclear genome. The results were parsed into general repeat types and synthesized with genome size estimates and a phylogenetic context to describe the pattern of transposable element evolution among these lineages. The major finding is that although some lineages in Asparagales exhibit conservation in repeat proportions, there is generally wide variation in types and frequency of repeats. This approach is an appropriate first step in characterizing repeats in evolutionary lineages with a paucity of genomic resources.


BMC Biology ◽  
2021 ◽  
Vol 19 (1) ◽  
Author(s):  
C. P. Stelzer ◽  
J. Blommaert ◽  
A. M. Waldvogel ◽  
M. Pichler ◽  
B. Hecox-Lea ◽  
...  

Abstract Background Eukaryotic genomes are known to display an enormous variation in size, but the evolutionary causes of this phenomenon are still poorly understood. To obtain mechanistic insights into such variation, previous studies have often employed comparative genomics approaches involving closely related species or geographically isolated populations within a species. Genome comparisons among individuals of the same population remained so far understudied—despite their great potential in providing a microevolutionary perspective to genome size evolution. The rotifer Brachionus asplanchnoidis represents one of the most extreme cases of within-population genome size variation among eukaryotes, displaying almost twofold variation within a geographic population. Results Here, we used a whole-genome sequencing approach to identify the underlying DNA sequence differences by assembling a high-quality reference genome draft for one individual of the population and aligning short reads of 15 individuals from the same geographic population including the reference individual. We identified several large, contiguous copy number variable regions (CNVs), up to megabases in size, which exhibited striking coverage differences among individuals, and whose coverage overall scaled with genome size. CNVs were of remarkably low complexity, being mainly composed of tandemly repeated satellite DNA with only a few interspersed genes or other sequences, and were characterized by a significantly elevated GC-content. CNV patterns in offspring of two parents with divergent genome size and CNV patterns in several individuals from an inbred line differing in genome size demonstrated inheritance and accumulation of CNVs across generations. Conclusions By identifying the exact genomic elements that cause within-population genome size variation, our study paves the way for studying genome size evolution in contemporary populations rather than inferring patterns and processes a posteriori from species comparisons.


2021 ◽  
Vol 70 (1) ◽  
pp. 156-169
Author(s):  
Deepak Ohri

Abstract Gymnosperms show a significantly higher mean (1C=18.16, 1Cx=16.80) and a narrow range (16.89-fold) of genome sizes as compared with angiosperms. Among the 12 families the largest ranges of 1C values is shown by Ephedraceae (4.73-fold) and Cupressaceae (4.45-fold) which are partly due to polyploidy as 1Cx values vary 2.41 and 1.37-fold respectively. In rest of the families which have only diploid taxa the range of 1C values is from 1.18-fold (Cycadaeae) to 4.36-fold (Podocarpaceae). The question is how gymnosperms acquired such big genome sizes despite the rarity of recent instances of polyploidy. A general survey of different families and genera shows that gymnosperms have experienced both increase and decrease in their genome size during evolution. Various genomic components which have accounted for these large genomes have been discussed. The major contributors are the transposable elements particularly LTR-retrotransposons comprising of Ty3gypsy, Ty1copia and gymny superfamilies which are most widespread. The genomes of gymnosperms have been acquiring diverse LTR-RTs in their long evolution in the absence of any efficient mechanism of their elimination. The epigenetic machinery which silences these large tracts of repeat sequences into the stretches of heterochromatin and the adaptive value of these silenced repeat sequences need further investigation.


Blood ◽  
2020 ◽  
Vol 136 (Supplement 1) ◽  
pp. 52-53
Author(s):  
Kylee H Maclachlan ◽  
Binbin Zheng-Lin ◽  
Venkata Yellapantula ◽  
Andriy Derkach ◽  
Even H Rustad ◽  
...  

Chromothripsis is emerging as a strong and independent prognostic factor in multiple myeloma (MM), predicting shorter progression-free (PFS) and overall survival (Rustad BioRxiv 2019). Reliable detection requires whole genome sequencing (WGS), with 24% prevalence in 752 newly diagnosed multiple myeloma (NDMM) from CoMMpass (NCT01454297, Rustad BioRxiv 2019) compared with 1.3% by array-based techniques (Magrangeas Blood 2011). In MM, chromothripsis presents differently to solid cancers. Although the biological impact is similar across malignancies, in MM the structural complexity of chromothriptic events is typically lower. In addition, chromothripsis can occur early in MM development and remain stable over time (Maura Nat Comm 2019). Computational algorithms for chromothripsis detection (e.g. ShatterSeek; Cortes-Ciriano Nat Gen 2018) were developed in solid cancers and are accurate in that setting. Running ShatterSeek on 752 NDMM patients with low coverage WGS from CoMMpass, we observed a high specificity for chromothripsis (98.3%) but poor sensitivity (30.2%). ShatterSeek detected chromothripsis in 64/752 samples (8.5%), with 85% confirmed on manual curation; however, missed 114 cases located by manual curation. This indicates that MM-specific computational methods are required. We hypothesized that a signature analysis approach using copy number variation (CNV) may provide an accurate estimation of chromothripsis. We adapted CNV signature analysis, developed in ovarian cancer (Macintyre Nat Gen 2018), to now detect MM-specific CNV and structural features. The analysis utilizes 6 fundamental CN features: i) absolute CN of segments, ii) difference in CN in adjacent segments, iii) breakpoints per 10 Mb, iv) breakpoints per chromosome arm, v) lengths of oscillating CN segment chains, and vi) the size of segments. The optimal number of categories in each CNV feature was established using a mixed effect model (mclust R package). Using CoMMpass low-coverage WGS, de novo extraction using the hierarchical dirichlet process defined 5 signatures, 2 of which (CNV-SIG 4 and CNV-SIG 5) contain features associated with chromothripsis: longer lengths of oscillating CN states, higher numbers of breakpoints / chromosome arm, and higher total numbers of small segments of CN change. Next, we demonstrate that CNV signatures are highly predictive of chromothripsis (average area-under-the-curve /AUC = 0.9, based on 10-fold cross validation). Chromothripsis-associated CNV signatures are correlated with biallelic TP53 inactivation (p= 0.01) and gain1q21 (p<0.001) and show negative association with t(11;14) (p<0.001). Chromothriptic signatures were associated with shorter PFS, with multivariate analysis after correction for ISS, age, biallelic TP53 inactivation, t(4;14) and gain1q21 producing a hazard ratio of 2.9 (95% CI 1.07-7.7, p = 0.036). A validation set of 29 NDMM WGS confirmed the ability of CNV signatures to predict chromothripsis (AUC 0.87). As WGS is currently too expensive and computationally intensive to employ in routine practice, we investigated if CNV signatures can predict chromothripsis without using WGS. First, we performed de novo signature extraction using whole exome data from 865 CoMMpass samples. CNV signatures extracted without reference to WGS produced an AUC = 0.81 for predicting chromothripsis (in those with WGS to confirm; n =752), and the chromothriptic-signatures confirmed the association with a shorter PFS (HR=7.2, 95%CI 1.32-39.4, p = 0.022). Second, we applied CNV signature analysis to NDMM having either the myTYPE targeted sequencing panel (n= 113; Yellapantula, Blood Can J 2019) or a single nucleotide polymorphism (SNP) array (n= 217). CNV signature assessment by each technology was predictive of clinical outcome, likely due to the detection of chromothripsis. As with WGS, multivariate analysis confirmed CNV signatures to be independently prognostic (myTYPE; p = 0.003, SNP; p = 0.004). Overall, we demonstrate that CNV signature analysis in NDMM provides a highly accurate prediction of chromothripsis. CNV signature assessment remains reliable by multiple surrogate measures, without requiring WGS. Chromothripsis-associated CNV signatures are an independent and adverse prognostic factor, potentially allowing refinement of standard prognostic scores for NDMM patients and providing a more accurate risk stratification for clinical trials. Disclosures Hultcrantz: Amgen: Research Funding; Daiichi Sankyo: Research Funding; GSK: Research Funding; Intellisphere LLC: Consultancy. Dogan:Takeda: Consultancy; National Cancer Institute: Research Funding; Roche: Consultancy, Research Funding; Seattle Genetics: Consultancy; AbbVie: Consultancy; EUSA Pharma: Consultancy; Physicians Education Resource: Consultancy; Corvus Pharmaceuticals: Consultancy. Morgan:Bristol-Myers Squibb: Consultancy, Honoraria; Janssen: Research Funding; Karyopharm: Consultancy, Honoraria; Amgen: Consultancy, Honoraria; Takeda: Consultancy, Honoraria; Celgene: Consultancy, Honoraria, Research Funding; Roche: Consultancy, Honoraria; GSK: Consultancy, Honoraria. Landgren:Cellectis: Consultancy, Honoraria; Takeda: Other: Independent Data Monitoring Committees for clinical trials, Research Funding; BMS: Consultancy, Honoraria; Adaptive: Consultancy, Honoraria; Takeda: Other: Independent Data Monitoring Committees for clinical trials, Research Funding; Glenmark: Consultancy, Honoraria, Research Funding; Seattle Genetics: Research Funding; Binding Site: Consultancy, Honoraria; Karyopharma: Research Funding; Merck: Other; BMS: Consultancy, Honoraria; Karyopharma: Research Funding; Merck: Other; Pfizer: Consultancy, Honoraria; Celgene: Consultancy, Honoraria, Research Funding; Seattle Genetics: Research Funding; Juno: Consultancy, Honoraria; Juno: Consultancy, Honoraria; Janssen: Consultancy, Honoraria, Other: Independent Data Monitoring Committees for clinical trials, Research Funding; Celgene: Consultancy, Honoraria, Research Funding; Janssen: Consultancy, Honoraria, Other: Independent Data Monitoring Committees for clinical trials, Research Funding; Pfizer: Consultancy, Honoraria; Amgen: Consultancy, Honoraria, Research Funding; Cellectis: Consultancy, Honoraria; Glenmark: Consultancy, Honoraria, Research Funding; Binding Site: Consultancy, Honoraria.


2020 ◽  
Vol 37 (9) ◽  
pp. 2549-2567 ◽  
Author(s):  
Gavin C Woodruff ◽  
Anastasia A Teterina

Abstract The abundance, diversity, and genomic distribution of repetitive elements is highly variable among species. These patterns are thought to be driven in part by reproductive mode and the interaction of selection and recombination, and recombination rates typically vary by chromosomal position. In the nematode Caenorhabditis elegans, repetitive elements are enriched at chromosome arms and depleted on centers, and this mirrors the chromosomal distributions of other genomic features such as recombination rate. How conserved is this genomic landscape of repeats, and what evolutionary forces maintain it? To address this, we compared the genomic organization of repetitive elements across five Caenorhabditis species with chromosome-level assemblies. As previously reported, repeat content is enriched on chromosome arms in most Caenorhabditis species, and no obvious patterns of repeat content associated with reproductive mode were observed. However, the fig-associated C. inopinata has experienced repetitive element expansion and reveals no association of global repeat density with chromosome position. Patterns of repeat superfamily specific distributions reveal this global pattern is driven largely by a few repeat superfamilies that in C. inopinata have expanded in number and have weak associations with chromosome position. Additionally, 15% of predicted protein-coding genes in C. inopinata align to transposon-related proteins. When these are excluded, C. inopinata has no enrichment of genes in chromosome centers, in contrast to its close relatives who all have such clusters. Forward evolutionary simulations reveal that chromosomal heterogeneity in recombination rate alone can generate structured repetitive genomic landscapes when insertions are weakly deleterious, whereas chromosomal heterogeneity in the fitness effects of transposon insertion can promote such landscapes across a variety of evolutionary scenarios. Thus, patterns of gene density along chromosomes likely contribute to global repetitive landscapes in this group, although other historical or genomic factors are needed to explain the idiosyncrasy of genomic organization of various transposable element taxa within C. inopinata. Taken together, these results highlight the power of comparative genomics and evolutionary simulations in testing hypotheses regarding the causes of genome organization.


Sign in / Sign up

Export Citation Format

Share Document