Evolutionary Dynamics of Abundant 7-bp Satellites in the Genome of Drosophila virilis

Abstract The factors that drive the rapid changes in abundance of tandem arrays of highly repetitive sequences, known as satellite DNA, are not well understood. Drosophila virilis has one of the highest relative amounts of simple satellites of any organism that has been studied, with an estimated >40% of its genome composed of a few related 7-bp satellites. Here, we use D. virilis as a model to understand technical biases affecting satellite sequencing and the evolutionary processes that drive satellite composition. By analyzing sequencing data from Illumina, PacBio, and Nanopore platforms, we identify platform-specific biases and suggest best practices for accurate characterization of satellites by sequencing. We use comparative genomics and cytogenetics to demonstrate that the highly abundant AAACTAC satellite family arose from a related satellite in the branch leading to the virilis phylad 4.5–11 Ma before exploding in abundance in some species of the clade. The most abundant satellite is conserved in sequence and location in the pericentromeric region but has diverged widely in abundance among species, whereas the satellites nearest the centromere are rapidly turning over in sequence composition. By analyzing multiple strains of D. virilis, we saw that the abundances of two centromere-proximal satellites are anticorrelated along a geographical gradient, which we suggest could be caused by ongoing conflicts at the centromere. In conclusion, we illuminate several key attributes of satellite evolutionary dynamics that we hypothesize to be driven by processes including selection, meiotic drive, and constraints on satellite sequence and abundance.

Download Full-text

Evolutionary dynamics of abundant 7 bp satellites in the genome ofDrosophila virilis

10.1101/693077 ◽

2019 ◽

Cited By ~ 3

Author(s):

Jullien M. Flynn ◽

Manyuan Long ◽

Rod A. Wing ◽

Andrew G. Clark

Keyword(s):

Best Practices ◽

Evolutionary Dynamics ◽

Pericentromeric Region ◽

Drosophila Virilis ◽

Sequencing Data ◽

Satellite Sequence ◽

Sequence Composition ◽

Rapid Changes ◽

Multiple Strains

AbstractThe factors that drive the rapid changes in satellite DNA genomic composition we see in eukaryotes are not well understood.Drosophila virilishas one of the highest relative amounts of simple satellites of any organism that has been studied, with an estimated >40% of its genome composed of a few related 7 bp satellites. Here we useD. virilisas a model to understand technical biases affecting satellite sequencing and the evolutionary processes that drive satellite composition. By analyzing sequencing data from Illumina, PacBio, and Nanopore platforms, we identify platform-specific biases and suggest best practices for accurate characterization of satellites by sequencing. We use comparative genomics and cytogenetics to demonstrate that the highly abundant satellite family arose from a related satellite in the branch leading to the virilis phylad 4.5 - 11 million years ago before exploding in abundance in some species of the clade. The most abundant satellite is conserved in sequence and location in the pericentromeric region but has diverged widely in abundance among species, whereas the satellites nearest the centromere are rapidly turning over in sequence composition. By analyzing multiple strains ofD. virilis, we saw that one centromere-proximal satellite is increasing in abundance along a geographical gradient while the other is contracting in an anti-correlated manner, suggesting ongoing conflicts at the centromere. In conclusion, we illuminate several key attributes of satellite evolutionary dynamics that we hypothesize to be driven by processes like selection, meiotic drive, and constraints on satellite sequence and abundance.

Download Full-text

Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data

BMC Bioinformatics ◽

10.1186/1471-2105-11-378 ◽

2010 ◽

Vol 11 (1) ◽

pp. 378 ◽

Cited By ~ 246

Author(s):

Petr Novák ◽

Pavel Neumann ◽

Jiří Macas

Keyword(s):

Next Generation Sequencing ◽

Repetitive Sequences ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing ◽

Graph Based Clustering

Download Full-text

The Tritryps comparative repeatome: insights on repetitive element evolution in Trypanosomatid pathogens

10.1101/387217 ◽

2018 ◽

Author(s):

Sebastián Pita ◽

Florencia Díaz-Viraqué ◽

Gregorio Iraola ◽

Carlos Robello

Keyword(s):

Repetitive Dna ◽

Leishmania Major ◽

Evolutionary Dynamics ◽

De Novo ◽

Low Cost ◽

Repetitive Sequences ◽

Human Pathogens ◽

Genome Wide ◽

Low Coverage

AbstractThe major human pathogens Trypanosoma cruzi, Trypanosoma brucei and Leishmania major are collectively known as the Tritryps. The initial comparative analysis of their genomes has uncovered that Tritryps share a great number of genes, but repetitive DNA seems to be extremely variable between them. However, the in-depth characterization of repetitive DNA in these pathogens has been in part neglected, mainly due to the well-known technical challenges of studying repetitive sequences from de novo assemblies using short reads. Here, we compared the repetitive DNA repertories between the Tritryps genomes using genome-wide, low-coverage Illumina sequencing coupled to RepeatExplorer analysis. Our work demonstrates that this extensively implemented approach for studying higher eukaryote repeatomes is also useful for protozoan parasites like trypanosomatids, as we recovered previously observed differences in the presence and amount of repetitive DNA families. Additionally, our estimations of repetitive DNA abundance were comparable to those obtained from enhanced-quality assemblies using longer reads. Importantly, our methodology allowed us to describe a previously unknown transposable element in Leishmania major (TATE family), highlighting its potential to accurately recover distinctive features from poorly characterized repeatomes. Together, our results support the application of this low-cost, low-coverage sequencing approach for the extensive characterization of repetitive DNA evolutionary dynamics in trypanosomatid and other protozooan genomes.

Download Full-text

Evolutionary dynamics and chromosomal distribution of repetitive sequences on chromosomes of Aegilops speltoides revealed by genomic in situ hybridization

Heredity ◽

10.1046/j.1365-2540.2001.00891.x ◽

2001 ◽

Vol 86 (6) ◽

pp. 738-742 ◽

Cited By ~ 7

Author(s):

Alexander Belyayev ◽

Olga Raskina ◽

Eviatar Nevo

Keyword(s):

In Situ Hybridization ◽

Evolutionary Dynamics ◽

Repetitive Sequences ◽

Genomic In Situ Hybridization ◽

Aegilops Speltoides ◽

Chromosomal Distribution

Download Full-text

Chromosomal Map of the Model LegumeLotus japonicus

Genetics ◽

10.1093/genetics/161.4.1661 ◽

2002 ◽

Vol 161 (4) ◽

pp. 1661-1672 ◽

Cited By ~ 7

Author(s):

Andrea Pedrosa ◽

Niels Sandal ◽

Jens Stougaard ◽

Dieter Schweizer ◽

Andreas Bachmair

Keyword(s):

Repetitive Sequences ◽

Lotus Japonicus ◽

Chromosome Complement ◽

Linkage Groups ◽

Closely Related Species ◽

F1 Hybrid ◽

Bac Clones ◽

Genomic Regions ◽

Chromosomal Map

AbstractLotus japonicus is a model plant for the legume family. To facilitate map-based cloning approaches and genome analysis, we performed an extensive characterization of the chromosome complement of the species. A detailed karyotype of L. japonicus Gifu was built and plasmid and BAC clones, corresponding to genetically mapped markers (see the accompanying article by Sandal et al. 2002, this issue), were used for FISH to correlate genetic and chromosomal maps. Hybridization of DNA clones from 32 different genomic regions enabled the assignment of linkage groups to chromosomes, the comparison between genetic and physical distances throughout the genome, and the partial characterization of different repetitive sequences, including telomeric and centromeric repeats. Additional analysis of L. filicaulis and its F1 hybrid with L. japonicus demonstrated the occurrence of inversions between these closely related species, suggesting that these chromosome rearrangements are early events in speciation of this group.

Download Full-text

Development of a User-Friendly Pipeline for Mutational Analyses of HIV Using Ultra-Accurate Maximum-Depth Sequencing

Viruses ◽

10.3390/v13071338 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1338

Author(s):

Morgan E. Meissner ◽

Emily J. Julik ◽

Jonathan P. Badalamenti ◽

William G. Arndt ◽

Lauren J. Mills ◽

...

Keyword(s):

Error Rates ◽

Maximum Depth ◽

Sequencing Data ◽

Background Error ◽

High Background ◽

Immunodeficiency Virus ◽

User Friendly ◽

Viral Mutagenesis ◽

Hiv 1

Human immunodeficiency virus type 2 (HIV-2) accumulates fewer mutations during replication than HIV type 1 (HIV-1). Advanced studies of HIV-2 mutagenesis, however, have historically been confounded by high background error rates in traditional next-generation sequencing techniques. In this study, we describe the adaptation of the previously described maximum-depth sequencing (MDS) technique to studies of both HIV-1 and HIV-2 for the ultra-accurate characterization of viral mutagenesis. We also present the development of a user-friendly Galaxy workflow for the bioinformatic analyses of sequencing data generated using the MDS technique, designed to improve replicability and accessibility to molecular virologists. This adapted MDS technique and analysis pipeline were validated by comparisons with previously published analyses of the frequency and spectra of mutations in HIV-1 and HIV-2 and is readily expandable to studies of viral mutation across the genomes of both viruses. Using this novel sequencing pipeline, we observed that the background error rate was reduced 100-fold over standard Illumina error rates, and 10-fold over traditional unique molecular identifier (UMI)-based sequencing. This technical advancement will allow for the exploration of novel and previously unrecognized sources of viral mutagenesis in both HIV-1 and HIV-2, which will expand our understanding of retroviral diversity and evolution.

Download Full-text

Genomic Tackling of Human Satellite DNA: Breaking Barriers through Time

International Journal of Molecular Sciences ◽

10.3390/ijms22094707 ◽

2021 ◽

Vol 22 (9) ◽

pp. 4707

Author(s):

Mariana Lopes ◽

Sandra Louzada ◽

Margarida Gama-Carvalho ◽

Raquel Chaves

Keyword(s):

Human Genome ◽

Satellite Dna ◽

Repetitive Sequences ◽

Nucleotide Composition ◽

Genomic Component ◽

Genomic Studies ◽

Human Genomic ◽

Definition Of ◽

High Degree

(Peri)centromeric repetitive sequences and, more specifically, satellite DNA (satDNA) sequences, constitute a major human genomic component. SatDNA sequences can vary on a large number of features, including nucleotide composition, complexity, and abundance. Several satDNA families have been identified and characterized in the human genome through time, albeit at different speeds. Human satDNA families present a high degree of sub-variability, leading to the definition of various subfamilies with different organization and clustered localization. Evolution of satDNA analysis has enabled the progressive characterization of satDNA features. Despite recent advances in the sequencing of centromeric arrays, comprehensive genomic studies to assess their variability are still required to provide accurate and proportional representation of satDNA (peri)centromeric/acrocentric short arm sequences. Approaches combining multiple techniques have been successfully applied and seem to be the path to follow for generating integrated knowledge in the promising field of human satDNA biology.

Download Full-text

Genetic characterization of novel polymorphic microsatellite markers for Epilobium nankotaizanense (Onagraceae), an endemic and threatened herb in Taiwan

Plant Genetic Resources ◽

10.1017/s1479262121000289 ◽

2021 ◽

pp. 1-4

Author(s):

Yu-Wei Tseng ◽

Chi-Chun Huang ◽

Chih-Chiang Wang ◽

Chiuan-Yu Li ◽

Kuo-Hsiang Hung

Keyword(s):

Microsatellite Markers ◽

Natural Populations ◽

Conservation Strategy ◽

Sequencing Data ◽

Germplasm Collections ◽

The Family ◽

Conservation Genetic ◽

Population Sizes ◽

Polymorphic Microsatellite

Abstract Epilobium belongs to the family Onagraceae, which consists of approximately 200 species distributed worldwide, and some species have been used as medicinal plants. Epilobium nankotaizanense is an endemic and endangered herb that grows in the high mountains in Taiwan at an elevation of more than 3300 m. Alpine herbs are severely threatened by climate change, which leads to a reduction in their habitats and population sizes. However, only a few studies have addressed genetic diversity and population genetics. In the present study, we developed a new set of microsatellite markers for E. nankotaizanense using high-throughput genome sequencing data. Twenty polymorphic microsatellite markers were developed and tested on 30 individuals collected from three natural populations. These loci were successfully amplified, and polymorphisms were observed in E. nankotaizanense. The number of alleles per locus (A) ranged from 2.000 to 3.000, and the observed (Ho) and expected (He) heterozygosities ranged from 0.000 to 0.929 and from 0.034 to 0.631, respectively. The developed polymorphic microsatellite markers will be useful in future conservation genetic studies of E. nankotaizanense as well as for developing an effective conservation strategy for this species and facilitating germplasm collections and sustainable utilization of other Epilobium species.

Download Full-text

Bayesian Classification of Microbial Communities Based on 16S rRNA Metagenomic Data

10.1101/340653 ◽

2018 ◽

Cited By ~ 1

Author(s):

Arghavan Bahadorinejad ◽

Ivan Ivanov ◽

Johanna W Lampe ◽

Meredith AJ Hullar ◽

Robert S Chapkin ◽

...

Keyword(s):

16S Rrna ◽

Sample Size ◽

Microbial Communities ◽

State Of The Art ◽

Metagenomic Data ◽

Data Sets ◽

Sequencing Data ◽

Sample Data

AbstractWe propose a Bayesian method for the classification of 16S rRNA metagenomic profiles of bacterial abundance, by introducing a Poisson-Dirichlet-Multinomial hierarchical model for the sequencing data, constructing a prior distribution from sample data, calculating the posterior distribution in closed form; and deriving an Optimal Bayesian Classifier (OBC). The proposed algorithm is compared to state-of-the-art classification methods for 16S rRNA metagenomic data, including Random Forests and the phylogeny-based Metaphyl algorithm, for varying sample size, classification difficulty, and dimensionality (number of OTUs), using both synthetic and real metagenomic data sets. The results demonstrate that the proposed OBC method, with either noninformative or constructed priors, is competitive or superior to the other methods. In particular, in the case where the ratio of sample size to dimensionality is small, it was observed that the proposed method can vastly outperform the others.Author summaryRecent studies have highlighted the interplay between host genetics, gut microbes, and colorectal tumor initiation/progression. The characterization of microbial communities using metagenomic profiling has therefore received renewed interest. In this paper, we propose a method for classification, i.e., prediction of different outcomes, based on 16S rRNA metagenomic data. The proposed method employs a Bayesian approach, which is suitable for data sets with small ration of number of available instances to the dimensionality. Results using both synthetic and real metagenomic data show that the proposed method can outperform other state-of-the-art metagenomic classification algorithms.

Download Full-text

Divergent and convergent evolution of housekeeping genes in human–pig lineage

PeerJ ◽

10.7717/peerj.4840 ◽

2018 ◽

Vol 6 ◽

pp. e4840 ◽

Cited By ~ 4

Author(s):

Kai Wei ◽

Tingting Zhang ◽

Lei Ma

Keyword(s):

Active Sites ◽

Evolutionary Dynamics ◽

Purifying Selection ◽

Housekeeping Genes ◽

Neutral Evolution ◽

Structure Evolution ◽

Tissue Cell ◽

Sequencing Data ◽

Cellular Functions ◽

Species Specific

Housekeeping genes are ubiquitously expressed and maintain basic cellular functions across tissue/cell type conditions. The present study aimed to develop a set of pig housekeeping genes and compare the structure, evolution and function of housekeeping genes in the human–pig lineage. By using RNA sequencing data, we identified 3,136 pig housekeeping genes. Compared with human housekeeping genes, we found that pig housekeeping genes were longer and subjected to slightly weaker purifying selection pressure and faster neutral evolution. Common housekeeping genes, shared by the two species, achieve stronger purifying selection than species-specific genes. However, pig- and human-specific housekeeping genes have similar functions. Some species-specific housekeeping genes have evolved independently to form similar protein active sites or structure, such as the classical catalytic serine–histidine–aspartate triad, implying that they have converged for maintaining the basic cellular function, which allows them to adapt to the environment. Human and pig housekeeping genes have varied structures and gene lists, but they have converged to maintain basic cellular functions essential for the existence of a cell, regardless of its specific role in the species. The results of our study shed light on the evolutionary dynamics of housekeeping genes.

Download Full-text