Diverse origins of high copy tandem repeats in grass genomes

In studying genomic architecture, highly repetitive regions have historically posed a challenge when investigating sequence variation and content. High-throughput sequencing has enabled researchers to use whole-genome shotgun sequencing to estimate the abundance of repetitive sequence, and these methodologies have been recently applied to centromeres. Here, we utilize sequence assembly and read mapping to identify and quantify the genomic abundance of different tandem repeat sequences. Previous research has posited that the highest abundance tandem repeat in eukaryotic genomes is often the centromeric repeat, and we pair our bioinformatic pipeline with fluorescent in-situ hybridization data to test this hypothesis. We find that de novo assembly and bioinformatic filters can successfully identify repeats with homology to known tandem repeats. Fluorescent in-situ hybridization, however, shows that de novo assembly fails to identify novel centromeric repeats, instead identifying other potentially important repetitive sequences. Together, our results test the applicability and limitations of using de novo repeat assembly of tandem repeats to identify novel centromeric repeats. Building on our findings of genomic composition, we also set forth a method for exploring the repetitive regions of non-model genomes whose diversity limits the applicability of established genetic resources.

Download Full-text

Diverse origins of high copy tandem repeats in grass genomes

10.7287/peerj.preprints.2314v1 ◽

2016 ◽

Author(s):

Paul Bilinski ◽

Yonghua Han ◽

Matthew B Hufford ◽

Anne Lorant ◽

Pingdong Zhang ◽

...

Keyword(s):

Tandem Repeat ◽

Tandem Repeats ◽

High Throughput Sequencing ◽

De Novo ◽

Repetitive Sequences ◽

Read Mapping ◽

Hybridization Data ◽

Centromeric Repeat ◽

Tandem Repeat Sequences

Download Full-text

The pericentromeric heterochromatin of the grass Zingeria biebersteiniana (2n = 4) is composed of Zbcen1-type tandem repeats that are intermingled with accumulated dispersedly organized sequences

Genome ◽

10.1139/g01-092 ◽

2001 ◽

Vol 44 (6) ◽

pp. 955-961 ◽

Cited By ~ 19

Author(s):

Verity A Saunders ◽

Andreas Houben

Keyword(s):

In Situ Hybridization ◽

Tandem Repeat ◽

Tandem Repeats ◽

Repetitive Sequences ◽

Pericentromeric Region ◽

Pericentromeric Heterochromatin ◽

Repeat Family ◽

Dna Reassociation ◽

Copy Dna

DNA reassociation and hydroxyapatite chromatography were used to isolate high-copy DNA of the grass Zingeria biebersteiniana (2n = 4). In situ hybridization demonstrated that the DNA isolated was enriched for pericentromere-specific repetitive sequences. One abundant pericentromere-specific component is the differentially methylated tandem-repeat family Zbcen1. Other sequences isolated, Zb46 and Zb47A, are dispersed and display similarity to parts of the gypsy- and copia-like retrotransposable elements of other grasses. In situ hybridization with the copia-like sequence Zb47A resulted in dispersed labelling along the chromosome arms, with a significant signal accumulation in the pericentromeric region of all chromosomes. It is concluded that the pericentromeric heterochromatin of Z. biebersteiniana is composed of members of the Zbcen1 tandem repeat family and that these tandem arrays are intermingled with accumulated putative copia-like retrotransposon sequences. An observed Rabl interphase orientation suggests that the length of the chromosomes rather than the genome size is the determining factor of the Rabl phenomenon.Key Words: centromere, heterochromatin, tandemly repeated DNA, retrotransposon-like, DNA reassociation.

Download Full-text

HP1 drives de novo 3D genome reorganization in early Drosophila embryos

Nature ◽

10.1038/s41586-021-03460-z ◽

2021 ◽

Author(s):

Fides Zenk ◽

Yinxiu Zhan ◽

Pavel Kos ◽

Eva Löser ◽

Nazerke Atinbayeva ◽

...

Keyword(s):

Genome Organization ◽

Molecular Mechanisms ◽

High Throughput Sequencing ◽

De Novo ◽

Early Embryo ◽

Heterochromatin Protein ◽

Chromosome Conformation ◽

3D Genome ◽

Genome Reorganization

AbstractFundamental features of 3D genome organization are established de novo in the early embryo, including clustering of pericentromeric regions, the folding of chromosome arms and the segregation of chromosomes into active (A-) and inactive (B-) compartments. However, the molecular mechanisms that drive de novo organization remain unknown1,2. Here, by combining chromosome conformation capture (Hi-C), chromatin immunoprecipitation with high-throughput sequencing (ChIP–seq), 3D DNA fluorescence in situ hybridization (3D DNA FISH) and polymer simulations, we show that heterochromatin protein 1a (HP1a) is essential for de novo 3D genome organization during Drosophila early development. The binding of HP1a at pericentromeric heterochromatin is required to establish clustering of pericentromeric regions. Moreover, HP1a binding within chromosome arms is responsible for overall chromosome folding and has an important role in the formation of B-compartment regions. However, depletion of HP1a does not affect the A-compartment, which suggests that a different molecular mechanism segregates active chromosome regions. Our work identifies HP1a as an epigenetic regulator that is involved in establishing the global structure of the genome in the early embryo.

Download Full-text

Karyotypes and Distribution of Tandem Repeat Sequences in Brassica nigra Determined by Fluorescence in situ Hybridization

Cytogenetic and Genome Research ◽

10.1159/000479179 ◽

2017 ◽

Vol 152 (3) ◽

pp. 158-165 ◽

Cited By ~ 6

Author(s):

Gui-xiang Wang ◽

Qun-yan He ◽

Jiri Macas ◽

Petr Novák ◽

Pavel Neumann ◽

...

Keyword(s):

In Situ Hybridization ◽

Fluorescence In Situ Hybridization ◽

Fluorescence Intensity ◽

Tandem Repeat ◽

Brassica Nigra ◽

Metaphase Chromosomes ◽

Repeat Sequences ◽

Tandem Repeat Sequences ◽

Black Mustard

Whole-genome shotgun reads were analyzed to determine the repeat sequence composition in the genome of black mustard, Brassica nigra (L.) Koch. The analysis showed that satellite DNA sequences are very abundant in the black mustard genome. The distribution pattern of 7 new tandem repeats (BnSAT13, BnSAT28, BnSAT68, BnSAT76, BnSAT114, BnSAT180, and BnSAT200) on black mustard chromosomes was visualized using fluorescence in situ hybridization (FISH). The FISH signals of BnSAT13 and BnSAT76 provided useful cytogenetic markers; their position and fluorescence intensity allowed for unambiguous identification of all 8 somatic metaphase chromosomes. A karyotype showing the location and fluorescence intensity of these tandem repeat sequences together with the position of rDNAs and centromeric retrotransposons of Brassica (CRB) was constructed. The establishment of the FISH-based karyotype in B. nigra provides valuable information that can be used in detailed analyses of B. nigra accessions and derived allopolyploid Brassica species containing the B genome.

Download Full-text

De novo identification of satellite DNAs in the sequenced genomes of Drosophila virilis and D. americana using the RepeatExplorer and TAREAN pipelines

10.1101/781146 ◽

2019 ◽

Author(s):

Bráulio S.M.L. Silva ◽

Pedro Heringer ◽

Guilherme B. Dias ◽

Marta Svartman ◽

Gustavo C.S. Kuhn

Keyword(s):

Transposable Elements ◽

Tandem Repeat ◽

Tandem Repeats ◽

De Novo ◽

Chromosome Mapping ◽

Drosophila Virilis ◽

Satellite Dnas ◽

Bioinformatic Tools ◽

A Genome ◽

Genome Assemblies

AbstractSatellite DNAs are among the most abundant repetitive DNAs found in eukaryote genomes, where they participate in a variety of biological roles, from being components of important chromosome structures to gene regulation. Experimental methodologies used before the genomic era were not sufficient despite being too laborious and time-consuming to recover the collection of all satDNAs from a genome. Today, the availability of whole sequenced genomes combined with the development of specific bioinformatic tools are expected to foster the identification of virtually all of the “satellitome” from a particular species. While whole genome assemblies are important to obtain a global view of genome organization, most assemblies are incomplete and lack repetitive regions. Here, we applied short-read sequencing and similarity clustering in order to perform a de novo identification of the most abundant satellite families in two Drosophila species from the virilis group: Drosophila virilis and D. americana. These species were chosen because they have been used as a model to understand satDNA biology since early 70’s. We combined computational tandem repeat detection via similarity-based read clustering (implemented in Tandem Repeat Analyzer pipeline – “TAREAN”) with data from the literature and chromosome mapping to obtain an overview of satDNAs in D. virilis and D. americana. The fact that all of the abundant tandem repeats we detected were previously identified in the literature allowed us to evaluate the efficiency of TAREAN in correctly identifying true satDNAs. Our results indicate that raw sequencing reads can be efficiently used to detect satDNAs, but that abundant tandem repeats present in dispersed arrays or associated with transposable elements are frequent false positives. We demonstrate that TAREAN with its parent method RepeatExplorer, may be used as resources to detect tandem repeats associated with transposable elements and also to reveal families of dispersed tandem repeats.

Download Full-text

Robust detection of tandem repeat expansions from long DNA reads

10.1101/356931 ◽

2018 ◽

Cited By ~ 1

Author(s):

Satomi Mitsuhashi ◽

Martin C Frith ◽

Takeshi Mizuguchi ◽

Satoko Miyatake ◽

Tomoko Toyota ◽

...

Keyword(s):

Tandem Repeat ◽

Tandem Repeats ◽

Genetic Diseases ◽

Error Rates ◽

Robust Detection ◽

Sequencing Errors ◽

Tandem Repeat Sequences ◽

Long Read ◽

Repeat Expansions ◽

The Many

AbstractTandemly repeated sequences are highly mutable and variable features of genomes. Tandem repeat expansions are responsible for a growing list of human diseases, even though it is hard to determine tandem repeat sequences with current DNA sequencing technology. Recent long-read technologies are promising, because the DNA reads are often longer than the repetitive regions, but are hampered by high error rates. Here, we report robust detection of human repeat expansions from careful alignments of long (PacBio and nanopore) reads to a reference genome. Our method (tandem-genotypes) is robust to systematic sequencing errors, inexact repeats with fuzzy boundaries, and low sequencing coverage. By comparing to healthy controls, we can prioritize pathological expansions within the top 10 out of 700000 tandem repeats in the genome. This may help to elucidate the many genetic diseases whose causes remain unknown.

Download Full-text

The Beginning of the End: A Chromosomal Assembly of the New World Malaria Mosquito Ends with a Novel Telomere

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401654 ◽

2020 ◽

Vol 10 (10) ◽

pp. 3811-3819 ◽

Cited By ~ 3

Author(s):

Austin Compton ◽

Jiangtao Liang ◽

Chujia Chen ◽

Varvara Lukyanchikova ◽

Yumin Qi ◽

...

Keyword(s):

New World ◽

Tandem Repeats ◽

De Novo ◽

Mosquito Species ◽

Repeat Unit ◽

Repetitive Sequences ◽

Malaria Mosquito ◽

A Genome ◽

Significant Step ◽

Reference Quality

Chromosome level assemblies are accumulating in various taxonomic groups including mosquitoes. However, even in the few reference-quality mosquito assemblies, a significant portion of the heterochromatic regions including telomeres remain unresolved. Here we produce a de novo assembly of the New World malaria mosquito, Anopheles albimanus by integrating Oxford Nanopore sequencing, Illumina, Hi-C and optical mapping. This 172.6 Mbps female assembly, which we call AalbS3, is obtained by scaffolding polished large contigs (contig N50 = 13.7 Mbps) into three chromosomes. All chromosome arms end with telomeric repeats, which is the first in mosquito assemblies and represents a significant step toward the completion of a genome assembly. These telomeres consist of tandem repeats of a novel 30-32 bp Telomeric Repeat Unit (TRU) and are confirmed by analyzing the termini of long reads and through both chromosomal in situ hybridization and a Bal31 sensitivity assay. The AalbS3 assembly included previously uncharacterized centromeric and rDNA clusters and more than doubled the content of transposable elements and other repetitive sequences. This telomere-to-telomere assembly, although still containing gaps, represents a significant step toward resolving biologically important but previously hidden genomic components. The comparison of different scaffolding methods will also inform future efforts to obtain reference-quality genomes for other mosquito species.

Download Full-text

Induction of Recombinant Lectin Expression by an Artificially Constructed Tandem Repeat Structure: A Case Study Using Bryopsis plumosa Mannose-Binding Lectin

Biomolecules ◽

10.3390/biom8040146 ◽

2018 ◽

Vol 8 (4) ◽

pp. 146 ◽

Cited By ~ 3

Author(s):

Hyun-Ju Hwang ◽

Jin-Woo Han ◽

Hancheol Jeon ◽

Jong Han

Keyword(s):

Tandem Repeat ◽

Large Scale ◽

Tandem Repeats ◽

Expression System ◽

Mannose Binding Lectin ◽

Repeat Structure ◽

Tandem Repeat Sequences ◽

Mannose Binding ◽

Bryopsis Plumosa ◽

Binding Lectin

Lectin is an important protein in medical and pharmacological applications. Impurities in lectin derived from natural sources and the generation of inactive proteins by recombinant technology are major obstacles for the use of lectins. Expressing recombinant lectin with a tandem repeat structure can potentially overcome these problems, but few studies have systematically examined this possibility. This was investigated in the present study using three distinct forms of recombinant mannose-binding lectin from Bryopsis plumosa (BPL2)—i.e., the monomer (rD1BPL2), as well as the dimer (rD2BPL2), and tetramer (rD4BPL2) arranged as tandem repeats. The concentration of the inducer molecule isopropyl β-D-1-thiogalactopyranoside and the induction time had no effect on the efficiency of the expression of each construct. Of the tested constructs, only rD4BPL2 showed hemagglutination activity towards horse erythrocytes; the activity of towards the former was 64 times higher than that of native BPL2. Recombinant and native BPL2 showed differences in carbohydrate specificity; the activity of rD4BPL2 was inhibited by the glycoprotein fetuin, whereas that of native BPL2 was also inhibited by d-mannose. Our results indicate that expression as tandem repeat sequences can increase the efficiency of lectin production on a large scale using a bacterial expression system.

Download Full-text

Genome-wide profiling of heritable and de novo STR variations

10.1101/077727 ◽

2016 ◽

Cited By ~ 7

Author(s):

Thomas Willems ◽

Dina Zielinski ◽

Assaf Gordon ◽

Melissa Gymrek ◽

Yaniv Erlich

Keyword(s):

Tandem Repeats ◽

High Throughput Sequencing ◽

De Novo ◽

Genetic Diseases ◽

Whole Genome Sequencing Data ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Genome Wide ◽

A Genome ◽

Short Tandem

AbstractShort tandem repeats (STRs) are highly variable elements that play a pivotal role in multiple genetic diseases, population genetics applications, and forensic casework. However, STRs have proven problematic to genotype from high-throughput sequencing data. Here, we describe HipSTR, a novel haplotype-based method for robustly genotyping, haplotyping, and phasing STRs from whole genome sequencing data and report a genome-wide analysis and validation of de novo STR mutations.

Download Full-text

GtTR: Bayesian estimation of absolute tandem repeat copy number using sequence capture and high throughput sequencing

10.1101/246108 ◽

2018 ◽

Cited By ~ 1

Author(s):

Devika Ganesamoorthy ◽

Minh Duc Cao ◽

Tania Duarte ◽

Wenhan Chen ◽

Lachlan Coin

Keyword(s):

High Throughput ◽

Tandem Repeat ◽

Copy Number ◽

Tandem Repeats ◽

High Throughput Sequencing ◽

Sequence Data ◽

Complex Diseases ◽

Sequencing Analysis ◽

Reference Dataset ◽

Long Read

ABSTRACTBackgroundTandem repeats comprise significant proportion of the human genome including coding and regulatory regions. They are highly prone to repeat number variation and nucleotide mutation due to their repetitive and unstable nature, making them a major source of genomic variation between individuals. Despite recent advances in high throughput sequencing, analysis of tandem repeats in the context of complex diseases is still hindered by technical limitations.MethodsWe report a novel targeted sequencing approach, which allows simultaneous analysis of hundreds of repeats. We developed a Bayesian algorithm, namely – GtTR - which combines information from a reference long-read dataset with a short read counting approach to genotype tandem repeats at population scale. PCR sizing analysis was used for validation.ResultsWe used a PacBio long-read sequenced sample to generate a reference tandem repeat genotype dataset with on average 13% absolute deviation from PCR sizing results. Using this reference dataset GtTR generated estimates of VNTR copy number with accuracy within 95% high posterior density (HPD) intervals of 68% and 83% for capture sequence data and 200X WGS data respectively, improving to 87% and 94% with use of a PCR reference. We show that the genotype resolution increases as a function of depth, such that the median 95% HPD interval lies within 25%, 14%, 12% and 8% of the its midpoint copy number value for 30X, 200X WGS, 395X and 800X capture sequence data respectively. We validated nine targets by PCR sizing analysis and genotype estimates from sequencing results correlated well with PCR results.ConclusionsThe novel genotyping approach described here presents a new cost-effective method to explore previously unrecognized class of repeat variation in GWAS studies of complex diseases at the population level. Further improvements in accuracy can be obtained by improving accuracy of the reference dataset.

Download Full-text