scholarly journals In-solution Y-chromosome capture-enrichment on ancient DNA libraries

2017 ◽  
Author(s):  
Diana I Cruz-Dávalos ◽  
María A Nieves-Colón ◽  
Alexandra Sockell ◽  
G David Poznik ◽  
Hannes Schroeder ◽  
...  

AbstractBackgroundAs most ancient biological samples have low levels of endogenous DNA, it is advantageous to enrich for specific genomic regions prior to sequencing. One approach – in-solution capture-enrichment – retrieves sequences of interest and reduces the fraction of microbial DNA. In this work, we implement a capture-enrichment approach targeting informative regions of the Y chromosome in six human archaeological remains excavated in the Caribbean and dated between 200 and 3,000 years BP. We compare the recovery rate of Y-chromosome capture (YCC) alone, whole-genome capture followed by YCC (WGC+Y) versus non-enriched (pre-capture) libraries.ResultsWe recovered 17–4,152 times more targeted unique Y-chromosome sequences after capture, where 0.01-6.2% (WGC+Y) and 0.01-23.5% (YCC) of the sequence reads were on-target, compared to 0.0002-0.004% pre-capture. In samples with endogenous DNA content greater than 0.1%, we found that WGC followed by YCC (WGC+Y) yields lower enrichment due to the loss of complexity in consecutive capture experiments, whereas in samples with lower endogenous content, WGC+Y yielded greater enrichment than YCC alone. Finally, increasing recovery of informative sites enabled us to assign Y-chromosome haplogroups to some of the archeological remains and gain insights about their paternal lineages and origins.ConclusionsWe present to our knowledge the first in-solution capture-enrichment method targeting the human Y-chromosome in aDNA sequencing libraries. YCC and WGC+Y enrichments lead to an increase in the amount of Y-DNA sequences, as compared to libraries not enriched for the Y-chromosome. Our probe design effectively recovers regions of the Y-chromosome bearing phylogenetically informative sites, allowing us to identify paternal lineages with less sequencing than needed for pre-capture libraries. Finally we recommend considering the endogenous content in the experimental design and avoiding consecutive rounds of capture for low-complexity libraries, as clonality increases considerably with each round.

This paper reviews past and present trends in mapping the human Y chromosome. So far, mapping has essentially used a combination of cytogenetic and molecular analyses of Y-chromosomal anomalies and sex reversal syndromes. This deletion mapping culminated recently in the isolation of the putative sex-determining locus TDF . With the availability of new separation and cloning techniques suited for large size fragments (over 100 kilobases), the next step will consist rather in the establishment of a physical map of fragments of known physical sizes. This may allow the definition of several variants of the human Y chromosome differing by the order or location of DNA sequences along the molecule.


Genomics ◽  
1989 ◽  
Vol 5 (1) ◽  
pp. 153-156 ◽  
Author(s):  
Ulrich Müller ◽  
Marc Lalande ◽  
Timothy A. Donlon ◽  
Michael W. Heartlein

Nature ◽  
1983 ◽  
Vol 303 (5920) ◽  
pp. 831-832 ◽  
Author(s):  
C. E. Bishop ◽  
G. Guellaen ◽  
D. Geldwerth ◽  
R. Voss ◽  
M. Fellous ◽  
...  

Development ◽  
1987 ◽  
Vol 101 (Supplement) ◽  
pp. 77-92
Author(s):  
Kirby D. Smith ◽  
Keith E. Young ◽  
C. Conover Talbot ◽  
Barbara J. Schmeckpeper

A significant fraction of the human Y chromosome is composed of DNA sequences which have homologues on the X chromosome or autosomes in humans and non-human primates. However, most human Ychromosome sequences so far examined do not have homologues on the Y chromosomes of other primates. This observation suggests that a significant proportion of the human Y chromosome is composed of sequences that have acquired their Y-chromosome association since humans diverged from other primates. More than 50 % of the human Y chromosome is composed of a variety of repeated DNAs which, with one known exception, can be distinguished from homologues elsewhere in the genome. These include the alphoid repeats, the major human SINE (Alu repeats) and several additional families of repeats which account for the majority of Y-chromosome repeated DNA. The alphoid sequences tandemly clustered near the centromere on the Y chromosome can be distinguished from those on other chromosomes by both sequence and repeat organization, while the majority of Y-chromosome Alu repeats have little homology with genomic consensus Alu sequences. In contrast, the Y-chromosome LINE repeats cannot be distinguished from LINEs found on other chromosomes. It has been proposed that both SINE and LINE repeats have been dispersed throughout the genome by mechanisms that involve RNA intermediates. The difference in the relationship of the Y-chromosome Alu and LINE repeats to their respective family members elsewhere in the genome makes it possible that their dispersal to the Y chromosome has occurred by different mechanisms or at different rates. In addition to the SINE and LINE repeats, the human Y chromosome contains a group of repeated DNA elements originally identified as 3·4 and 2·1 kb fragments in HaeIII digests of male genomic DNA. Although the 3·4 and 2·1 kb Y repeats do not crossreact, both exist as tandem clusters of alternating Yspecific and non-Y-specific sequences. The 3·4 kb Y repeats contain at least three distinct sequences with autosomal homologies interspersed in various ways with a collection of several different Yspecific repeat sequences. Individual recombinant clones derived from isolated 3·4 kb HaeIII Y fragments have been identified which do not cross-react. Thus, the 3·4 kb HaeIII Y fragments are a heterogeneous mixture of sequences which have in common the regular occurrence of HaeIII restriction sites at 3·4 kb intervals and an organization as tandem clusters at various sites along the Y-long arm. The 2·1 kb HaeIII Y fragment cross-reacts with a 1i9 kb HaeIII autosomal fragment. Both the Ychromosomal and autosomal fragments are part of tandem clusters which have a unit length of 2·4 kb. All of the 2·4 kb Y repeats are similar and contain a 1·6 kb Y-specific repeat and an 800 bp sequence which has homology with an 800 bp sequence in the autosomal 2·4 kb repeats. While this 800 bp sequence is common to both Y and autosomal 2·4 kb repeats and is associated with a single Y-specific repeat, it is associated with at least four non-cross-reacting autosome-specific sequences. Like the Y repeat, the autosomal repeats exist as tandem clusters of 2·4 kb units and are composed of an 800 bp common sequence alternating with a 1·6 kb autosome-specific sequence. Thus, in humans, the common sequence is associated with several different sequences yet always occurs as part of a tandem cluster of 2·4 kb repeats. The common and autosome-specific sequences of the 2·4 kb repeats are also present in gorillas as part of organized repeat units. However, in gorillas the two are not associated with each other. The Y-chromosome repeats described here are a heterogeneous mixture of sequences organized into specific sets of alternating Y-specific and non-Y-specific sequences. They do not have an identified function and the mechanisms by which they are generated are unknown. Nevertheless, their marked chromosomal speciticity and the regularity of the basic repeat unit in each type of repeat seem inconsistent with stochastic mechanisms of sequence diffusion between chromosomes.


2006 ◽  
Vol 04 (02) ◽  
pp. 523-536 ◽  
Author(s):  
YURIY L. ORLOV ◽  
RENE TE BOEKHORST ◽  
IRINA I. ABNIZOVA

Identifying regions of DNA with extreme statistical characteristics is an important aspect of the structural analysis of complete genomes. Linguistic methods, mainly based on estimating word frequency, can be used for this as they allow for the delineation of regions of low complexity. Low complexity may be due to biased nucleotide composition, by tandem- or dispersed repeats, by palindrome-hairpin structures, as well as by a combination of all these features. We developed software tools in which various numerical measures of text complexity are implemented, including combinatorial and linguistic ones. We also added Hurst exponent estimate to the software to measure dependencies in DNA sequences. By applying these tools to various functional genomic regions, we demonstrate that the complexity of introns and regulatory regions is lower than that of coding regions, whilst Hurst exponent is larger. Further analysis of promoter sequences revealed that the lower complexity of these regions is associated with long-range correlations caused by transcription factor binding sites.


2004 ◽  
Vol 24 (2) ◽  
pp. 308-312 ◽  
Author(s):  
Fadi J. Charchar ◽  
Maciej Tomaszewski ◽  
Beata Lacka ◽  
Jaroslaw Zakrzewski ◽  
Ewa Zukowska-Szczechowska ◽  
...  

2002 ◽  
Vol 70 (5) ◽  
pp. 1197-1214 ◽  
Author(s):  
Fulvio Cruciani ◽  
Piero Santolamazza ◽  
Peidong Shen ◽  
Vincent Macaulay ◽  
Pedro Moral ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document