scholarly journals RelocaTE2: a high resolution transposable element insertion site mapping tool for population resequencing

Author(s):  
Jinfeng Chen ◽  
Travis Wrightsman ◽  
Susan R Wessler ◽  
Jason E. Stajich

Background Transposable element (TE) polymorphisms are important components of population genetic variation. The functional impacts of TEs in gene regulation and generating genetic diversity have been observed in multiple species, but the frequency and magnitude of TE variation is under appreciated. Inexpensive and deep sequencing technology has made it affordable to apply population genetic methods to whole genomes with methods that identify single nucleotide and insertion/deletion polymorphisms. However, identifying TE polymorphisms, particularly transposition events or non-reference insertion sites can be challenging due to the repetitive nature of these sequences, which hamper both the sensitivity and specificity of analysis tools. Methods We have developed the tool RelocaTE2 ( http://github.com/stajichlab/RelocaTE2 ) for identification of TE insertion sites at high sensitivity and specificity. RelocaTE2 searches for known TE sequences in whole genome sequencing reads from second generation sequencing platforms such as Illumina. These sequence reads are used as seeds to pinpoint chromosome locations where TEs have transposed. RelocaTE2 detects target site duplication (TSD) of TE insertions allowing it to report TE polymorphism loci with single base pair precision. Results and Discussion The performance of RelocaTE2 is evaluated using both simulated and real sequence data. RelocaTE2 demonstrate high level of sensitivity and specificity, particularly when the sequence coverage is not shallow. In comparison to other tools tested, RelocaTE2 achieves the best balance between sensitivity and specificity. In particular, RelocaTE2 performs best in prediction of TSDs for TE insertions. Even in highly repetitive regions, such as those tested on rice chromosome 4, RelocaTE2 is able to report up to 95% of simulated TE insertions with less than 0.1% false positive rate using 10-fold genome coverage resequencing data. RelocaTE2 provides a robust solution to identify TE insertion sites and can be incorporated into analysis workflows in support of describing the complete genotype from light coverage genome sequencing.

2016 ◽  
Author(s):  
Jinfeng Chen ◽  
Travis Wrightsman ◽  
Susan R Wessler ◽  
Jason E. Stajich

Background Transposable element (TE) polymorphisms are important components of population genetic variation. The functional impacts of TEs in gene regulation and generating genetic diversity have been observed in multiple species, but the frequency and magnitude of TE variation is under appreciated. Inexpensive and deep sequencing technology has made it affordable to apply population genetic methods to whole genomes with methods that identify single nucleotide and insertion/deletion polymorphisms. However, identifying TE polymorphisms, particularly transposition events or non-reference insertion sites can be challenging due to the repetitive nature of these sequences, which hamper both the sensitivity and specificity of analysis tools. Methods We have developed the tool RelocaTE2 ( http://github.com/stajichlab/RelocaTE2 ) for identification of TE insertion sites at high sensitivity and specificity. RelocaTE2 searches for known TE sequences in whole genome sequencing reads from second generation sequencing platforms such as Illumina. These sequence reads are used as seeds to pinpoint chromosome locations where TEs have transposed. RelocaTE2 detects target site duplication (TSD) of TE insertions allowing it to report TE polymorphism loci with single base pair precision. Results and Discussion The performance of RelocaTE2 is evaluated using both simulated and real sequence data. RelocaTE2 demonstrate high level of sensitivity and specificity, particularly when the sequence coverage is not shallow. In comparison to other tools tested, RelocaTE2 achieves the best balance between sensitivity and specificity. In particular, RelocaTE2 performs best in prediction of TSDs for TE insertions. Even in highly repetitive regions, such as those tested on rice chromosome 4, RelocaTE2 is able to report up to 95% of simulated TE insertions with less than 0.1% false positive rate using 10-fold genome coverage resequencing data. RelocaTE2 provides a robust solution to identify TE insertion sites and can be incorporated into analysis workflows in support of describing the complete genotype from light coverage genome sequencing.


PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e2942 ◽  
Author(s):  
Jinfeng Chen ◽  
Travis R. Wrightsman ◽  
Susan R. Wessler ◽  
Jason E. Stajich

Background Transposable element (TE) polymorphisms are important components of population genetic variation. The functional impacts of TEs in gene regulation and generating genetic diversity have been observed in multiple species, but the frequency and magnitude of TE variation is under appreciated. Inexpensive and deep sequencing technology has made it affordable to apply population genetic methods to whole genomes with methods that identify single nucleotide and insertion/deletion polymorphisms. However, identifying TE polymorphisms, particularly transposition events or non-reference insertion sites can be challenging due to the repetitive nature of these sequences, which hamper both the sensitivity and specificity of analysis tools. Methods We have developed the tool RelocaTE2 for identification of TE insertion sites at high sensitivity and specificity. RelocaTE2 searches for known TE sequences in whole genome sequencing reads from second generation sequencing platforms such as Illumina. These sequence reads are used as seeds to pinpoint chromosome locations where TEs have transposed. RelocaTE2 detects target site duplication (TSD) of TE insertions allowing it to report TE polymorphism loci with single base pair precision. Results and Discussion The performance of RelocaTE2 is evaluated using both simulated and real sequence data. RelocaTE2 demonstrate high level of sensitivity and specificity, particularly when the sequence coverage is not shallow. In comparison to other tools tested, RelocaTE2 achieves the best balance between sensitivity and specificity. In particular, RelocaTE2 performs best in prediction of TSDs for TE insertions. Even in highly repetitive regions, such as those tested on rice chromosome 4, RelocaTE2 is able to report up to 95% of simulated TE insertions with less than 0.1% false positive rate using 10-fold genome coverage resequencing data. RelocaTE2 provides a robust solution to identify TE insertion sites and can be incorporated into analysis workflows in support of describing the complete genotype from light coverage genome sequencing.


Author(s):  
Jinfeng Chen ◽  
Travis Wrightsman ◽  
Susan R Wessler ◽  
Jason E. Stajich

Background Transposable element (TE) polymorphisms are important components of population genetic variation. The functional impacts of TEs in gene regulation and generating genetic diversity have been observed in multiple species, but the frequency and magnitude of TE variation is under appreciated. Inexpensive and deep sequencing technology has made it affordable to apply population genetic methods to whole genomes with methods that identify single nucleotide and insertion/deletion polymorphisms. However, identifying TE transposition events or polymorphisms can be challenging due to the repetitive nature of these sequences, which hamper both the sensitivity and specificity of analysis tools. Methods We have developed the tool RelocaTE2 ( http://github.com/stajichlab/RelocaTE2 ) for identification of TE polymorphisms at high sensitivity and specificity. RelocaTE2 searches for known TE sequences in whole genome sequencing reads from second generation sequencing platforms such as Illumina. These sequence reads are used as seeds to pinpoint chromosome locations where TEs have transposed. RelocaTE2 detects target site duplication (TSD) of TE insertions allowing it to report TE polymorphism loci with single base pair precision. Results and Discussion The performance of RelocaTE2 is evaluated using both simulated and real sequence data. RelocaTE2 demonstrates a higher level of sensitivity and specificity when compared to other tools. Even in highly repetitive regions, such as those tested on rice chromosome 4, RelocaTE2 is able to report up to 95% of simulated TE insertions with less than 0.1% false positive rate using 10-fold genome coverage resequencing data. RelocaTE2 provides a robust solution to identify TE polymorphisms and can be incorporated into analysis workflows in support of describing the complete genotype from light coverage genome sequencing.


2018 ◽  
Author(s):  
Zoltán Maróti ◽  
Zsolt Boldogkői ◽  
Dóra Tombácz ◽  
Michael Snyder ◽  
Tibor Kalmár

ABSTRACTUnderstanding the underlying genetic structure of human populations is of fundamental interest to both biological and social sciences. Advances in high-throughput genotyping technology have markedly improved our understanding of global patterns of human genetic variation. The most widely used methods for collecting variant information at the DNA-level include whole genome sequencing, which continues to remain costly, and the more economical solution of array-based techniques, as these are capable of simultaneously genotyping a pre-selected set of variable DNA sites in the human genome. The largest publicly accessible set of human genomic sequence data available today originates from exome sequencing that comprises around 1.2% of the whole genome (approximately 30 million base pairs). In this study, we compared the application of the exome dataset to the array-based dataset and to the gold standard whole genome dataset using the same population genetic analysis methods. Our results draw attention to some of the inherent problems that arise from using pre-selected SNP sets for population genetic analysis. Additionally, we demonstrate that exome sequencing provides a better alternative to the array-based methods for population genetic analysis. In this study, we propose a strategy for unbiased variant collection from exome data and offer a bioinformatics protocol for proper data processing.


2002 ◽  
Vol 46 (8) ◽  
pp. 2337-2343 ◽  
Author(s):  
Julien Haroche ◽  
Jeanine Allignet ◽  
Névine El Solh

ABSTRACT We characterized a new transposon, Tn5406 (5,467 bp), in a clinical isolate of Staphylococcus aureus (BM3327). It carries a variant of vgaA, which encodes a putative ABC protein conferring resistance to streptogramin A but not to mixtures of streptogramins A and B. It also carries three putative genes, the products of which exhibit significant similarities (61 to 73% amino acid identity) to the three transposases of the staphylococcal transposon Tn554. Like Tn554, Tn5406 failed to generate target repeats. In BM3327, the single copy of Tn5406 was inserted into the chromosomal att554 site, which is the preferential insertion site of Tn554. In three other independent S. aureus clinical isolates, Tn5406 was either present as a single plasmid copy (BM3318), as two chromosomal copies (BM3252), or both in the chromosome and on a plasmid (BM3385). The Tn5406-carrying plasmids also contain two other genes, vgaB and vatB. The insertion sites of Tn5406 in BM3252 were studied: one copy was in att554, and one copy was in the additional SCCmec element. Amplification experiments revealed circular forms of Tn5406, indicating that this transposon might be active. To our knowledge, a transposon conferring resistance to streptogramin A and related compounds has not been previously described.


1988 ◽  
Vol 8 (2) ◽  
pp. 737-746
Author(s):  
D Eide ◽  
P Anderson

The transposable element Tc1 is responsible for most spontaneous mutations that occur in Caenorhabditis elegans variety Bergerac. We investigated the genetic and molecular properties of Tc1 transposition and excision. We show that Tc1 insertion into the unc-54 myosin heavy-chain gene was strongly site specific. The DNA sequences of independent Tc1 insertion sites were similar to each other, and we present a consensus sequence for Tc1 insertion that describes these similarities. We show that Tc1 excision was usually imprecise. Tc1 excision was imprecise in both germ line and somatic cells. Imprecise excision generated novel unc-54 alleles that had amino acid substitutions, amino acid insertions, and, in certain cases, probably altered mRNA splicing. The DNA sequences remaining after Tc1 somatic excision were the same as those remaining after germ line excision, but the frequency of somatic excision was at least 1,000-fold higher than that of germ line excision. The genetic properties of Tc1 excision, combined with the DNA sequences of the resulting unc-54 alleles, demonstrated that excision was dependent on Tc1 transposition functions in both germ line and somatic cells. Somatic excision was not regulated in the same strain-specific manner as germ-line excision was. In a genetic background where Tc1 transposition and excision in the germ line was not detectable, Tc1 excision in the soma still occurred at high frequency.


2017 ◽  
Vol 31 (6) ◽  
pp. 781 ◽  
Author(s):  
Savel R. Daniels ◽  
Megan Dreyer ◽  
Prashant P. Sharma

During the present study, we examined the phylogeography and systematics of two species of velvet worm (Peripatopsis Pocock, 1894) in the forested region of the southern Cape of South Africa. A total of 89 P. moseleyi (Wood-Mason, 1879) and 65 P. sedgwicki (Purcell, 1899) specimens were collected and sequenced for the cytochrome c oxidase subunit I mtDNA (COI). In addition, a single P. sedgwicki specimen per sample locality was sequenced for the 18S rRNA locus. Furthermore, morphological variation among P. sedgwicki sample localities were explored using traditional alpha taxonomic characters. DNA sequence data were subjected to phylogenetic analyses using Bayesian inference and population genetic analyses using haplotype networks and analyses of molecular variance (AMOVAs). Phylogenetic results revealed the presence of four and three clades within P. moseleyi and P. sedgwicki respectively. Haplotype networks were characterised by the absence of shared haplotypes between clades, suggesting genetic isolation, a result corroborated by the AMOVA and highly significant FST values. Specimens from Fort Fordyce Nature Reserve were both genetically and morphologically distinct from the two remaining P. sedgwicki clades. The latter result suggests the presence of a novel lineage nested within P. sedgwicki and suggests that species boundaries within this taxon require re-examination.


1991 ◽  
Vol 57 (1) ◽  
pp. 83-91 ◽  
Author(s):  
Norman Kaplan ◽  
Richard R. Hudson ◽  
Masaru Iizuka

SummaryA population genetic model with a single locus at which balancing selection acts and many linked loci at which neutral mutations can occur is analysed using the coalescent approach. The model incorporates geographic subdivision with migration, as well as mutation, recombination, and genetic drift of neutral variation. It is found that geographic subdivision can affect genetic variation even with high rates of migration, providing that selection is strong enough to maintain different allele frequencies at the selected locus. Published sequence data from the alcohol dehydrogenase locus of Drosophila melanogaster are found to fit the proposed model slightly better than a similar model without subdivision.


Sign in / Sign up

Export Citation Format

Share Document