scholarly journals The wide distribution and horizontal transfers of beta satellite DNA in eukaryotes

2019 ◽  
Author(s):  
Jiawen Yang ◽  
Bin Yuan ◽  
Yu Wu ◽  
Meiyu Li ◽  
Jian Li ◽  
...  

AbstractBeta satellite DNA (satDNA), also known as Sau3A sequences, are repeated DNA sequences reported in human and primate genomes. It is previously thought that beta satDNAs originated in old world monkeys and bursted in great apes. In this study, we searched 7,821 genome assemblies of 3,767 eukaryotic species and found that beta satDNAs are widely distributed across eukaryotes. The four major branches of eukaryotes, animals, fungi, plants and Harosa/SAR, all have multiple clades containing beta satDNAs. These results were also confirmed by searching whole genome sequencing data (SRA) and PCR assay. Beta satDNA sequences were found in all the primate clades, as well as in Dermoptera and Scandentia, indicating that the beta satDNAs in primates might originate in the common ancestor of Primatomorpha or Euarchonta. In contrast, the widely patchy distribution of beta satDNAs across eukaryotes presents a typical scenario of multiple horizontal transfers.One-sentence summaryBeta satDNAs in Opimoda could be result of HT from Diaphoretickes and those in primates might have originated in common ancestor of Primatomorpha.

2019 ◽  
Author(s):  
Yabin Guo ◽  
Jiawen Yang ◽  
Bin Yuan ◽  
Yu Wu ◽  
Meiyu Li ◽  
...  

Abstract Beta satellite DNA (satDNA) sequences, also known as Sau3A sequences, are repeated DNA elements reported in human and primate genomes. Beta satDNAs may play roles in genome stability and chromosome segregation during mitosis. It is previously thought that beta satDNAs originated in old world monkeys and bursted in great apes. However, global and high-throughput studies on beta satDNAs are still absent.Results: In this study, we searched 7,821 genome assemblies of 3,767 eukaryotic species and found that beta satDNAs actually are widely distributed across eukaryotes. The four major branches of eukaryotes, animals, fungi, plants and Harosa/SAR, all have multiple clades containing beta satDNAs. These results were also confirmed by searching whole genome sequencing data (SRA) and PCR assay. Beta satDNA might have originated during the early evolution of eukaryotes. The widely patchy distribution of beta satDNAs across eukaryotes presents a typical scenario of multiple horizontal transfers (HT). In contrast, beta satDNA sequences were found in all the primate clades, Primatomorpha and Euarchonta, indicating an origin in the common ancestor and vertical transfers thereafter. Besides in eukaryotes, beta satDNAs were even found in some archaea and bacteria, which should have been acquired from eukaryotes via HTs.Conclusion:Beta satDNAs widely exist in eukaryotes. The current distribution landscape of beta satDNA is the result of countless HTs. Our study shows for the first time that satellite DNAs can also undergo HT, and will provide new ideas for the future investigations in HT/HGT field.Keywords: Beta satellite DNA, Sau3A sequences, Eukaryotes, Horizontal gene transfer, Primates


Genomics ◽  
2020 ◽  
Vol 112 (6) ◽  
pp. 5295-5304
Author(s):  
Jiawen Yang ◽  
Bin Yuan ◽  
Yu Wu ◽  
Meiyu Li ◽  
Jian Li ◽  
...  

GigaScience ◽  
2020 ◽  
Vol 9 (6) ◽  
Author(s):  
Lisa K Johnson ◽  
Ruta Sahasrabudhe ◽  
James Anthony Gill ◽  
Jennifer L Roach ◽  
Lutz Froenicke ◽  
...  

Abstract Background Whole-genome sequencing data from wild-caught individuals of closely related North American killifish species (Fundulus xenicus, Fundulus catenatus, Fundulus nottii, and Fundulus olivaceus) were obtained using long-read Oxford Nanopore Technology (ONT) PromethION and short-read Illumina platforms. Findings Draft de novo reference genome assemblies were generated using a combination of long and short sequencing reads. For each species, the PromethION platform was used to generate 30–45× sequence coverage, and the Illumina platform was used to generate 50–160× sequence coverage. Illumina-only assemblies were fragmented with high numbers of contigs, while ONT-only assemblies were error prone with low BUSCO scores. The highest N50 values, ranging from 0.4 to 2.7 Mb, were from assemblies generated using a combination of short- and long-read data. BUSCO scores were consistently >90% complete using the Eukaryota database. Conclusions High-quality genomes can be obtained from a combination of using short-read Illumina data to polish assemblies generated with long-read ONT data. Draft assemblies and raw sequencing data are available for public use. We encourage use and reuse of these data for assembly benchmarking and other analyses.


2020 ◽  
Author(s):  
Hannes P. Eggertsson ◽  
Bjarni V. Halldorsson

AbstractMotivationData analysis is requisite on reliable data. In genetics this includes verifying that the sample is not contaminated with another, a problem ubiquitous in biology.ResultsIn human, and other diploid species, DNA contamination from the same species can be found by the presence of three haplotypes between polymorphic SNPs. read_haps is a tool that detects sample contamination from short read whole genome sequencing data.Availabilitygithub.com/DecodeGenetics/[email protected]


2020 ◽  
Author(s):  
Eric S. Tvedte ◽  
Mark Gasser ◽  
Benjamin C. Sparklin ◽  
Jane Michalski ◽  
Xuechu Zhao ◽  
...  

ABSTRACTBackgroundThe newest generation of DNA sequencing technology is highlighted by the ability to sequence reads hundreds of kilobases in length, and the increased availability of long read data has democratized the genome sequencing and assembly process. PacBio and Oxford Nanopore Technologies (ONT) have pioneered competitive long read platforms, with more recent work focused on improving sequencing throughput and per-base accuracy. Released in 2019, the PacBio Sequel II platform advertises substantial enhancements over previous PacBio systems.ResultsWe used whole-genome sequencing data produced by two PacBio platforms (Sequel II and RS II) and two ONT protocols (Rapid Sequencing and Ligation Sequencing) to compare assemblies of the bacteria Escherichia coli and the fruit fly Drosophila ananassae. Sequel II assemblies had higher contiguity and consensus accuracy relative to other methods, even after accounting for differences in sequencing throughput. ONT RAPID libraries had the fewest chimeric reads in addition to superior quantification of E. coli plasmids versus ligation-based libraries. The quality of assemblies can be enhanced by adopting hybrid approaches using Illumina libraries for bacterial genome assemblies or combined ONT and Sequel II libraries for eukaryotic genome assemblies. Genome-wide DNA methylation could be detected using both technologies, however ONT libraries enabled the identification of a broader range of known E. coli methyltransferase recognition motifs in addition to undocumented D. ananassae motifs.ConclusionsThe ideal choice of long read technology may depend on several factors including the question or hypothesis under examination. No single technology outperformed others in all metrics examined.


2020 ◽  
Author(s):  
Diem Nguyen ◽  
Valentina Peona ◽  
Per Unneberg ◽  
Alexander Suh ◽  
Patric Jern ◽  
...  

AbstractBackgroundA large portion of nuclear DNA is composed of transposable element (TE) sequences, whose transposition is controlled by diverse host defense strategies in order to maintain genomic integrity. One such strategy is the fungal-specific Repeat-Induced Point (RIP) mutation that hyper-mutates repetitive DNA sequences. While RIP is found across Fungi, it has been shown to vary in efficiency. To date, detailed information on the TE landscapes and associated RIP patterns exist only in a few species belonging to highly divergent lineages.ResultWe investigated 18 nearly gapless genome assemblies of ten Neurospora species, which diverged from a common ancestor about 7 MYA, to determine genome-wide TE distribution and their associated RIP patterns. We showed that the TE contents between 8.7-18.9% covary with genome sizes that range between 37.8-43.9 Mb. Degraded copies of Long Terminal Repeat (LTR) retrotransposons were abundant among the identified TEs, and these are distributed across the genome at varying frequencies. In all investigated genomes, TE sequences had signs of numerous C-to-T substitutions, suggesting that RIP occurred in all species. RIP signatures in all genomes correlated with TE-dense regions.ConclusionsEssentially gapless genome assemblies allowed us to identify TEs in Neurospora genomes, and reveal that TEs contribute to genome size variation in this group. Our study suggests that TEs and RIP are highly correlated in Neurospora, and hence, the pattern of interaction is conserved over the investigated evolutionary timescale. We show that RIP signatures can be used to facilitate the identification of TE-rich region in the genome.


Author(s):  
Hannes P Eggertsson ◽  
Bjarni V Halldorsson

Abstract Motivation Data analysis is requisite on reliable data. In genetics this includes verifying that the sample is not contaminated with another, a problem ubiquitous in biology. Results In human, and other diploid species, DNA contamination from the same species can be found by the presence of three haplotypes between polymorphic SNPs. read_haps is a tool that detects sample contamination from short read whole genome sequencing data. Availabilityand implementation github.com/DecodeGenetics/read_haps. Contact [email protected]


Genes ◽  
2019 ◽  
Vol 10 (7) ◽  
pp. 509
Author(s):  
Tian Lan ◽  
Yu Lin ◽  
Jacob Njaramba-Ngatia ◽  
Xiao Guo ◽  
Ren Li ◽  
...  

The taxonomical identification merely based on morphology is often difficult for ancient remains. Therefore, universal or specific PCR amplification followed by sequencing and BLAST (basic local alignment search tool) search has become the most frequently used genetic-based method for the species identification of biological samples, including ancient remains. However, it is challenging for these methods to process extremely ancient samples with severe DNA fragmentation and contamination. Here, we applied whole-genome sequencing data from 12 ancient samples with ages ranging from 2.7 to 700 kya to compare different mapping algorithms, and tested different reference databases, mapping similarities and query coverage to explore the best method and mapping parameters that can improve the accuracy of ancient mammal species identification. The selected method and parameters were tested using 152 ancient samples, and 150 of the samples were successfully identified. We further screened the BLAST-based mapping results according to the deamination characteristics of ancient DNA to improve the ability of ancient species identification. Our findings demonstrate a marked improvement to the normal procedures used for ancient species identification, which was achieved through defining the mapping and filtering guidelines to identify true ancient DNA sequences. The guidelines summarized in this study could be valuable in archaeology, paleontology, evolution, and forensic science. For the convenience of the scientific community, we wrote a software script with Perl, called AncSid, which is made available on GitHub.


Sign in / Sign up

Export Citation Format

Share Document