Intra-genomic rDNA gene variability of Nassellaria and Spumellaria (Rhizaria, Radiolaria) assessed by Sanger, MinION and Illumina sequencing

Mapping Intimacies ◽

10.1101/2021.10.05.463214 ◽

2021 ◽

Author(s):

Miguel Mendez Sandin ◽

Sarah Romac ◽

Fabrice Not

Keyword(s):

Large Scale ◽

High Throughput Sequencing ◽

Phylogenetic Reconstruction ◽

Environmental Dna ◽

Genomic Diversity ◽

Genomic Variability ◽

Rdna Gene ◽

Sequencing Errors ◽

Sequencing Platforms ◽

Gene Variability

Ribosomal DNA (rDNA) genes are known to be valuable markers for the barcoding of eukaryotic life and its phylogenetic classification at various taxonomic levels. The large scale exploration of environmental microbial diversity through metabarcoding approaches have been focused mainly on the hypervariable regions V4 and V9 of the 18S rDNA gene. Yet, the accurate interpretation of such environmental surveys is hampered by technical (e.g., PCR and sequencing errors) and biological biases (e.g., intra-genomic variability). Here we explored the intra-genomic diversity of Nassellaria and Spumellaria specimens (Radiolaria) by comparing Sanger sequencing with two different high-throughput sequencing platforms: Illumina and Oxford Nanopore Technologies (MinION). Our analysis determined that intra-genomic variability of Nassellaria and Spumellaria is generally low, yet in some Spumellaria specimens we found two different copies of the V4 with a similarity lower than 97%. From the different sequencing methods, Illumina showed the highest number of contaminations (i.e., environmental DNA, cross-contamination, tag-jumping), revealed by its high sequencing depth; and Minion showed the highest sequencing rate error (~14%). Yet the long reads produced by MinION (~2900 bp) allowed accurate phylogenetic reconstruction studies. These results, highlight the requirement for a careful interpretation of Illumina based metabarcoding studies, in particular regarding low abundant amplicons, and open future perspectives towards full environmental rDNA metabarcoding surveys.

Download Full-text

Technological advancements and their importance for nematode identification

SOIL ◽

10.5194/soil-2-257-2016 ◽

2016 ◽

Vol 2 (2) ◽

pp. 257-270 ◽

Cited By ~ 6

Author(s):

Mohammed Ahmed ◽

Melanie Sapp ◽

Thomas Prior ◽

Gerrit Karssen ◽

Matthew Alan Back

Keyword(s):

High Throughput ◽

Crop Production ◽

Large Scale ◽

High Throughput Sequencing ◽

Biological Indicators ◽

Rapid Identification ◽

Community Studies ◽

Terrestrial Environments ◽

Traditional Taxonomy ◽

Sequencing Platforms

Abstract. Nematodes represent a species-rich and morphologically diverse group of metazoans known to inhabit both aquatic and terrestrial environments. Their role as biological indicators and as key players in nutrient cycling has been well documented. Some plant-parasitic species are also known to cause significant losses to crop production. In spite of this, there still exists a huge gap in our knowledge of their diversity due to the enormity of time and expertise often involved in characterising species using phenotypic features. Molecular methodology provides useful means of complementing the limited number of reliable diagnostic characters available for morphology-based identification. We discuss herein some of the limitations of traditional taxonomy and how molecular methodologies, especially the use of high-throughput sequencing, have assisted in carrying out large-scale nematode community studies and characterisation of phytonematodes through rapid identification of multiple taxa. We also provide brief descriptions of some the current and almost-outdated high-throughput sequencing platforms and their applications in both plant nematology and soil ecology.

Download Full-text

An evaluation of the accuracy and speed of metagenome analysis tools

Scientific Reports ◽

10.1038/srep19233 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 187

Author(s):

Stinus Lindgreen ◽

Karen L. Adair ◽

Paul P. Gardner

Keyword(s):

Aquatic Ecosystems ◽

Large Scale ◽

High Throughput Sequencing ◽

Data Sets ◽

Metagenome Analysis ◽

Analysis Tools ◽

Sequencing Platforms ◽

Capacity Data ◽

High Degree ◽

Realistic Data

Abstract Metagenome studies are becoming increasingly widespread, yielding important insights into microbial communities covering diverse environments from terrestrial and aquatic ecosystems to human skin and gut. With the advent of high-throughput sequencing platforms, the use of large scale shotgun sequencing approaches is now commonplace. However, a thorough independent benchmark comparing state-of-the-art metagenome analysis tools is lacking. Here, we present a benchmark where the most widely used tools are tested on complex, realistic data sets. Our results clearly show that the most widely used tools are not necessarily the most accurate, that the most accurate tool is not necessarily the most time consuming and that there is a high degree of variability between available tools. These findings are important as the conclusions of any metagenomics study are affected by errors in the predicted community composition and functional capacity. Data sets and results are freely available from http://www.ucbioinformatics.org/metabenchmark.html

Download Full-text

Gradients Do Grow on Trees: A Linear-Time O(N)-Dimensional Gradient for Statistical Phylogenetics

Molecular Biology and Evolution ◽

10.1093/molbev/msaa130 ◽

2020 ◽

Vol 37 (10) ◽

pp. 3047-3060

Author(s):

Xiang Ji ◽

Zhenyu Zhang ◽

Andrew Holbrook ◽

Akihiko Nishimura ◽

Guy Baele ◽

...

Keyword(s):

Large Scale ◽

High Throughput Sequencing ◽

Linear Time ◽

Phylogenetic Reconstruction ◽

Fold Increase ◽

Time Algorithm ◽

Data Sets ◽

Lassa Virus ◽

Computational Performance ◽

Computational Bottleneck

Abstract Calculation of the log-likelihood stands as the computational bottleneck for many statistical phylogenetic algorithms. Even worse is its gradient evaluation, often used to target regions of high probability. Order O(N)-dimensional gradient calculations based on the standard pruning algorithm require O(N2) operations, where N is the number of sampled molecular sequences. With the advent of high-throughput sequencing, recent phylogenetic studies have analyzed hundreds to thousands of sequences, with an apparent trend toward even larger data sets as a result of advancing technology. Such large-scale analyses challenge phylogenetic reconstruction by requiring inference on larger sets of process parameters to model the increasing data heterogeneity. To make these analyses tractable, we present a linear-time algorithm for O(N)-dimensional gradient evaluation and apply it to general continuous-time Markov processes of sequence substitution on a phylogenetic tree without a need to assume either stationarity or reversibility. We apply this approach to learn the branch-specific evolutionary rates of three pathogenic viruses: West Nile virus, Dengue virus, and Lassa virus. Our proposed algorithm significantly improves inference efficiency with a 126- to 234-fold increase in maximum-likelihood optimization and a 16- to 33-fold computational performance increase in a Bayesian framework.

Download Full-text

An evaluation of the accuracy and speed of metagenome analysis tools

10.1101/017830 ◽

2015 ◽

Cited By ~ 10

Author(s):

Stinus Lindgreen ◽

Karen L Adair ◽

Paul Gardner

Keyword(s):

Aquatic Ecosystems ◽

Large Scale ◽

High Throughput Sequencing ◽

State Of The Art ◽

Data Sets ◽

Metagenome Analysis ◽

Analysis Tools ◽

Sequencing Platforms ◽

High Degree ◽

Realistic Data

Metagenome studies are becoming increasingly widespread, yielding important insights into microbial communities covering diverse environments from terrestrial and aquatic ecosystems to human skin and gut. With the advent of high-throughput sequencing platforms, the use of large scale shotgun sequencing approaches is now commonplace. However, a thorough independent benchmark comparing state-of-the-art metagenome analysis tools is lacking. Here, we present a benchmark where the most widely used tools are tested on complex, realistic data sets. Our results clearly show that the most widely used tools are not necessarily the most accurate, that the most accurate tool is not necessarily the most time consuming, and that there is a high degree of variability between available tools. These findings are important as the conclusions of any metagenomics study are affected by errors in the predicted community composition. Data sets and results are freely available from http://www.ucbioinformatics.org/metabenchmark.html

Download Full-text

Environmental DNA analysis shows high potential as a tool for estimating intraspecific genetic diversity in a wild fish population

10.1101/829770 ◽

2019 ◽

Cited By ~ 1

Author(s):

Satsuki Tsuji ◽

Atsushi Maruyama ◽

Masaki Miya ◽

Masayuki Ushio ◽

Hirotoshi Sato ◽

...

Keyword(s):

Genetic Diversity ◽

Water Sample ◽

Sanger Sequencing ◽

Large Scale ◽

High Throughput Sequencing ◽

Dna Analysis ◽

Environmental Dna ◽

Intraspecific Diversity ◽

Survey Method ◽

Intraspecific Genetic Diversity

AbstractEnvironmental DNA (eDNA) analysis has recently been used as a new tool for estimating intraspecific diversity. However, whether known haplotypes contained in a sample can be detected correctly using eDNA-based methods has been examined only by an aquarium experiment. Here, we tested whether the haplotypes of Ayu fish (Plecoglossus altivelis altivelis) detected in a capture survey could also be detected from an eDNA sample derived from the field that contained various haplotypes with low concentrations and foreign substances. A water sample and Ayu specimens collected from a river on the same day were analysed by eDNA analysis and Sanger sequencing, respectively. The 10 L water sample was divided into 20 filters for each of which 15 PCR replications were performed. After high-throughput sequencing, denoising was performed using two of the most widely used denoising packages, UNOISE3 and DADA2. Of the 42 haplotypes obtained from the Sanger sequencing of 96 specimens, 38 (UNOISE3) and 41 (DADA2) haplotypes were detected by eDNA analysis. When DADA2 was used, except for one haplotype, haplotypes owned by at least two specimens were detected from all the filter replications. This study showed that the eDNA analysis for evaluating intraspecific genetic diversity provides comparable results for large-scale capture-based conventional methods, suggesting that it could become a more efficient survey method for investigating intraspecific genetic diversity in the field.

Download Full-text

Indel-correcting DNA barcodes for high-throughput sequencing

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1802640115 ◽

2018 ◽

Vol 115 (27) ◽

pp. E6217-E6226 ◽

Cited By ~ 16

Author(s):

John A. Hawkins ◽

Stephen K. Jones ◽

Ilya J. Finkelstein ◽

William H. Press

Keyword(s):

High Throughput ◽

Dna Sequences ◽

Large Scale ◽

High Throughput Sequencing ◽

Gc Content ◽

Error Correcting Codes ◽

Dna Barcodes ◽

Library Size ◽

Sequencing Errors ◽

Insertion And Deletion

Many large-scale, high-throughput experiments use DNA barcodes, short DNA sequences prepended to DNA libraries, for identification of individuals in pooled biomolecule populations. However, DNA synthesis and sequencing errors confound the correct interpretation of observed barcodes and can lead to significant data loss or spurious results. Widely used error-correcting codes borrowed from computer science (e.g., Hamming, Levenshtein codes) do not properly account for insertions and deletions (indels) in DNA barcodes, even though deletions are the most common type of synthesis error. Here, we present and experimentally validate filled/truncated right end edit (FREE) barcodes, which correct substitution, insertion, and deletion errors, even when these errors alter the barcode length. FREE barcodes are designed with experimental considerations in mind, including balanced guanine-cytosine (GC) content, minimal homopolymer runs, and reduced internal hairpin propensity. We generate and include lists of barcodes with different lengths and error correction levels that may be useful in diverse high-throughput applications, including >106 single-error–correcting 16-mers that strike a balance between decoding accuracy, barcode length, and library size. Moreover, concatenating two or more FREE codes into a single barcode increases the available barcode space combinatorially, generating lists with >1015 error-correcting barcodes. The included software for creating barcode libraries and decoding sequenced barcodes is efficient and designed to be user-friendly for the general biology community.

Download Full-text

Advances in understanding the evolution of fungal genome architecture

F1000Research ◽

10.12688/f1000research.25424.1 ◽

2020 ◽

Vol 9 ◽

pp. 776 ◽

Cited By ~ 1

Author(s):

Shelby J. Priest ◽

Vikas Yadav ◽

Joseph Heitman

Keyword(s):

Large Scale ◽

High Throughput Sequencing ◽

Rapid Evolution ◽

Genomic Diversity ◽

Fungal Genome ◽

Small Scale ◽

Genomic Changes ◽

Wide Range ◽

Domains Of Life ◽

Fungal Genomes

Diversity within the fungal kingdom is evident from the wide range of morphologies fungi display as well as the various ecological roles and industrial purposes they serve. Technological advances, particularly in long-read sequencing, coupled with the increasing efficiency and decreasing costs across sequencing platforms have enabled robust characterization of fungal genomes. These sequencing efforts continue to reveal the rampant diversity in fungi at the genome level. Here, we discuss studies that have furthered our understanding of fungal genetic diversity and genomic evolution. These studies revealed the presence of both small-scale and large-scale genomic changes. In fungi, research has recently focused on many small-scale changes, such as how hypermutation and allelic transmission impact genome evolution as well as how and why a few specific genomic regions are more susceptible to rapid evolution than others. High-throughput sequencing of a diverse set of fungal genomes has also illuminated the frequency, mechanisms, and impacts of large-scale changes, which include chromosome structural variation and changes in chromosome number, such as aneuploidy, polyploidy, and the presence of supernumerary chromosomes. The studies discussed herein have provided great insight into how the architecture of the fungal genome varies within species and across the kingdom and how modern fungi may have evolved from the last common fungal ancestor and might also pave the way for understanding how genomic diversity has evolved in all domains of life.

Download Full-text

The landscape of somatic mutation in sporadic Chinese colorectal cancer

10.1101/155671 ◽

2017 ◽

Author(s):

Zhe Liu ◽

Chao Yang ◽

Xiangchun Li ◽

Wen Luo ◽

Bhaskar Roy ◽

...

Keyword(s):

Colorectal Cancer ◽

Wnt Signaling ◽

Cancer Progression ◽

Large Scale ◽

High Throughput Sequencing ◽

Gene Mutations ◽

Chinese Patients ◽

Sequencing Analysis ◽

Scale Characterization ◽

Sequencing Platforms

ABSTRACTColorectal cancer is the fifth prevalent cancer in China. Nevertheless, a large-scale characterization of Chinese colorectal cancer mutation spectrum has not been carried out. In this study, we have performed whole exome-sequencing analysis of 98 patients’ tumor samples with matched pairs of normal colon tissues using Illumina and Complete Genomics high-throughput sequencing platforms. Canonical CRC somatic gene mutations with high prevalence (>10%) have been verified, including TP53, APC, KRAS, SMAD4, FBXW7 and PIK3CA. PEG3 is identified as a novel frequently mutated gene (10.6%). APC and Wnt signaling exhibit significantly lower mutation frequencies than those in TCGA data. Analysis with clinical characteristics indicates that APC gene and Wnt signaling display lower mutation rate in lymph node positive cancer than negative ones, which are not observed in TCGA data. APC gene and Wnt signaling are considered as the key molecule and pathway for colorectal cancer initiation, and these findings greatly undermine their importance in tumor progression for Chinese patients. Taken together, the application of next-generation sequencing has led to the determination of novel somatic mutations and alternative disease mechanisms in colorectal cancer progression, which may be useful for understanding disease mechanism and personalizing treatment for Chinese patients.

Download Full-text

DDBJ update: streamlining submission and access of human data

Nucleic Acids Research ◽

10.1093/nar/gkaa982 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D71-D75

Author(s):

Asami Fukuda ◽

Yuichi Kodama ◽

Jun Mashima ◽

Takatomo Fujisawa ◽

Osamu Ogasawara

Keyword(s):

Large Scale ◽

High Throughput Sequencing ◽

Group Structure ◽

Biological Data ◽

Sequencing Data ◽

The Public ◽

Human Data ◽

Phenotype Data ◽

Sequencing Platforms ◽

Authorized Access

Abstract The Bioinformation and DDBJ Center (DDBJ Center, https://www.ddbj.nig.ac.jp) provides databases that capture, preserve and disseminate diverse biological data to support research in the life sciences. This center collects nucleotide sequences with annotations, raw sequencing data, and alignment information from high-throughput sequencing platforms, and study and sample information, in collaboration with the National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI). This collaborative framework is known as the International Nucleotide Sequence Database Collaboration (INSDC). In collaboration with the National Bioscience Database Center (NBDC), the DDBJ Center also provides a controlled-access database, the Japanese Genotype–phenotype Archive (JGA), which archives and distributes human genotype and phenotype data, requiring authorized access. The NBDC formulates guidelines and policies for sharing human data and reviews data submission and use applications. To streamline all of the processes at NBDC and JGA, we have integrated the two systems by introducing a unified login platform with a group structure in September 2020. In addition to the public databases, the DDBJ Center provides a computer resource, the NIG supercomputer, for domestic researchers to analyze large-scale genomic data. This report describes updates to the services of the DDBJ Center, focusing on the NBDC and JGA system enhancements.

Download Full-text

Simultaneous absolute quantification and sequencing of fish environmental DNA in a mesocosm by quantitative sequencing technique

Scientific Reports ◽

10.1038/s41598-021-83318-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Tatsuhiko Hoshino ◽

Ryohei Nakao ◽

Hideyuki Doi ◽

Toshifumi Minamoto

Keyword(s):

Fish Species ◽

High Throughput Sequencing ◽

Correlation Coefficients ◽

Environmental Dna ◽

Digital Pcr ◽

Absolute Quantification ◽

Taqman Probe ◽

Individual Species ◽

Natural Environments ◽

Target Sequence

AbstractThe combination of high-throughput sequencing technology and environmental DNA (eDNA) analysis has the potential to be a powerful tool for comprehensive, non-invasive monitoring of species in the environment. To understand the correlation between the abundance of eDNA and that of species in natural environments, we have to obtain quantitative eDNA data, usually via individual assays for each species. The recently developed quantitative sequencing (qSeq) technique enables simultaneous phylogenetic identification and quantification of individual species by counting random tags added to the 5′ end of the target sequence during the first DNA synthesis. Here, we applied qSeq to eDNA analysis to test its effectiveness in biodiversity monitoring. eDNA was extracted from water samples taken over 4 days from aquaria containing five fish species (Hemigrammocypris neglectus, Candidia temminckii, Oryzias latipes, Rhinogobius flumineus, and Misgurnus anguillicaudatus), and quantified by qSeq and microfluidic digital PCR (dPCR) using a TaqMan probe. The eDNA abundance quantified by qSeq was consistent with that quantified by dPCR for each fish species at each sampling time. The correlation coefficients between qSeq and dPCR were 0.643, 0.859, and 0.786 for H. neglectus, O. latipes, and M. anguillicaudatus, respectively, indicating that qSeq accurately quantifies fish eDNA.

Download Full-text