scholarly journals Index-based map-to-sequence alignment in large eukaryotic genomes

2015 ◽  
Author(s):  
Davide Verzotto ◽  
Axel M Hillmer ◽  
Audrey S M Teo ◽  
Niranjan Nagarajan

Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and mapping technologies (e.g. optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kbp--2 Mbp) and thus provide a unique source of information for disambiguating complex rearrangements in cancer genomes. Despite their utility, combining high-throughput sequencing and mapping technologies has been challenging due to the lack of efficient and freely available software for robustly aligning maps to sequences. Here we introduce two new map-to-sequence alignment algorithms that efficiently and accurately align high-throughput mapping datasets to large, eukaryotic genomes while accounting for high error rates. In order to do so, these methods (OPTIMA for glocal and OPTIMA-Overlap for overlap alignment) exploit the ability to create efficient data structures that index continuous-valued mapping data while accounting for errors. We also introduce an approach for evaluating the significance of alignments that avoids expensive permutation-based tests while being agnostic to technology-dependent error rates. Our benchmarking results suggest that OPTIMA and OPTIMA-Overlap outperform state-of-the-art approaches in sensitivity (1.6--2X improvement) while simultaneously being more efficient (170--200%) and precise in their alignments (99% precision). These advantages are independent of the quality of the data, suggesting that our indexing approach and statistical evaluation are robust and provide improved sensitivity while guaranteeing high precision.

MycoKeys ◽  
2018 ◽  
Vol 39 ◽  
pp. 29-40 ◽  
Author(s):  
Sten Anslan ◽  
R. Henrik Nilsson ◽  
Christian Wurzbacher ◽  
Petr Baldrian ◽  
Leho Tedersoo ◽  
...  

Along with recent developments in high-throughput sequencing (HTS) technologies and thus fast accumulation of HTS data, there has been a growing need and interest for developing tools for HTS data processing and communication. In particular, a number of bioinformatics tools have been designed for analysing metabarcoding data, each with specific features, assumptions and outputs. To evaluate the potential effect of the application of different bioinformatics workflow on the results, we compared the performance of different analysis platforms on two contrasting high-throughput sequencing data sets. Our analysis revealed that the computation time, quality of error filtering and hence output of specific bioinformatics process largely depends on the platform used. Our results show that none of the bioinformatics workflows appears to perfectly filter out the accumulated errors and generate Operational Taxonomic Units, although PipeCraft, LotuS and PIPITS perform better than QIIME2 and Galaxy for the tested fungal amplicon dataset. We conclude that the output of each platform requires manual validation of the OTUs by examining the taxonomy assignment values.


2020 ◽  
Vol 10 (12) ◽  
pp. 4063
Author(s):  
Beata Nalepa ◽  
Sławomir Ciesielski ◽  
Marek Aljewicz

The aim of this study was to evaluate the microbiome of industrially produced ripened Edam cheeses by next-generation sequencing. The samples for analyses were collected in spring and autumn. Spring samples were characterized by significantly higher Lactococcus and Bacillus counts and lower counts of Enterobacteriaceae, Enterococcus, and yeasts than autumn samples. The predominant microorganisms identified by the Illumina high-throughput sequencing technology belonged to four phyla: Firmicutes, Actinobacteria, Proteobacteria and Bacteroidetes. The dominant species were starter culture bacteria. Lactobacillus rhamnosus, Lactobacillus kefiri, Lactobacillus kefiranofaciens, Lactobacillus casei, Streptococcus thermophilus, and Bifidobacterium had the highest share of microbial cheese communities. The number of γ-Proteobacteria reads was higher in autumn cheese samples. A high number of reads was also noted in the genus Clostridium. The counts of spore-forming bacteria of the genus Bacillus were higher in cheeses produced in spring. The study revealed highly similar relationships between the analyzed production periods. The present results contribute to the existing knowledge of cheese microbiota, and they can be used to improve and modify production processes based on the composition of microbial communities, as well as to improve the quality of the final product.


2021 ◽  
Vol 12 ◽  
Author(s):  
Cheng Wang ◽  
Yunhe Xu ◽  
Bin Yu ◽  
Aibo Xiao ◽  
Yuhong Su ◽  
...  

The microbial composition of sour porridge at different fermentation times was analyzed through high-throughput sequencing, and a pure culture fermentation process was established to optimize production process and improve the edible quality of the porridge. In natural fermentation, Firmicutes and Proteobacteria were abundant throughout the process. Specifically, Aeromonas, Acinetobacter, and Klebsiella were dominant on fermentation days 1–5 (groups NF-1, NF-3, and NF-5), while Lactobacillus and Acetobacter gradually became the dominant bacteria on fermentation day 7 (group NF-7). Further, we isolated one strain of acid-producing bacteria from sour porridge, identified as Lacticaseibacillus paracasei by 16SrRNA sequencing and annotated as strain SZ02. Pure culture fermentation using this strain significantly increased the relative starch and amylose contents of the porridge, while decreasing the lipid, protein, and ash contents (P < 0.05). These findings suggest that sour porridge produced using strain SZ02 has superior edible qualities and this strategy may be exploited for its industrial production.


2021 ◽  
Vol 12 ◽  
Author(s):  
Harihara Subrahmaniam Muralidharan ◽  
Nidhi Shah ◽  
Jacquelyn S. Meisel ◽  
Mihai Pop

High-throughput sequencing has revolutionized the field of microbiology, however, reconstructing complete genomes of organisms from whole metagenomic shotgun sequencing data remains a challenge. Recovered genomes are often highly fragmented, due to uneven abundances of organisms, repeats within and across genomes, sequencing errors, and strain-level variation. To address the fragmented nature of metagenomic assemblies, scientists rely on a process called binning, which clusters together contigs inferred to originate from the same organism. Existing binning algorithms use oligonucleotide frequencies and contig abundance (coverage) within and across samples to group together contigs from the same organism. However, these algorithms often miss short contigs and contigs from regions with unusual coverage or DNA composition characteristics, such as mobile elements. Here, we propose that information from assembly graphs can assist current strategies for metagenomic binning. We use MetaCarvel, a metagenomic scaffolding tool, to construct assembly graphs where contigs are nodes and edges are inferred based on paired-end reads. We developed a tool, Binnacle, that extracts information from the assembly graphs and clusters scaffolds into comprehensive bins. Binnacle also provides wrapper scripts to integrate with existing binning methods. The Binnacle pipeline can be found on GitHub (https://github.com/marbl/binnacle). We show that binning graph-based scaffolds, rather than contigs, improves the contiguity and quality of the resulting bins, and captures a broader set of the genes of the organisms being reconstructed.


Sign in / Sign up

Export Citation Format

Share Document