Mapping Contigs onto Reference Genomes

Author(s):  
Nalvo F. Almeida ◽  
André C. Lima ◽  
Said S. Adi ◽  
Carlos J. M. Viana ◽  
Marcel Y. Nakazaki ◽  
...  
Keyword(s):  
Microbiome ◽  
2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Lars Snipen ◽  
Inga-Leena Angell ◽  
Torbjørn Rognes ◽  
Knut Rudi

Abstract Background Studies of shifts in microbial community composition has many applications. For studies at species or subspecies levels, the 16S amplicon sequencing lacks resolution and is often replaced by full shotgun sequencing. Due to higher costs, this restricts the number of samples sequenced. As an alternative to a full shotgun sequencing we have investigated the use of Reduced Metagenome Sequencing (RMS) to estimate the composition of a microbial community. This involves the use of double-digested restriction-associated DNA sequencing, which means only a smaller fraction of the genomes are sequenced. The read sets obtained by this approach have properties different from both amplicon and shotgun data, and analysis pipelines for both can either not be used at all or not explore the full potential of RMS data. Results We suggest a procedure for analyzing such data, based on fragment clustering and the use of a constrained ordinary least square de-convolution for estimating the relative abundance of all community members. Mock community datasets show the potential to clearly separate strains even when the 16S is 100% identical, and genome-wide differences is < 0.02, indicating RMS has a very high resolution. From a simulation study, we compare RMS to shotgun sequencing and show that we get improved abundance estimates when the community has many very closely related genomes. From a real dataset of infant guts, we show that RMS is capable of detecting a strain diversity gradient for Escherichia coli across time. Conclusion We find that RMS is a good alternative to either metabarcoding or shotgun sequencing when it comes to resolving microbial communities at the strain level. Like shotgun metagenomics, it requires a good database of reference genomes and is well suited for studies of the human gut or other communities where many reference genomes exist. A data analysis pipeline is offered, as an R package at https://github.com/larssnip/microRMS.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Nae-Chyun Chen ◽  
Brad Solomon ◽  
Taher Mun ◽  
Sheila Iyer ◽  
Ben Langmead

AbstractMost sequencing data analyses start by aligning sequencing reads to a linear reference genome, but failure to account for genetic variation leads to reference bias and confounding of results downstream. Other approaches replace the linear reference with structures like graphs that can include genetic variation, incurring major computational overhead. We propose the reference flow alignment method that uses multiple population reference genomes to improve alignment accuracy and reduce reference bias. Compared to the graph aligner vg, reference flow achieves a similar level of accuracy and bias avoidance but with 14% of the memory footprint and 5.5 times the speed.


Forests ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 222
Author(s):  
Bartosz Ulaszewski ◽  
Joanna Meger ◽  
Jaroslaw Burczyk

Next-generation sequencing of reduced representation genomic libraries (RRL) is capable of providing large numbers of genetic markers for population genetic studies at relatively low costs. However, one major concern of these types of markers is the precision of genotyping, which is related to the common problem of missing data, which appears to be particularly important in association and genomic selection studies. We evaluated three RRL approaches (GBS, RADseq, ddRAD) and different SNP identification methods (de novo or based on a reference genome) to find the best solutions for future population genomics studies in two economically and ecologically important broadleaved tree species, namely F. sylvatica and Q. robur. We found that the use of ddRAD method coupled with SNP calling based on reference genomes provided the largest numbers of markers (28 k and 36 k for beech and oak, respectively), given standard filtering criteria. Using technical replicates of samples, we demonstrated that more than 80% of SNP loci should be considered as reliable markers in GBS and ddRAD, but not in RADseq data. According to the reference genomes’ annotations, more than 30% of the identified ddRAD loci appeared to be related to genes. Our findings provide a solid support for using ddRAD-based SNPs for future population genomics studies in beech and oak.


2020 ◽  
Vol 41 (S1) ◽  
pp. s434-s434
Author(s):  
Grant Vestal ◽  
Steven Bruzek ◽  
Amanda Lasher ◽  
Amorce Lima ◽  
Suzane Silbert

Background: Hospital-acquired infections pose a significant threat to patient health. Laboratories are starting to consider whole-genome sequencing (WGS) as a molecular method for outbreak detection and epidemiological surveillance. The objective of this study was to assess the use of the iSeq100 platform (Illumina, San Diego, CA) for accurate sequencing and WGS-based outbreak detection using the bioMérieux EPISEQ CS, a novel cloud-based software for sequence assembly and data analysis. Methods: In total, 25 isolates, including 19 MRSA isolates and 6 ATCC strains were evaluated in this study: A. baumannii ATCC 19606, B. cepacia ATCC 25416, E. faecalis ATCC 29212, E. coli ATCC 25922, P. aeruginosa ATCC 27853 and S. aureus ATCC 25923. DNA extraction of all isolates was performed on the QIAcube (Qiagen, Hilden, Germany) using the DNEasy Ultra Clean Microbial kit extraction protocol. DNA libraries were prepared for WGS using the Nextera DNA Flex Library Prep Kit (Illumina) and sequenced at 2×150-bp on the iSeq100 according to the manufacturer’s instructions. The 19 MRSA isolates were previously characterized by the DiversiLab system (bioMérieux, France). Upon validation of the iSeq100 platform, a new outbreak analysis was performed using WGS analysis using EPISEQ CS. ATCC sequences were compared to assembled reference genomes from the NCBI GenBank to assess the accuracy of the iSeq100 platform. The FASTQ files were aligned via BowTie2 version 2.2.6 software, using default parameters, and FreeBayes version 1.1.0.46-0 was used to call homozygous single-nucleotide polymorphisms (SNPs) with a minimum coverage of 5 and an allele frequency of 0.87 using default parameters. ATCC sequences were analyzed using ResFinder version 3.2 and were compared in silico to the reference genome. Results: EPISEQ CS classified 8 MRSA isolates as unrelated and grouped 11 isolates into 2 separate clusters: cluster A (5 isolates) and cluster B (6 isolates) with similarity scores of ≥99.63% and ≥99.50%, respectively. This finding contrasted with the previous characterization by DiversiLab, which identified 3 clusters of 2, 8, and 11 isolates, respectively. The EPISEQ CS resistome data detected the mecA gene in 18 of 19 MRSA isolates. Comparative analysis of the ATCCsequences to the reference genomes showed 99.9986% concordance of SNPs and 100.00% concordance between the resistance genes present. Conclusions: The iSeq100 platform accurately sequenced the bacterial isolates and could be an affordable alternative in conjunction with EPISEQ CS for epidemiological surveillance analysis and infection prevention.Funding: NoneDisclosures: None


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 144
Author(s):  
Ulrike H. Taron ◽  
Johanna L. A. Paijmans ◽  
Axel Barlow ◽  
Michaela Preick ◽  
Arati Iyengar ◽  
...  

The Asiatic wild dog (Cuon alpinus), restricted today largely to South and Southeast Asia, was widespread throughout Eurasia and even reached North America during the Pleistocene. Like many other species, it suffered from a huge range loss towards the end of the Pleistocene and went extinct in most of its former distribution. The fossil record of the dhole is scattered and the identification of fossils can be complicated by an overlap in size and a high morphological similarity between dholes and other canid species. We generated almost complete mitochondrial genomes for six putative dhole fossils from Europe. By using three lines of evidence, i.e., the number of reads mapping to various canid mitochondrial genomes, the evaluation and quantification of the mapping evenness along the reference genomes and phylogenetic analysis, we were able to identify two out of six samples as dhole, whereas four samples represent wolf fossils. This highlights the contribution genetic data can make when trying to identify the species affiliation of fossil specimens. The ancient dhole sequences are highly divergent when compared to modern dhole sequences, but the scarcity of dhole data for comparison impedes a more extensive analysis.


2021 ◽  
Author(s):  
Matthew Kirchhof ◽  
Christopher Jf Cameron ◽  
Stefan C Kremer
Keyword(s):  

2018 ◽  
Vol 35 (15) ◽  
pp. 2654-2656 ◽  
Author(s):  
Guoli Ji ◽  
Wenbin Ye ◽  
Yaru Su ◽  
Moliang Chen ◽  
Guangzao Huang ◽  
...  

Abstract Summary Alternative splicing (AS) is a well-established mechanism for increasing transcriptome and proteome diversity, however, detecting AS events and distinguishing among AS types in organisms without available reference genomes remains challenging. We developed a de novo approach called AStrap for AS analysis without using a reference genome. AStrap identifies AS events by extensive pair-wise alignments of transcript sequences and predicts AS types by a machine-learning model integrating more than 500 assembled features. We evaluated AStrap using collected AS events from reference genomes of rice and human as well as single-molecule real-time sequencing data from Amborella trichopoda. Results show that AStrap can identify much more AS events with comparable or higher accuracy than the competing method. AStrap also possesses a unique feature of predicting AS types, which achieves an overall accuracy of ∼0.87 for different species. Extensive evaluation of AStrap using different parameters, sample sizes and machine-learning models on different species also demonstrates the robustness and flexibility of AStrap. AStrap could be a valuable addition to the community for the study of AS in non-model organisms with limited genetic resources. Availability and implementation AStrap is available for download at https://github.com/BMILAB/AStrap. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document