scholarly journals Challenging a bioinformatic tool’s ability to detect microbial contaminants usingin silicowhole genome sequencing data

PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3729 ◽  
Author(s):  
Nathan D. Olson ◽  
Justin M. Zook ◽  
Jayne B. Morrow ◽  
Nancy J. Lin

High sensitivity methods such as next generation sequencing and polymerase chain reaction (PCR) are adversely impacted by organismal and DNA contaminants. Current methods for detecting contaminants in microbial materials (genomic DNA and cultures) are not sensitive enough and require either a known or culturable contaminant. Whole genome sequencing (WGS) is a promising approach for detecting contaminants due to its sensitivity and lack of need fora prioriassumptions about the contaminant. Prior to applying WGS, we must first understand its limitations for detecting contaminants and potential for false positives. Herein we demonstrate and characterize a WGS-based approach to detect organismal contaminants using an existing metagenomic taxonomic classification algorithm. Simulated WGS datasets from ten genera as individuals and binary mixtures of eight organisms at varying ratios were analyzed to evaluate the role of contaminant concentration and taxonomy on detection. For the individual genomes the false positive contaminants reported depended on the genus, withStaphylococcus,Escherichia, andShigellahaving the highest proportion of false positives. For nearly all binary mixtures the contaminant was detected in thein-silicodatasets at the equivalent of 1 in 1,000 cells, thoughF. tularensiswas not detected in any of the simulated contaminant mixtures andY. pestiswas only detected at the equivalent of one in 10 cells. Once a WGS method for detecting contaminants is characterized, it can be applied to evaluate microbial material purity, in efforts to ensure that contaminants are characterized in microbial materials used to validate pathogen detection assays, generate genome assemblies for database submission, and benchmark sequencing methods.

2017 ◽  
Author(s):  
Nathan D Olson ◽  
Justin M Zook ◽  
Jayne B Morrow ◽  
Nancy J Lin

High sensitivity methods such as next generation sequencing and polymerase chain reaction (PCR) are adversely impacted by organismal and DNA contaminants. Current methods for detecting contaminants in microbial materials (genomic DNA and cultures) are not sensitive enough and require either a known or culturable contaminant. Therefore, high sensitivity methods not requiring a priori assumptions about the contaminant are needed. We demonstrate the use of whole genome sequencing (WGS) and a metagenomic taxonomic classification algorithm for assessing the organismal purity of a microbial material. Using this proposed method we characterized the types of false positive contaminants reported and the dependence of detectable contaminant concentration on material and contaminant genome using simulated WGS data. Using the proposed method to characterize microbial material purity will help to ensure that the materials used to validate pathogen detection assays, generate genome assemblies for database submission, and benchmark sequencing methods are free of contaminants adversely impacting measurement results.


2017 ◽  
Author(s):  
Nathan D Olson ◽  
Justin M Zook ◽  
Jayne B Morrow ◽  
Nancy J Lin

High sensitivity methods such as next generation sequencing and polymerase chain reaction (PCR) are adversely impacted by organismal and DNA contaminants. Current methods for detecting contaminants in microbial materials (genomic DNA and cultures) are not sensitive enough and require either a known or culturable contaminant. Therefore, high sensitivity methods not requiring a priori assumptions about the contaminant are needed. We demonstrate the use of whole genome sequencing (WGS) and a metagenomic taxonomic classification algorithm for assessing the organismal purity of a microbial material. Using this proposed method we characterized the types of false positive contaminants reported and the dependence of detectable contaminant concentration on material and contaminant genome using simulated WGS data. Using the proposed method to characterize microbial material purity will help to ensure that the materials used to validate pathogen detection assays, generate genome assemblies for database submission, and benchmark sequencing methods are free of contaminants adversely impacting measurement results.


2020 ◽  
Author(s):  
Kyle Fletcher ◽  
Lin Zhang ◽  
Juliana Gil ◽  
Rongkui Han ◽  
Keri Cavanaugh ◽  
...  

AbstractBackgroundGenetic maps are an important resource for validation of genome assemblies, trait discovery, and breeding. Next generation sequencing has enabled production of high-density genetic maps constructed with 10,000s of markers. Most current approaches require a genome assembly to identify markers. Our Assembly Free Linkage Analysis Pipeline (AFLAP) removes this requirement by using uniquely segregating k-mers as markers to rapidly construct a genotype table and perform subsequent linkage analysis. This avoids potential biases including preferential read alignment and variant calling.ResultsThe performance of AFLAP was determined in simulations and contrasted to a conventional workflow. We tested AFLAP using 100 F2 individuals of Arabidopsis thaliana, sequenced to low coverage. Genetic maps generated using k-mers contained over 130,000 markers that were concordant with the genomic assembly. The utility of AFLAP was then demonstrated by generating an accurate genetic map using genotyping-by-sequencing data of 235 recombinant inbred lines of Lactuca spp. AFLAP was then applied to 83 F1 individuals of the oomycete Bremia lactucae, sequenced to >5x coverage. The genetic map contained over 90,000 markers ordered in 19 large linkage groups. This genetic map was used to fragment, order, orient, and scaffold the genome, resulting in a much-improved reference assembly.ConclusionsAFLAP can be used to generate high density linkage maps and improve genome assemblies of any organism when a mapping population is available using whole genome sequencing or genotyping-by-sequencing data. Genetic maps produced for B. lactucae were accurately aligned to the genome and guided significant improvements of the reference assembly.


Author(s):  
Lihong Huang ◽  
Bin Hong ◽  
Wenxian Yang ◽  
Liansheng Wang ◽  
Rongshan Yu

Abstract Metagenomics data provide rich information for the detection of foodborne pathogens from food and environmental samples that are mixed with complex background bacteria strains. While pathogen detection from metagenomic sequencing data has become an activity of increasing interest, shotgun sequencing of uncultured food samples typically produces data that contain reads from many different organisms, making accurate strain typing a challenging task. Particularly, as many pathogens may contain a common set of genes that are highly similar to those from normal bacteria in food samples, traditional strain-level abundance profiling approaches do not perform well at detecting pathogens of very low abundance levels. To overcome this limitation, we propose an abundance correction method based on species-specific genomic regions to achieve high sensitivity and high specificity in target pathogen detection at low abundance.


2020 ◽  
Author(s):  
Lihong Huang ◽  
Bin Hong ◽  
Wenxian Yang ◽  
Liansheng Wang ◽  
Rongshan Yu

Metagenomics data provides rich information for the detection of foodborne pathogens from food and environmental samples that are mixed with complex background bacteria strains. While pathogen detection from metagenomic sequencing data has become an activity of increasing interest, shotgun sequencing of uncultured food samples typically produces data that contains reads from many different organisms, making accurate strain typing a challenging task. Particularly, as many pathogens may contain a common set of genes that are highly similar to those from normal bacteria in food samples, traditional strain-level abundance profiling approaches do not perform well at detecting pathogens of very low abundance levels. To overcome this limitation, we propose an abundance correction method based on species-specific genomic regions to achieve high sensitivity and high specificity in target pathogen detection at low abundance.


2021 ◽  
Vol 84 (1) ◽  
Author(s):  
Hayato Ogawa ◽  
Keita Horitani ◽  
Yasuhiro Izumiya ◽  
Soichi Sano

Contrary to earlier beliefs, every cell in the individual is genetically different due to somatic mutations. Consequently, tissues become a mixture of cells with distinct genomes, a phenomenon termed somatic mosaicism. Recent advances in genome sequencing technology have unveiled possible causes of mutations and how they shape the unique mutational landscape of the tissues. Moreover, the analysis of sequencing data in combination with clinical information has revealed the impacts of somatic mosaicism on disease processes. In this review, we discuss somatic mosaicism in various tissues and its clinical implications for human disease. Expected final online publication date for the Annual Review of Physiology, Volume 84 is February 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jason M. Neal-McKinney ◽  
Kun C. Liu ◽  
Christopher M. Lock ◽  
Wen-Hsin Wu ◽  
Jinxin Hu

AbstractThe sequencing, assembly, and analysis of bacterial genomes is central to tracking and characterizing foodborne pathogens. The bulk of bacterial genome sequencing at the US Food and Drug Administration is performed using short-read Illumina MiSeq technology, resulting in highly accurate but fragmented genomic sequences. The MinION sequencer from Oxford Nanopore is an evolving technology that produces long-read sequencing data with low equipment cost. The goal of this study was to compare Campylobacter genome assemblies generated from MiSeq and MinION data independently, as well as hybrid genome assemblies combining both data types. Two reference strains and two field isolates of C. jejuni were sequenced using MiSeq and MinION, and the sequence data were assembled using the software programs SPAdes and Canu, respectively. Hybrid genome assembly was performed using the program Unicycler. Comparison of the C. jejuni 81-176 and RM1221 genome assemblies to the PacBio reference genomes revealed that the SPAdes assemblies had the most accurate nucleotide identity, while the hybrid assemblies were the most contiguous. Assemblies generated only from MinION data using Canu were the least accurate, containing many indels and substitutions that affected downstream analyses. The hybrid sequencing approach was the most useful for detecting plasmids, large genome rearrangements, and repetitive elements such as rRNA and tRNA genes. The full genomes of both C. jejuni field isolates were completed and circularized using hybrid sequencing, and a plasmid was detected in one isolate. Continued development of nanopore sequencing technologies will likely enhance the accuracy of hybrid genome assemblies and enable public health laboratories to routinely generate complete circularized bacterial genome sequences.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Kyle Fletcher ◽  
Lin Zhang ◽  
Juliana Gil ◽  
Rongkui Han ◽  
Keri Cavanaugh ◽  
...  

AbstractOur assembly-free linkage analysis pipeline (AFLAP) identifies segregating markers as k-mers in the raw reads without using a reference genome assembly for calling variants and provides genotype tables for the construction of unbiased, high-density genetic maps without a genome assembly. AFLAP is validated and contrasted to a conventional workflow using simulated data. AFLAP is applied to whole genome sequencing and genotype-by-sequencing data of F1, F2, and recombinant inbred populations of two different plant species, producing genetic maps that are concordant with genome assemblies. The AFLAP-based genetic map for Bremia lactucae enables the production of a chromosome-scale genome assembly.


2019 ◽  
Vol 20 (2) ◽  
pp. 331
Author(s):  
Wesley Oliveira Barbosa ◽  
Antonio Wilson Vieira

The automatic detection of lines and curves from color images is a very important task in many applications, such as object recognition and scene reconstruction. Although there are closed formulation for curve fitting to a set of points, if the point set describes more than one instance of the object, as two circles for example, there is no closed formulation for obtaining the individual set of parameters without a priori information of which points belong to each object. However, it is usual the presence of multiple instances of objects such as lines and circles on an image. The well known Hough Transform is an efficient tool for recovering multiple objects from images using a voting process where the usual presence of false positives is an issue. In our work, we present an improvement on the voting process to detect multiple circles using Hough Transform in order to avoid false positives. Our experiments show that our voting process leads to a more robust detection, reducing the number of false positive and providing a more accurate detection even with large number of circles.


2020 ◽  
Author(s):  
Eric S. Tvedte ◽  
Mark Gasser ◽  
Benjamin C. Sparklin ◽  
Jane Michalski ◽  
Xuechu Zhao ◽  
...  

ABSTRACTBackgroundThe newest generation of DNA sequencing technology is highlighted by the ability to sequence reads hundreds of kilobases in length, and the increased availability of long read data has democratized the genome sequencing and assembly process. PacBio and Oxford Nanopore Technologies (ONT) have pioneered competitive long read platforms, with more recent work focused on improving sequencing throughput and per-base accuracy. Released in 2019, the PacBio Sequel II platform advertises substantial enhancements over previous PacBio systems.ResultsWe used whole-genome sequencing data produced by two PacBio platforms (Sequel II and RS II) and two ONT protocols (Rapid Sequencing and Ligation Sequencing) to compare assemblies of the bacteria Escherichia coli and the fruit fly Drosophila ananassae. Sequel II assemblies had higher contiguity and consensus accuracy relative to other methods, even after accounting for differences in sequencing throughput. ONT RAPID libraries had the fewest chimeric reads in addition to superior quantification of E. coli plasmids versus ligation-based libraries. The quality of assemblies can be enhanced by adopting hybrid approaches using Illumina libraries for bacterial genome assemblies or combined ONT and Sequel II libraries for eukaryotic genome assemblies. Genome-wide DNA methylation could be detected using both technologies, however ONT libraries enabled the identification of a broader range of known E. coli methyltransferase recognition motifs in addition to undocumented D. ananassae motifs.ConclusionsThe ideal choice of long read technology may depend on several factors including the question or hypothesis under examination. No single technology outperformed others in all metrics examined.


Sign in / Sign up

Export Citation Format

Share Document