DepthFinder: a tool to determine the optimal read depth for reduced-representation sequencing

2019 ◽  
Vol 36 (1) ◽  
pp. 26-32
Author(s):  
Davoud Torkamaneh ◽  
Jérôme Laroche ◽  
Brian Boyle ◽  
François Belzile

Abstract Motivation Identification of DNA sequence variations such as single nucleotide polymorphisms (SNPs) is a fundamental step toward genetic studies. Reduced-representation sequencing methods have been developed as alternatives to whole genome sequencing to reduce costs and enable the analysis of many more individual. Amongst these methods, restriction site associated sequencing (RSAS) methodologies have been widely used for rapid and cost-effective discovery of SNPs and for high-throughput genotyping in a wide range of species. Despite the extensive improvements of the RSAS methods in the last decade, the estimation of the number of reads (i.e. read depth) required per sample for an efficient and effective genotyping remains mostly based on trial and error. Results Herein we describe a bioinformatics tool, DepthFinder, designed to estimate the required read counts for RSAS methods. To illustrate its performance, we estimated required read counts in six different species (human, cattle, spruce budworm, salmon, barley and soybean) that cover a range of different biological (genome size, level of genome complexity, level of DNA methylation and ploidy) and technical (library preparation protocol and sequencing platform) factors. To assess the prediction accuracy of DepthFinder, we compared DepthFinder-derived results with independent datasets obtained from an RSAS experiment. This analysis yielded estimated accuracies of nearly 94%. Moreover, we present DepthFinder as a powerful tool to predict the most effective size selection interval in RSAS work. We conclude that DepthFinder constitutes an efficient, reliable and useful tool for a broad array of users in different research communities. Availability and implementation https://bitbucket.org/jerlar73/DepthFinder Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Vol 35 (17) ◽  
pp. 3160-3162
Author(s):  
Davoud Torkamaneh ◽  
Jérôme Laroche ◽  
Istvan Rajcan ◽  
François Belzile

Abstract Motivation Reduced-representation sequencing is a genome-wide scanning method for simultaneous discovery and genotyping of thousands to millions of single nucleotide polymorphisms that is used across a wide range of species. However, in this method a reproducible but very small fraction of the genome is captured for sequencing, while the resulting reads are typically aligned against the entire reference genome. Results Here we present a skinny reference genome approach in which a simplified reference genome is used to decrease computing time for data processing and to increase single nucleotide polymorphism counts and accuracy. A skinny reference genome can be integrated into any reduced-representation sequencing analytical pipeline. Availability and implementation https://bitbucket.org/jerlar73/SRG-Extractor. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (21) ◽  
pp. 4442-4444 ◽  
Author(s):  
Jia-Xing Yue ◽  
Gianni Liti

Abstract Summary Simulated genomes with pre-defined and random genomic variants can be very useful for benchmarking genomic and bioinformatics analyses. Here we introduce simuG, a lightweight tool for simulating the full-spectrum of genomic variants (single nucleotide polymorphisms, Insertions/Deletions, copy number variants, inversions and translocations) for any organisms (including human). The simplicity and versatility of simuG make it a unique general-purpose genome simulator for a wide-range of simulation-based applications. Availability and implementation Code in Perl along with user manual and testing data is available at https://github.com/yjx1217/simuG. This software is free for use under the MIT license. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Lee E. Korshoj ◽  
Prashant Nagpal

AbstractAdvances in precision medicine require high-throughput, inexpensive, point-of-care diagnostic methods with multi-omics capability for detecting a wide range of biomolecules and their molecular variants. Optical techniques have offered many promising advances towards such diagnostics. However, the inability to squeeze light with several hundred-nanometer wavelengths into angstrom-scale volume for single nucleotide measurements has hindered further progress. Recently, a block optical sequencing (BOS) method has been shown for determining relative nucleobase content in DNA k-mer blocks with Raman spectroscopy, and a block optical content scoring (BOCS) algorithm was developed for robust content-based genetic biomarker database searching. Here, we performed BOS measurements on positively-charged silver nanoparticles to achieve 93.3% accuracy for predicting nucleobase content in DNA k-mer blocks (where k=10), as well as measurements on RNA and chemically-modified nucleobases for extensions to transcriptomic and epigenetic studies. Our high-accuracy BOS measurements were then used with BOCS to correctly identify a β-lactamase gene from the MEGARes antibiotic resistance database and confirm the Pseudomonas aeruginosa pathogen of origin from <12 content measurements (<15% coverage) of the gene. These results prove the integration of BOS/BOCS as a diagnostic optical sequencing platform. With the versatile range of available plasmonic substrates offering simple data acquisition, varying resolution (single-molecule to ensemble), and multiplexing, this optical sequencing platform has potential as the rapid, cost-effective method needed for broad-spectrum biomarker detection.


2019 ◽  
Author(s):  
Adam P. A. Cardilini ◽  
Katarina C. Stuart ◽  
Phillip Cassey ◽  
Mark F. Richardson ◽  
William Sherwin ◽  
...  

AbstractA detailed understanding of population genetics in invasive populations helps us to identify drivers of successful introductions. Here, we investigate putative signals of selection in Australian populations of invasive common starlings,Sturnus vulgaris, and seek to understand how these have been influenced by introduction history. We use reduced representation sequencing to determine population structure, and identity Single Nucleotide Polymorphisms (SNPs) that are putatively under selection. We found that since their introduction into Australia, starling populations have become genetically differentiated despite the potential for high levels of dispersal, and that selection has facilitated their adaptation to the wide range of environmental conditions across their geographic range. Isolation by distance appears to have played a strong role in determining genetic substructure across the starling’s Australian range. Analyses of candidate SNPs that are putatively under selection indicate that aridity, precipitation, and temperature may be important factors driving adaptive variation across the starling’s invasive range in Australia. However, we also note that the historic introduction regime may leave footprints on sites flagged as being under adaptive selection, and encourage critical interpretation of selection analyses.


2022 ◽  
Author(s):  
Nadin Rohland ◽  
Swapan Mallick ◽  
Matthew Mah ◽  
Robert M Maier ◽  
Nick J Patterson ◽  
...  

In-solution enrichment for hundreds of thousands of single nucleotide polymorphisms (SNPs) has been the source of >70% of all genome-scale ancient human DNA data published to date. This approach has made it possible to generate data for one to two orders of magnitude lower cost than random shotgun sequencing, making it economical to study ancient samples with low proportions of human DNA, and increasing the rate of conversion of sampled remains into working data thereby facilitating ethical stewardship of human remains. So far, nearly all ancient DNA data obtained using in-solution enrichment has been generated using a set of bait sequences targeting about 1.24 million SNPs (the 1240k reagent). These sequences were published in 2015, but synthesis of the reagent has been cost-effective for only a few laboratories. In 2021, two companies made available reagents that target the same core set of SNPs along with supplementary content. Here, we test the properties of the three reagents on a common set of 27 ancient DNA libraries across a range of richness of DNA content and percentages of human molecules. All three reagents are highly effective at enriching many hundreds of thousands of SNPs. For all three reagents and a wide range of conditions, one round of enrichment produces data that is as useful as two rounds when tens of millions of sequences are read out as is typical for such experiments. In our testing, the Twist Ancient DNA reagent produces the highest coverages, greatest uniformity on targeted positions, and almost no bias toward enriching one allele more than another relative to shotgun sequencing. Allelic bias in 1240k enrichment has made it challenging to carry out joint analysis of these data with shotgun data, creating a situation where the ancient DNA community has been publishing two important bodes of data that cannot easily be co-analyzed by population genetic methods. To address this challenge, we introduce a subset of hundreds of thousands of SNPs for which 1240k data can be effectively co-analyzed with all other major data types.


2020 ◽  
pp. 1192-1198
Author(s):  
M.S. Mohammad ◽  
Tibebe Tesfaye ◽  
Kim Ki-Seong

Ultrasonic thickness gauges are easy to operate and reliable, and can be used to measure a wide range of thicknesses and inspect all engineering materials. Supplementing the simple ultrasonic thickness gauges that present results in either a digital readout or as an A-scan with systems that enable correlating the measured values to their positions on the inspected surface to produce a two-dimensional (2D) thickness representation can extend their benefits and provide a cost-effective alternative to expensive advanced C-scan machines. In previous work, the authors introduced a system for the positioning and mapping of the values measured by the ultrasonic thickness gauges and flaw detectors (Tesfaye et al. 2019). The system is an alternative to the systems that use mechanical scanners, encoders, and sophisticated UT machines. It used a camera to record the probe’s movement and a projected laser grid obtained by a laser pattern generator to locate the probe on the inspected surface. In this paper, a novel system is proposed to be applied to flat surfaces, in addition to overcoming the other limitations posed due to the use of the laser projection. The proposed system uses two video cameras, one to monitor the probe’s movement on the inspected surface and the other to capture the corresponding digital readout of the thickness gauge. The acquired images of the probe’s position and thickness gauge readout are processed to plot the measured data in a 2D color-coded map. The system is meant to be simpler and more effective than the previous development.


Author(s):  
Allan Matthews ◽  
Adrian Leyland

Over the past twenty years or so, there have been major steps forward both in the understanding of tribological mechanisms and in the development of new coating and treatment techniques to better “engineer” surfaces to achieve reductions in wear and friction. Particularly in the coatings tribology field, improved techniques and theories which enable us to study and understand the mechanisms occurring at the “nano”, “micro” and “macro” scale have allowed considerable progress to be made in (for example) understanding contact mechanisms and the influence of “third bodies” [1–5]. Over the same period, we have seen the emergence of the discipline which we now call “Surface Engineering”, by which, ideally, a bulk material (the ‘substrate’) and a coating are combined in a way that provides a cost-effective performance enhancement of which neither would be capable without the presence of the other. It is probably fair to say that the emergence and recognition of Surface Engineering as a field in its own right has been driven largely by the availability of “plasma”-based coating and treatment processes, which can provide surface properties which were previously unachievable. In particular, plasma-assisted (PA) physical vapour deposition (PVD) techniques, allowing wear-resistant ceramic thin films such as titanium nitride (TiN) to be deposited on a wide range of industrial tooling, gave a step-change in industrial productivity and manufactured product quality, and caught the attention of engineers due to the remarkable cost savings and performance improvements obtained. Subsequently, so-called 2nd- and 3rd-generation ceramic coatings (with multilayered or nanocomposite structures) have recently been developed [6–9], to further extend tool performance — the objective typically being to increase coating hardness further, or extend hardness capabilities to higher temperatures.


Biostatistics ◽  
2019 ◽  
Author(s):  
Dane R Van Domelen ◽  
Emily M Mitchell ◽  
Neil J Perkins ◽  
Enrique F Schisterman ◽  
Amita K Manatunga ◽  
...  

SUMMARYMeasuring a biomarker in pooled samples from multiple cases or controls can lead to cost-effective estimation of a covariate-adjusted odds ratio, particularly for expensive assays. But pooled measurements may be affected by assay-related measurement error (ME) and/or pooling-related processing error (PE), which can induce bias if ignored. Building on recently developed methods for a normal biomarker subject to additive errors, we present two related estimators for a right-skewed biomarker subject to multiplicative errors: one based on logistic regression and the other based on a Gamma discriminant function model. Applied to a reproductive health dataset with a right-skewed cytokine measured in pools of size 1 and 2, both methods suggest no association with spontaneous abortion. The fitted models indicate little ME but fairly severe PE, the latter of which is much too large to ignore. Simulations mimicking these data with a non-unity odds ratio confirm validity of the estimators and illustrate how PE can detract from pooling-related gains in statistical efficiency. These methods address a key issue associated with the homogeneous pools study design and should facilitate valid odds ratio estimation at a lower cost in a wide range of scenarios.


Author(s):  
Richard Jiang ◽  
Bruno Jacob ◽  
Matthew Geiger ◽  
Sean Matthew ◽  
Bryan Rumsey ◽  
...  

Abstract Summary We present StochSS Live!, a web-based service for modeling, simulation and analysis of a wide range of mathematical, biological and biochemical systems. Using an epidemiological model of COVID-19, we demonstrate the power of StochSS Live! to enable researchers to quickly develop a deterministic or a discrete stochastic model, infer its parameters and analyze the results. Availability and implementation StochSS Live! is freely available at https://live.stochss.org/ Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Darawan Rinchai ◽  
Jessica Roelands ◽  
Mohammed Toufiq ◽  
Wouter Hendrickx ◽  
Matthew C Altman ◽  
...  

Abstract Motivation We previously described the construction and characterization of generic and reusable blood transcriptional module repertoires. More recently we released a third iteration (“BloodGen3” module repertoire) that comprises 382 functionally annotated gene sets (modules) and encompasses 14,168 transcripts. Custom bioinformatic tools are needed to support downstream analysis, visualization and interpretation relying on such fixed module repertoires. Results We have developed and describe here a R package, BloodGen3Module. The functions of our package permit group comparison analyses to be performed at the module-level, and to display the results as annotated fingerprint grid plots. A parallel workflow for computing module repertoire changes for individual samples rather than groups of samples is also available; these results are displayed as fingerprint heatmaps. An illustrative case is used to demonstrate the steps involved in generating blood transcriptome repertoire fingerprints of septic patients. Taken together, this resource could facilitate the analysis and interpretation of changes in blood transcript abundance observed across a wide range of pathological and physiological states. Availability The BloodGen3Module package and documentation are freely available from Github: https://github.com/Drinchai/BloodGen3Module Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document