PacBio sequencing output increased through uniform and directional fivefold concatenation

AbstractAdvances in sequencing technology have allowed researchers to sequence DNA with greater ease and at decreasing costs. Main developments have focused on either sequencing many short sequences or fewer large sequences. Methods for sequencing mid-sized sequences of 600–5,000 bp are currently less efficient. For example, the PacBio Sequel I system yields ~ 100,000–300,000 reads with an accuracy per base pair of 90–99%. We sought to sequence several DNA populations of ~ 870 bp in length with a sequencing accuracy of 99% and to the greatest depth possible. We optimised a simple, robust method to concatenate genes of ~ 870 bp five times and then sequenced the resulting DNA of ~ 5,000 bp by PacBioSMRT long-read sequencing. Our method improved upon previously published concatenation attempts, leading to a greater sequencing depth, high-quality reads and limited sample preparation at little expense. We applied this efficient concatenation protocol to sequence nine DNA populations from a protein engineering study. The improved method is accompanied by a simple and user-friendly analysis pipeline, DeCatCounter, to sequence medium-length sequences efficiently at one-fifth of the cost.

Download Full-text

HapSolo: an optimization approach for removing secondary haplotigs during diploid genome assembly and scaffolding

BMC Bioinformatics ◽

10.1186/s12859-020-03939-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Edwin A. Solares ◽

Yuan Tao ◽

Anthony D. Long ◽

Brandon S. Gaut

Keyword(s):

Cost Function ◽

Anopheles Funestus ◽

Hill Climbing ◽

Optimization Approach ◽

Sequencing Technology ◽

Genome Data ◽

A Genome ◽

Long Read ◽

Downstream Analysis ◽

The Cost

Abstract Background Despite marked recent improvements in long-read sequencing technology, the assembly of diploid genomes remains a difficult task. A major obstacle is distinguishing between alternative contigs that represent highly heterozygous regions. If primary and secondary contigs are not properly identified, the primary assembly will overrepresent both the size and complexity of the genome, which complicates downstream analysis such as scaffolding. Results Here we illustrate a new method, which we call HapSolo, that identifies secondary contigs and defines a primary assembly based on multiple pairwise contig alignment metrics. HapSolo evaluates candidate primary assemblies using BUSCO scores and then distinguishes among candidate assemblies using a cost function. The cost function can be defined by the user but by default considers the number of missing, duplicated and single BUSCO genes within the assembly. HapSolo performs hill climbing to minimize cost over thousands of candidate assemblies. We illustrate the performance of HapSolo on genome data from three species: the Chardonnay grape (Vitis vinifera), with a genome of 490 Mb, a mosquito (Anopheles funestus; 200 Mb) and the Thorny Skate (Amblyraja radiata; 2650 Mb). Conclusions HapSolo rapidly identified candidate assemblies that yield improvements in assembly metrics, including decreased genome size and improved N50 scores. Contig N50 scores improved by 35%, 9% and 9% for Chardonnay, mosquito and the thorny skate, respectively, relative to unreduced primary assemblies. The benefits of HapSolo were amplified by down-stream analyses, which we illustrated by scaffolding with Hi-C data. We found, for example, that prior to the application of HapSolo, only 52% of the Chardonnay genome was captured in the largest 19 scaffolds, corresponding to the number of chromosomes. After the application of HapSolo, this value increased to ~ 84%. The improvements for the mosquito’s largest three scaffolds, representing the number of chromosomes, were from 61 to 86%, and the improvement was even more pronounced for thorny skate. We compared the scaffolding results to assemblies that were based on PurgeDups for identifying secondary contigs, with generally superior results for HapSolo.

Download Full-text

HapSolo: An optimization approach for removing secondary haplotigs during diploid genome assembly and scaffolding

10.1101/2020.06.29.178848 ◽

2020 ◽

Author(s):

Edwin A. Solares ◽

Yuan Tao ◽

Anthony D. Long ◽

Brandon S. Gaut

Keyword(s):

Cost Function ◽

Anopheles Funestus ◽

Hill Climbing ◽

Optimization Approach ◽

Sequencing Technology ◽

Genome Data ◽

A Genome ◽

Long Read ◽

Downstream Analysis ◽

The Cost

ABSTRACTBackgroundDespite marked recent improvements in long-read sequencing technology, the assembly of diploid genomes remains a difficult task. A major obstacle is distinguishing between alternative contigs that represent highly heterozygous regions. If primary and secondary contigs are not properly identified, the primary assembly will overrepresent both the size and complexity of the genome, which complicates downstream analysis such as scaffolding.ResultsHere we illustrate a new method, which we call HapSolo, that identifies secondary contigs and defines a primary assembly based on multiple pairwise contig alignment metrics. HapSolo evaluates candidate primary assemblies using BUSCO scores and then distinguishes among candidate assemblies using a cost function. The cost function can be defined by the user but by default considers the number of missing, duplicated and single BUSCO genes within the assembly. HapSolo performs hill climbing to minimize cost over thousands of candidate assemblies. We illustrate the performance of HapSolo on genome data from three species: the Chardonnay grape (Vitis vinifera), with a genome of 490Mb, a mosquito (Anopheles funestus; 200Mb) and the Thorny Skate (Amblyraja radiata; 2,650 Mb).ConclusionsHapSolo rapidly identified candidate assemblies that yield improvements in assembly metrics, including decreased genome size and improved N50 scores. Contig N50 scores improved by 35%, 9% and 9% for Chardonnay, mosquito and the thorny skate, respectively, relative to unreduced primary assemblies. The benefits of HapSolo were amplified by down-stream analyses, which we illustrated by scaffolding with Hi-C data. We found, for example, that prior to the application of HapSolo, only 52% of the Chardonnay genome was captured in the largest 19 scaffolds, corresponding to the number of chromosomes. After the application of HapSolo, this value increased to ~84%. The improvements for mosquito scaffolding were similar to that of Chardonnay (from 61% to 86%), but even more pronounced for thorny skate. We compared the scaffolding results to assemblies that were based on another published method for identifying secondary contigs, with generally superior results for HapSolo.

Download Full-text

SHI7 Is a Self-Learning Pipeline for Multipurpose Short-Read DNA Quality Control

mSystems ◽

10.1128/msystems.00202-17 ◽

2018 ◽

Vol 3 (3) ◽

Cited By ~ 15

Author(s):

Gabriel A. Al-Ghalith ◽

Benjamin Hillmann ◽

Kaiwei Ang ◽

Robin Shields-Cutler ◽

Dan Knights

Keyword(s):

Quality Control ◽

Dna Sequences ◽

Sequence Data ◽

Background Knowledge ◽

Sequencing Technology ◽

Data Set ◽

Short Read ◽

Dna Quality ◽

Public Data ◽

User Friendly

ABSTRACT Next-generation sequencing technology is of great importance for many biological disciplines; however, due to technical and biological limitations, the short DNA sequences produced by modern sequencers require numerous quality control (QC) measures to reduce errors, remove technical contaminants, or merge paired-end reads together into longer or higher-quality contigs. Many tools for each step exist, but choosing the appropriate methods and usage parameters can be challenging because the parameterization of each step depends on the particularities of the sequencing technology used, the type of samples being analyzed, and the stochasticity of the instrumentation and sample preparation. Furthermore, end users may not know all of the relevant information about how their data were generated, such as the expected overlap for paired-end sequences or type of adaptors used to make informed choices. This increasing complexity and nuance demand a pipeline that combines existing steps together in a user-friendly way and, when possible, learns reasonable quality parameters from the data automatically. We propose a user-friendly quality control pipeline called SHI7 (canonically pronounced “shizen”), which aims to simplify quality control of short-read data for the end user by predicting presence and/or type of common sequencing adaptors, what quality scores to trim, whether the data set is shotgun or amplicon sequencing, whether reads are paired end or single end, and whether pairs are stitchable, including the expected amount of pair overlap. We hope that SHI7 will make it easier for all researchers, expert and novice alike, to follow reasonable practices for short-read data quality control. IMPORTANCE Quality control of high-throughput DNA sequencing data is an important but sometimes laborious task requiring background knowledge of the sequencing protocol used (such as adaptor type, sequencing technology, insert size/stitchability, paired-endedness, etc.). Quality control protocols typically require applying this background knowledge to selecting and executing numerous quality control steps with the appropriate parameters, which is especially difficult when working with public data or data from collaborators who use different protocols. We have created a streamlined quality control pipeline intended to substantially simplify the process of DNA quality control from raw machine output files to actionable sequence data. In contrast to other methods, our proposed pipeline is easy to install and use and attempts to learn the necessary parameters from the data automatically with a single command.

Download Full-text

An improved method of calculating the cost price of the product

Fibre Chemistry ◽

10.1007/bf00547345 ◽

1972 ◽

Vol 3 (2) ◽

pp. 191-194

Author(s):

N. D. Novikova ◽

E. N. Arnoldova ◽

N. P. Bogatova ◽

Z. V. Bobrova

Keyword(s):

Cost Price ◽

Improved Method ◽

The Cost

Download Full-text

BUDGET UAV SYSTEMS FOR THE PROSPECTION OF SMALL- AND MEDIUM-SCALE ARCHAEOLOGICAL SITES

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xli-b1-971-2016 ◽

2016 ◽

Vol XLI-B1 ◽

pp. 971-977 ◽

Cited By ~ 9

Author(s):

W. Ostrowski ◽

K. Hanus

Keyword(s):

Low Cost ◽

Planning System ◽

Archaeological Sites ◽

Quality Of Data ◽

Computing Power ◽

Medium Scale ◽

The Cost ◽

User Friendly ◽

Operational Range

One of the popular uses of UAVs in photogrammetry is providing an archaeological documentation. A wide offer of low-cost (consumer) grade UAVs, as well as the popularity of user-friendly photogrammetric software allowing obtaining satisfying results, contribute to facilitating the process of preparing documentation for small archaeological sites. However, using solutions of this kind is much more problematic for larger areas. The limited possibilities of autonomous flight makes it significantly harder to obtain data for areas too large to be covered during a single mission. Moreover, sometimes the platforms used are not equipped with telemetry systems, which makes navigating and guaranteeing a similar quality of data during separate flights difficult. The simplest solution is using a better UAV, however the cost of devices of such type often exceeds the financial capabilities of archaeological expeditions. <br><br> The aim of this article is to present methodology allowing obtaining data for medium scale areas using only a basic UAV. The proposed methodology assumes using a simple multirotor, not equipped with any flight planning system or telemetry. Navigating of the platform is based solely on live-view images sent from the camera attached to the UAV. The presented survey was carried out using a simple GoPro camera which, from the perspective of photogrammetric use, was not the optimal configuration due to the fish eye geometry of the camera. Another limitation is the actual operational range of UAVs which in the case of cheaper systems, rarely exceeds 1 kilometre and is in fact often much smaller. Therefore the surveyed area must be divided into sub-blocks which correspond to the range of the drone. It is inconvenient since the blocks must overlap, so that they will later be merged during their processing. This increases the length of required flights as well as the computing power necessary to process a greater number of images. <br><br> These issues make prospection highly inconvenient, but not impossible. Our paper presents our experiences through two case studies: surveys conducted in Nepal under the aegis of UNESCO, and works carried out as a part of a Polish archaeological expedition in Cyprus, which both prove that the proposed methodology allows obtaining satisfying results. The article is an important voice in the ongoing debate between commercial and academic archaeologists who discuss the balance between the required standards of conducting archaeological works and economic capabilities of archaeological missions.

Download Full-text

Real Time Electrical Energy Monitoring and Cost Benefit Analysis using Smart Meter

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f1029.0386s20 ◽

2020 ◽

Vol 8 (6S) ◽

pp. 156-160

Keyword(s):

Real Time ◽

Cost Benefit Analysis ◽

Cost Benefit ◽

Electrical Energy ◽

Energy Resources ◽

Energy Utilization ◽

Daily Lives ◽

New Energy ◽

The Cost ◽

User Friendly

Energy is an essential component in supporting people’s daily lives and is a significant economical element in development of the country. The eventual depletion of conventional energy resources and their harmful impacts on environment as well as the rising energy costs and the limitations of new energy resources and technologies have pushed efficient energy management to the top of the agenda. But how the energy utilization can be managed? A simple answer to this is viable and real time metering, which enables calculation of run time energy consumption and obtaining the real-time as well as cumulative cost. In this research an Innovative hardware and IoT based solution to this problem is availed that could provide live information related to consumption of electricity by various appliances. The methodology used in this research is mainly based on a hardware tool named Elite 440 which is a meter and provides the data about various electrical parameters. This data so obtained is made visible on the dashboard in a user friendly. The data so visible includes various parameters like voltage, current, power factor etc. Also the data so obtained on the dashboard gets updated in each five minutes and simultaneously the cost gets updated which makes it real time monitoring System.

Download Full-text

TagSeqTools: a flexible and comprehensive analysis pipeline for NAD tagSeq data

10.1101/2020.03.09.982934 ◽

2020 ◽

Cited By ~ 1

Author(s):

Huan Zhong ◽

Zongwei Cai ◽

Zhu Yang ◽

Yiji Xia

Keyword(s):

Rna Sequencing ◽

Comprehensive Analysis ◽

Enzymatic Reactions ◽

Computational Tool ◽

Sequencing Data ◽

Analysis Pipeline ◽

Oxford Nanopore ◽

Long Read ◽

Identification And Characterization

AbstractNAD tagSeq has recently been developed for the identification and characterization of NAD+-capped RNAs (NAD-RNAs). This method adopts a strategy of chemo-enzymatic reactions to label the NAD-RNAs with a synthetic RNA tag before subjecting to the Oxford Nanopore direct RNA sequencing. A computational tool designed for analyzing the sequencing data of tagged RNA will facilitate the broader application of this method. Hence, we introduce TagSeqTools as a flexible, general pipeline for the identification and quantification of tagged RNAs (i.e., NAD+-capped RNAs) using long-read transcriptome sequencing data generated by NAD tagSeq method. TagSeqTools comprises two major modules, TagSeek for differentiating tagged and untagged reads, and TagSeqQuant for the quantitative and further characterization analysis of genes and isoforms. Besides, the pipeline also integrates some advanced functions to identify antisense or splicing, and supports the data reformation for visualization. Therefore, TagSeqTools provides a convenient and comprehensive workflow for researchers to analyze the data produced by the NAD tagSeq method or other tagging-based experiments using Oxford nanopore direct RNA sequencing. The pipeline is available at https://github.com/dorothyzh/TagSeqTools, under Apache License 2.0.

Download Full-text

An Improved Substation Locating and Sizing Method Based on the Weighted Voronoi Diagram and the Transportation Model

Journal of Applied Mathematics ◽

10.1155/2014/810607 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 6

Author(s):

Shiju Wang ◽

Zhiying Lu ◽

Shaoyun Ge ◽

Chengshan Wang

Keyword(s):

Global Convergence ◽

Voronoi Diagram ◽

Power Supply ◽

Convergence Speed ◽

Improved Method ◽

Power Networks ◽

Transportation Model ◽

The Cost ◽

Substation Planning ◽

The Impact

Substation locating and sizing is an important component of urban power networks. In this paper, an improved method based on the weighted Voronoi diagram and transportation model for substation planning is proposed, which can optimize the location, capacity, and power supply range for each substation with the minimum investment which contains the cost of the lines, substations, and annual operation expense. The weighted Voronoi diagram (WVD) whose weights can be adaptively adjusted can calculate the location and the capacity for each substation with good performance of global convergence and better convergence speed. Transportation model can simulate the best correspondence relationship between the loads and substations. The impact of geographical factors is also considered in this paper. Large amount of experiments show that the improved method can get more reasonable and more optimized planning result within shorter time than the original WVD and other algorithms.

Download Full-text

Metagenomics Approaches for Improving Food Safety

Journal of Food Protection ◽

10.4315/jfp-21-301 ◽

2021 ◽

Author(s):

Craig Billington ◽

Joanne M. Kingsbury ◽

Lucia Rivas

Keyword(s):

Food Safety ◽

Diagnostic Techniques ◽

Microbial Pathogens ◽

Food Ingredient ◽

Next Generation Sequencing Technology ◽

Long Read ◽

Foodborne Outbreaks ◽

The Cost ◽

Generation Sequencing ◽

Conventional Detection

Advancements in next-generation sequencing technology have dramatically reduced the cost and increased the ease of microbial whole-genome sequencing. This is revolutionizing the identification and analysis of foodborne microbial pathogens, facilitating expedited detection and mitigation of foodborne outbreaks, improving public health outcomes, and limiting costly recalls. However, this approach is still anchored in traditional laboratory practice involving the selection and culture of a single isolate. Metagenomic-based approaches, including metabarcoding, shotgun and long-read metagenomics, comprise the next disruptive revolution in food safety diagnostics and offer the potential to directly identify entire microbial communities in a single food, ingredient, or environmental sample. In this review, metagenomic-based approaches are introduced and placed within the context of conventional detection and diagnostic techniques, and essential considerations for undertaking metagenomic assays and data analysis are described. Recent applications of the use of metagenomics for food safety are discussed, alongside current limitations and knowledge gaps, and new opportunities arising from the use of this technology.

Download Full-text