scholarly journals Effective Machine-Learning Assembly For Next-Generation Sequencing With Very Low Coverage

2018 ◽  
Author(s):  
Louis Ranjard ◽  
Thomas K. F. Wong ◽  
Allen G. Rodrigo

ABSTRACTIn short-read DNA sequencing experiments, the read coverage is a key parameter to successfully assemble the reads and reconstruct the sequence of the input DNA. When coverage is very low, the original sequence reconstruction from the reads can be difficult because of the occurrence of uncovered gaps. Reference guided assembly can then improve these assemblies. However, when the available reference is phylogenetically distant from the sequencing reads, the mapping rate of the reads can be extremely low. Some recent improvements in read mapping approaches aim at modifying the reference according to the reads dynamically. Such approaches can significantly improve the alignment rate of the reads onto distant references but the processing of insertions and deletions remains challenging. Here, we introduce a dynamic programming algorithm to update the reference sequence according to previously aligned reads. Substitutions, insertions and deletions are performed in the reference sequence dynamically. We evaluate this approach to assemble a western-grey kangaroo mitochondrial amplicon. Our results show that more reads can be aligned and that this method produces assemblies of length comparable to the truth while limiting error rate when classic approaches fail to recover the correct length. Our method allows us to assemble the first full mitochondrial genome for the western-grey kangaroo. Finally, we discuss how the core algorithm of this method could be improved and combined with other approaches to analyse larger genomic sequences.

2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Louis Ranjard ◽  
Thomas K. F. Wong ◽  
Allen G. Rodrigo

Abstract Background In short-read DNA sequencing experiments, the read coverage is a key parameter to successfully assemble the reads and reconstruct the sequence of the input DNA. When coverage is very low, the original sequence reconstruction from the reads can be difficult because of the occurrence of uncovered gaps. Reference guided assembly can then improve these assemblies. However, when the available reference is phylogenetically distant from the sequencing reads, the mapping rate of the reads can be extremely low. Some recent improvements in read mapping approaches aim at modifying the reference according to the reads dynamically. Such approaches can significantly improve the alignment rate of the reads onto distant references but the processing of insertions and deletions remains challenging. Results Here, we introduce a new algorithm to update the reference sequence according to previously aligned reads. Substitutions, insertions and deletions are performed in the reference sequence dynamically. We evaluate this approach to assemble a western-grey kangaroo mitochondrial amplicon. Our results show that more reads can be aligned and that this method produces assemblies of length comparable to the truth while limiting error rate when classic approaches fail to recover the correct length. Finally, we discuss how the core algorithm of this method could be improved and combined with other approaches to analyse larger genomic sequences. Conclusions We introduced an algorithm to perform dynamic alignment of reads on a distant reference. We showed that such approach can improve the reconstruction of an amplicon compared to classically used bioinformatic pipelines. Although not portable to genomic scale in the current form, we suggested several improvements to be investigated to make this method more flexible and allow dynamic alignment to be used for large genome assemblies.


2019 ◽  
Author(s):  
Hasindu Gamaarachchi ◽  
Chun Wai Lam ◽  
Gihan Jayatilaka ◽  
Hiruna Samarakoon ◽  
Jared T. Simpson ◽  
...  

AbstractNanopore sequencing has the potential to revolutionise genomics by realising portable, real-time sequencing applications, including point-of-care diagnostics and in-the-field genotyping. Achieving these applications requires efficient bioinformatic algorithms for the analysis of raw nanopore signal data. For instance, comparing raw nanopore signals to a biological reference sequence is a computationally complex task despite leveraging a dynamic programming algorithm for Adaptive Banded Event Alignment (ABEA)—a commonly used approach to polish sequencing data and identify non-standard nucleotides, such as measuring DNA methylation. Here, we parallelise and optimise an implementation of the ABEA algorithm (termed f5c) to efficiently run on heterogeneous CPU-GPU architectures. By optimising memory, compute and load balancing between CPU and GPU, we demonstrate how f5c can perform ~3-5× faster than the original implementation of ABEA in the Nanopolish software package. We also show that f5c enables DNA methylation detection on-the-fly using an embedded System on Chip (SoC) equipped with GPUs. Our work not only demonstrates that complex genomics analyses can be performed on lightweight computing systems, but also benefits High-Performance Computing (HPC). The associated source code for f5c along with GPU optimised ABEA is available at https://github.com/hasindu2008/f5c.


2019 ◽  
Author(s):  
Chirag Jain ◽  
Alexander Dilthey ◽  
Sanchit Misra ◽  
Haowen Zhang ◽  
Srinivas Aluru

AbstractAligning DNA sequences to an annotated reference is a key step for genotyping in biology. Recent scientific studies have demonstrated improved inference by aligning reads to a variation graph, i.e., a reference sequence augmented with known genetic variations. Given a variation graph in the form of a directed acyclic string graph, the sequence to graph alignment problem seeks to find the best matching path in the graph for an input query sequence. Solving this problem exactly using a sequential dynamic programming algorithm takes quadratic time in terms of the graph size and query length, making it difficult to scale to high throughput DNA sequencing data. In this work, we propose the first parallel algorithm for computing sequence to graph alignments that leverages multiple cores and single-instruction multiple-data (SIMD) operations. We take advantage of the available inter-task parallelism, and provide a novel blocked approach to compute the score matrix while ensuring high memory locality. Using a 48-core Intel Xeon Skylake processor, the proposed algorithm achieves peak performance of 317 billion cell updates per second (GCUPS), and demonstrates near linear weak and strong scaling on up to 48 cores. It delivers significant performance gains compared to existing algorithms, and results in run-time reduction from multiple days to three hours for the problem of optimally aligning high coverage long (PacBio/ONT) or short (Illumina) DNA reads to an MHC human variation graph containing 10 million vertices.AvailabilityThe implementation of our algorithm is available at https://github.com/ParBLiSS/PaSGAL. Data sets used for evaluation are accessible using https://alurulab.cc.gatech.edu/PaSGAL.


Author(s):  
Jin Yu ◽  
Pengfei Shen ◽  
Zhao Wang ◽  
Yurun Song ◽  
Xiaohan Dong

Heavy duty vehicles, especially special vehicles, including wheel loaders and sprinklers, generally work with drastic changes in load. With the usage of a conventional hydraulic mechanical transmission, they face with these problems such as low efficiency, high fuel consumption and so forth. Some scholars focus on the research to solve these issues. However, few of them take into optimal strategies the fluctuation of speed ratio change, which can also cause a lot of problems. In this study, a novel speed regulation is proposed which cannot only solve problems above but also overcome impact caused by speed ratio change. Initially, based on the former research of the Compound Coupled Hydro-mechanical Transmission (CCHMT), the basic characteristics of CCHMT are analyzed. Besides, to solve these problems, dynamic programming algorithm is utilized to formulate basic speed regulation strategy under specific operating condition. In order to reduce the problem caused by speed ratio change, a new optimization is applied. The results indicate that the proposed DP optimal speed regulation strategy has better performance on reducing fuel consumption by up to 1.16% and 6.66% in driving cycle JN1015 and in ECE R15 working condition individually, as well as smoothing the fluctuation of speed ratio by up to 12.65% and 19.01% in those two driving cycles respectively. The processes determining the speed regulation strategy can provide a new method to formulate the control strategies of CCHMT under different operating conditions particularlly under real-world conditions.


Sign in / Sign up

Export Citation Format

Share Document