scholarly journals DeepSimulator: a deep simulator for Nanopore sequencing

2017 ◽  
Author(s):  
Yu Li ◽  
Renmin Han ◽  
Chongwei Bi ◽  
Mo Li ◽  
Sheng Wang ◽  
...  

ABSTRACTMotivationOxford Nanopore sequencing is a rapidly developed sequencing technology in recent years. To keep pace with the explosion of the downstream data analytical tools, a versatile Nanopore sequencing simulator is needed to complement the experimental data as well as to benchmark those newly developed tools. However, all the currently available simulators are based on simple statistics of the produced reads, which have difficulty in capturing the complex nature of the Nanopore sequencing procedure, the main task of which is the generation of raw electrical current signals.ResultsHere we propose a deep learning based simulator, DeepSimulator, to mimic the entire pipeline of Nanopore sequencing. Starting from a given reference genome or assembled contigs, we simulate the electrical current signals by a context-dependent deep learning model, followed by a base-calling procedure to yield simulated reads. This workflow mimics the sequencing procedure more naturally. The thorough experiments performed across four species show that the signals generated by our context-dependent model are more similar to the experimentally obtained signals than the ones generated by the official context-independent pore model. In terms of the simulated reads, we provide a parameter interface to users so that they can obtain the reads with different accuracies ranging from 83% to 97%. The reads generated by the default parameter have almost the same properties as the real data. Two case studies demonstrate the application of DeepSimulator to benefit the development of tools in de novo assembly and in low coverage SNP detection.AvailabilityThe software can be accessed freely at: https://github.com/lykaust15/deep_simulator.

2020 ◽  
Author(s):  
Hongxu Ding ◽  
Ioannis Anastopoulos ◽  
Andrew D. Bailey ◽  
Joshua Stuart ◽  
Benedict Paten

ABSTRACTThe characteristic ionic currents of nucleotide kmers are commonly used in analyzing nanopore sequencing readouts. We present a graph convolutional network-based deep learning framework for predicting kmer characteristic ionic currents from corresponding chemical structures. We show such a framework can generalize the chemical information of the 5-methyl group from thymine to cytosine by correctly predicting 5-methylcytosine-containing DNA 6mers, thus shedding light on the de novo detection of nucleotide modifications.


2020 ◽  
Vol 48 (9) ◽  
pp. 4940-4945
Author(s):  
Pieter Spealman ◽  
Jaden Burrell ◽  
David Gresham

Abstract Inverted duplicated DNA sequences are a common feature of structural variants (SVs) and copy number variants (CNVs). Analysis of CNVs containing inverted duplicated DNA sequences using nanopore sequencing identified recurrent aberrant behavior characterized by low confidence, incorrect and missed base calls. Inverted duplicate DNA sequences in both yeast and human samples were observed to have systematic elevation in the electrical current detected at the nanopore, increased translocation rates and decreased sampling rates. The coincidence of inverted duplicated DNA sequences with dramatically reduced sequencing accuracy and an increased translocation rate suggests that secondary DNA structures may interfere with the dynamics of transit of the DNA through the nanopore.


2020 ◽  
Author(s):  
Oguzhan Begik ◽  
Morghan C Lucas ◽  
Leszek P Pryszcz ◽  
Jose Miguel Ramirez ◽  
Rebeca Medina ◽  
...  

ABSTRACTA broad diversity of modifications decorate RNA molecules. Originally conceived as static components, evidence is accumulating that some RNA modifications may be dynamic, contributing to cellular responses to external signals and environmental circumstances. A major difficulty in studying these modifications, however, is the need of tailored protocols to map each modification type individually. Here, we present a new approach that uses direct RNA nanopore sequencing to identify and quantify RNA modifications present in native RNA molecules. First, we show that each RNA modification type results in a distinct and characteristic base-calling ‘error’ signature, which we validate using a battery of genetic strains lacking either pseudouridine (Y) or 2’-O-methylation (Nm) modifications. We then demonstrate the value of these signatures for de novo prediction of Y modifications transcriptome-wide, confirming known Y-modified sites as well as uncovering novel Y sites in mRNAs, ncRNAs and rRNAs, including a previously unreported Pus4-dependent Y modification in yeast mitochondrial rRNA, which we validate using orthogonal methods. To explore the dynamics of pseudouridylation across environmental stresses, we treat the cells with oxidative, cold and heat stresses, finding that yeast ribosomal rRNA modifications do not change upon environmental exposures, contrary to the general belief. By contrast, our method reveals many novel heat-sensitive Y-modified sites in snRNAs, snoRNAs and mRNAs, in addition to recovering previously reported sites. Finally, we develop a novel software, nanoRMS, which we show can estimate per-site modification stoichiometries from individual RNA molecules by identifying the reads with altered current intensity and trace profiles, and quantify the RNA modification stoichiometry changes between two conditions. Our work demonstrates that Y RNA modifications can be predicted de novo and in a quantitative manner using native RNA nanopore sequencing.


2020 ◽  
Vol 2020 ◽  
pp. 1-12 ◽  
Author(s):  
Shih-Lin Lin ◽  
Hua-Wei Huang

Financial forecasting is based on the use of past and present financial information to make the best prediction of the future financial situation, to avoid high-risk situations, and to increase benefits. Such forecasts are of interest to anyone who wants to know the state of possible finances in the future, including investors and decision-makers. However, the complex nature of financial data makes it difficult to get accurate forecasts. Artificial intelligence, which has been shown to be suitable for analyzing very complex problems, can be applied to financial forecasting. Financial data is both nonlinear and nonstationary, with broadband frequency features. In other words, there is a large range of fluctuation, meaning that predictions made only using long short-term memory (LSTM) are not enough to ensure accuracy. This study uses an LSTM model for analysis of financial data, followed by a comparison of the analytical results with the actual data to see which has a larger root-mean-square-error (RMSE). The proposed method combines deep learning with empirical mode decomposition (EMD) to understand and predict financial trends from financial data. The financial data for this study are from the Taiwan corporate social responsibility (CSR) index. First, the EMD method is used to transform the CSR index data into a limited number of intrinsic mode functions (IMF). The bandwidth of these IMFs becomes narrower, with regular cyclic, periodic, or seasonal components in the time domain. In other words, the range of fluctuation is small. LSTM is a good way to forecast cyclic or seasonal data. The forecast result is obtained by adding all the IMFs together. It has been verified in past studies that only the LSTM and LSTM combined with the EMD can be used. The analytical results show that smaller RMSEs can be obtained using the LSTM combined with EMD compared to real data.


2018 ◽  
Author(s):  
Carlos de Lannoy ◽  
Judith Risse ◽  
Dick de Ridder

AbstractNanopore sequencing is a novel approach to nucleic acid analysis that generates long, error-prone reads. Since device components, base calling software and best practices for sample preparation are updated frequently and extensively, the nature of the produced data also changes frequently. As a result, peer-reviewed publications on de novo assembly pipeline benchmarking efforts are quickly rendered outdated by the next major improvement to the sequencing platforms. To provide the user community with a faster, more flexible alternative to peer-reviewed benchmark papers for de novo assembly tool performance we constructed poreTally, a comprehensive benchmarking tool. poreTally automatically assembles a given read set using several often-used assembly pipelines, analyzes the resulting assemblies for correctness and continuity, and finally generates a quality report. Results can immediately be shared with peers in a Github/Gitlab repository. Furthermore, we aim to give a more inclusive overview of assembly pipeline performance than any individual research group can, by offering users the possibility to submit their results to a collective benchmarking effort. poreTally is available on Github.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Hongxu Ding ◽  
Ioannis Anastopoulos ◽  
Andrew D. Bailey ◽  
Joshua Stuart ◽  
Benedict Paten

AbstractThe characteristic ionic currents of nucleotide kmers are commonly used in analyzing nanopore sequencing readouts. We present a graph convolutional network-based deep learning framework for predicting kmer characteristic ionic currents from corresponding chemical structures. We show such a framework can generalize the chemical information of the 5-methyl group from thymine to cytosine by correctly predicting 5-methylcytosine-containing DNA 6mers, thus shedding light on the de novo detection of nucleotide modifications.


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 1962
Author(s):  
Enrico Buratto ◽  
Adriano Simonetto ◽  
Gianluca Agresti ◽  
Henrik Schäfer ◽  
Pietro Zanuttigh

In this work, we propose a novel approach for correcting multi-path interference (MPI) in Time-of-Flight (ToF) cameras by estimating the direct and global components of the incoming light. MPI is an error source linked to the multiple reflections of light inside a scene; each sensor pixel receives information coming from different light paths which generally leads to an overestimation of the depth. We introduce a novel deep learning approach, which estimates the structure of the time-dependent scene impulse response and from it recovers a depth image with a reduced amount of MPI. The model consists of two main blocks: a predictive model that learns a compact encoded representation of the backscattering vector from the noisy input data and a fixed backscattering model which translates the encoded representation into the high dimensional light response. Experimental results on real data show the effectiveness of the proposed approach, which reaches state-of-the-art performances.


2021 ◽  
Vol 11 (9) ◽  
pp. 3863
Author(s):  
Ali Emre Öztürk ◽  
Ergun Erçelebi

A large amount of training image data is required for solving image classification problems using deep learning (DL) networks. In this study, we aimed to train DL networks with synthetic images generated by using a game engine and determine the effects of the networks on performance when solving real-image classification problems. The study presents the results of using corner detection and nearest three-point selection (CDNTS) layers to classify bird and rotary-wing unmanned aerial vehicle (RW-UAV) images, provides a comprehensive comparison of two different experimental setups, and emphasizes the significant improvements in the performance in deep learning-based networks due to the inclusion of a CDNTS layer. Experiment 1 corresponds to training the commonly used deep learning-based networks with synthetic data and an image classification test on real data. Experiment 2 corresponds to training the CDNTS layer and commonly used deep learning-based networks with synthetic data and an image classification test on real data. In experiment 1, the best area under the curve (AUC) value for the image classification test accuracy was measured as 72%. In experiment 2, using the CDNTS layer, the AUC value for the image classification test accuracy was measured as 88.9%. A total of 432 different combinations of trainings were investigated in the experimental setups. The experiments were trained with various DL networks using four different optimizers by considering all combinations of batch size, learning rate, and dropout hyperparameters. The test accuracy AUC values for networks in experiment 1 ranged from 55% to 74%, whereas the test accuracy AUC values in experiment 2 networks with a CDNTS layer ranged from 76% to 89.9%. It was observed that the CDNTS layer has considerable effects on the image classification accuracy performance of deep learning-based networks. AUC, F-score, and test accuracy measures were used to validate the success of the networks.


2021 ◽  
Vol 61 (2) ◽  
pp. 621-630
Author(s):  
Sowmya Ramaswamy Krishnan ◽  
Navneet Bung ◽  
Gopalakrishnan Bulusu ◽  
Arijit Roy

BioChem ◽  
2021 ◽  
Vol 1 (1) ◽  
pp. 36-48
Author(s):  
Ivan Jacobs ◽  
Manolis Maragoudakis

Computer-assisted de novo design of natural product mimetics offers a viable strategy to reduce synthetic efforts and obtain natural-product-inspired bioactive small molecules, but suffers from several limitations. Deep learning techniques can help address these shortcomings. We propose the generation of synthetic molecule structures that optimizes the binding affinity to a target. To achieve this, we leverage important advancements in deep learning. Our approach generalizes to systems beyond the source system and achieves the generation of complete structures that optimize the binding to a target unseen during training. Translating the input sub-systems into the latent space permits the ability to search for similar structures, and the sampling from the latent space for generation.


Sign in / Sign up

Export Citation Format

Share Document