scholarly journals Homopolish: a method for the removal of systematic errors in nanopore sequencing by homologous polishing

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yao-Ting Huang ◽  
Po-Yu Liu ◽  
Pei-Wen Shih

AbstractNanopore sequencing has been widely used for the reconstruction of microbial genomes. Owing to higher error rates, errors on the genome are corrected via neural networks trained by Nanopore reads. However, the systematic errors usually remain uncorrected. This paper designs a model that is trained by homologous sequences for the correction of Nanopore systematic errors. The developed program, Homopolish, outperforms Medaka and HELEN in bacteria, viruses, fungi, and metagenomic datasets. When combined with Medaka/HELEN, the genome quality can exceed Q50 on R9.4 flow cells. We show that Nanopore-only sequencing can produce high-quality microbial genomes sufficient for downstream analysis.

2020 ◽  
Author(s):  
Yao-Ting Huang ◽  
Po-Yu Liu ◽  
Pei-Wen Shih

AbstractNanopore sequencing has been widely used for reconstruction of a variety of microbial genomes. Owing to the higher error rate, the assembled genome requires further error correction. Existing methods erase many of these errors via deep neural network trained from Nanopore reads. However, quite a few systematic errors are still left on the genome. This paper proposed a new model trained from homologous sequences extracted from closely-related genomes, which provides valuable features missed in Nanopore reads. The developed program (called Homopolish) outperforms the state-of-the-art Racon/Medaka and MarginPolish/HELEN pipelines in metagenomic and isolates of bacteria, viruses and fungi. When Homopolish is combined with Medaka or with HELEN, the genomes quality can exceed Q50 on R9.4 flowcells. The genome quality can be also improved on R10.3 flowcells (Q50-Q90). We proved that Nanopore-only sequencing can now produce high-quality genomes without the need of Illumina hybrid sequencing.


2017 ◽  
Author(s):  
Philippe Faucon ◽  
Robert Trevino ◽  
Parithi Balachandran ◽  
Kylie Standage-Beier ◽  
Xiao Wang

AbstractNanopore sequencing has introduced the ability to sequence long stretches of DNA, enabling the resolution of repeating segments, or paired SNPs across long stretches of DNA. Unfortunately significant error rates >15%, introduced through systematic and random noise inhibit downstream analysis. We propose a novel method, using unsupervised learning, to correct biologically amplified reads before downstream analysis proceeds. We also demonstrate that our method has performance comparable to existing techniques without limiting the detection of repeats, or the length of the input sequence.


2018 ◽  
Author(s):  
Denis Bertrand ◽  
Jim Shaw ◽  
Manesh Kalathiappan ◽  
Amanda Hui Qi Ng ◽  
Senthil Muthiah ◽  
...  

AbstractThe analysis of information rich whole-metagenome datasets acquired from complex microbial communities is often restricted by the fragmented nature of assembly from short-read sequencing. The availability of long-reads from third-generation sequencing technologies (e.g. PacBio or Oxford Nanopore) can help improve assembly quality in principle, but high error rates and low throughput have limited their application in metagenomics. In this work, we describe the first hybrid metagenomic assembler which combines the advantages of short and long-read technologies, providing an order of magnitude improvement in contiguity compared to short read assemblies, and high base-pair level accuracy. The proposed approach (OPERA-MS) integrates a novel assembly-based metagenome clustering technique with an exact scaffolding algorithm that can efficiently assemble repeat rich sequences. Based on evaluations with defined in vitro communities and virtual gut microbiomes, we show that it is possible to assemble near complete genomes from metagenomes with as little as 9× long read coverage, thus enabling high quality assembly of lowly abundant species (<1%). Furthermore, OPERA-MS’s fine-grained clustering is able to deconvolute and assemble multiple genomes of the same species in a single sample, allowing us to study the complex dynamics of the human microbiome at the sub-species level. Applying nanopore sequencing to gut metagenomes of patients undergoing antibiotic treatment, we show that long reads can be obtained from stool samples in clinical studies to produce more meaningful metagenomic assemblies (up to 200× improvement over short-read assemblies), including the closed assembly of >80 putative plasmid/phage sequences and a 263kbp jumbo phage. Our results highlight that high-quality hybrid assemblies provide an unprecedented view of the gut resistome in these patients, including strain dynamics and identification of novel plasmid sequences.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jin Young Lee ◽  
Minyoung Kong ◽  
Jinjoo Oh ◽  
JinSoo Lim ◽  
Sung Hee Chung ◽  
...  

AbstractAssembling high-quality microbial genomes using only cost-effective Nanopore long-read systems such as Flongle is important to accelerate research on the microbial genome and the most critical point for this is the polishing process. In this study, we performed an evaluation based on BUSCO and Prokka gene prediction in terms of microbial genome assembly for eight state-of-the-art Nanopore polishing tools and combinations available. In the evaluation of individual tools, Homopolish, PEPPER, and Medaka demonstrated better results than others. In combination polishing, the second round Homopolish, and the PEPPER × medaka combination also showed better results than others. However, individual tools and combinations have specific limitations on usage and results. Depending on the target organism and the purpose of the downstream research, it is confirmed that there remain some difficulties in perfectly replacing the hybrid polishing carried out by the addition of a short-read. Nevertheless, through continuous improvement of the protein pores, related base-calling algorithms, and polishing tools based on improved error models, a high-quality microbial genome can be achieved using only Nanopore reads without the production of additional short-read data. The polishing strategy proposed in this study is expected to provide useful information for assembling the microbial genome using only Nanopore reads depending on the target microorganism and the purpose of the research.


Author(s):  
Dimitrios Boursinos ◽  
Xenofon Koutsoukos

AbstractMachine learning components such as deep neural networks are used extensively in cyber-physical systems (CPS). However, such components may introduce new types of hazards that can have disastrous consequences and need to be addressed for engineering trustworthy systems. Although deep neural networks offer advanced capabilities, they must be complemented by engineering methods and practices that allow effective integration in CPS. In this paper, we proposed an approach for assurance monitoring of learning-enabled CPS based on the conformal prediction framework. In order to allow real-time assurance monitoring, the approach employs distance learning to transform high-dimensional inputs into lower size embedding representations. By leveraging conformal prediction, the approach provides well-calibrated confidence and ensures a bounded small error rate while limiting the number of inputs for which an accurate prediction cannot be made. We demonstrate the approach using three datasets of mobile robot following a wall, speaker recognition, and traffic sign recognition. The experimental results demonstrate that the error rates are well-calibrated while the number of alarms is very small. Furthermore, the method is computationally efficient and allows real-time assurance monitoring of CPS.


Processes ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 1241
Author(s):  
Véronique Gomes ◽  
Marco S. Reis ◽  
Francisco Rovira-Más ◽  
Ana Mendes-Ferreira ◽  
Pedro Melo-Pinto

The high quality of Port wine is the result of a sequence of winemaking operations, such as harvesting, maceration, fermentation, extraction and aging. These stages require proper monitoring and control, in order to consistently achieve the desired wine properties. The present work focuses on the harvesting stage, where the sugar content of grapes plays a key role as one of the critical maturity parameters. Our approach makes use of hyperspectral imaging technology to rapidly extract information from wine grape berries; the collected spectra are fed to machine learning algorithms that produce estimates of the sugar level. A consistent predictive capability is important for establishing the harvest date, as well as to select the best grapes to produce specific high-quality wines. We compared four different machine learning methods (including deep learning), assessing their generalization capacity for different vintages and varieties not included in the training process. Ridge regression, partial least squares, neural networks and convolutional neural networks were the methods considered to conduct this comparison. The results show that the estimated models can successfully predict the sugar content from hyperspectral data, with the convolutional neural network outperforming the other methods.


2021 ◽  
Vol 11 (13) ◽  
pp. 5931
Author(s):  
Ji’an You ◽  
Zhaozheng Hu ◽  
Chao Peng ◽  
Zhiqiang Wang

Large amounts of high-quality image data are the basis and premise of the high accuracy detection of objects in the field of convolutional neural networks (CNN). It is challenging to collect various high-quality ship image data based on the marine environment. A novel method based on CNN is proposed to generate a large number of high-quality ship images to address this. We obtained ship images with different perspectives and different sizes by adjusting the ships’ postures and sizes in three-dimensional (3D) simulation software, then 3D ship data were transformed into 2D ship image according to the principle of pinhole imaging. We selected specific experimental scenes as background images, and the target ships of the 2D ship images were superimposed onto the background images to generate “Simulation–Real” ship images (named SRS images hereafter). Additionally, an image annotation method based on SRS images was designed. Finally, the target detection algorithm based on CNN was used to train and test the generated SRS images. The proposed method is suitable for generating a large number of high-quality ship image samples and annotation data of corresponding ship images quickly to significantly improve the accuracy of ship detection. The annotation method proposed is superior to the annotation methods that label images with the image annotation software of Label-me and Label-img in terms of labeling the SRS images.


Agronomy ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 1342
Author(s):  
Shaghayegh Mehravi ◽  
Gholam Ali Ranjbar ◽  
Ghader Mirzaghaderi ◽  
Anita Alice Severn-Ellis ◽  
Armin Scheben ◽  
...  

The species of Pimpinella, one of the largest genera of the family Apiaceae, are traditionally cultivated for medicinal purposes. In this study, high-throughput double digest restriction-site associated DNA sequencing technology (ddRAD-seq) was used to identify single nucleotide polymorphisms (SNPs) in eight Pimpinella species from Iran. After double-digestion with the enzymes HpyCH4IV and HinfI, a total of 334,702,966 paired-end reads were de novo assembled into 1,270,791 loci with an average of 28.8 reads per locus. After stringent filtering, 2440 high-quality SNPs were identified for downstream analysis. Analysis of genetic relationships and population structure, based on these retained SNPs, indicated the presence of three major groups. Gene ontology and pathway analysis were determined by using comparison SNP-associated flanking sequences with a public non-redundant database. Due to the lack of genomic resources in this genus, our present study is the first report to provide high-quality SNPs in Pimpinella based on a de novo analysis pipeline using ddRAD-seq. This data will enhance the molecular knowledge of the genus Pimpinella and will provide an important source of information for breeders and the research community to enhance breeding programs and support the management of Pimpinella genomic resources.


2021 ◽  
Vol 11 (10) ◽  
pp. 4617
Author(s):  
Daehee Park ◽  
Cheoljun Lee

Because smartphones support various functions, they are carried by users everywhere. Whenever a user believes that a moment is interesting, important, or meaningful to them, they can record a video to preserve such memories. The main problem with video recording an important moment is the fact that the user needs to look at the scene through the mobile phone screen rather than seeing the actual real-world event. This occurs owing to uncertainty the user might feel when recording the video. For example, the user might not be sure if the recording is of high-quality and might worry about missing the target object. To overcome this, we developed a new camera application that utilizes two main algorithms, the minimum output sum of squared error and the histograms of oriented gradient algorithms, to track the target object and recognize the direction of the user’s head. We assumed that the functions of the new camera application can solve the user’s anxiety while recording a video. To test the effectiveness of the proposed application, we conducted a case study and measured the emotional responses of users and the error rates based on a comparison with the use of a regular camera application. The results indicate that the new camera application induces greater feelings of pleasure, excitement, and independence than a regular camera application. Furthermore, it effectively reduces the error rates during video recording.


2018 ◽  
Vol 20 (4) ◽  
pp. 1542-1559 ◽  
Author(s):  
Damla Senol Cali ◽  
Jeremie S Kim ◽  
Saugata Ghose ◽  
Can Alkan ◽  
Onur Mutlu

Abstract Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance, as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we (1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and (2) provide guidelines for determining the appropriate tools for each step. Based on our analyses, we make four key observations: (1) the choice of the tool for basecalling plays a critical role in overcoming the high error rates of nanopore sequencing technology. (2) Read-to-read overlap finding tools, GraphMap and Minimap, perform similarly in terms of accuracy. However, Minimap has a lower memory usage, and it is faster than GraphMap. (3) There is a trade-off between accuracy and performance when deciding on the appropriate tool for the assembly step. The fast but less accurate assembler Miniasm can be used for quick initial assembly, and further polishing can be applied on top of it to increase the accuracy, which leads to faster overall assembly. (4) The state-of-the-art polishing tool, Racon, generates high-quality consensus sequences while providing a significant speedup over another polishing tool, Nanopolish. We analyze various combinations of different tools and expose the trade-offs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious and effective choices for each step of the genome assembly pipeline using nanopore sequence data. Also, with the help of bottlenecks we have found, developers can improve the current tools or build new ones that are both accurate and fast, to overcome the high error rates of the nanopore sequencing technology.


Sign in / Sign up

Export Citation Format

Share Document