scholarly journals A deep learning framework for real-time detection of novel pathogens during sequencing

2021 ◽  
Author(s):  
Jakub M. Bartoszewicz ◽  
Ulrich Genske ◽  
Bernhard Y. Renard

AbstractMotivationNovel pathogens evolve quickly and may emerge rapidly, causing dangerous outbreaks or even global pandemics. Next-generation sequencing is the state-of-the art in open-view pathogen detection, and one of the few methods available at the earliest stages of an epidemic, even when the biological threat is unknown. Analyzing the samples as the sequencer is running can greatly reduce the turnaround time, but existing tools rely on close matches to lists of known pathogens and perform poorly on novel species. Machine learning approaches can predict if single reads originate from more distant, unknown pathogens, but require relatively long input sequences and processed data from a finished sequencing run.ResultsWe present DeePaC-Live, a Python package for real-time pathogenic potential prediction directly from incomplete sequencing reads. We train deep neural networks to classify Illumina and Nanopore reads and integrate our models with HiLive2, a real-time Illumina mapper. DeePaC-Live outperforms alternatives based on machine learning and sequence alignment on simulated and real data, including SARS-CoV-2 sequencing runs. After just 50 Illumina cycles, we increase the true positive rate 80-fold compared to the live-mapping approach. The first 250bp of Nanopore reads, corresponding to 0.5s of sequencing time, are enough to yield predictions more accurate than mapping the finished long reads. Our approach could also be used for screening synthetic sequences against biosecurity threats.AvailabilityThe code is available at: https://gitlab.com/dacs-hpi/deepac-live and https://gitlab.com/dacs-hpi/deepac. The package can be installed with Bioconda, Docker or [email protected], [email protected] informationSupplementary data are available online.

Author(s):  
Jakub M Bartoszewicz ◽  
Anja Seidel ◽  
Robert Rentzsch ◽  
Bernhard Y Renard

Abstract Motivation We expect novel pathogens to arise due to their fast-paced evolution, and new species to be discovered thanks to advances in DNA sequencing and metagenomics. Moreover, recent developments in synthetic biology raise concerns that some strains of bacteria could be modified for malicious purposes. Traditional approaches to open-view pathogen detection depend on databases of known organisms, which limits their performance on unknown, unrecognized and unmapped sequences. In contrast, machine learning methods can infer pathogenic phenotypes from single NGS reads, even though the biological context is unavailable. Results We present DeePaC, a Deep Learning Approach to Pathogenicity Classification. It includes a flexible framework allowing easy evaluation of neural architectures with reverse-complement parameter sharing. We show that convolutional neural networks and LSTMs outperform the state-of-the-art based on both sequence homology and machine learning. Combining a deep learning approach with integrating the predictions for both mates in a read pair results in cutting the error rate almost in half in comparison to the previous state-of-the-art. Availability and implementation The code and the models are available at: https://gitlab.com/rki_bioinformatics/DeePaC. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Zhi Zhang ◽  
Dagang Wang ◽  
Jianxiu Qiu ◽  
Jinxin Zhu ◽  
Tingli Wang

AbstractThe Global Precipitation Measurement (GPM) mission provides satellite precipitation products with an unprecedented spatio-temporal resolution and spatial coverage. However, its near-real-time (NRT) product still suffers from low accuracy. This study aims to improve the early run of the Integrated Multi-satellitE Retrievals for GPM (IMERG) by using four machine learning approaches, i.e., support vector machine (SVM), random forest (RF), artificial neural network (ANN), and Extreme Gradient Boosting (XGB). The cloud properties are selected as the predictors in addition to the original IMERG in these approaches. All the four approaches show similar improvement, with 53%-60% reduction of root-mean-square error (RMSE) compared with the original IMERG in a humid area, i.e., the Dongjiang River Basin (DJR) in southeastern China. The improvements are even greater in a semi-arid area, i.e., the Fenhe River Basin (FHR) in central China, the RMSE reduction ranges from 63%-66%. The products generated by the machine learning methods performs similarly to or even outperform than the final run of IMERG. Feature importance analysis, a technique to evaluate input features based on how useful they are in predicting a target variable, indicates that the cloud height and the brightness temperature are the most useful information in improving satellite precipitation products, followed by the atmospheric reflectivity and the surface temperature. This study shows that a more accurate NRT precipitation product can be produced by combining machine learning approaches and cloud information, which is of importance for hydrological applications that requires NRT precipitation information including flood monitoring.


Author(s):  
Sachin Kumar ◽  
Karan Veer

Aims: The objective of this research is to predict the covid-19 cases in India based on the machine learning approaches. Background: Covid-19, a respiratory disease caused by one of the coronavirus family members, has led to a pandemic situation worldwide in 2020. This virus was detected firstly in Wuhan city of China in December 2019. This viral disease has taken less than three months to spread across the globe. Objective: In this paper, we proposed a regression model based on the Support vector machine (SVM) to forecast the number of deaths, the number of recovered cases, and total confirmed cases for the next 30 days. Method: For prediction, the data is collected from Github and the ministry of India's health and family welfare from March 14, 2020, to December 3, 2020. The model has been designed in Python 3.6 in Anaconda to forecast the forecasting value of corona trends until September 21, 2020. The proposed methodology is based on the prediction of values using SVM based regression model with polynomial, linear, rbf kernel. The dataset has been divided into train and test datasets with 40% and 60% test size and verified with real data. The model performance parameters are evaluated as a mean square error, mean absolute error, and percentage accuracy. Results and Conclusion: The results show that the polynomial model has obtained 95 % above accuracy score, linear scored above 90%, and rbf scored above 85% in predicting cumulative death, conformed cases, and recovered cases.


Author(s):  
Mamata Rath ◽  
Sushruta Mishra

Machine learning is a field that is developed out of artificial intelligence (AI). Applying AI, we needed to manufacture better and keen machines. Be that as it may, aside from a couple of simple errands, for example, finding the briefest way between two points, it isn't to program more mind boggling and continually developing difficulties. There was an acknowledgment that the best way to have the capacity to accomplish this undertaking was to give machines a chance to gain from itself. This sounds like a youngster learning from itself. So, machine learning was produced as another capacity for computers. Also, machine learning is available in such huge numbers of sections of technology that we don't understand it while utilizing it. This chapter explores advanced-level security in network and real-time applications using machine learning.


2017 ◽  
Author(s):  
Sebastian Deorowicz ◽  
Agnieszka Debudaj-Grabysz ◽  
Adam Gudyś ◽  
Szymon Grabowski

AbstractMotivationMapping reads to a reference genome is often the first step in a sequencing data analysis pipeline. Mistakes made at this computationally challenging stage cannot be recovered easily.ResultsWe present Whisper, an accurate and high-performant mapping tool, based on the idea of sorting reads and then mapping them against suffix arrays for the reference genome and its reverse complement. Employing task and data parallelism as well as storing temporary data on disk result in superior time efficiency at reasonable memory requirements. Whisper excels at large NGS read collections, in particular Illumina reads with typical WGS coverage. The experiments with real data indicate that our solution works in about 15% of the time needed by the well-known Bowtie2 and BWA-MEM tools at a comparable accuracy (validated in variant calling pipeline).AvailabilityWhisper is available for free from https://github.com/refresh-bio/Whisper or http://sun.aei.polsl.pl/REFRESH/Whisper/[email protected] informationSupplementary data are available at publisher Web site.


2017 ◽  
Vol 3 (10) ◽  
Author(s):  
Anjum Khan ◽  
Anjana Nigam

 As the network primarily based applications are growing quickly, the network security mechanisms need a lot of attention to enhance speed and preciseness. The ever evolving new intrusion types cause a significant threat to network security. Though varied network security tools are developed, however the quick growth of intrusive activities continues to be a significant issue. Intrusion detection systems (IDSs) are wont to detect intrusive activities on the network. Analysis showed that application of machine learning techniques in intrusion detection might reach high detection rate. Machine learning and classification algorithms facilitate to design “Intrusion Detection Models” which might classify the network traffic into intrusive or traditional traffic. This paper discusses some usually used machine learning techniques in Intrusion Detection System and conjointly reviews a number of the prevailing machine learning IDS proposed by researchers at different times. in this paper an experimental analysis is performed to demonstrate the performance analysis of some existing techniques in order that they will be used further in developing Hybrid Classifier for real data packets classification. The given result analysis shows that KNN, RF and SVM performs best for NSL-KDD dataset.


Author(s):  
Zhixiang Chen ◽  
Binhai Zhu ◽  
Xiannong Meng

In this chapter, machine-learning approaches to real-time intelligent Web search are discussed. The goal is to build an intelligent Web search system that can find the user’s desired information with as little relevance feedback from the user as possible. The system can achieve a significant search precision increase with a small number of iterations of user relevance feedback. A new machine-learning algorithm is designed as the core of the intelligent search component. This algorithm is applied to three different search engines with different emphases. This chapter presents the algorithm, the architectures, and the performances of these search engines. Future research issues regarding real-time intelligent Web search are also discussed.


Sign in / Sign up

Export Citation Format

Share Document