DeePaC: predicting pathogenic potential of novel DNA with reverse-complement neural networks

Bioinformatics ◽

10.1093/bioinformatics/btz541 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jakub M Bartoszewicz ◽

Anja Seidel ◽

Robert Rentzsch ◽

Bernhard Y Renard

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Pathogen Detection ◽

State Of The Art ◽

Supplementary Information ◽

Learning Approach ◽

Pathogenic Potential ◽

Reverse Complement ◽

Recent Developments

Abstract Motivation We expect novel pathogens to arise due to their fast-paced evolution, and new species to be discovered thanks to advances in DNA sequencing and metagenomics. Moreover, recent developments in synthetic biology raise concerns that some strains of bacteria could be modified for malicious purposes. Traditional approaches to open-view pathogen detection depend on databases of known organisms, which limits their performance on unknown, unrecognized and unmapped sequences. In contrast, machine learning methods can infer pathogenic phenotypes from single NGS reads, even though the biological context is unavailable. Results We present DeePaC, a Deep Learning Approach to Pathogenicity Classification. It includes a flexible framework allowing easy evaluation of neural architectures with reverse-complement parameter sharing. We show that convolutional neural networks and LSTMs outperform the state-of-the-art based on both sequence homology and machine learning. Combining a deep learning approach with integrating the predictions for both mates in a read pair results in cutting the error rate almost in half in comparison to the previous state-of-the-art. Availability and implementation The code and the models are available at: https://gitlab.com/rki_bioinformatics/DeePaC. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DeePaC: Predicting pathogenic potential of novel DNA with a universal framework for reverse-complement neural networks

10.1101/535286 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jakub M. Bartoszewicz ◽

Anja Seidel ◽

Robert Rentzsch ◽

Bernhard Y. Renard

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Pathogen Detection ◽

State Of The Art ◽

Model Performance ◽

Pathogenic Potential ◽

Reverse Complement ◽

Recent Developments ◽

Undesirable Property ◽

Simple Character

AbstractMotivation:We expect novel pathogens to arise due to their fast-paced evolution, and new species to be discovered thanks to advances in DNA sequencing and metagenomics. What is more, recent developments in synthetic biology raise concerns that some strains of bacteria could be modified for malicious purposes. Traditional approaches to open-view pathogen detection depend on databases of known organisms, limiting their performance on unknown, unrecognized, and unmapped sequences. In contrast, machine learning methods can infer pathogenic phenotypes from single NGS reads even though the biological context is unavailable. However, modern neural architectures treat DNA as a simple character string and may predict conflicting labels for a given sequence and its reverse-complement. This undesirable property may impact model performance.Results:We present DeePaC, a Deep Learning Approach to Pathogenicity Classification. It includes a universal, extensible framework for neural architectures ensuring identical predictions for any given DNA sequence and its reverse-complement. We implement reverse-complement convolutional neural networks and LSTMs, which outperform the state-of-the-art methods based on both sequence homology and machine learning. Combining a reverse-complement architecture with integrating the predictions for both mates in a read pair results in cutting the error rate almost in half in comparison to the previous state-of-the-art.Availability:The code and the models are available at: https://gitlab.com/rki_bioinformatics/DeePaC

Download Full-text

Virtual Screening Meets Deep Learning

Current Computer - Aided Drug Design ◽

10.2174/1573409914666181018141602 ◽

2018 ◽

Vol 15 (1) ◽

pp. 6-28 ◽

Cited By ~ 6

Author(s):

Javier Pérez-Sianes ◽

Horacio Pérez-Sánchez ◽

Fernando Díaz

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Deep Learning ◽

Virtual Screening ◽

Great Increase ◽

New Drugs ◽

Learning Approach ◽

Screening Strategies ◽

Computer Aided ◽

Recent Developments

Background: Automated compound testing is currently the de facto standard method for drug screening, but it has not brought the great increase in the number of new drugs that was expected. Computer- aided compounds search, known as Virtual Screening, has shown the benefits to this field as a complement or even alternative to the robotic drug discovery. There are different methods and approaches to address this problem and most of them are often included in one of the main screening strategies. Machine learning, however, has established itself as a virtual screening methodology in its own right and it may grow in popularity with the new trends on artificial intelligence. Objective: This paper will attempt to provide a comprehensive and structured review that collects the most important proposals made so far in this area of research. Particular attention is given to some recent developments carried out in the machine learning field: the deep learning approach, which is pointed out as a future key player in the virtual screening landscape.

Download Full-text

Representing Deep Neural Networks Latent Space Geometries with Graphs

Algorithms ◽

10.3390/a14020039 ◽

2021 ◽

Vol 14 (2) ◽

pp. 39

Author(s):

Carlos Lassance ◽

Vincent Gripon ◽

Antonio Ortega

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Objective Function ◽

Learning Process ◽

Deep Neural Networks ◽

State Of The Art ◽

The Core ◽

Learning Tasks ◽

Latent Space

Deep Learning (DL) has attracted a lot of attention for its ability to reach state-of-the-art performance in many machine learning tasks. The core principle of DL methods consists of training composite architectures in an end-to-end fashion, where inputs are associated with outputs trained to optimize an objective function. Because of their compositional nature, DL architectures naturally exhibit several intermediate representations of the inputs, which belong to so-called latent spaces. When treated individually, these intermediate representations are most of the time unconstrained during the learning process, as it is unclear which properties should be favored. However, when processing a batch of inputs concurrently, the corresponding set of intermediate representations exhibit relations (what we call a geometry) on which desired properties can be sought. In this work, we show that it is possible to introduce constraints on these latent geometries to address various problems. In more detail, we propose to represent geometries by constructing similarity graphs from the intermediate representations obtained when processing a batch of inputs. By constraining these Latent Geometry Graphs (LGGs), we address the three following problems: (i) reproducing the behavior of a teacher architecture is achieved by mimicking its geometry, (ii) designing efficient embeddings for classification is achieved by targeting specific geometries, and (iii) robustness to deviations on inputs is achieved via enforcing smooth variation of geometry between consecutive latent spaces. Using standard vision benchmarks, we demonstrate the ability of the proposed geometry-based methods in solving the considered problems.

Download Full-text

Single-Cell Phenotype Classification Using Deep Convolutional Neural Networks

CrossRef Listing of Deleted DOIs ◽

10.1177/1087057116631284 ◽

2016 ◽

Vol 21 (9) ◽

pp. 998-1003 ◽

Cited By ~ 42

Author(s):

Oliver Dürr ◽

Beate Sick

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Single Cell ◽

Convolutional Neural Networks ◽

State Of The Art ◽

Misclassification Rate ◽

Support Vector ◽

Learning Methods ◽

Phenotype Classification

Deep learning methods are currently outperforming traditional state-of-the-art computer vision algorithms in diverse applications and recently even surpassed human performance in object recognition. Here we demonstrate the potential of deep learning methods to high-content screening–based phenotype classification. We trained a deep learning classifier in the form of convolutional neural networks with approximately 40,000 publicly available single-cell images from samples treated with compounds from four classes known to lead to different phenotypes. The input data consisted of multichannel images. The construction of appropriate feature definitions was part of the training and carried out by the convolutional network, without the need for expert knowledge or handcrafted features. We compare our results against the recent state-of-the-art pipeline in which predefined features are extracted from each cell using specialized software and then fed into various machine learning algorithms (support vector machine, Fisher linear discriminant, random forest) for classification. The performance of all classification approaches is evaluated on an untouched test image set with known phenotype classes. Compared to the best reference machine learning algorithm, the misclassification rate is reduced from 8.9% to 6.6%.

Download Full-text

State of the Art in Computational Bioacoustics and Machine Learning: How far have we come?

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37227 ◽

2019 ◽

Vol 3 ◽

Author(s):

Dan Stowell

Keyword(s):

Machine Learning ◽

Big Data ◽

Deep Learning ◽

State Of The Art ◽

Learning Networks ◽

Recent Developments ◽

Audio Data ◽

Ecological Applications ◽

Near Future ◽

Statistical Ecology

Terrestrial bioacoustics, like many other domains, has recently witnessed some transformative results from the application of deep learning and big data (Stowell 2017, Mac Aodha et al. 2018, Fairbrass et al. 2018, Mercado III and Sturdy 2017). Generalising over specific projects, which bioacoustic tasks can we consider "solved"? What can we expect in the near future, and what remains hard to do? What does a bioacoustician need to understand about deep learning? This contribution will address these questions, giving the audience a concise summary of recent developments and ways forward. It builds on recent projects and evaluation campaigns led by the author (Stowell et al. 2015, Stowell et al. 2018), as well as broader developments in signal processing, machine learning and bioacoustic applications of these. We will discuss which type of deep learning networks are appropriate for audio data, how to address zoological/ecological applications which often have few available data, and issues in integrating deep learning predictions with existing workflows in statistical ecology.

Download Full-text

Assessment of bilateral knee pain from MR imaging using deep neural networks

10.1101/463497 ◽

2018 ◽

Author(s):

Gary H. Chang ◽

David T. Felson ◽

Shangran Qiu ◽

Terence D. Capellini ◽

Vijaya B. Kolachalama

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Neural Networks ◽

Deep Learning ◽

Knee Pain ◽

Deep Neural Networks ◽

Learning Approach ◽

Mr Images ◽

Bilateral Knee ◽

Machine Learning Approach

ABSTRACTBackground and objectiveIt remains difficult to characterize pain in knee joints with osteoarthritis solely by radiographic findings. We sought to understand how advanced machine learning methods such as deep neural networks can be used to analyze raw MRI scans and predict bilateral knee pain, independent of other risk factors.MethodsWe developed a deep learning framework to associate information from MRI slices taken from the left and right knees of subjects from the Osteoarthritis Initiative with bilateral knee pain. Model training was performed by first extracting features from two-dimensional (2D) sagittal intermediate-weighted turbo spin echo slices. The extracted features from all the 2D slices were subsequently combined to directly associate using a fused deep neural network with the output of interest as a binary classification problem.ResultsThe deep learning model resulted in predicting bilateral knee pain on test data with 70.1% mean accuracy, 51.3% mean sensitivity, and 81.6% mean specificity. Systematic analysis of the predictions on the test data revealed that the model performance was consistent across subjects of different Kellgren-Lawrence grades.ConclusionThe study demonstrates a proof of principle that a machine learning approach can be applied to associate MR images with bilateral knee pain.SIGNIFICANCE AND INNOVATIONKnee pain is typically considered as an early indicator of osteoarthritis (OA) risk. Emerging evidence suggests that MRI changes are linked to pre-clinical OA, thus underscoring the need for building image-based models to predict knee pain. We leveraged a state-of-the-art machine learning approach to associate raw MR images with bilateral knee pain, independent of other risk factors.

Download Full-text

A deep learning framework for real-time detection of novel pathogens during sequencing

10.1101/2021.01.26.428301 ◽

2021 ◽

Author(s):

Jakub M. Bartoszewicz ◽

Ulrich Genske ◽

Bernhard Y. Renard

Keyword(s):

Machine Learning ◽

Real Time ◽

Pathogen Detection ◽

Real Data ◽

True Positive Rate ◽

Turnaround Time ◽

Supplementary Information ◽

Learning Approaches ◽

Pathogenic Potential ◽

Link Type

AbstractMotivationNovel pathogens evolve quickly and may emerge rapidly, causing dangerous outbreaks or even global pandemics. Next-generation sequencing is the state-of-the art in open-view pathogen detection, and one of the few methods available at the earliest stages of an epidemic, even when the biological threat is unknown. Analyzing the samples as the sequencer is running can greatly reduce the turnaround time, but existing tools rely on close matches to lists of known pathogens and perform poorly on novel species. Machine learning approaches can predict if single reads originate from more distant, unknown pathogens, but require relatively long input sequences and processed data from a finished sequencing run.ResultsWe present DeePaC-Live, a Python package for real-time pathogenic potential prediction directly from incomplete sequencing reads. We train deep neural networks to classify Illumina and Nanopore reads and integrate our models with HiLive2, a real-time Illumina mapper. DeePaC-Live outperforms alternatives based on machine learning and sequence alignment on simulated and real data, including SARS-CoV-2 sequencing runs. After just 50 Illumina cycles, we increase the true positive rate 80-fold compared to the live-mapping approach. The first 250bp of Nanopore reads, corresponding to 0.5s of sequencing time, are enough to yield predictions more accurate than mapping the finished long reads. Our approach could also be used for screening synthetic sequences against biosecurity threats.AvailabilityThe code is available at: https://gitlab.com/dacs-hpi/deepac-live and https://gitlab.com/dacs-hpi/deepac. The package can be installed with Bioconda, Docker or [email protected], [email protected] informationSupplementary data are available online.

Download Full-text

Towards the Automatic Mathematician

Automated Deduction – CADE 28 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-79876-5_2 ◽

2021 ◽

pp. 25-37

Author(s):

Markus N. Rabe ◽

Christian Szegedy

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Mathematical Reasoning ◽

The Road ◽

Fine Grained ◽

Recent Developments ◽

On The Road

AbstractOver the recent years deep learning has found successful applications in mathematical reasoning. Today, we can predict fine-grained proof steps, relevant premises, and even useful conjectures using neural networks. This extended abstract summarizes recent developments of machine learning in mathematical reasoning and the vision of the N2Formal group at Google Research to create an automatic mathematician. The second part discusses the key challenges on the road ahead.

Download Full-text

Emotional quantification of soundscapes by learning between samples

Multimedia Tools and Applications ◽

10.1007/s11042-020-09430-3 ◽

2020 ◽

Vol 79 (41-42) ◽

pp. 30387-30395

Author(s):

Stavros Ntalampiras

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

State Of The Art ◽

Emotional Responses ◽

Machine Learning Techniques ◽

Specific Domain ◽

Learning Framework ◽

Learning Techniques ◽

Wide Range

Abstract Predicting the emotional responses of humans to soundscapes is a relatively recent field of research coming with a wide range of promising applications. This work presents the design of two convolutional neural networks, namely ArNet and ValNet, each one responsible for quantifying arousal and valence evoked by soundscapes. We build on the knowledge acquired from the application of traditional machine learning techniques on the specific domain, and design a suitable deep learning framework. Moreover, we propose the usage of artificially created mixed soundscapes, the distributions of which are located between the ones of the available samples, a process that increases the variance of the dataset leading to significantly better performance. The reported results outperform the state of the art on a soundscape dataset following Schafer’s standardized categorization considering both sound’s identity and the respective listening context.

Download Full-text

Levenshtein Augmentation Improves Performance of SMILES Based Deep-Learning Synthesis Prediction

10.26434/chemrxiv.12562121 ◽

2020 ◽

Author(s):

Dean Sumner ◽

Jiazhen He ◽

Amol Thakkar ◽

Ola Engkvist ◽

Esben Jannik Bjerrum

Keyword(s):

Neural Networks ◽

Pattern Recognition ◽

Deep Learning ◽

Recurrent Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Sequence Similarity ◽

Learning Models ◽

Underlying Network

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>

Download Full-text