PEDLA: predicting enhancers with a deep learning-based algorithmic framework

AbstractTranscriptional enhancers are non-coding segments of DNA that play a central role in the spatiotemporal regulation of gene expression programs. However, systematically and precisely predicting enhancers remain a major challenge. Although existing methods have achieved some success in enhancer prediction, they still suffer from many issues. We developed a deep learning-based algorithmic framework named PEDLA (https://github.com/wenjiegroup/PEDLA), which can directly learn an enhancer predictor from massively heterogeneous data and generalize in ways that are mostly consistent across various cell types/tissues. We first trained PEDLA with 1,114-dimensional heterogeneous features in H1 cells, and we demonstrated that our PEDLA framework integrates diverse heterogeneous features and gives state-of-the-art performance relative to five existing methods for enhancer prediction. We further extended PEDLA to iteratively learn from 22 training cell types/tissues. Our results showed that PEDLA manifested superior performance consistency in both training and independent test sets. On average, PEDLA achieved 95.0% accuracy and a 96.8% geometric mean (GM) across 22 training cell types/tissues, as well as 95.7% accuracy and a 96.8% GM across 20 independent test cell types/tissues. Together, our work illustrates the power of harnessing state-of-the-art deep learning techniques to consistently identify regulatory elements at a genome-wide scale from massively heterogeneous data across diverse cell types/tissues.

Download Full-text

Classification of Hyperspectral Image Based on Double-Branch Dual-Attention Mechanism Network

Remote Sensing ◽

10.3390/rs12030582 ◽

2020 ◽

Vol 12 (3) ◽

pp. 582 ◽

Cited By ~ 4

Author(s):

Rui Li ◽

Shunyi Zheng ◽

Chenxi Duan ◽

Yang Yang ◽

Xiqi Wang

Keyword(s):

Deep Learning ◽

Hyperspectral Image ◽

State Of The Art ◽

Attention Mechanism ◽

Superior Performance ◽

Feature Maps ◽

Spatial Features ◽

Training Samples ◽

Series Of Experiments

In recent years, researchers have paid increasing attention on hyperspectral image (HSI) classification using deep learning methods. To improve the accuracy and reduce the training samples, we propose a double-branch dual-attention mechanism network (DBDA) for HSI classification in this paper. Two branches are designed in DBDA to capture plenty of spectral and spatial features contained in HSI. Furthermore, a channel attention block and a spatial attention block are applied to these two branches respectively, which enables DBDA to refine and optimize the extracted feature maps. A series of experiments on four hyperspectral datasets show that the proposed framework has superior performance to the state-of-the-art algorithm, especially when the training samples are signally lacking.

Download Full-text

SeqEnhDL: sequence-based classification of cell type-specific enhancers using deep learning models

10.21203/rs.3.rs-94396/v1 ◽

2020 ◽

Author(s):

Yupeng Wang ◽

Rosario Jaime-Lara ◽

Abhrarup Roy ◽

Ying Sun ◽

Xinyue Liu ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Cell Types ◽

Regulatory Elements ◽

Learning Models ◽

Cell Type ◽

Coding Sequences ◽

Sequence Features ◽

A Genome ◽

Cell Type Specific

Abstract ObjectiveComputational identification of cell type-specific regulatory elements on a genome-wide scale is very challenging.ResultsWe propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of “strong enhancer” chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, sequential k-mer (k=5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers including gkm-SVM and DanQ, with regard to distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL is able to directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified according to their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL.

Download Full-text

GraphProt2: A novel deep learning-based method for predicting binding sites of RNA-binding proteins

10.1101/850024 ◽

2019 ◽

Cited By ~ 2

Author(s):

Michael Uhl ◽

Van Dinh Tran ◽

Rolf Backofen

Keyword(s):

Gene Expression ◽

Neural Networks ◽

Deep Learning ◽

Binding Sites ◽

Binding Proteins ◽

Rna Binding ◽

Rna Binding Proteins ◽

State Of The Art ◽

Prediction Method ◽

Superior Performance

AbstractCLIP-seq is the state-of-the-art technique to experimentally determine transcriptome-wide binding sites of RNA-binding proteins (RBPs). However, it relies on gene expression which can be highly variable between conditions, and thus cannot provide a complete picture of the RBP binding landscape. This necessitates the use of computational methods to predict missing binding sites. Here we present GraphProt2, a computational RBP binding site prediction method based on graph convolutional neural networks (GCN). In contrast to current CNN methods, GraphProt2 supports variable length input as well as the possibility to accurately predict nucleotide-wise binding profiles. We demonstrate its superior performance compared to GraphProt and a CNN-based method on single as well as combined CLIP-seq datasets.

Download Full-text

Continuous Training and Deployment of Deep Learning Models

Datenbank-Spektrum ◽

10.1007/s13222-021-00386-8 ◽

2021 ◽

Author(s):

Ioannis Prapas ◽

Behrouz Derakhshan ◽

Alireza Rezaei Mahdiraji ◽

Volker Markl

Keyword(s):

Deep Learning ◽

Historical Data ◽

State Of The Art ◽

Streaming Data ◽

Superior Performance ◽

Learning Models ◽

Model Quality ◽

Continuous Training ◽

Training Time ◽

Machine Learning Methods

AbstractDeep Learning (DL) has consistently surpassed other Machine Learning methods and achieved state-of-the-art performance in multiple cases. Several modern applications like financial and recommender systems require models that are constantly updated with fresh data. The prominent approach for keeping a DL model fresh is to trigger full retraining from scratch when enough new data are available. However, retraining large and complex DL models is time-consuming and compute-intensive. This makes full retraining costly, wasteful, and slow. In this paper, we present an approach to continuously train and deploy DL models. First, we enable continuous training through proactive training that combines samples of historical data with new streaming data. Second, we enable continuous deployment through gradient sparsification that allows us to send a small percentage of the model updates per training iteration. Our experimental results with LeNet5 on MNIST and modern DL models on CIFAR-10 show that proactive training keeps models fresh with comparable—if not superior—performance to full retraining at a fraction of the time. Combined with gradient sparsification, sparse proactive training enables very fast updates of a deployed model with arbitrarily large sparsity, reducing communication per iteration up to four orders of magnitude, with minimal—if any—losses in model quality. Sparse training, however, comes at a price; it incurs overhead on the training that depends on the size of the model and increases the training time by factors ranging from 1.25 to 3 in our experiments. Arguably, a small price to pay for successfully enabling the continuous training and deployment of large DL models.

Download Full-text

DNC4mC-Deep: Identification and Analysis of DNA N4-Methylcytosine Sites Based on Different Encoding Schemes By Using Deep Learning

Cells ◽

10.3390/cells9081756 ◽

2020 ◽

Vol 9 (8) ◽

pp. 1756 ◽

Cited By ~ 4

Author(s):

Abdul Wahab ◽

Omid Mahmoudi ◽

Jeehong Kim ◽

Kil To Chong

Keyword(s):

Deep Learning ◽

Protein Interactions ◽

State Of The Art ◽

Critical Role ◽

Regulation Of Gene Expression ◽

The State ◽

Superior Performance ◽

Training Dataset ◽

Conformation Stability ◽

Deep Learning Model

N4-methylcytosine as one kind of modification of DNA has a critical role which alters genetic performance such as protein interactions, conformation, stability in DNA as well as the regulation of gene expression same cell developmental and genomic imprinting. Some different 4mC site identifiers have been proposed for various species. Herein, we proposed a computational model, DNC4mC-Deep, including six encoding techniques plus a deep learning model to predict 4mC sites in the genome of F. vesca, R. chinensis, and Cross-species dataset. It was demonstrated by the 10-fold cross-validation test to get superior performance. The DNC4mC-Deep obtained 0.829 and 0.929 of MCC on F. vesca and R. chinensis training dataset, respectively, and 0.814 on cross-species. This means the proposed method outperforms the state-of-the-art predictors at least 0.284 and 0.265 on F. vesca and R. chinensis training dataset in turn. Furthermore, the DNC4mC-Deep achieved 0.635 and 0.565 of MCC on F. vesca and R. chinensis independent dataset, respectively, and 0.562 on cross-species which shows it can achieve the best performance to predict 4mC sites as compared to the state-of-the-art predictor.

Download Full-text

A Deep Learning Framework for Malware Classification

International Journal of Digital Crime and Forensics ◽

10.4018/ijdcf.2020010105 ◽

2020 ◽

Vol 12 (1) ◽

pp. 90-108

Author(s):

Mahmoud Kalash ◽

Mrigank Rochan ◽

Noman Mohammed ◽

Neil Bruce ◽

Yang Wang ◽

...

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Superior Performance ◽

Traditional Learning ◽

Security Threats ◽

Learning Approaches ◽

Learning Framework ◽

Malware Classification ◽

New Strategies

In this article, the authors propose a deep learning framework for malware classification. There has been a huge increase in the volume of malware in recent years which poses serious security threats to financial institutions, businesses, and individuals. In order to combat the proliferation of malware, new strategies are essential to quickly identify and classify malware samples. Nowadays, machine learning approaches are becoming popular for malware classification. However, most of these approaches are based on shallow learning algorithms (e.g. SVM). Recently, convolutional neural networks (CNNs), a deep learning approach, have shown superior performance compared to traditional learning algorithms, especially in tasks such as image classification. Inspired by this, the authors propose a CNN-based architecture to classify malware samples. They convert malware binaries to grayscale images and subsequently train a CNN for classification. Experiments on two challenging malware classification datasets, namely Malimg and Microsoft, demonstrate that their method outperforms competing state-of-the-art algorithms.

Download Full-text

Automated recognition of ultrasound cardiac views based on deep learning with graph constraint

10.1101/2020.05.07.20094045 ◽

2020 ◽

Author(s):

Yanhua Gao ◽

Yuan Zhu ◽

Bo Liu ◽

Yue Hu ◽

Youmin Guo

Keyword(s):

Deep Learning ◽

Cardiac Cycle ◽

State Of The Art ◽

The State ◽

Automated Recognition ◽

Cardiac Image ◽

Independent Test ◽

The Mean ◽

Shape Changes ◽

Generalization Accuracy

ObjectiveIn Transthoracic echocardiographic (TTE) examination, it is essential to identify the cardiac views accurately. Computer-aided recognition is expected to improve the accuracy of the TTE examination.MethodsThis paper proposes a new method for automatic recognition of cardiac views based on deep learning, including three strategies. First, A spatial transform network is performed to learn cardiac shape changes during the cardiac cycle, which reduces intra-class variability. Second, a channel attention mechanism is introduced to adaptively recalibrates channel-wise feature responses. Finally, unlike conventional deep learning methods, which learned each input images individually, the structured signals are applied by a graph of similarities among images. These signals are transformed into the graph-based image embedding, which act as unsupervised regularization constraints to improve the generalization accuracy.ResultsThe proposed method was trained and tested in 171792 cardiac images from 584 subjects. Compared with the known result of the state of the art, the overall accuracy of the proposed method on cardiac image classification is 99.10% vs. 91.7%, and the mean AUC is 99.36%. Moreover, the overall accuracy is 98.15%, and the mean AUC is 98.96% on an independent test set with 34211 images from 100 subjects.ConclusionThe method of this paper achieved the results of the state of the art, which is expected to be an automated recognition tool for cardiac views recognition. The work confirms the potential of deep learning on ultrasound medicine.

Download Full-text

MetaRNN: Differentiating Rare Pathogenic and Rare Benign Missense SNVs and InDels Using Deep Learning

10.1101/2021.04.09.438706 ◽

2021 ◽

Author(s):

Chang Li ◽

Degui Zhi ◽

Kai Wang ◽

Xiaoming Liu

Keyword(s):

Deep Learning ◽

Prediction Models ◽

State Of The Art ◽

Single Nucleotide Variants ◽

Score Distribution ◽

Single Nucleotide ◽

Pathogenicity Prediction ◽

New Models ◽

Independent Test

We present the pathogenicity prediction models MetaRNN and MetaRNN-indel to help identify and prioritize rare nonsynonymous single nucleotide variants (nsSNVs) and non-frameshift insertion/deletions (nfINDELs) using deep learning and context annotations. Employing independent test datasets, we demonstrate that these new models outperform state-of-the-art competitors and achieve a more interpretable score distribution. MetaRNN executables and precomputed scores are available at http://www.liulab.science/MetaRNN.

Download Full-text

MarkerCapsule: Explainable Single Cell Typing using Capsule Networks

10.1101/2020.09.22.307512 ◽

2020 ◽

Author(s):

Sumanta Ray ◽

Alexander Schönhuth

Keyword(s):

Single Cell ◽

State Of The Art ◽

Activity Patterns ◽

Cell Types ◽

Heterogeneous Data ◽

The State ◽

Manual Annotation ◽

Human Knowledge ◽

Typing Methods ◽

Cell Typing

ABSTRACTMany single cell typing methods require manual annotation which casts problems with respect to resolution of (sub-)types, manpower resources and bias towards existing human knowledge. The integration of heterogeneous data and biologically meaningful interpretation of results are further current key challenges. We introduce MarkerCapsule, which leverages the landmark advantages of capsule networks achieved in their original applications in single cell typing. Thereby, the small amount of labeled data required and the naturally arising, biologically meaningful interpretation of cell types in terms of characteristic gene activity patterns are exemplary strengths, beyond outperforming the state of the art in terms of basic typing accuracy. MarkerCapsule is available at: https://github.com/sumantaray/MarkerCapsule.

Download Full-text

Applications of Deep Learning for Dense Scenes Analysis in Agriculture: A Review

Sensors ◽

10.3390/s20051520 ◽

2020 ◽

Vol 20 (5) ◽

pp. 1520 ◽

Cited By ~ 5

Author(s):

Qian Zhang ◽

Yeqi Liu ◽

Chuanyang Gong ◽

Yingyi Chen ◽

Huihui Yu

Keyword(s):

Deep Learning ◽

Language Processing ◽

Deep Neural Networks ◽

State Of The Art ◽

Semantic Segmentation ◽

Scene Analysis ◽

Superior Performance ◽

Learning Technology ◽

Future Work ◽

Modern Image

Deep Learning (DL) is the state-of-the-art machine learning technology, which shows superior performance in computer vision, bioinformatics, natural language processing, and other areas. Especially as a modern image processing technology, DL has been successfully applied in various tasks, such as object detection, semantic segmentation, and scene analysis. However, with the increase of dense scenes in reality, due to severe occlusions, and small size of objects, the analysis of dense scenes becomes particularly challenging. To overcome these problems, DL recently has been increasingly applied to dense scenes and has begun to be used in dense agricultural scenes. The purpose of this review is to explore the applications of DL for dense scenes analysis in agriculture. In order to better elaborate the topic, we first describe the types of dense scenes in agriculture, as well as the challenges. Next, we introduce various popular deep neural networks used in these dense scenes. Then, the applications of these structures in various agricultural tasks are comprehensively introduced in this review, including recognition and classification, detection, counting and yield estimation. Finally, the surveyed DL applications, limitations and the future work for analysis of dense images in agriculture are summarized.

Download Full-text