Supervised Promoter Recognition: A benchmark framework

AbstractDeep learning has become a prevalent method in identifying genomic regulatory sequences such as promoters. In a number of recent papers, the performance of deep learning models have continually been reported as an improvement over alternatives for sequence-based promoter recognition. However, the performance improvements in these models do not account for the different datasets that models are being evaluated on. The lack of a consensus dataset and procedure for benchmarking purposes has made the comparison of each model’s true performance difficult to assess.We present a framework called Supervised Promoter Recognition Framework (‘SUPR REF’) capable of streamlining the complete process of training, validating, testing, and comparing promoter recognition models in a systematic manner. SUPR REF includes the creation of biologically relevant benchmark datasets to be used in the evaluation process of deep learning promoter recognition models. We showcase this framework by comparing the models’ performance on alternative datasets, and properly evaluate previously published models on new benchmark datasets. Our results show that the reliability of deep learning ab initio promoter recognition models on eukaryotic genomic sequences is still not at a sufficient level, as precision is severely lacking. Furthermore, given the observational nature of these data, cross-validation results from small datasets need to be interpreted with caution.AvailabilitySource code and documentation of the framework is available online at https://github.com/ivanpmartell/suprref

Download Full-text

Evaluation of Deep Learning Strategies for Nucleus Segmentation in Fluorescence Images

10.1101/335216 ◽

2018 ◽

Cited By ~ 13

Author(s):

Juan C. Caicedo ◽

Jonathan Roth ◽

Allen Goodman ◽

Tim Becker ◽

Kyle W Karhohs ◽

...

Keyword(s):

Deep Learning ◽

Learning Strategies ◽

Source Code ◽

Evaluation Framework ◽

Evaluation Methodology ◽

Biologically Relevant ◽

Recent Developments ◽

Classical Image ◽

Microscopy Images ◽

Nucleus Segmentation

Identifying nuclei is often a critical first step in analyzing microscopy images of cells, and classical image processing algorithms are most commonly used for this task. Recent developments in deep learning can yield superior accuracy, but typical evaluation metrics for nucleus segmentation do not satisfactorily capture error modes that are relevant in cellular images. We present an evaluation framework to measure accuracy, types of errors, and computational efficiency; and use it to compare deep learning strategies and classical approaches. We publicly release a set of 23,165 manually annotated nuclei and source code to reproduce experiments and run the proposed evaluation methodology. Our evaluation framework shows that deep learning improves accuracy and can reduce the number of biologically relevant errors by half.

Download Full-text

An Anatomy of a Hybrid Color Descriptor with a Neural Network Model to Enhance the Retrieval Accuracy of an Image Retrieval System

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666191122113801 ◽

2019 ◽

Vol 13 ◽

Author(s):

Shikha Bhardwaj ◽

Gitanjali Pandove ◽

Pawan Kumar Dahiya

Keyword(s):

Neural Network ◽

Deep Learning ◽

Image Retrieval ◽

Hybrid System ◽

Back Propagation ◽

Back Propagation Neural Network ◽

Retrieval Accuracy ◽

Color Descriptor ◽

Benchmark Datasets ◽

Color Moment

Background: In order to retrieve a particular image from vast repository of images, an efficient system is required and such an eminent system is well-known by the name Content-based image retrieval (CBIR) system. Color is indeed an important attribute of an image and the proposed system consist of a hybrid color descriptor which is used for color feature extraction. Deep learning, has gained a prominent importance in the current era. So, the performance of this fusion based color descriptor is also analyzed in the presence of Deep learning classifiers. Method: This paper describes a comparative experimental analysis on various color descriptors and the best two are chosen to form an efficient color based hybrid system denoted as combined color moment-color autocorrelogram (Co-CMCAC). Then, to increase the retrieval accuracy of the hybrid system, a Cascade forward back propagation neural network (CFBPNN) is used. The classification accuracy obtained by using CFBPNN is also compared to Patternnet neural network. Results: The results of the hybrid color descriptor depict that the proposed system has superior results of the order of 95.4%, 88.2%, 84.4% and 96.05% on Corel-1K, Corel-5K, Corel-10K and Oxford flower benchmark datasets respectively as compared to many state-of-the-art related techniques. Conclusion: This paper depict an experimental and analytical analysis on different color feature descriptors namely, Color moment (CM), Color auto-correlogram (CAC), Color histogram (CH), Color coherence vector (CCV) and Dominant color descriptor (DCD). The proposed hybrid color descriptor (Co-CMCAC) is utilized for the withdrawal of color features with Cascade forward back propagation neural network (CFBPNN) is used as a classifier on four benchmark datasets namely Corel-1K, Corel-5K and Corel-10K and Oxford flower.

Download Full-text

Literature survey of deep learning-based vulnerability analysis on source code

IET Software ◽

10.1049/iet-sen.2020.0084 ◽

2020 ◽

Vol 14 (6) ◽

pp. 654-664

Author(s):

Abubakar Omari Abdallah Semasaba ◽

Wei Zheng ◽

Xiaoxue Wu ◽

Samuel Akwasi Agyemang

Keyword(s):

Deep Learning ◽

Source Code ◽

Vulnerability Analysis ◽

Literature Survey

Download Full-text

Real-Time Environment Monitoring Using a Lightweight Image Super-Resolution Network

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18115890 ◽

2021 ◽

Vol 18 (11) ◽

pp. 5890

Author(s):

Qiang Yu ◽

Feiqiang Liu ◽

Long Xiao ◽

Zitao Liu ◽

Xiaomin Yang

Keyword(s):

Deep Learning ◽

Real Time ◽

Super Resolution ◽

Model Complexity ◽

Practical Application ◽

Single Image ◽

Feature Maps ◽

Benchmark Datasets ◽

Image Super Resolution ◽

Single Image Super Resolution

Deep-learning (DL)-based methods are of growing importance in the field of single image super-resolution (SISR). The practical application of these DL-based models is a remaining problem due to the requirement of heavy computation and huge storage resources. The powerful feature maps of hidden layers in convolutional neural networks (CNN) help the model learn useful information. However, there exists redundancy among feature maps, which can be further exploited. To address these issues, this paper proposes a lightweight efficient feature generating network (EFGN) for SISR by constructing the efficient feature generating block (EFGB). Specifically, the EFGB can conduct plain operations on the original features to produce more feature maps with parameters slightly increasing. With the help of these extra feature maps, the network can extract more useful information from low resolution (LR) images to reconstruct the desired high resolution (HR) images. Experiments conducted on the benchmark datasets demonstrate that the proposed EFGN can outperform other deep-learning based methods in most cases and possess relatively lower model complexity. Additionally, the running time measurement indicates the feasibility of real-time monitoring.

Download Full-text

Named Entity Recognition and Relation Extraction

ACM Computing Surveys ◽

10.1145/3445965 ◽

2021 ◽

Vol 54 (1) ◽

pp. 1-39

Author(s):

Zara Nasar ◽

Syed Waqar Jaffry ◽

Muhammad Kamran Malik

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Named Entity Recognition ◽

Relation Extraction ◽

The State ◽

Entity Recognition ◽

Joint Models ◽

Named Entity ◽

Textual Data ◽

Benchmark Datasets

With the advent of Web 2.0, there exist many online platforms that result in massive textual-data production. With ever-increasing textual data at hand, it is of immense importance to extract information nuggets from this data. One approach towards effective harnessing of this unstructured textual data could be its transformation into structured text. Hence, this study aims to present an overview of approaches that can be applied to extract key insights from textual data in a structured way. For this, Named Entity Recognition and Relation Extraction are being majorly addressed in this review study. The former deals with identification of named entities, and the latter deals with problem of extracting relation between set of entities. This study covers early approaches as well as the developments made up till now using machine learning models. Survey findings conclude that deep-learning-based hybrid and joint models are currently governing the state-of-the-art. It is also observed that annotated benchmark datasets for various textual-data generators such as Twitter and other social forums are not available. This scarcity of dataset has resulted into relatively less progress in these domains. Additionally, the majority of the state-of-the-art techniques are offline and computationally expensive. Last, with increasing focus on deep-learning frameworks, there is need to understand and explain the under-going processes in deep architectures.

Download Full-text

A Systematic Deep Learning Based Overhead Tracking and Counting System Using RGB-D Remote Cameras

Applied Sciences ◽

10.3390/app11125503 ◽

2021 ◽

Vol 11 (12) ◽

pp. 5503

Author(s):

Munkhjargal Gochoo ◽

Syeda Amna Rizwan ◽

Yazeed Yasin Ghadi ◽

Ahmad Jalal ◽

Kibum Kim

Keyword(s):

Deep Learning ◽

Human Head ◽

Point Clouds ◽

Head Tracking ◽

Complex Environments ◽

People Tracking ◽

Practical Applications ◽

Remote Cameras ◽

Video Frames ◽

Benchmark Datasets

Automatic head tracking and counting using depth imagery has various practical applications in security, logistics, queue management, space utilization and visitor counting. However, no currently available system can clearly distinguish between a human head and other objects in order to track and count people accurately. For this reason, we propose a novel system that can track people by monitoring their heads and shoulders in complex environments and also count the number of people entering and exiting the scene. Our system is split into six phases; at first, preprocessing is done by converting videos of a scene into frames and removing the background from the video frames. Second, heads are detected using Hough Circular Gradient Transform, and shoulders are detected by HOG based symmetry methods. Third, three robust features, namely, fused joint HOG-LBP, Energy based Point clouds and Fused intra-inter trajectories are extracted. Fourth, the Apriori-Association is implemented to select the best features. Fifth, deep learning is used for accurate people tracking. Finally, heads are counted using Cross-line judgment. The system was tested on three benchmark datasets: the PCDS dataset, the MICC people counting dataset and the GOTPD dataset and counting accuracy of 98.40%, 98%, and 99% respectively was achieved. Our system obtained remarkable results.

Download Full-text

NIMG-08. PREDICTION OF LOWER-GRADE GLIOMA MOLECULAR SUBTYPES USING DEEP LEARNING

Neuro-Oncology ◽

10.1093/neuonc/noaa215.621 ◽

2020 ◽

Vol 22 (Supplement_2) ◽

pp. ii148-ii148

Author(s):

Yoshihiro Muragaki ◽

Yutaka Matsui ◽

Takashi Maruyama ◽

Masayuki Nitta ◽

Taiichi Saito ◽

...

Keyword(s):

Deep Learning ◽

Cross Validation ◽

Molecular Subtype ◽

Learning Model ◽

Group Classification ◽

Training Dataset ◽

Lower Grade ◽

Test Dataset ◽

Ct Data ◽

Deep Learning Model

Abstract INTRODUCTION It is useful to know the molecular subtype of lower-grade gliomas (LGG) when deciding on a treatment strategy. This study aims to diagnose this preoperatively. METHODS A deep learning model was developed to predict the 3-group molecular subtype using multimodal data including magnetic resonance imaging (MRI), positron emission tomography (PET), and computed tomography (CT). The performance was evaluated using leave-one-out cross validation with a dataset containing information from 217 LGG patients. RESULTS The model performed best when the dataset contained MRI, PET, and CT data. The model could predict the molecular subtype with an accuracy of 96.6% for the training dataset and 68.7% for the test dataset. The model achieved test accuracies of 58.5%, 60.4%, and 59.4% when the dataset contained only MRI, MRI and PET, and MRI and CT data, respectively. The conventional method used to predict mutations in the isocitrate dehydrogenase (IDH) gene and the codeletion of chromosome arms 1p and 19q (1p/19q) sequentially had an overall accuracy of 65.9%. This is 2.8 percent point lower than the proposed method, which predicts the 3-group molecular subtype directly. CONCLUSIONS AND FUTURE PERSPECTIVE A deep learning model was developed to diagnose the molecular subtype preoperatively based on multi-modality data in order to predict the 3-group classification directly. Cross-validation showed that the proposed model had an overall accuracy of 68.7% for the test dataset. This is the first model to double the expected value for a 3-group classification problem, when predicting the LGG molecular subtype. We plan to apply the techniques of heat map and/or segmentation for an increase in prediction accuracy.

Download Full-text

Vulnerability Feature Extraction Model for Source Code Based on Deep Learning

10.1109/iccnea53019.2021.00016 ◽

2021 ◽

Author(s):

Zhengyuan Wang ◽

Junjun Guo ◽

Haonan Li

Keyword(s):

Feature Extraction ◽

Deep Learning ◽

Source Code ◽

Extraction Model

Download Full-text

Astrid

Proceedings of the VLDB Endowment ◽

10.14778/3436905.3436907 ◽

2020 ◽

Vol 14 (4) ◽

pp. 471-484

Author(s):

Suraj Shetiya ◽

Saravanan Thirumuruganathan ◽

Nick Koudas ◽

Gautam Das

Keyword(s):

Deep Learning ◽

Objective Function ◽

Pattern Matching ◽

Language Processing ◽

Language Model ◽

Language Models ◽

Selectivity Estimation ◽

Statistical Correlations ◽

Benchmark Datasets ◽

Traditional Approaches

Accurate selectivity estimation for string predicates is a long-standing research challenge in databases. Supporting pattern matching on strings (such as prefix, substring, and suffix) makes this problem much more challenging, thereby necessitating a dedicated study. Traditional approaches often build pruned summary data structures such as tries followed by selectivity estimation using statistical correlations. However, this produces insufficiently accurate cardinality estimates resulting in the selection of sub-optimal plans by the query optimizer. Recently proposed deep learning based approaches leverage techniques from natural language processing such as embeddings to encode the strings and use it to train a model. While this is an improvement over traditional approaches, there is a large scope for improvement. We propose Astrid, a framework for string selectivity estimation that synthesizes ideas from traditional and deep learning based approaches. We make two complementary contributions. First, we propose an embedding algorithm that is query-type (prefix, substring, and suffix) and selectivity aware. Consider three strings 'ab', 'abc' and 'abd' whose prefix frequencies are 1000, 800 and 100 respectively. Our approach would ensure that the embedding for 'ab' is closer to 'abc' than 'abd'. Second, we describe how neural language models could be used for selectivity estimation. While they work well for prefix queries, their performance for substring queries is sub-optimal. We modify the objective function of the neural language model so that it could be used for estimating selectivities of pattern matching queries. We also propose a novel and efficient algorithm for optimizing the new objective function. We conduct extensive experiments over benchmark datasets and show that our proposed approaches achieve state-of-the-art results.

Download Full-text

Epileptic Seizure Prediction Using Deep Transformer Model

International Journal of Neural Systems ◽

10.1142/s0129065721500581 ◽

2021 ◽

Author(s):

Abhijeet Bhattacharya ◽

Tanmay Baweja ◽

S. P. K. Karri

Keyword(s):

Signal Processing ◽

Deep Learning ◽

False Positive Rate ◽

Superior Performance ◽

Seizure Prediction ◽

Advantages And Disadvantages ◽

Positive Rate ◽

Benchmark Datasets ◽

Automated Screening ◽

Transformer Model

The electroencephalogram (EEG) is the most promising and efficient technique to study epilepsy and record all the electrical activity going in our brain. Automated screening of epilepsy through data-driven algorithms reduces the manual workload of doctors to diagnose epilepsy. New algorithms are biased either towards signal processing or deep learning, which holds subjective advantages and disadvantages. The proposed pipeline is an end-to-end automated seizure prediction framework with a Fourier transform feature extraction and deep learning-based transformer model, a blend of signal processing and deep learning — this imbibes the potential features to automatically identify the attentive regions in EEG signals for effective screening. The proposed pipeline has demonstrated superior performance on the benchmark dataset with average sensitivity and false-positive rate per hour (FPR/h) as 98.46%, 94.83% and 0.12439, 0, respectively. The proposed work shows great results on the benchmark datasets and a big potential for clinics as a support system with medical experts monitoring the patients.

Download Full-text