Compound dataset and custom code for deep generative multi-target compound design

Aim: Generating a data and software infrastructure for evaluating multi-target compound (MT-CPD) design via deep generative modeling. Methodology: The REINVENT 2.0 approach for generative modeling was extended for MT-CPD design and a large benchmark data set was curated. Exemplary results & data: Proof-of-concept for deep generative MT-CPD design was established. Custom code and the benchmark set comprising 2809 MT-CPDs, 61,928 single-target and 295,395 inactive compounds from biological screens are made freely available. Limitations & next steps: MT-CPD design via deep learning is still at its conceptual stages. It will be required to demonstrate experimental impact. The data and software we provide enable further investigation of MT-CPD design and generation of candidate molecules for experimental programs.

Download Full-text

Subtype-GAN: a deep learning approach for integrative cancer subtyping of multi-omics data

Bioinformatics ◽

10.1093/bioinformatics/btab109 ◽

2021 ◽

Author(s):

Hai Yang ◽

Rui Chen ◽

Dongdong Li ◽

Zhe Wang

Keyword(s):

Neural Network ◽

Deep Learning ◽

Latent Variables ◽

Supplementary Information ◽

Learning Approach ◽

Data Sets ◽

Omics Data ◽

Data Set ◽

Benchmark Data ◽

Cancer Pathogenesis

Abstract Motivation The discovery of cancer subtyping can help explore cancer pathogenesis, determine clinical actionability in treatment, and improve patients' survival rates. However, due to the diversity and complexity of multi-omics data, it is still challenging to develop integrated clustering algorithms for tumor molecular subtyping. Results We propose Subtype-GAN, a deep adversarial learning approach based on the multiple-input multiple-output neural network to model the complex omics data accurately. With the latent variables extracted from the neural network, Subtype-GAN uses consensus clustering and the Gaussian Mixture model to identify tumor samples' molecular subtypes. Compared with other state-of-the-art subtyping approaches, Subtype-GAN achieved outstanding performance on the benchmark data sets consisting of ∼4,000 TCGA tumors from 10 types of cancer. We found that on the comparison data set, the clustering scheme of Subtype-GAN is not always similar to that of the deep learning method AE but is identical to that of NEMO, MCCA, VAE, and other excellent approaches. Finally, we applied Subtype-GAN to the BRCA data set and automatically obtained the number of subtypes and the subtype labels of 1031 BRCA tumors. Through the detailed analysis, we found that the identified subtypes are clinically meaningful and show distinct patterns in the feature space, demonstrating the practicality of Subtype-GAN. Availability The source codes, the clustering results of Subtype-GAN across the benchmark data sets are available at https://github.com/haiyang1986/Subtype-GAN. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A Benchmark Data Set and Evaluation of Deep Learning Architectures for Ball Detection in the RoboCup SPL

RoboCup 2017: Robot World Cup XXI - Lecture Notes in Computer Science ◽

10.1007/978-3-030-00308-1_33 ◽

2018 ◽

pp. 398-409 ◽

Cited By ~ 5

Author(s):

Simon O’Keeffe ◽

Rudi Villing

Keyword(s):

Deep Learning ◽

Data Set ◽

Benchmark Data ◽

Learning Architectures ◽

Ball Detection

Download Full-text

Deep Learning Approaches for Whiteboard Image Quality Enhancement

Color and Imaging Conference ◽

10.2352/j.imagingsci.technol.2019.63.4.040404 ◽

2019 ◽

Vol 2019 (1) ◽

pp. 360-368

Author(s):

Mekides Assefa Abebe ◽

Jon Yngve Hardeberg

Keyword(s):

Deep Learning ◽

Image Quality ◽

Image Data ◽

Quality Enhancement ◽

Network Architectures ◽

Learning Approaches ◽

Data Set ◽

Image Quality Enhancement ◽

Processing Techniques ◽

White Balancing

Different whiteboard image degradations highly reduce the legibility of pen-stroke content as well as the overall quality of the images. Consequently, different researchers addressed the problem through different image enhancement techniques. Most of the state-of-the-art approaches applied common image processing techniques such as background foreground segmentation, text extraction, contrast and color enhancements and white balancing. However, such types of conventional enhancement methods are incapable of recovering severely degraded pen-stroke contents and produce artifacts in the presence of complex pen-stroke illustrations. In order to surmount such problems, the authors have proposed a deep learning based solution. They have contributed a new whiteboard image data set and adopted two deep convolutional neural network architectures for whiteboard image quality enhancement applications. Their different evaluations of the trained models demonstrated their superior performances over the conventional methods.

Download Full-text

A Deep Learning based Arabic Script Recognition System: Benchmark on KHAT

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/3/3 ◽

2020 ◽

Vol 17 (3) ◽

pp. 299-305 ◽

Cited By ~ 1

Author(s):

Riaz Ahmad ◽

Saeeda Naz ◽

Muhammad Afzal ◽

Sheikh Rashid ◽

Marcus Liwicki ◽

...

Keyword(s):

Deep Learning ◽

Character Recognition ◽

Data Augmentation ◽

Short Term Memory ◽

Recognition System ◽

Learning Approach ◽

Arabic Text ◽

Data Set ◽

Processing Step ◽

Handwritten Arabic

This paper presents a deep learning benchmark on a complex dataset known as KFUPM Handwritten Arabic TexT (KHATT). The KHATT data-set consists of complex patterns of handwritten Arabic text-lines. This paper contributes mainly in three aspects i.e., (1) pre-processing, (2) deep learning based approach, and (3) data-augmentation. The pre-processing step includes pruning of white extra spaces plus de-skewing the skewed text-lines. We deploy a deep learning approach based on Multi-Dimensional Long Short-Term Memory (MDLSTM) networks and Connectionist Temporal Classification (CTC). The MDLSTM has the advantage of scanning the Arabic text-lines in all directions (horizontal and vertical) to cover dots, diacritics, strokes and fine inflammation. The data-augmentation with a deep learning approach proves to achieve better and promising improvement in results by gaining 80.02% Character Recognition (CR) over 75.08% as baseline.

Download Full-text

Human Activity Recognition using Fourier Transform Inspired Deep Learning Combination Model

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327908666180727123657 ◽

2019 ◽

Vol 9 (1) ◽

pp. 16-31

Author(s):

Kyungkoo Jun

Keyword(s):

Fourier Transform ◽

Deep Learning ◽

Short Term Memory ◽

Window Size ◽

Sensor Data ◽

Data Sets ◽

Data Set ◽

Proposed Model ◽

Testing Data ◽

Labeling Scheme

Background & Objective: This paper proposes a Fourier transform inspired method to classify human activities from time series sensor data. Methods: Our method begins by decomposing 1D input signal into 2D patterns, which is motivated by the Fourier conversion. The decomposition is helped by Long Short-Term Memory (LSTM) which captures the temporal dependency from the signal and then produces encoded sequences. The sequences, once arranged into the 2D array, can represent the fingerprints of the signals. The benefit of such transformation is that we can exploit the recent advances of the deep learning models for the image classification such as Convolutional Neural Network (CNN). Results: The proposed model, as a result, is the combination of LSTM and CNN. We evaluate the model over two data sets. For the first data set, which is more standardized than the other, our model outperforms previous works or at least equal. In the case of the second data set, we devise the schemes to generate training and testing data by changing the parameters of the window size, the sliding size, and the labeling scheme. Conclusion: The evaluation results show that the accuracy is over 95% for some cases. We also analyze the effect of the parameters on the performance.

Download Full-text

Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media

Journal Of Big Data ◽

10.1186/s40537-021-00488-w ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Yahya Albalawi ◽

Jim Buckley ◽

Nikola S. Nikolov

Keyword(s):

Social Media ◽

Deep Learning ◽

Comprehensive Evaluation ◽

Classification Problem ◽

Data Sets ◽

Word Embeddings ◽

Data Set ◽

Lower Accuracy ◽

Health Related ◽

The Impact

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.

Download Full-text

Semi-automatic Labelling of Scientific Articles using Deep Learning to Enlarge Benchmark Data for Scientific Summarization

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval ◽

10.1145/3404835.3463271 ◽

2021 ◽

Author(s):

Alaa El-Ebshihy

Keyword(s):

Deep Learning ◽

Benchmark Data

Download Full-text

Using Convolutional Neural Networks to Map Houses Suitable for Electric Vehicle Home Charging

AI ◽

10.3390/ai2010009 ◽

2021 ◽

Vol 2 (1) ◽

pp. 135-149

Author(s):

James Flynn ◽

Cinzia Giannetti

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Urban Areas ◽

Proof Of Concept ◽

Major Step ◽

Dominant Form ◽

Residential Properties ◽

The Uk ◽

Ev Charging ◽

Unique Dataset

With Electric Vehicles (EV) emerging as the dominant form of green transport in the UK, it is critical that we better understand existing infrastructures in place to support the uptake of these vehicles. In this multi-disciplinary paper, we demonstrate a novel end-to-end workflow using deep learning to perform automated surveys of urban areas to identify residential properties suitable for EV charging. A unique dataset comprised of open source Google Street View images was used to train and compare three deep neural networks and represents the first attempt to classify residential driveways from streetscape imagery. We demonstrate the full system workflow on two urban areas and achieve accuracies of 87.2% and 89.3% respectively. This proof of concept demonstrates a promising new application of deep learning in the field of remote sensing, geospatial analysis, and urban planning, as well as a major step towards fully autonomous artificially intelligent surveying techniques of the built environment.

Download Full-text

Research on imbalanced data set preprocessing based on deep learning

2021 Asia-Pacific Conference on Communications Technology and Computer Science (ACCTCS) ◽

10.1109/acctcs52002.2021.00023 ◽

2021 ◽

Author(s):

Wang Fangyu ◽

Zhang Jianhui ◽

Bu Youjun ◽

Chen Bo

Keyword(s):

Deep Learning ◽

Imbalanced Data ◽

Data Set

Download Full-text

Training and deploying a deep learning model for endoscopic severity grading in ulcerative colitis using multicenter clinical trial data

Therapeutic Advances in Gastrointestinal Endoscopy ◽

10.1177/2631774521990623 ◽

2021 ◽

Vol 14 ◽

pp. 263177452199062

Author(s):

Benjamin Gutierrez Becker ◽

Filippo Arcadu ◽

Andreas Thalhammer ◽

Citlalli Gamez Serna ◽

Owen Feehan ◽

...

Keyword(s):

Artificial Intelligence ◽

Ulcerative Colitis ◽

Clinical Trials ◽

Deep Learning ◽

Mayo Clinic ◽

Automated System ◽

Phase Iii ◽

Plain Language ◽

Data Set ◽

Endoscopic Videos

Introduction: The Mayo Clinic Endoscopic Subscore is a commonly used grading system to assess the severity of ulcerative colitis. Correctly grading colonoscopies using the Mayo Clinic Endoscopic Subscore is a challenging task, with suboptimal rates of interrater and intrarater variability observed even among experienced and sufficiently trained experts. In recent years, several machine learning algorithms have been proposed in an effort to improve the standardization and reproducibility of Mayo Clinic Endoscopic Subscore grading. Methods: Here we propose an end-to-end fully automated system based on deep learning to predict a binary version of the Mayo Clinic Endoscopic Subscore directly from raw colonoscopy videos. Differently from previous studies, the proposed method mimics the assessment done in practice by a gastroenterologist, that is, traversing the whole colonoscopy video, identifying visually informative regions and computing an overall Mayo Clinic Endoscopic Subscore. The proposed deep learning–based system has been trained and deployed on raw colonoscopies using Mayo Clinic Endoscopic Subscore ground truth provided only at the colon section level, without manually selecting frames driving the severity scoring of ulcerative colitis. Results and Conclusion: Our evaluation on 1672 endoscopic videos obtained from a multisite data set obtained from the etrolizumab Phase II Eucalyptus and Phase III Hickory and Laurel clinical trials, show that our proposed methodology can grade endoscopic videos with a high degree of accuracy and robustness (Area Under the Receiver Operating Characteristic Curve = 0.84 for Mayo Clinic Endoscopic Subscore ⩾ 1, 0.85 for Mayo Clinic Endoscopic Subscore ⩾ 2 and 0.85 for Mayo Clinic Endoscopic Subscore ⩾ 3) and reduced amounts of manual annotation. Plain language summary Patient, caregiver and provider thoughts on educational materials about prescribing and medication safety Artificial intelligence can be used to automatically assess full endoscopic videos and estimate the severity of ulcerative colitis. In this work, we present an artificial intelligence algorithm for the automatic grading of ulcerative colitis in full endoscopic videos. Our artificial intelligence models were trained and evaluated on a large and diverse set of colonoscopy videos obtained from concluded clinical trials. We demonstrate not only that artificial intelligence is able to accurately grade full endoscopic videos, but also that using diverse data sets obtained from multiple sites is critical to train robust AI models that could potentially be deployed on real-world data.

Download Full-text