Benchmarking Geometric Deep Learning for Cortical Segmentation and Neurodevelopmental Phenotype Prediction

AbstractThe emerging field of geometric deep learning extends the application of convolutional neural networks to irregular domains such as graphs, meshes and surfaces. Several recent studies have explored the potential for using these techniques to analyse and segment the cortical surface. However, there has been no comprehensive comparison of these approaches to one another, nor to existing Euclidean methods, to date. This paper benchmarks a collection of geometric and traditional deep learning models on phenotype prediction and segmentation of sphericalised neonatal cortical surface data, from the publicly available Developing Human Connectome Project (dHCP). Tasks include prediction of postmenstrual age at scan, gestational age at birth and segmentation of the cortical surface into anatomical regions defined by the M-CRIB-S atlas. Performance was assessed not only in terms of model precision, but also in terms of network dependence on image registration, and model interpretation via occlusion. Networks were trained both on sphericalised and anatomical cortical meshes. Findings suggest that the utility of geometric deep learning over traditional deep learning is highly task-specific, which has implications for the design of future deep learning models on the cortical surface. The code, and instructions for data access, are available from https://github.com/Abdulah-Fawaz/Benchmarking-Surface-DL.

Download Full-text

PlotMI: visualization of pairwise interactions and positional preferences learned by a deep learning model from sequence data

10.1101/2021.03.14.435285 ◽

2021 ◽

Author(s):

Tuomo Hartonen ◽

Teemu Kivioja ◽

Jussi Taipale

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Sequence Data ◽

Predictive Performance ◽

Learning Model ◽

Biological Research ◽

Learning Approaches ◽

Learning Models ◽

Model Interpretation ◽

Pairwise Interactions

Deep learning models have in recent years gained success in various tasks related to understanding information coded in the DNA sequence. Rapidly developing genome-wide measurement technologies provide large quantities of data ideally suited for modeling using deep learning or other powerful machine learning approaches. Although offering state-of-the art predictive performance, the predictions made by deep learning models can be difficult to understand. In virtually all biological research, the understanding of how a predictive model works is as important as the raw predictive performance. Thus interpretation of deep learning models is an emerging hot topic especially in context of biological research. Here we describe plotMI, a mutual information based model interpretation strategy that can intuitively visualize positional preferences and pairwise interactions learned by any machine learning model trained on sequence data with a defined alphabet as input. PlotMI is freely available at https://github.com/hartonen/plotMI.

Download Full-text

Learned cardinality estimation

Proceedings of the VLDB Endowment ◽

10.14778/3485450.3485459 ◽

2021 ◽

Vol 15 (1) ◽

pp. 85-97

Author(s):

Ji Sun ◽

Jintao Zhang ◽

Zhaoyan Sun ◽

Guoliang Li ◽

Nan Tang

Keyword(s):

Deep Learning ◽

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

Data Distribution ◽

Learning Models ◽

Cardinality Estimation ◽

Comprehensive Comparison ◽

Query Optimizers ◽

Relational Table

Cardinality estimation is core to the query optimizers of DBMSs. Non-learned methods, especially based on histograms and samplings, have been widely used in commercial and open-source DBMSs. Nevertheless, histograms and samplings can only be used to summarize one or few columns, which fall short of capturing the joint data distribution over an arbitrary combination of columns, because of the oversimplification of histograms and samplings over the original relational table(s). Consequently, these traditional methods typically make bad predictions for hard cases such as queries over multiple columns, with multiple predicates, and joins between multiple tables. Recently, learned cardinality estimators have been widely studied. Because these learned estimators can better capture the data distribution and query characteristics, empowered by the recent advance of (deep learning) models, they outperform non-learned methods on many cases. The goals of this paper are to provide a design space exploration of learned cardinality estimators and to have a comprehensive comparison of the SOTA learned approaches so as to provide a guidance for practitioners to decide what method to use under various practical scenarios.

Download Full-text

Deep GONet: self-explainable deep neural network based on Gene Ontology for phenotype prediction from gene expression data

BMC Bioinformatics ◽

10.1186/s12859-021-04370-7 ◽

2021 ◽

Vol 22 (S10) ◽

Author(s):

Victoria Bourgeais ◽

Farida Zehraoui ◽

Mohamed Ben Hamdoune ◽

Blaise Hanczar

Keyword(s):

Neural Network ◽

Gene Expression ◽

Gene Ontology ◽

Deep Learning ◽

Precision Medicine ◽

Gene Expression Data ◽

Biological Knowledge ◽

Expression Data ◽

Learning Models ◽

Phenotype Prediction

Abstract Background With the rapid advancement of genomic sequencing techniques, massive production of gene expression data is becoming possible, which prompts the development of precision medicine. Deep learning is a promising approach for phenotype prediction (clinical diagnosis, prognosis, and drug response) based on gene expression profile. Existing deep learning models are usually considered as black-boxes that provide accurate predictions but are not interpretable. However, accuracy and interpretation are both essential for precision medicine. In addition, most models do not integrate the knowledge of the domain. Hence, making deep learning models interpretable for medical applications using prior biological knowledge is the main focus of this paper. Results In this paper, we propose a new self-explainable deep learning model, called Deep GONet, integrating the Gene Ontology into the hierarchical architecture of the neural network. This model is based on a fully-connected architecture constrained by the Gene Ontology annotations, such that each neuron represents a biological function. The experiments on cancer diagnosis datasets demonstrate that Deep GONet is both easily interpretable and highly performant to discriminate cancer and non-cancer samples. Conclusions Our model provides an explanation to its predictions by identifying the most important neurons and associating them with biological functions, making the model understandable for biologists and physicians.

Download Full-text

Levenshtein Augmentation Improves Performance of SMILES Based Deep-Learning Synthesis Prediction

10.26434/chemrxiv.12562121 ◽

2020 ◽

Author(s):

Dean Sumner ◽

Jiazhen He ◽

Amol Thakkar ◽

Ola Engkvist ◽

Esben Jannik Bjerrum

Keyword(s):

Neural Networks ◽

Pattern Recognition ◽

Deep Learning ◽

Recurrent Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Sequence Similarity ◽

Learning Models ◽

Underlying Network

SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as attentional gain – an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.

Download Full-text

Improving the Accuracy of Protein-Ligand Binding Affinity Prediction by Deep Learning Models: Benchmark and Model

10.26434/chemrxiv.9866912 ◽

2019 ◽

Author(s):

Mohammad Rezaei ◽

Yanjun Li ◽

Xiaolin Li ◽

Chenglong Li

Keyword(s):

Deep Learning ◽

Drug Design ◽

Binding Affinity ◽

Benchmark Dataset ◽

Rational Drug Design ◽

Learning Models ◽

Structure Based Drug Design ◽

Binding Affinity Prediction ◽

Affinity Prediction ◽

Rational Drug

Introduction: The ability to discriminate among ligands binding to the same protein target in terms of their relative binding affinity lies at the heart of structure-based drug design. Any improvement in the accuracy and reliability of binding affinity prediction methods decreases the discrepancy between experimental and computational results. Objectives: The primary objectives were to find the most relevant features affecting binding affinity prediction, least use of manual feature engineering, and improving the reliability of binding affinity prediction using efficient deep learning models by tuning the model hyperparameters. Methods: The binding site of target proteins was represented as a grid box around their bound ligand. Both binary and distance-dependent occupancies were examined for how an atom affects its neighbor voxels in this grid. A combination of different features including ANOLEA, ligand elements, and Arpeggio atom types were used to represent the input. An efficient convolutional neural network (CNN) architecture, DeepAtom, was developed, trained and tested on the PDBbind v2016 dataset. Additionally an extended benchmark dataset was compiled to train and evaluate the models. Results: The best DeepAtom model showed an improved accuracy in the binding affinity prediction on PDBbind core subset (Pearson’s R=0.83) and is better than the recent state-of-the-art models in this field. In addition when the DeepAtom model was trained on our proposed benchmark dataset, it yields higher correlation compared to the baseline which confirms the value of our model. Conclusions: The promising results for the predicted binding affinities is expected to pave the way for embedding deep learning models in virtual screening and rational drug design fields.

Download Full-text

Data science in economics: comprehensive review of advanced machine learning and deep learning methods

10.31232/osf.io/4pxq2 ◽

2020 ◽

Author(s):

Saeed Nosratabadi ◽

Amir Mosavi ◽

Puhong Duan ◽

Pedram Ghamisi ◽

Ferdinand Filip ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Science ◽

State Of The Art ◽

Science Methods ◽

Learning Models ◽

Diverse Range ◽

Hybrid Machine ◽

Economics Research

This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.

Download Full-text

A Study on the Auxiliary Diagnosis of Thyroid Disease Images Based on Multiple Dimensional Deep Learning Algorithms

Current Medical Imaging Formerly Current Medical Imaging Reviews ◽

10.2174/1573405615666190115155223 ◽

2020 ◽

Vol 16 (3) ◽

pp. 199-205

Author(s):

Yuejun Liu ◽

Yifei Xu ◽

Xiangzheng Meng ◽

Xuguang Wang ◽

Tianxu Bai

Keyword(s):

Deep Learning ◽

Learning Algorithms ◽

Region Of Interest ◽

Classification Performance ◽

Thyroid Diseases ◽

Great Success ◽

Learning Models ◽

Good Classification Performance ◽

Spect Images

Background: Medical imaging plays an important role in the diagnosis of thyroid diseases. In the field of machine learning, multiple dimensional deep learning algorithms are widely used in image classification and recognition, and have achieved great success. Objective: The method based on multiple dimensional deep learning is employed for the auxiliary diagnosis of thyroid diseases based on SPECT images. The performances of different deep learning models are evaluated and compared. Methods: Thyroid SPECT images are collected with three types, they are hyperthyroidism, normal and hypothyroidism. In the pre-processing, the region of interest of thyroid is segmented and the amount of data sample is expanded. Four CNN models, including CNN, Inception, VGG16 and RNN, are used to evaluate deep learning methods. Results: Deep learning based methods have good classification performance, the accuracy is 92.9%-96.2%, AUC is 97.8%-99.6%. VGG16 model has the best performance, the accuracy is 96.2% and AUC is 99.6%. Especially, the VGG16 model with a changing learning rate works best. Conclusion: The standard CNN, Inception, VGG16, and RNN four deep learning models are efficient for the classification of thyroid diseases with SPECT images. The accuracy of the assisted diagnostic method based on deep learning is higher than that of other methods reported in the literature.

Download Full-text

Deep Learning in Disease Diagnosis: Models and Datasets

Current Bioinformatics ◽

10.2174/1574893615999201002124021 ◽

2020 ◽

Vol 15 ◽

Author(s):

Deeksha Saxena ◽

Mohammed Haris Siddiqui ◽

Rajnish Kumar

Keyword(s):

Biological Sciences ◽

Machine Learning ◽

Deep Learning ◽

Disease Diagnosis ◽

Learning Models ◽

Data Types ◽

Related Data ◽

Abstract Level ◽

Experimental Validations ◽

Selection Of

Background: Deep learning (DL) is an Artificial neural network-driven framework with multiple levels of representation for which non-linear modules combined in such a way that the levels of representation can be enhanced from lower to a much abstract level. Though DL is used widely in almost every field, it has largely brought a breakthrough in biological sciences as it is used in disease diagnosis and clinical trials. DL can be clubbed with machine learning, but at times both are used individually as well. DL seems to be a better platform than machine learning as the former does not require an intermediate feature extraction and works well with larger datasets. DL is one of the most discussed fields among the scientists and researchers these days for diagnosing and solving various biological problems. However, deep learning models need some improvisation and experimental validations to be more productive. Objective: To review the available DL models and datasets that are used in disease diagnosis. Methods: Available DL models and their applications in disease diagnosis were reviewed discussed and tabulated. Types of datasets and some of the popular disease related data sources for DL were highlighted. Results: We have analyzed the frequently used DL methods, data types and discussed some of the recent deep learning models used for solving different biological problems. Conclusion: The review presents useful insights about DL methods, data types, selection of DL models for the disease diagnosis.

Download Full-text

Deep Learning Model Comparison for Vision-Based Classification of Full/Empty-Load Trucks in Earthmoving Operations

Applied Sciences ◽

10.3390/app9224871 ◽

2019 ◽

Vol 9 (22) ◽

pp. 4871 ◽

Cited By ~ 4

Author(s):

Quan Liu ◽

Chen Feng ◽

Zida Song ◽

Joseph Louis ◽

Jian Zhou

Keyword(s):

Deep Learning ◽

Model Comparison ◽

Surveillance Systems ◽

Comparison Study ◽

Learning Models ◽

The Core ◽

Dump Trucks ◽

Deep Learning Model ◽

Contact Field

Earthmoving is an integral civil engineering operation of significance, and tracking its productivity requires the statistics of loads moved by dump trucks. Since current truck loads’ statistics methods are laborious, costly, and limited in application, this paper presents the framework of a novel, automated, non-contact field earthmoving quantity statistics (FEQS) for projects with large earthmoving demands that use uniform and uncovered trucks. The proposed FEQS framework utilizes field surveillance systems and adopts vision-based deep learning for full/empty-load truck classification as the core work. Since convolutional neural network (CNN) and its transfer learning (TL) forms are popular vision-based deep learning models and numerous in type, a comparison study is conducted to test the framework’s core work feasibility and evaluate the performance of different deep learning models in implementation. The comparison study involved 12 CNN or CNN-TL models in full/empty-load truck classification, and the results revealed that while several provided satisfactory performance, the VGG16-FineTune provided the optimal performance. This proved the core work feasibility of the proposed FEQS framework. Further discussion provides model choice suggestions that CNN-TL models are more feasible than CNN prototypes, and models that adopt different TL methods have advantages in either working accuracy or speed for different tasks.

Download Full-text

Deep Learning Models for Estimation of the SuperDARN Cross Polar Cap Potential

Earth and Space Science ◽

10.1029/2020ea001219 ◽

2020 ◽

Vol 7 (8) ◽

Author(s):

Erxiao Liu ◽

Hongqiao Hu ◽

Jianjun Liu ◽

Lei Qiao

Keyword(s):

Deep Learning ◽

Learning Models ◽

Polar Cap ◽

Cross Polar Cap Potential

Download Full-text