forgeNet: a graph deep neural network model using tree-based ensemble classifiers for feature graph construction

Yunchuan Kong; Tianwei Yu

doi:10.1093/bioinformatics/btaa164

forgeNet: a graph deep neural network model using tree-based ensemble classifiers for feature graph construction

Bioinformatics ◽

10.1093/bioinformatics/btaa164 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3507-3515

Author(s):

Yunchuan Kong ◽

Tianwei Yu

Keyword(s):

Deep Learning ◽

Model Building ◽

Classification Model ◽

Supplementary Information ◽

Sparse Learning ◽

Omics Data ◽

Ensemble Classifiers ◽

Feedforward Network ◽

Unique Challenge ◽

Functional Relationships

Abstract Motivation A unique challenge in predictive model building for omics data has been the small number of samples (n) versus the large amount of features (p). This ‘n≪p’ property brings difficulties for disease outcome classification using deep learning techniques. Sparse learning by incorporating known functional relationships between the biological units, such as the graph-embedded deep feedforward network (GEDFN) model, has been a solution to this issue. However, such methods require an existing feature graph, and potential mis-specification of the feature graph can be harmful on classification and feature selection. Results To address this limitation and develop a robust classification model without relying on external knowledge, we propose a forest graph-embedded deep feedforward network (forgeNet) model, to integrate the GEDFN architecture with a forest feature graph extractor, so that the feature graph can be learned in a supervised manner and specifically constructed for a given prediction task. To validate the method’s capability, we experimented the forgeNet model with both synthetic and real datasets. The resulting high classification accuracy suggests that the method is a valuable addition to sparse deep learning models for omics data. Availability and implementation The method is available at https://github.com/yunchuankong/forgeNet. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SMILE: Mutual Information Learning for Integration of Single-cell Omics Data

Bioinformatics ◽

10.1093/bioinformatics/btab706 ◽

2021 ◽

Author(s):

Yang Xu ◽

Priyojit Das ◽

Rachel Patton McCord

Keyword(s):

Deep Learning ◽

Mutual Information ◽

Single Cell ◽

Learning Algorithm ◽

Cellular Systems ◽

Supplementary Information ◽

Omics Data ◽

Learning Approaches ◽

Rna Seq ◽

Integrate Data

Abstract Motivation Deep learning approaches have empowered single-cell omics data analysis in many ways and generated new insights from complex cellular systems. As there is an increasing need for single cell omics data to be integrated across sources, types, and features of data, the challenges of integrating single-cell omics data are rising. Here, we present an unsupervised deep learning algorithm that learns discriminative representations for single-cell data via maximizing mutual information, SMILE (Single-cell Mutual Information Learning). Results Using a unique cell-pairing design, SMILE successfully integrates multi-source single-cell transcriptome data, removing batch effects and projecting similar cell types, even from different tissues, into the shared space. SMILE can also integrate data from two or more modalities, such as joint profiling technologies using single-cell ATAC-seq, RNA-seq, DNA methylation, Hi-C, and ChIP data. When paired cells are known, SMILE can integrate data with unmatched feature, such as genes for RNA-seq and genome wide peaks for ATAC-seq. Integrated representations learned from joint profiling technologies can then be used as a framework for comparing independent single source data. Supplementary information Supplementary data are available at Bioinformatics online. The source code of SMILE including analyses of key results in the study can be found at: https://github.com/rpmccordlab/SMILE.

Download Full-text

A Novel Context Aware Joint Segmentation and Classification Framework for Glaucoma Detection

Computational and Mathematical Methods in Medicine ◽

10.1155/2021/2921737 ◽

2021 ◽

Vol 2021 ◽

pp. 1-19

Author(s):

S. Sankar Ganesh ◽

G. Kannayeram ◽

Alagar Karthick ◽

M. Muhibbullah

Keyword(s):

Deep Learning ◽

Binary Classification ◽

Ocular Disease ◽

Classification Model ◽

Context Aware ◽

Learning Approaches ◽

Unique Challenge ◽

Classification Framework ◽

Glaucoma Detection ◽

Insidious Onset

Glaucoma is a chronic ocular disease characterized by damage to the optic nerve resulting in progressive and irreversible visual loss. Early detection and timely clinical interventions are critical in improving glaucoma-related outcomes. As a typical and complicated ocular disease, glaucoma detection presents a unique challenge due to its insidious onset and high intra- and interpatient variabilities. Recent studies have demonstrated that robust glaucoma detection systems can be realized with deep learning approaches. The optic disc (OD) is the most commonly studied retinal structure for screening and diagnosing glaucoma. This paper proposes a novel context aware deep learning framework called GD-YNet, for OD segmentation and glaucoma detection. It leverages the potential of aggregated transformations and the simplicity of the YNet architecture in context aware OD segmentation and binary classification for glaucoma detection. Trained with the RIGA and RIMOne-V2 datasets, this model achieves glaucoma detection accuracies of 99.72%, 98.02%, 99.50%, and 99.41% with the ACRIMA, Drishti-gs, REFUGE, and RIMOne-V1 datasets. Further, the proposed model can be extended to a multiclass segmentation and classification model for glaucoma staging and severity assessment.

Download Full-text

Subtype-GAN: a deep learning approach for integrative cancer subtyping of multi-omics data

Bioinformatics ◽

10.1093/bioinformatics/btab109 ◽

2021 ◽

Author(s):

Hai Yang ◽

Rui Chen ◽

Dongdong Li ◽

Zhe Wang

Keyword(s):

Neural Network ◽

Deep Learning ◽

Latent Variables ◽

Supplementary Information ◽

Learning Approach ◽

Data Sets ◽

Omics Data ◽

Data Set ◽

Benchmark Data ◽

Cancer Pathogenesis

Abstract Motivation The discovery of cancer subtyping can help explore cancer pathogenesis, determine clinical actionability in treatment, and improve patients' survival rates. However, due to the diversity and complexity of multi-omics data, it is still challenging to develop integrated clustering algorithms for tumor molecular subtyping. Results We propose Subtype-GAN, a deep adversarial learning approach based on the multiple-input multiple-output neural network to model the complex omics data accurately. With the latent variables extracted from the neural network, Subtype-GAN uses consensus clustering and the Gaussian Mixture model to identify tumor samples' molecular subtypes. Compared with other state-of-the-art subtyping approaches, Subtype-GAN achieved outstanding performance on the benchmark data sets consisting of ∼4,000 TCGA tumors from 10 types of cancer. We found that on the comparison data set, the clustering scheme of Subtype-GAN is not always similar to that of the deep learning method AE but is identical to that of NEMO, MCCA, VAE, and other excellent approaches. Finally, we applied Subtype-GAN to the BRCA data set and automatically obtained the number of subtypes and the subtype labels of 1031 BRCA tumors. Through the detailed analysis, we found that the identified subtypes are clinically meaningful and show distinct patterns in the feature space, demonstrating the practicality of Subtype-GAN. Availability The source codes, the clustering results of Subtype-GAN across the benchmark data sets are available at https://github.com/haiyang1986/Subtype-GAN. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Exploring generative deep learning for omics data using log-linear models

Bioinformatics ◽

10.1093/bioinformatics/btaa623 ◽

2020 ◽

Vol 36 (20) ◽

pp. 5045-5053

Author(s):

Moritz Hess ◽

Maren Hackenberg ◽

Harald Binder

Keyword(s):

Deep Learning ◽

Linear Models ◽

Synthetic Data ◽

Simulated Data ◽

Image Data ◽

Supplementary Information ◽

Underlying Structure ◽

Omics Data ◽

Latent Representations ◽

Log Linear

Abstract Motivation Following many successful applications to image data, deep learning is now also increasingly considered for omics data. In particular, generative deep learning not only provides competitive prediction performance, but also allows for uncovering structure by generating synthetic samples. However, exploration and visualization is not as straightforward as with image applications. Results We demonstrate how log-linear models, fitted to the generated, synthetic data can be used to extract patterns from omics data, learned by deep generative techniques. Specifically, interactions between latent representations learned by the approaches and generated synthetic data are used to determine sets of joint patterns. Distances of patterns with respect to the distribution of latent representations are then visualized in low-dimensional coordinate systems, e.g. for monitoring training progress. This is illustrated with simulated data and subsequently with cortical single-cell gene expression data. Using different kinds of deep generative techniques, specifically variational autoencoders and deep Boltzmann machines, the proposed approach highlights how the techniques uncover underlying structure. It facilitates the real-world use of such generative deep learning techniques to gain biological insights from omics data. Availability and implementation The code for the approach as well as an accompanying Jupyter notebook, which illustrates the application of our approach, is available via the GitHub repository: https://github.com/ssehztirom/Exploring-generative-deep-learning-for-omics-data-by-using-log-linear-models. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Effectiveness of transfer learning for enhancing tumor classification with a convolutional neural network on frozen sections

Scientific Reports ◽

10.1038/s41598-020-78129-0 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Young-Gon Kim ◽

Sungchul Kim ◽

Cristina Eunbee Cho ◽

In Hye Song ◽

Hee Jin Lee ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolutional Neural Network ◽

Transfer Learning ◽

Frozen Section ◽

Medical Center ◽

External Validation ◽

Model Performance ◽

Classification Model ◽

Training Dataset

AbstractFast and accurate confirmation of metastasis on the frozen tissue section of intraoperative sentinel lymph node biopsy is an essential tool for critical surgical decisions. However, accurate diagnosis by pathologists is difficult within the time limitations. Training a robust and accurate deep learning model is also difficult owing to the limited number of frozen datasets with high quality labels. To overcome these issues, we validated the effectiveness of transfer learning from CAMELYON16 to improve performance of the convolutional neural network (CNN)-based classification model on our frozen dataset (N = 297) from Asan Medical Center (AMC). Among the 297 whole slide images (WSIs), 157 and 40 WSIs were used to train deep learning models with different dataset ratios at 2, 4, 8, 20, 40, and 100%. The remaining, i.e., 100 WSIs, were used to validate model performance in terms of patch- and slide-level classification. An additional 228 WSIs from Seoul National University Bundang Hospital (SNUBH) were used as an external validation. Three initial weights, i.e., scratch-based (random initialization), ImageNet-based, and CAMELYON16-based models were used to validate their effectiveness in external validation. In the patch-level classification results on the AMC dataset, CAMELYON16-based models trained with a small dataset (up to 40%, i.e., 62 WSIs) showed a significantly higher area under the curve (AUC) of 0.929 than those of the scratch- and ImageNet-based models at 0.897 and 0.919, respectively, while CAMELYON16-based and ImageNet-based models trained with 100% of the training dataset showed comparable AUCs at 0.944 and 0.943, respectively. For the external validation, CAMELYON16-based models showed higher AUCs than those of the scratch- and ImageNet-based models. Model performance for slide feasibility of the transfer learning to enhance model performance was validated in the case of frozen section datasets with limited numbers.

Download Full-text

Uncertainty-Aware Deep Learning-Based Cardiac Arrhythmias Classification Model of Electrocardiogram Signals

Computers ◽

10.3390/computers10060082 ◽

2021 ◽

Vol 10 (6) ◽

pp. 82

Author(s):

Ahmad O. Aseeri

Keyword(s):

Deep Learning ◽

Cardiac Arrhythmias ◽

Large Scale ◽

Clinical Decision Making ◽

Probabilistic Approach ◽

Classification Model ◽

Gating Mechanism ◽

Uncertainty Estimates ◽

Wide Range

Deep Learning-based methods have emerged to be one of the most effective and practical solutions in a wide range of medical problems, including the diagnosis of cardiac arrhythmias. A critical step to a precocious diagnosis in many heart dysfunctions diseases starts with the accurate detection and classification of cardiac arrhythmias, which can be achieved via electrocardiograms (ECGs). Motivated by the desire to enhance conventional clinical methods in diagnosing cardiac arrhythmias, we introduce an uncertainty-aware deep learning-based predictive model design for accurate large-scale classification of cardiac arrhythmias successfully trained and evaluated using three benchmark medical datasets. In addition, considering that the quantification of uncertainty estimates is vital for clinical decision-making, our method incorporates a probabilistic approach to capture the model’s uncertainty using a Bayesian-based approximation method without introducing additional parameters or significant changes to the network’s architecture. Although many arrhythmias classification solutions with various ECG feature engineering techniques have been reported in the literature, the introduced AI-based probabilistic-enabled method in this paper outperforms the results of existing methods in outstanding multiclass classification results that manifest F1 scores of 98.62% and 96.73% with (MIT-BIH) dataset of 20 annotations, and 99.23% and 96.94% with (INCART) dataset of eight annotations, and 97.25% and 96.73% with (BIDMC) dataset of six annotations, for the deep ensemble and probabilistic mode, respectively. We demonstrate our method’s high-performing and statistical reliability results in numerical experiments on the language modeling using the gating mechanism of Recurrent Neural Networks.

Download Full-text

Understanding Natural Disaster Scenes from Mobile Images Using Deep Learning

Applied Sciences ◽

10.3390/app11093952 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3952

Author(s):

Shimin Tang ◽

Zhiqiang Chen

Keyword(s):

Deep Learning ◽

Natural Disaster ◽

Scene Understanding ◽

Computing Methods ◽

Classification Model ◽

Learning Approach ◽

Learning Models ◽

Damage Level ◽

Feature Extractor ◽

Mobile Imaging

With the ubiquitous use of mobile imaging devices, the collection of perishable disaster-scene data has become unprecedentedly easy. However, computing methods are unable to understand these images with significant complexity and uncertainties. In this paper, the authors investigate the problem of disaster-scene understanding through a deep-learning approach. Two attributes of images are concerned, including hazard types and damage levels. Three deep-learning models are trained, and their performance is assessed. Specifically, the best model for hazard-type prediction has an overall accuracy (OA) of 90.1%, and the best damage-level classification model has an explainable OA of 62.6%, upon which both models adopt the Faster R-CNN architecture with a ResNet50 network as a feature extractor. It is concluded that hazard types are more identifiable than damage levels in disaster-scene images. Insights are revealed, including that damage-level recognition suffers more from inter- and intra-class variations, and the treatment of hazard-agnostic damage leveling further contributes to the underlying uncertainties.

Download Full-text

Performance Comparison of Deep Learning Autoencoders for Cancer Subtype Detection Using Multi-Omics Data

Cancers ◽

10.3390/cancers13092013 ◽

2021 ◽

Vol 13 (9) ◽

pp. 2013

Author(s):

Edian F. Franco ◽

Pratip Rana ◽

Aline Cruz ◽

Víctor V. Calderón ◽

Vasco Azevedo ◽

...

Keyword(s):

Deep Learning ◽

Data Fusion ◽

Similarity Measures ◽

Research Problem ◽

Optimal Number ◽

Performance Comparison ◽

The Cancer Genome Atlas ◽

Cancer Type ◽

Omics Data ◽

Cancer Subtype

A heterogeneous disease such as cancer is activated through multiple pathways and different perturbations. Depending upon the activated pathway(s), the survival of the patients varies significantly and shows different efficacy to various drugs. Therefore, cancer subtype detection using genomics level data is a significant research problem. Subtype detection is often a complex problem, and in most cases, needs multi-omics data fusion to achieve accurate subtyping. Different data fusion and subtyping approaches have been proposed over the years, such as kernel-based fusion, matrix factorization, and deep learning autoencoders. In this paper, we compared the performance of different deep learning autoencoders for cancer subtype detection. We performed cancer subtype detection on four different cancer types from The Cancer Genome Atlas (TCGA) datasets using four autoencoder implementations. We also predicted the optimal number of subtypes in a cancer type using the silhouette score and found that the detected subtypes exhibit significant differences in survival profiles. Furthermore, we compared the effect of feature selection and similarity measures for subtype detection. For further evaluation, we used the Glioblastoma multiforme (GBM) dataset and identified the differentially expressed genes in each of the subtypes. The results obtained are consistent with other genomic studies and can be corroborated with the involved pathways and biological functions. Thus, it shows that the results from the autoencoders, obtained through the interaction of different datatypes of cancer, can be used for the prediction and characterization of patient subgroups and survival profiles.

Download Full-text

Toward an Automatic Quality Assessment of Voice-Based Telemedicine Consultations: A Deep Learning Approach

Sensors ◽

10.3390/s21093279 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3279

Author(s):

Maria Habib ◽

Mohammad Faris ◽

Raneem Qaddoura ◽

Manal Alomari ◽

Alaa Alomari ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Quality Assessment ◽

Transcript Level ◽

Assessment Process ◽

Classification Model ◽

Systematic Evaluation ◽

Transcript Levels ◽

Perceptual Evaluation

Maintaining a high quality of conversation between doctors and patients is essential in telehealth services, where efficient and competent communication is important to promote patient health. Assessing the quality of medical conversations is often handled based on a human auditory-perceptual evaluation. Typically, trained experts are needed for such tasks, as they follow systematic evaluation criteria. However, the daily rapid increase of consultations makes the evaluation process inefficient and impractical. This paper investigates the automation of the quality assessment process of patient–doctor voice-based conversations in a telehealth service using a deep-learning-based classification model. For this, the data consist of audio recordings obtained from Altibbi. Altibbi is a digital health platform that provides telemedicine and telehealth services in the Middle East and North Africa (MENA). The objective is to assist Altibbi’s operations team in the evaluation of the provided consultations in an automated manner. The proposed model is developed using three sets of features: features extracted from the signal level, the transcript level, and the signal and transcript levels. At the signal level, various statistical and spectral information is calculated to characterize the spectral envelope of the speech recordings. At the transcript level, a pre-trained embedding model is utilized to encompass the semantic and contextual features of the textual information. Additionally, the hybrid of the signal and transcript levels is explored and analyzed. The designed classification model relies on stacked layers of deep neural networks and convolutional neural networks. Evaluation results show that the model achieved a higher level of precision when compared with the manual evaluation approach followed by Altibbi’s operations team.

Download Full-text

KArSL

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3423420 ◽

2021 ◽

Vol 20 (1) ◽

pp. 1-19

Author(s):

Ala Addin I. Sidig ◽

Hamzah Luqman ◽

Sabri Mahmoud ◽

Mohamed Mohandes

Keyword(s):

Deep Learning ◽

Sign Language ◽

Markov Models ◽

Body Language ◽

Arab Countries ◽

Microsoft Kinect ◽

Classification Model ◽

Language Recognition ◽

Sign Language Recognition ◽

Arabic Sign Language

Sign language is the major means of communication for the deaf community. It uses body language and gestures such as hand shapes, lib patterns, and facial expressions to convey a message. Sign language is geography-specific, as it differs from one country to another. Arabic Sign language is used in all Arab countries. The availability of a comprehensive benchmarking database for ArSL is one of the challenges of the automatic recognition of Arabic Sign language. This article introduces KArSL database for ArSL, consisting of 502 signs that cover 11 chapters of ArSL dictionary. Signs in KArSL database are performed by three professional signers, and each sign is repeated 50 times by each signer. The database is recorded using state-of-art multi-modal Microsoft Kinect V2. We also propose three approaches for sign language recognition using this database. The proposed systems are Hidden Markov Models, deep learning images’ classification model applied on an image composed of shots of the video of the sign, and attention-based deep learning captioning system. Recognition accuracies of these systems indicate their suitability for such a large number of Arabic signs. The techniques are also tested on a publicly available database. KArSL database will be made freely available for interested researchers.

Download Full-text