Classification of Liver Cancer Subtypes Based on Hierarchical Integrated Stacked Autoencoder

Abstract The recent accumulation of cancer genomic data provides an opportunity to understand how a tumor’s genomic characteristics can affect its responses to drugs. This field, called pharmacogenomics, is a key area in the development of precision oncology. Deep learning (DL) methodology has emerged as a powerful technique to characterize and learn from rapidly accumulating pharmacogenomics data. We introduce the fundamentals and typical model architectures of DL. We review the use of DL in classification of cancers and cancer subtypes (diagnosis and treatment stratification of patients), prediction of drug response and drug synergy for individual tumors (treatment prioritization for a patient), drug repositioning and discovery and the study of mechanism/mode of action of treatments. For each topic, we summarize current genomics and pharmacogenomics data resources such as pan-cancer genomics data for cancer cell lines (CCLs) and tumors, and systematic pharmacologic screens of CCLs. By revisiting the published literature, including our in-house analyses, we demonstrate the unprecedented capability of DL enabled by rapid accumulation of data resources to decipher complex drug response patterns, thus potentially improving cancer medicine. Overall, this review provides an in-depth summary of state-of-the-art DL methods and up-to-date pharmacogenomics resources and future opportunities and challenges to realize the goal of precision oncology.

Download Full-text

A Novel Deep Flexible Neural Forest Model for Classification of Cancer Subtypes Based on Gene Expression Data

IEEE Access ◽

10.1109/access.2019.2898723 ◽

2019 ◽

Vol 7 ◽

pp. 22086-22095 ◽

Cited By ~ 8

Author(s):

Jing Xu ◽

Peng Wu ◽

Yuehui Chen ◽

Qingfang Meng ◽

Hussain Dawood ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Expression Data ◽

Cancer Subtypes ◽

Forest Model

Download Full-text

MLW-gcForest: A Multi-Weighted gcForest Model for Cancer Subtype Classification by Methylation Data

Applied Sciences ◽

10.3390/app9173589 ◽

2019 ◽

Vol 9 (17) ◽

pp. 3589 ◽

Cited By ~ 2

Author(s):

Yunyun Dong ◽

Wenkai Yang ◽

Jiawen Wang ◽

Juanjuan Zhao ◽

Yan Qiang

Keyword(s):

Machine Learning ◽

Small Sample Size ◽

Small Sample ◽

The Cancer Genome Atlas ◽

High Dimensionality ◽

Methylation Data ◽

Learning Methods ◽

Cancer Subtypes ◽

Machine Learning Methods

Effective cancer treatment requires a clear subtype. Due to the small sample size, high dimensionality, and class imbalances of cancer gene data, classifying cancer subtypes by traditional machine learning methods remains challenging. The gcForest algorithm is a combination of machine learning methods and a deep neural network and has been indicated to achieve better classification of small samples of data. However, the gcForest algorithm still faces many challenges when this method is applied to the classification of cancer subtypes. In this paper, we propose an improved gcForest algorithm (MLW-gcForest) to study the applicability of this method to the small sample sizes, high dimensionality, and class imbalances of genetic data. The main contributions of this algorithm are as follows: (1) Different weights are assigned to different random forests according to the classification ability of the forests. (2) We propose a sorting optimization algorithm that assigns different weights to the feature vectors generated under different sliding windows. The MLW-gcForest model is trained on the methylation data of five data sets from the cancer genome atlas (TCGA). The experimental results show that the MLW-gcForest algorithm achieves high accuracy and area under curve (AUC) values for the classification of cancer subtypes compared with those of traditional machine learning methods and state of the art methods. The results also show that methylation data can be effectively used to diagnose cancer.

Download Full-text

Data Perturbation Independent Diagnosis and Validation of Breast Cancer Subtypes Using Clustering and Patterns

Cancer Informatics ◽

10.1177/117693510600200006 ◽

2006 ◽

Vol 2 ◽

pp. 117693510600200 ◽

Cited By ~ 6

Author(s):

G. Alexe ◽

G.S. Dalgin ◽

R. Ramaswamy ◽

C. Delisi ◽

G. Bhanot

Keyword(s):

Breast Cancer ◽

Data Perturbation ◽

Luminal A ◽

Cancer Subtypes ◽

Luminal B ◽

Molecular Stratification ◽

Core Cluster ◽

Therapeutic Decisions ◽

Disease Subtypes

Molecular stratification of disease based on expression levels of sets of genes can help guide therapeutic decisions if such classifications can be shown to be stable against variations in sample source and data perturbation. Classifications inferred from one set of samples in one lab should be able to consistently stratify a different set of samples in another lab. We present a method for assessing such stability and apply it to the breast cancer (BCA) datasets of Sorlie et al. 2003 and Ma et al. 2003. We find that within the now commonly accepted BCA categories identified by Sorlie et al. Luminal A and Basal are robust, but Luminal B and ERBB2+ are not. In particular, 36% of the samples identified as Luminal B and 55% identified as ERBB2+ cannot be assigned an accurate category because the classification is sensitive to data perturbation. We identify a “core cluster” of samples for each category, and from these we determine “patterns” of gene expression that distinguish the core clusters from each other. We find that the best markers for Luminal A and Basal are (ESR1, LIV1, GATA-3) and (CCNE1, LAD1, KRT5), respectively. Pathways enriched in the patterns regulate apoptosis, tissue remodeling and the immune response. We use a different dataset (Ma et al. 2003) to test the accuracy with which samples can be allocated to the four disease subtypes. We find, as expected, that the classification of samples identified as Luminal A and Basal is robust but classification into the other two subtypes is not.

Download Full-text