An ensemble learning approach for modeling the systems biology of drug-induced injury

Abstract Background Drug-induced liver injury (DILI) is an adverse reaction caused by the intake of drugs of common use that produces liver damage. The impact of DILI is estimated to affect around 20 in 100,000 inhabitants worldwide each year. Despite being one of the main causes of liver failure, the pathophysiology and mechanisms of DILI are poorly understood. In the present study, we developed an ensemble learning approach based on different features (CMap gene expression, chemical structures, drug targets) to predict drugs that might cause DILI and gain a better understanding of the mechanisms linked to the adverse reaction. Results We searched for gene signatures in CMap gene expression data by using two approaches: phenotype-gene associations data from DisGeNET, and a non-parametric test comparing gene expression of DILI-Concern and No-DILI-Concern drugs (as per DILIrank definitions). The average accuracy of the classifiers in both approaches was 69%. We used chemical structures as features, obtaining an accuracy of 65%. The combination of both types of features produced an accuracy around 63%, but improved the independent hold-out test up to 67%. The use of drug-target associations as feature obtained the best accuracy (70%) in the independent hold-out test. Conclusions When using CMap gene expression data, searching for a specific gene signature among the landmark genes improves the quality of the classifiers, but it is still limited by the intrinsic noise of the dataset. When using chemical structures as a feature, the structural diversity of the known DILI-causing drugs hampers the prediction, which is a similar problem as for the use of gene expression information. The combination of both features did not improve the quality of the classifiers but increased the robustness as shown on independent hold-out tests. The use of drug-target associations as feature improved the prediction, specially the specificity, and the results were comparable to previous research studies.

Download Full-text

An Efficient PCA Ensemble Learning Approach for Prediction of RNA-Seq Malaria Vector Gene Expression Data Classification

International Journal of Engineering Research and Technology ◽

10.37624/ijert/13.1.2020.163-169 ◽

2020 ◽

Vol 13 (1) ◽

pp. 163

Author(s):

Micheal Olaolu Arowolo ◽

Marion O. Adebiyi ◽

Ayodele A. Adebiyi

Keyword(s):

Gene Expression ◽

Ensemble Learning ◽

Gene Expression Data ◽

Malaria Vector ◽

Data Classification ◽

Learning Approach ◽

Expression Data ◽

Rna Seq

Download Full-text

Gene Expression Data Based Deep Learning Model for Accurate Prediction of Drug-Induced Liver Injury in Advance

Journal of Chemical Information and Modeling ◽

10.1021/acs.jcim.9b00143 ◽

2019 ◽

Vol 59 (7) ◽

pp. 3240-3250 ◽

Cited By ~ 3

Author(s):

Chunlai Feng ◽

Hengwei Chen ◽

Xianqin Yuan ◽

Mengqiu Sun ◽

Kexin Chu ◽

...

Keyword(s):

Gene Expression ◽

Deep Learning ◽

Liver Injury ◽

Gene Expression Data ◽

Learning Model ◽

Accurate Prediction ◽

Expression Data ◽

Drug Induced ◽

Drug Induced Liver Injury ◽

Deep Learning Model

Download Full-text

Quality of Feature Selection Based on Microarray Gene Expression Data

Computational Science – ICCS 2008 - Lecture Notes in Computer Science ◽

10.1007/978-3-540-69389-5_17 ◽

2008 ◽

pp. 140-147 ◽

Cited By ~ 4

Author(s):

Henryk Maciejewski

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

Microarray Gene

Download Full-text

Meta‐learning approach to gene expression data classification

International Journal of Intelligent Computing and Cybernetics ◽

10.1108/17563780910959901 ◽

2009 ◽

Vol 2 (2) ◽

pp. 285-303 ◽

Cited By ~ 8

Author(s):

Bruno Feres de Souza ◽

Carlos Soares ◽

André C.P.L.F. de Carvalho

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Data Classification ◽

Learning Approach ◽

Expression Data ◽

Meta Learning

Download Full-text

An ensemble learning approach to reverse-engineering transcriptional regulatory networks from time-series gene expression data

BMC Genomics ◽

10.1186/1471-2164-10-s1-s8 ◽

2009 ◽

Vol 10 (Suppl 1) ◽

pp. S8 ◽

Cited By ~ 1

Author(s):

Jianhua Ruan ◽

Youping Deng ◽

Edward J Perkins ◽

Weixiong Zhang

Keyword(s):

Gene Expression ◽

Time Series ◽

Reverse Engineering ◽

Gene Expression Data ◽

Regulatory Networks ◽

Learning Approach ◽

Expression Data ◽

Transcriptional Regulatory Networks ◽

Transcriptional Regulatory ◽

Time Series Gene Expression

Download Full-text

Cross-platform proteomics to advance genetic prioritisation strategies

10.1101/2021.03.18.435919 ◽

2021 ◽

Author(s):

Maik Pietzner ◽

Eleanor Wheeler ◽

Julia Carrasco-Zanini ◽

Nicola D. Kerrison ◽

Erin Oerton ◽

...

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Drug Target ◽

Large Scale ◽

Expression Data ◽

Protein Targets ◽

Proteomic Data ◽

Sufficient Power ◽

Cross Platform ◽

Proteomic Techniques

Discovery of protein quantitative trait loci (pQTLs) has been enabled by affinity-based proteomic techniques and is increasingly used to guide genetically informed drug target evaluation. Large-scale proteomic data are now being created, but systematic, bidirectional assessment of platform differences is lacking, restricting clinical translation. We compared genetic, technical, and phenotypic determinants of 871 protein targets measured using both aptamer- (SomaScan® Platform v4) and antibody-based (Olink) assays in up to 10,708 individuals. Correlations coefficients for overlapping protein targets varied widely (median 0.38, IQR: 0.08-0.64). We found that 64% of pQTLs were shared across both platforms among all identified 608 cis- and 1,315 trans-pQTLs with sufficient power for replication, but with correlations of effect estimates being lower than previously reported (cis: 0.41, trans: 0.34). We identified technical, protein, and variant characteristics that contributed significantly to platform differences and found contradicting phenotypic associations attributable to those. We demonstrate how integrating phenomic and gene expression data improves genetic prioritisation strategies, including platform-specific pQTLs.

Download Full-text

Tumor Classification from Gene Expression Data: A Coding-Based Multiclass Learning Approach

Biological and Medical Data Analysis - Lecture Notes in Computer Science ◽

10.1007/11573067_22 ◽

2005 ◽

pp. 211-222

Author(s):

Alexander Hüntemann ◽

José C. González ◽

Elizabeth Tapia

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Tumor Classification ◽

Learning Approach ◽

Expression Data

Download Full-text

Network Insights into Improving Drug Target Inference Algorithms

10.1101/2020.01.17.910885 ◽

2020 ◽

Cited By ~ 1

Author(s):

Muying Wang ◽

Heeju Noh ◽

Ericka Mochan ◽

Jason E. Shoemaker

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Protein Interactions ◽

Drug Target ◽

Drug Targets ◽

Interaction Network ◽

Expression Data ◽

Protein Protein Interactions ◽

Cell Tissue ◽

Inference Algorithms

AbstractTo improve the efficacy of drug research and development (R&D), a better understanding of drug mechanisms of action (MoA) is needed to improve drug discovery. Computational algorithms, such as ProTINA, that integrate protein-protein interactions (PPIs), protein-gene interactions (PGIs) and gene expression data have shown promising performance on drug target inference. In this work, we evaluated how network and gene expression data affect ProTINA’s accuracy. Network data predominantly determines the accuracy of ProTINA instead of gene expression, while the size of an interaction network or selecting cell/tissue-specific networks have limited effects on the accuracy. However, we found that protein network betweenness values showed high accuracy in predicting drug targets. Therefore, we suggested a new algorithm, TREAP (https://github.com/ImmuSystems-Lab/TREAP), that combines betweenness values and adjusted p-values for target inference. This algorithm has resulted in higher accuracy than ProTINA using the same datasets.

Download Full-text

Microarray Gene Expression Data Clustering Using Red Black Tree Based K-Means Algorithm

INTERNATIONAL JOURNAL OF MANAGEMENT & INFORMATION TECHNOLOGY ◽

10.24297/ijmit.v1i3.1428 ◽

2012 ◽

Vol 1 (3) ◽

pp. 54-58 ◽

Cited By ~ 1

Author(s):

E K Jasila ◽

K A Abdul Nazeer

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Microarray Gene Expression Data ◽

Data Sets ◽

Expression Data ◽

Microarray Gene Expression ◽

Gene Expression Data Clustering ◽

Modern Era ◽

Red Black Tree

The need of high quality clustering is very important in the modern era of information processing. Clustering is one of the most important data analysis methods and the k-means clustering is commonly used for diverse applications. Despite its simplicity and ease of implementation, the k-means algorithm is computationally expensive and the quality of clusters is determined by the random choice of initial centroids. Different methods were proposed for improving the accuracy and efficiency of the k-means algorithm. In this paper, we propose a new approach that improves the accuracy of clustering microarray based gene expression data sets. In the proposed method, the initial centroids are determined by using the Red Black Tree and an improved heuristic approach is used to assign the data items to the nearest centroids. Experimental results show that the proposed algorithm performs better than other existing algorithms.

Download Full-text