PEAK Predicts Gene Regulatory Network Linkages during Sea Urchin Development with High Sensitivity from Gene Expression Data Alone

Abstract BackgroundGene regulatory network (GRN) inference can now take advantage of powerful machine learning algorithms to predict the entire landscape of gene-to-gene interactions with the potential to complement traditional experimental methods in building gene networks. However, the dynamical nature of embryonic development -- representing the time-dependent interactions between thousands of transcription factors, signaling molecules, and effector genes -- is one of the most challenging arenas for GRN prediction. ResultsIn this work, we show that successful GRN predictions for developmental systems from gene expression data alone can be obtained with the Priors Enriched Absent Knowledge (PEAK) network inference algorithm. PEAK is a noise-robust method that models gene expression dynamics via ordinary differential equations and selects the best network based on information-theoretic criteria coupled with the machine learning algorithm Elastic net. We test our GRN prediction methodology using two gene expression data sets for the purple sea urchin (S. purpuratus) and cross-check our results against existing GRN models that have been constructed and validated by over 30 years of experimental results. Our results found a remarkably high degree of sensitivity in identifying known gene interactions in the network (maximum 76.32%). We also generated 838 novel predictions for interactions that have not yet been described, which provide a resource for researchers to use to further complete the sea urchin GRN. ConclusionsGRN predictions that match known gene interactions can be produced using gene expression data alone from developmental time series experiments.

Download Full-text

Developmental gene regulatory network connections predicted by machine learning from gene expression data alone

PLoS ONE ◽

10.1371/journal.pone.0261926 ◽

2021 ◽

Vol 16 (12) ◽

pp. e0261926

Author(s):

Jingyi Zhang ◽

Farhan Ibrahim ◽

Emily Najmulski ◽

George Katholos ◽

Doaa Altarawy ◽

...

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Gene Regulatory Network ◽

Sea Urchin ◽

Gene Expression Data ◽

Regulatory Network ◽

Machine Learning Algorithms ◽

Gene Interactions ◽

Expression Data ◽

Gene Regulatory

Gene regulatory network (GRN) inference can now take advantage of powerful machine learning algorithms to complement traditional experimental methods in building gene networks. However, the dynamical nature of embryonic development–representing the time-dependent interactions between thousands of transcription factors, signaling molecules, and effector genes–is one of the most challenging arenas for GRN prediction. In this work, we show that successful GRN predictions for a developmental network from gene expression data alone can be obtained with the Priors Enriched Absent Knowledge (PEAK) network inference algorithm. PEAK is a noise-robust method that models gene expression dynamics via ordinary differential equations and selects the best network based on information-theoretic criteria coupled with the machine learning algorithm Elastic Net. We test our GRN prediction methodology using two gene expression datasets for the purple sea urchin, Stronglyocentrotus purpuratus, and cross-check our results against existing GRN models that have been constructed and validated by over 30 years of experimental results. Our results find a remarkably high degree of sensitivity in identifying known gene interactions in the network (maximum 81.58%). We also generate novel predictions for interactions that have not yet been described, which provide a resource for researchers to use to further complete the sea urchin GRN. Published ChIPseq data and spatial co-expression analysis further support a subset of the top novel predictions. We conclude that GRN predictions that match known gene interactions can be produced using gene expression data alone from developmental time series experiments.

Download Full-text

New Ensemble Machine Learning Method for Classification and Prediction on Gene Expression Data

Encyclopedia of Healthcare Information Systems ◽

10.4018/978-1-59904-889-5.ch122 ◽

2008 ◽

pp. 982-989 ◽

Cited By ~ 1

Author(s):

Ching Wei Wang

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Gene Expression Data ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Experimental Results ◽

Machine Learning Algorithm ◽

Expression Data ◽

Ensemble Machine Learning

One of the most active areas of research in supervised machine learning has been to study methods for constructing good ensembles of classifiers. The main discovery is that the ensemble classifier often performs much better than single classifiers that make them up. Recent researches (Dettling, 2004, Tan & Gilbert, 2003) have confirmed the utility of ensemble machine learning algorithms for gene expression analysis. The motivation of this work is to investigate a suitable machine learning algorithm for classification and prediction on gene expression data. The research starts with analyzing the behavior and weaknesses of three popular ensemble machine learning methods—Bagging, Boosting, and Arcing—followed by presentation of a new ensemble machine learning algorithm. The proposed method is evaluated with the existing ensemble machine learning algorithms over 12 gene expression datasets (Alon et al., 1999; Armstrong et al., 2002; Ash et al., 2000; Catherine et al., 2003; Dinesh et al., 2002; Gavin et al., 2002; Golub et al., 1999; Scott et al., 2002; van ’t Veer et al., 2002; Yeoh et al., 2002; Zembutsu et al., 2002). The experimental results show that the proposed algorithm greatly outperforms existing methods, achieving high accuracy in classification. The outline of this chapter is as follows: Ensemble machine learning approach and three popular ensembles (i.e., Bagging, Boosting, and Arcing) are introduced first in the Background section; second, the analyses on existing ensembles, details of the proposed algorithm, and experimental results are presented in Method section, followed by discussions on the future trends and conclusion.

Download Full-text

Integrating Gene Ontology Based Grouping and Ranking into the Machine Learning Algorithm for Gene Expression Data Analysis

10.1007/978-3-030-87101-7_20 ◽

2021 ◽

pp. 205-214

Author(s):

Malik Yousef ◽

Ahmet Sayıcı ◽

Burcu Bakir-Gungor

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Gene Ontology ◽

Data Analysis ◽

Gene Expression Data ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Expression Data ◽

Gene Expression Data Analysis

Download Full-text

A Robust Procedure for Machine Learning Algorithms Using Gene Expression Data

Biointerface Research in Applied Chemistry ◽

10.33263/briac122.24222439 ◽

2021 ◽

Vol 12 (2) ◽

pp. 2422-2439

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Gene Expression Data ◽

Learning Algorithms ◽

Simulated Data ◽

Cancer Classification ◽

Machine Learning Algorithms ◽

Expression Data ◽

Traditional Procedure

Cancer classification is one of the main objectives for analyzing big biological datasets. Machine learning algorithms (MLAs) have been extensively used to accomplish this task. Several popular MLAs are available in the literature to classify new samples into normal or cancer populations. Nevertheless, most of them often yield lower accuracies in the presence of outliers, which leads to incorrect classification of samples. Hence, in this study, we present a robust approach for the efficient and precise classification of samples using noisy GEDs. We examine the performance of the proposed procedure in a comparison of the five popular traditional MLAs (SVM, LDA, KNN, Naïve Bayes, Random forest) using both simulated and real gene expression data analysis. We also considered several rates of outliers (10%, 20%, and 50%). The results obtained from simulated data confirm that the traditional MLAs produce better results through our proposed procedure in the presence of outliers using the proposed modified datasets. The further transcriptome analysis found the significant involvement of these extra features in cancer diseases. The results indicated the performance improvement of the traditional MLAs with our proposed procedure. Hence, we propose to apply the proposed procedure instead of the traditional procedure for cancer classification.

Download Full-text

Unsupervised Machine Learning for Data Encoding applied to Ovarian Cancer Transcriptomes

10.1101/855593 ◽

2019 ◽

Author(s):

Tom M George ◽

Pietro Lio

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Ovarian Cancer ◽

Gene Expression Data ◽

Clustering Algorithm ◽

Machine Learning Algorithms ◽

Data Sets ◽

Dimensional Manifold ◽

Expression Data ◽

Wide Range

AbstractMachine learning algorithms are revolutionising how information can be extracted from complex and high-dimensional data sets via intelligent compression. For example, unsupervised Autoen-coders train a deep neural network with a low-dimensional “bottlenecked” central layer to reconstruct input vectors. Variational Autoencoders (VAEs) have shown promise at learning meaningful latent spaces for text, image and more recently, gene-expression data. In the latter case they have been shown capable of capturing biologically relevant features such as a patients sex or tumour type. Here we train a VAE on ovarian cancer transcriptomes from The Cancer Genome Atlas and show that, in many cases, the latent spaces learns an encoding predictive of cisplatin chemotherapy resistance. We analyse the effectiveness of such an architecture to a wide range of hyperparameters as well as use a state-of-the-art clustering algorithm, t-SNE, to embed the data in a two-dimensional manifold and visualise the predictive power of the trained latent spaces. By correlating genes to resistance-predictive encodings we are able to extract biological processes likely responsible for platinum resistance. Finally we demonstrate that variational autoencoders can reliably encode gene expression data contaminated with significant amounts of Gaussian and dropout noise, a necessary feature if this technique is to be applicable to other data sets, including those in non-medical fields.

Download Full-text

Machine-Learning Algorithms for Feature Selection from Gene Expression Data

Statistical Modelling and Machine Learning Principles for Bioinformatics Techniques, Tools, and Applications - Algorithms for Intelligent Systems ◽

10.1007/978-981-15-2445-5_10 ◽

2020 ◽

pp. 151-161 ◽

Cited By ~ 1

Author(s):

Nimrita Koul ◽

Sunilkumar S. Manvi

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Feature Selection ◽

Gene Expression Data ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Expression Data

Download Full-text

A Structure Learning Algorithm for Inference of Gene Networks from Microarray Gene Expression Data Using Bayesian Networks

2007 IEEE 7th International Symposium on BioInformatics and BioEngineering ◽

10.1109/bibe.2007.4375731 ◽

2007 ◽

Cited By ~ 6

Author(s):

Kazuyuki Numata ◽

Seiya Imoto ◽

Satoru Miyano

Keyword(s):

Gene Expression ◽

Bayesian Networks ◽

Gene Expression Data ◽

Gene Networks ◽

Structure Learning ◽

Learning Algorithm ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

Microarray Gene

Download Full-text

Prediction Errors in Learning Drug Response from Gene Expression Data – Influence of Labeling, Sample Size, and Machine Learning Algorithm

PLoS ONE ◽

10.1371/journal.pone.0070294 ◽

2013 ◽

Vol 8 (7) ◽

pp. e70294 ◽

Cited By ~ 10

Author(s):

Immanuel Bayer ◽

Philip Groth ◽

Sebastian Schneckener

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Sample Size ◽

Gene Expression Data ◽

Drug Response ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Prediction Errors ◽

Expression Data

Download Full-text

Machine Learning Algorithms for Predicting Chronic Obstructive Pulmonary Disease from Gene Expression Data with Class Imbalance

Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies ◽

10.5220/0010316501480153 ◽

2021 ◽

Author(s):

Kunti Mahmudah ◽

Bedy Purnama ◽

Fatma Indriani ◽

Kenji Satou

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Chronic Obstructive Pulmonary Disease ◽

Pulmonary Disease ◽

Gene Expression Data ◽

Class Imbalance ◽

Machine Learning Algorithms ◽

Chronic Obstructive ◽

Expression Data ◽

Obstructive Pulmonary Disease

Download Full-text

Performance evaluation of different machine learning algorithms in presence of outliers using gene expression data

Journal of Bio-Science ◽

10.3329/jbs.v28i0.44712 ◽

2019 ◽

Vol 28 ◽

pp. 69-80

Author(s):

M Shahjaman ◽

MM Rashid ◽

MI Asifuzzaman ◽

H Akter ◽

SMS Islam ◽

...

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Gene Expression Data ◽

Naive Bayes ◽

Learning Algorithms ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Expression Data ◽

Data Generating Process

Classification of samples into one or more populations is one of the main objectives of gene expression data (GED) analysis. Many machine learning algorithms were employed in several studies to perform this task. However, these studies did not consider the outliers problem. GEDs are often contaminated by outliers due to several steps involve in the data generating process from hybridization of DNA samples to image analysis. Most of the algorithms produce higher false positives and lower accuracies in presence of outliers, particularly for lower number of replicates in the biological conditions. Therefore, in this paper, a comprehensive study has been carried out among five popular machine learning algorithms (SVM, RF, Naïve Bayes, k-NN and LDA) using both simulated and real gene expression datasets, in absence and presence of outliers. Three different rates of outliers (5%, 10% and 50%) and six performance indices (TPR, FPR, TNR, FNR, FDR and AUC) were considered to investigate the performance of five machine learning algorithms. Both simulated and real GED analysis results revealed that SVM produced comparatively better performance than the other four algorithms (RF, Naïve Bayes, k-NN and LDA) for both small-and-large sample sizes. J. bio-sci. 28: 69-80, 2020

Download Full-text