A bayesian method for biological pathway discovery from high-throughput experimental data

High-throughput experiments are routinely performed in modern biological studies. However, extracting meaningful results from massive experimental data sets is a challenging task for biologists. Projecting data onto pathway and network contexts is a powerful way to unravel patterns embedded in seemingly scattered large data sets and assist knowledge discovery related to cancer and other complex diseases. We have developed a Cytoscape app called “ReactomeFIViz”, which utilizes a highly reliable gene functional interaction network and human curated pathways from Reactome and other pathway databases. This app provides a suite of features to assist biologists in performing pathway- and network-based data analysis in a biologically intuitive and user-friendly way. Biologists can use this app to uncover network and pathway patterns related to their studies, search for gene signatures from gene expression data sets, reveal pathways significantly enriched by genes in a list, and integrate multiple genomic data types into a pathway context using probabilistic graphical models. We believe our app will give researchers substantial power to analyze intrinsically noisy high-throughput experimental data to find biologically relevant information.

Download Full-text

Biclustering of DNA Microarray Data

Bioinformatics ◽

10.4018/978-1-4666-3604-0.ch029 ◽

2013 ◽

pp. 513-551 ◽

Cited By ~ 2

Author(s):

Alain B. Tchagang ◽

Youlian Pan ◽

Fazel Famili ◽

Ahmed H. Tewfik ◽

Panayiotis V. Benos

Keyword(s):

Experimental Data ◽

Data Analysis ◽

Dna Microarray ◽

High Throughput ◽

Microarray Data ◽

Evaluation Methods ◽

Microarray Data Analysis ◽

Dna Microarray Data ◽

Daunting Task

In this chapter, different methods and applications of biclustering algorithms to DNA microarray data analysis that have been developed in recent years are discussed and compared. Identification of biological significant clusters of genes from microarray experimental data is a very daunting task that emerged, especially with the development of high throughput technologies. Various computational and evaluation methods based on diverse principles were introduced to identify new similarities among genes. Mathematical aspects of the models are highlighted, and applications to solve biological problems are discussed.

Download Full-text

HypercubeME: two hundred million combinatorially complete datasets from a single experiment

Bioinformatics ◽

10.1093/bioinformatics/btz841 ◽

2019 ◽

Author(s):

Laura Avino Esteban ◽

Lyubov R Lonishin ◽

Daniil Bobrovskiy ◽

Gregory Leleytner ◽

Natalya S Bogatyreva ◽

...

Keyword(s):

Experimental Data ◽

High Throughput ◽

Recursive Algorithm ◽

Random Mutagenesis ◽

Higher Order ◽

Supplementary Information ◽

Single Experiment ◽

Manual Curation ◽

Complete Dataset ◽

Genotype Space

Abstract Motivation Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a “combinatorially complete dataset”. So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets. Results We present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199,847,053 unique combinatorially complete genotype combinations of dimensionality ranging from two to twelve. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data. Availability https://github.com/ivankovlab/HypercubeME.git Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Application: Biological Pathway Discovery

Web Service Mining ◽

10.1007/978-1-4419-6539-4_5 ◽

2010 ◽

pp. 77-109

Author(s):

George Zheng ◽

Athman Bouguettaya

Keyword(s):

Biological Pathway ◽

Pathway Discovery

Download Full-text

Web Service Mining for Biological Pathway Discovery

Lecture Notes in Computer Science - Data Integration in the Life Sciences ◽

10.1007/11530084_25 ◽

2005 ◽

pp. 292-295 ◽

Cited By ~ 2

Author(s):

George Zheng ◽

Athman Bouguettaya

Keyword(s):

Web Service ◽

Biological Pathway ◽

Pathway Discovery

Download Full-text

Cellular function prediction and biological pathway discovery in Arabidopsis thaliana using microarray data

The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society ◽

10.1109/iembs.2004.1403820 ◽

2005 ◽

Author(s):

T. Joshi ◽

Y. Chen ◽

N. Alexandrov ◽

D. Xu

Keyword(s):

Arabidopsis Thaliana ◽

Microarray Data ◽

Cellular Function ◽

Function Prediction ◽

Biological Pathway ◽

Pathway Discovery

Download Full-text

Discovering reliable protein interactions from high-throughput experimental data using network topology

Artificial Intelligence in Medicine ◽

10.1016/j.artmed.2005.02.004 ◽

2005 ◽

Vol 35 (1-2) ◽

pp. 37-47 ◽

Cited By ~ 32

Author(s):

Jin Chen ◽

Wynne Hsu ◽

Mong Li Lee ◽

See-Kiong Ng

Keyword(s):

Experimental Data ◽

High Throughput ◽

Network Topology ◽

Protein Interactions

Download Full-text

Identification and analysis of the regulatory network of Myc and microRNAs from high-throughput experimental data

Computers in Biology and Medicine ◽

10.1016/j.compbiomed.2013.06.002 ◽

2013 ◽

Vol 43 (9) ◽

pp. 1252-1260 ◽

Cited By ~ 8

Author(s):

Lili Xiong ◽

Wei Jiang ◽

Rui Zhou ◽

Canquan Mao ◽

Zhiyun Guo

Keyword(s):

Experimental Data ◽

High Throughput ◽

Regulatory Network

Download Full-text

HypercubeME: two hundred million combinatorially complete datasets from a single experiment

10.1101/741827 ◽

2019 ◽

Author(s):

Laura Avino Esteban ◽

Lyubov R. Lonishin ◽

Daniil Bobrovskiy ◽

Gregory Leleytner ◽

Natalya S. Bogatyreva ◽

...

Keyword(s):

Experimental Data ◽

High Throughput ◽

Recursive Algorithm ◽

Random Mutagenesis ◽

Higher Order ◽

Single Experiment ◽

Manual Curation ◽

Complete Dataset ◽

Genotype Space ◽

High Throughput Manner

AbstractMotivationEpistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a “combinatorially complete dataset”. So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets.ResultsWe present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199,847,053 unique combinatorially complete genotype combinations of dimensionality ranging from two to twelve. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data.Availabilityhttps://github.com/ivankovlab/HypercubeME.git.

Download Full-text

reactIDR: Evaluation of the statistical reproducibility of high-throughput structural analyses for a robust RNA reactivity classification

10.1101/275016 ◽

2018 ◽

Author(s):

Risa Kawaguchi ◽

Hisanori Kiryu ◽

Junichi Iwakiri ◽

Jun Sese

Keyword(s):

Experimental Data ◽

High Throughput ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Classification Problem ◽

Supplementary Information ◽

Dimensional Structure ◽

Data Generation ◽

Multiple Sources ◽

Stem Loop

AbstractMotivationRecently, next-generation sequencing techniques have been applied for the detection of RNA secondary structures called high-throughput RNA structural (HTS) analy- sis, and dozens of different protocols were used to detect comprehensive RNA structures at single-nucleotide resolution. However, the existing computational analyses heavily depend on experimental data generation methodology, which results in many difficulties associated with statistically sound comparisons or combining the results obtained using different HTS methods.ResultsHere, we introduced a statistical framework, reactIDR, which is applicable to the experimental data obtained using multiple HTS methodologies, and it classifies the nucleotides into three structural categories, stem, loop, and unmapped. reactIDR uses the irreproducible discovery rate (IDR) with a hidden Markov model (HMM) to discriminate accurately between the true and spurious signals obtained in the replicated HTS experiments. In reactIDR, IDR and HMM parameters are efficiently optimized by using an expectation-maximization algorithm. Furthermore, if known reference structures are given, a supervised learning can be applicable in a semi-supervised manner. The results of our analyses for real HTS data showed that reactIDR achieved the highest accuracy in the classification problem of stem/loop structures of rRNA using both individual and integrated HTS datasets as well as the best correspondence with the three-dimensional structure. Because reactIDR is the first method to compare HTS datasets obtained from multiple sources in a single unified model, it has a great potential to increase the accuracy of RNA secondary structure prediction at transcriptome-wide level with further experiments performed.AvailabilityreactIDR is implemented in Python. Source code is publicly available at https://github.com/carushi/reactIDRhttps://github.com/carushi/[email protected] informationSupplementary data are available at online.

Download Full-text