A bayesian method for biological pathway discovery from high-throughput experimental data

Author(s):  
Wei Wang ◽  
G.F. Cooper
F1000Research ◽  
2014 ◽  
Vol 3 ◽  
pp. 146 ◽  
Author(s):  
Guanming Wu ◽  
Eric Dawson ◽  
Adrian Duong ◽  
Robin Haw ◽  
Lincoln Stein

High-throughput experiments are routinely performed in modern biological studies. However, extracting meaningful results from massive experimental data sets is a challenging task for biologists. Projecting data onto pathway and network contexts is a powerful way to unravel patterns embedded in seemingly scattered large data sets and assist knowledge discovery related to cancer and other complex diseases. We have developed a Cytoscape app called “ReactomeFIViz”, which utilizes a highly reliable gene functional interaction network and human curated pathways from Reactome and other pathway databases. This app provides a suite of features to assist biologists in performing pathway- and network-based data analysis in a biologically intuitive and user-friendly way. Biologists can use this app to uncover network and pathway patterns related to their studies, search for gene signatures from gene expression data sets, reveal pathways significantly enriched by genes in a list, and integrate multiple genomic data types into a pathway context using probabilistic graphical models. We believe our app will give researchers substantial power to analyze intrinsically noisy high-throughput experimental data to find biologically relevant information.


2013 ◽  
pp. 513-551 ◽  
Author(s):  
Alain B. Tchagang ◽  
Youlian Pan ◽  
Fazel Famili ◽  
Ahmed H. Tewfik ◽  
Panayiotis V. Benos

In this chapter, different methods and applications of biclustering algorithms to DNA microarray data analysis that have been developed in recent years are discussed and compared. Identification of biological significant clusters of genes from microarray experimental data is a very daunting task that emerged, especially with the development of high throughput technologies. Various computational and evaluation methods based on diverse principles were introduced to identify new similarities among genes. Mathematical aspects of the models are highlighted, and applications to solve biological problems are discussed.


2019 ◽  
Author(s):  
Laura Avino Esteban ◽  
Lyubov R Lonishin ◽  
Daniil Bobrovskiy ◽  
Gregory Leleytner ◽  
Natalya S Bogatyreva ◽  
...  

Abstract Motivation Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a “combinatorially complete dataset”. So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets. Results We present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199,847,053 unique combinatorially complete genotype combinations of dimensionality ranging from two to twelve. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data. Availability https://github.com/ivankovlab/HypercubeME.git Supplementary information Supplementary data are available at Bioinformatics online.


2010 ◽  
pp. 77-109
Author(s):  
George Zheng ◽  
Athman Bouguettaya

2013 ◽  
Vol 43 (9) ◽  
pp. 1252-1260 ◽  
Author(s):  
Lili Xiong ◽  
Wei Jiang ◽  
Rui Zhou ◽  
Canquan Mao ◽  
Zhiyun Guo

2019 ◽  
Author(s):  
Laura Avino Esteban ◽  
Lyubov R. Lonishin ◽  
Daniil Bobrovskiy ◽  
Gregory Leleytner ◽  
Natalya S. Bogatyreva ◽  
...  

AbstractMotivationEpistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a “combinatorially complete dataset”. So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets.ResultsWe present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199,847,053 unique combinatorially complete genotype combinations of dimensionality ranging from two to twelve. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data.Availabilityhttps://github.com/ivankovlab/HypercubeME.git.


2018 ◽  
Author(s):  
Risa Kawaguchi ◽  
Hisanori Kiryu ◽  
Junichi Iwakiri ◽  
Jun Sese

AbstractMotivationRecently, next-generation sequencing techniques have been applied for the detection of RNA secondary structures called high-throughput RNA structural (HTS) analy- sis, and dozens of different protocols were used to detect comprehensive RNA structures at single-nucleotide resolution. However, the existing computational analyses heavily depend on experimental data generation methodology, which results in many difficulties associated with statistically sound comparisons or combining the results obtained using different HTS methods.ResultsHere, we introduced a statistical framework, reactIDR, which is applicable to the experimental data obtained using multiple HTS methodologies, and it classifies the nucleotides into three structural categories, stem, loop, and unmapped. reactIDR uses the irreproducible discovery rate (IDR) with a hidden Markov model (HMM) to discriminate accurately between the true and spurious signals obtained in the replicated HTS experiments. In reactIDR, IDR and HMM parameters are efficiently optimized by using an expectation-maximization algorithm. Furthermore, if known reference structures are given, a supervised learning can be applicable in a semi-supervised manner. The results of our analyses for real HTS data showed that reactIDR achieved the highest accuracy in the classification problem of stem/loop structures of rRNA using both individual and integrated HTS datasets as well as the best correspondence with the three-dimensional structure. Because reactIDR is the first method to compare HTS datasets obtained from multiple sources in a single unified model, it has a great potential to increase the accuracy of RNA secondary structure prediction at transcriptome-wide level with further experiments performed.AvailabilityreactIDR is implemented in Python. Source code is publicly available at https://github.com/carushi/reactIDRhttps://github.com/carushi/[email protected] informationSupplementary data are available at online.


Sign in / Sign up

Export Citation Format

Share Document