scholarly journals HypercubeME: two hundred million combinatorially complete datasets from a single experiment

2019 ◽  
Author(s):  
Laura Avino Esteban ◽  
Lyubov R Lonishin ◽  
Daniil Bobrovskiy ◽  
Gregory Leleytner ◽  
Natalya S Bogatyreva ◽  
...  

Abstract Motivation Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a “combinatorially complete dataset”. So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets. Results We present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199,847,053 unique combinatorially complete genotype combinations of dimensionality ranging from two to twelve. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data. Availability https://github.com/ivankovlab/HypercubeME.git Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Author(s):  
Laura Avino Esteban ◽  
Lyubov R. Lonishin ◽  
Daniil Bobrovskiy ◽  
Gregory Leleytner ◽  
Natalya S. Bogatyreva ◽  
...  

AbstractMotivationEpistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a “combinatorially complete dataset”. So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets.ResultsWe present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199,847,053 unique combinatorially complete genotype combinations of dimensionality ranging from two to twelve. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data.Availabilityhttps://github.com/ivankovlab/HypercubeME.git.


2018 ◽  
Author(s):  
Risa Kawaguchi ◽  
Hisanori Kiryu ◽  
Junichi Iwakiri ◽  
Jun Sese

AbstractMotivationRecently, next-generation sequencing techniques have been applied for the detection of RNA secondary structures called high-throughput RNA structural (HTS) analy- sis, and dozens of different protocols were used to detect comprehensive RNA structures at single-nucleotide resolution. However, the existing computational analyses heavily depend on experimental data generation methodology, which results in many difficulties associated with statistically sound comparisons or combining the results obtained using different HTS methods.ResultsHere, we introduced a statistical framework, reactIDR, which is applicable to the experimental data obtained using multiple HTS methodologies, and it classifies the nucleotides into three structural categories, stem, loop, and unmapped. reactIDR uses the irreproducible discovery rate (IDR) with a hidden Markov model (HMM) to discriminate accurately between the true and spurious signals obtained in the replicated HTS experiments. In reactIDR, IDR and HMM parameters are efficiently optimized by using an expectation-maximization algorithm. Furthermore, if known reference structures are given, a supervised learning can be applicable in a semi-supervised manner. The results of our analyses for real HTS data showed that reactIDR achieved the highest accuracy in the classification problem of stem/loop structures of rRNA using both individual and integrated HTS datasets as well as the best correspondence with the three-dimensional structure. Because reactIDR is the first method to compare HTS datasets obtained from multiple sources in a single unified model, it has a great potential to increase the accuracy of RNA secondary structure prediction at transcriptome-wide level with further experiments performed.AvailabilityreactIDR is implemented in Python. Source code is publicly available at https://github.com/carushi/reactIDRhttps://github.com/carushi/[email protected] informationSupplementary data are available at online.


2019 ◽  
Author(s):  
Huifang Xu ◽  
Weinan Liang ◽  
Linlin Ning ◽  
Yuanyuan Jiang ◽  
Wenxia Yang ◽  
...  

P450 fatty acid decarboxylases (FADCs) have recently been attracting considerable attention owing to their one-step direct production of industrially important 1-alkenes from biologically abundant feedstock free fatty acids under mild conditions. However, attempts to improve the catalytic activity of FADCs have met with little success. Protein engineering has been limited to selected residues and small mutant libraries due to lack of an effective high-throughput screening (HTS) method. Here, we devise a catalase-deficient <i>Escherichia coli</i> host strain and report an HTS approach based on colorimetric detection of H<sub>2</sub>O<sub>2</sub>-consumption activity of FADCs. Directed evolution enabled by this method has led to effective identification for the first time of improved FADC variants for medium-chain 1-alkene production from both DNA shuffling and random mutagenesis libraries. Advantageously, this screening method can be extended to other enzymes that stoichiometrically utilize H<sub>2</sub>O<sub>2</sub> as co-substrate.


F1000Research ◽  
2014 ◽  
Vol 3 ◽  
pp. 146 ◽  
Author(s):  
Guanming Wu ◽  
Eric Dawson ◽  
Adrian Duong ◽  
Robin Haw ◽  
Lincoln Stein

High-throughput experiments are routinely performed in modern biological studies. However, extracting meaningful results from massive experimental data sets is a challenging task for biologists. Projecting data onto pathway and network contexts is a powerful way to unravel patterns embedded in seemingly scattered large data sets and assist knowledge discovery related to cancer and other complex diseases. We have developed a Cytoscape app called “ReactomeFIViz”, which utilizes a highly reliable gene functional interaction network and human curated pathways from Reactome and other pathway databases. This app provides a suite of features to assist biologists in performing pathway- and network-based data analysis in a biologically intuitive and user-friendly way. Biologists can use this app to uncover network and pathway patterns related to their studies, search for gene signatures from gene expression data sets, reveal pathways significantly enriched by genes in a list, and integrate multiple genomic data types into a pathway context using probabilistic graphical models. We believe our app will give researchers substantial power to analyze intrinsically noisy high-throughput experimental data to find biologically relevant information.


2013 ◽  
pp. 513-551 ◽  
Author(s):  
Alain B. Tchagang ◽  
Youlian Pan ◽  
Fazel Famili ◽  
Ahmed H. Tewfik ◽  
Panayiotis V. Benos

In this chapter, different methods and applications of biclustering algorithms to DNA microarray data analysis that have been developed in recent years are discussed and compared. Identification of biological significant clusters of genes from microarray experimental data is a very daunting task that emerged, especially with the development of high throughput technologies. Various computational and evaluation methods based on diverse principles were introduced to identify new similarities among genes. Mathematical aspects of the models are highlighted, and applications to solve biological problems are discussed.


2019 ◽  
Vol 36 (8) ◽  
pp. 2584-2586 ◽  
Author(s):  
Dominika Labudová ◽  
Jiří Hon ◽  
Matej Lexa

Abstract Motivation G-quadruplex is a DNA or RNA form in which four guanine-rich regions are held together by base pairing between guanine nucleotides in coordination with potassium ions. G-quadruplexes are increasingly seen as a biologically important component of genomes. Their detection in vivo is problematic; however, sequencing and spectrometric techniques exist for their in vitro detection. We previously devised the pqsfinder algorithm for PQS identification, implemented it in C++ and published as an R/Bioconductor package. We looked for ways to optimize pqsfinder for faster and user-friendly sequence analysis. Results We identified two weak points where pqsfinder could be optimized. We modified the internals of the recursive algorithm to avoid matching and scoring many sub-optimal PQS conformations that are later discarded. To accommodate the needs of a broader range of users, we created a website for submission of sequence analysis jobs that does not require knowledge of R to use pqsfinder. Availability and implementation https://pqsfinder.fi.muni.cz, https://bioconductor.org/packages/pqsfinder. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (18) ◽  
pp. 3279-3286 ◽  
Author(s):  
Enrico Siragusa ◽  
Niina Haiminen ◽  
Richard Finkers ◽  
Richard Visser ◽  
Laxmi Parida

Abstract Summary Haplotype assembly of polyploids is an open issue in plant genomics. Recent experimental studies on highly heterozygous autotetraploid potato have shown that available methods do not deliver satisfying results in practice. We propose an optimal method to assemble haplotypes of highly heterozygous polyploids from Illumina short-sequencing reads. Our method is based on a generalization of the existing minimum fragment removal model to the polyploid case and on new integer linear programs to reconstruct optimal haplotypes. We validate our methods experimentally by means of a combined evaluation on simulated and experimental data based on 83 previously sequenced autotetraploid potato cultivars. Results on simulated data show that our methods produce highly accurate haplotype assemblies, while results on experimental data confirm a sensible improvement over the state of the art. Availability and implementation Executables for Linux at http://github.com/Computational Genomics/HaplotypeAssembler. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document