HypercubeME: two hundred million combinatorially complete datasets from a single experiment

Bioinformatics ◽

10.1093/bioinformatics/btz841 ◽

2019 ◽

Author(s):

Laura Avino Esteban ◽

Lyubov R Lonishin ◽

Daniil Bobrovskiy ◽

Gregory Leleytner ◽

Natalya S Bogatyreva ◽

...

Keyword(s):

Experimental Data ◽

High Throughput ◽

Recursive Algorithm ◽

Random Mutagenesis ◽

Higher Order ◽

Supplementary Information ◽

Single Experiment ◽

Manual Curation ◽

Complete Dataset ◽

Genotype Space

Abstract Motivation Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a “combinatorially complete dataset”. So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets. Results We present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199,847,053 unique combinatorially complete genotype combinations of dimensionality ranging from two to twelve. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data. Availability https://github.com/ivankovlab/HypercubeME.git Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

HypercubeME: two hundred million combinatorially complete datasets from a single experiment

10.1101/741827 ◽

2019 ◽

Author(s):

Laura Avino Esteban ◽

Lyubov R. Lonishin ◽

Daniil Bobrovskiy ◽

Gregory Leleytner ◽

Natalya S. Bogatyreva ◽

...

Keyword(s):

Experimental Data ◽

High Throughput ◽

Recursive Algorithm ◽

Random Mutagenesis ◽

Higher Order ◽

Single Experiment ◽

Manual Curation ◽

Complete Dataset ◽

Genotype Space ◽

High Throughput Manner

AbstractMotivationEpistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a “combinatorially complete dataset”. So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets.ResultsWe present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199,847,053 unique combinatorially complete genotype combinations of dimensionality ranging from two to twelve. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data.Availabilityhttps://github.com/ivankovlab/HypercubeME.git.

Download Full-text

reactIDR: Evaluation of the statistical reproducibility of high-throughput structural analyses for a robust RNA reactivity classification

10.1101/275016 ◽

2018 ◽

Author(s):

Risa Kawaguchi ◽

Hisanori Kiryu ◽

Junichi Iwakiri ◽

Jun Sese

Keyword(s):

Experimental Data ◽

High Throughput ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Classification Problem ◽

Supplementary Information ◽

Dimensional Structure ◽

Data Generation ◽

Multiple Sources ◽

Stem Loop

AbstractMotivationRecently, next-generation sequencing techniques have been applied for the detection of RNA secondary structures called high-throughput RNA structural (HTS) analy- sis, and dozens of different protocols were used to detect comprehensive RNA structures at single-nucleotide resolution. However, the existing computational analyses heavily depend on experimental data generation methodology, which results in many difficulties associated with statistically sound comparisons or combining the results obtained using different HTS methods.ResultsHere, we introduced a statistical framework, reactIDR, which is applicable to the experimental data obtained using multiple HTS methodologies, and it classifies the nucleotides into three structural categories, stem, loop, and unmapped. reactIDR uses the irreproducible discovery rate (IDR) with a hidden Markov model (HMM) to discriminate accurately between the true and spurious signals obtained in the replicated HTS experiments. In reactIDR, IDR and HMM parameters are efficiently optimized by using an expectation-maximization algorithm. Furthermore, if known reference structures are given, a supervised learning can be applicable in a semi-supervised manner. The results of our analyses for real HTS data showed that reactIDR achieved the highest accuracy in the classification problem of stem/loop structures of rRNA using both individual and integrated HTS datasets as well as the best correspondence with the three-dimensional structure. Because reactIDR is the first method to compare HTS datasets obtained from multiple sources in a single unified model, it has a great potential to increase the accuracy of RNA secondary structure prediction at transcriptome-wide level with further experiments performed.AvailabilityreactIDR is implemented in Python. Source code is publicly available at https://github.com/carushi/reactIDRhttps://github.com/carushi/[email protected] informationSupplementary data are available at online.

Download Full-text

Directed Evolution of P450 Fatty Acid Decarboxylases via High-Throughput Screening Towards Improved Catalytic Activity

10.26434/chemrxiv.9791162 ◽

2019 ◽

Author(s):

Huifang Xu ◽

Weinan Liang ◽

Linlin Ning ◽

Yuanyuan Jiang ◽

Wenxia Yang ◽

...

Keyword(s):

Catalytic Activity ◽

Fatty Acid ◽

Directed Evolution ◽

High Throughput ◽

High Throughput Screening ◽

Colorimetric Detection ◽

Screening Method ◽

Random Mutagenesis ◽

Direct Production ◽

Consumption Activity

P450 fatty acid decarboxylases (FADCs) have recently been attracting considerable attention owing to their one-step direct production of industrially important 1-alkenes from biologically abundant feedstock free fatty acids under mild conditions. However, attempts to improve the catalytic activity of FADCs have met with little success. Protein engineering has been limited to selected residues and small mutant libraries due to lack of an effective high-throughput screening (HTS) method. Here, we devise a catalase-deficient Escherichia coli host strain and report an HTS approach based on colorimetric detection of H2O2-consumption activity of FADCs. Directed evolution enabled by this method has led to effective identification for the first time of improved FADC variants for medium-chain 1-alkene production from both DNA shuffling and random mutagenesis libraries. Advantageously, this screening method can be extended to other enzymes that stoichiometrically utilize H2O2 as co-substrate.

Download Full-text

Pyrrolizin-3-one and its 1,2-dihydro derivative: structures of the free molecules determined by electron diffraction and ab initio calculations and in the crystal by X-ray diffractionElectronic supplementary information (ESI) available: further experimental data. See http://www.rsc.org/suppdata/p2/b1/b102475m/

Journal of the Chemical Society Perkin Transactions 2 ◽

10.1039/b102475m ◽

2001 ◽

pp. 2195-2201 ◽

Cited By ~ 4

Author(s):

Frank Blockhuys ◽

Sarah L. Hinchley ◽

Heather E. Robertson ◽

Alexander J. Blake ◽

Hamish McNab ◽

...

Keyword(s):

Experimental Data ◽

Electron Diffraction ◽

Ab Initio Calculations ◽

Ab Initio ◽

Supplementary Information ◽

X Ray ◽

Dihydro Derivative ◽

Free Molecules

Download Full-text

ReactomeFIViz: the Reactome FI Cytoscape app for pathway and network-based data analysis

F1000Research ◽

10.12688/f1000research.4431.1 ◽

2014 ◽

Vol 3 ◽

pp. 146 ◽

Cited By ~ 2

Author(s):

Guanming Wu ◽

Eric Dawson ◽

Adrian Duong ◽

Robin Haw ◽

Lincoln Stein

Keyword(s):

Experimental Data ◽

Data Analysis ◽

Graphical Models ◽

High Throughput ◽

Interaction Network ◽

Large Data ◽

Relevant Information ◽

Data Sets ◽

Data Types ◽

Biological Studies

High-throughput experiments are routinely performed in modern biological studies. However, extracting meaningful results from massive experimental data sets is a challenging task for biologists. Projecting data onto pathway and network contexts is a powerful way to unravel patterns embedded in seemingly scattered large data sets and assist knowledge discovery related to cancer and other complex diseases. We have developed a Cytoscape app called “ReactomeFIViz”, which utilizes a highly reliable gene functional interaction network and human curated pathways from Reactome and other pathway databases. This app provides a suite of features to assist biologists in performing pathway- and network-based data analysis in a biologically intuitive and user-friendly way. Biologists can use this app to uncover network and pathway patterns related to their studies, search for gene signatures from gene expression data sets, reveal pathways significantly enriched by genes in a list, and integrate multiple genomic data types into a pathway context using probabilistic graphical models. We believe our app will give researchers substantial power to analyze intrinsically noisy high-throughput experimental data to find biologically relevant information.

Download Full-text

Aqueous solutions that model the cytosol: studies on polarity, chemical reactivity and enzyme kineticsElectronic supplementary information (ESI) available: further results and discussion and tables of experimental data. See http://www.rsc.org/suppdata/ob/b4/b402886d/

Organic & Biomolecular Chemistry ◽

10.1039/b402886d ◽

2004 ◽

Vol 2 (9) ◽

pp. 1404 ◽

Cited By ~ 11

Author(s):

Nabil Asaad ◽

Marie Jetta den Otter ◽

Jan B. F. N. Engberts

Keyword(s):

Experimental Data ◽

Aqueous Solutions ◽

Chemical Reactivity ◽

Supplementary Information

Download Full-text

Biclustering of DNA Microarray Data

Bioinformatics ◽

10.4018/978-1-4666-3604-0.ch029 ◽

2013 ◽

pp. 513-551 ◽

Cited By ~ 2

Author(s):

Alain B. Tchagang ◽

Youlian Pan ◽

Fazel Famili ◽

Ahmed H. Tewfik ◽

Panayiotis V. Benos

Keyword(s):

Experimental Data ◽

Data Analysis ◽

Dna Microarray ◽

High Throughput ◽

Microarray Data ◽

Evaluation Methods ◽

Microarray Data Analysis ◽

Dna Microarray Data ◽

Daunting Task

In this chapter, different methods and applications of biclustering algorithms to DNA microarray data analysis that have been developed in recent years are discussed and compared. Identification of biological significant clusters of genes from microarray experimental data is a very daunting task that emerged, especially with the development of high throughput technologies. Various computational and evaluation methods based on diverse principles were introduced to identify new similarities among genes. Mathematical aspects of the models are highlighted, and applications to solve biological problems are discussed.

Download Full-text

A bayesian method for biological pathway discovery from high-throughput experimental data

Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004. ◽

10.1109/csb.2004.1332530 ◽

2004 ◽

Author(s):

Wei Wang ◽

G.F. Cooper

Keyword(s):

Experimental Data ◽

High Throughput ◽

Bayesian Method ◽

Biological Pathway ◽

Pathway Discovery

Download Full-text

pqsfinder web: G-quadruplex prediction using optimized pqsfinder algorithm

Bioinformatics ◽

10.1093/bioinformatics/btz928 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2584-2586 ◽

Cited By ~ 1

Author(s):

Dominika Labudová ◽

Jiří Hon ◽

Matej Lexa

Keyword(s):

Sequence Analysis ◽

Recursive Algorithm ◽

Supplementary Information ◽

Potassium Ions ◽

Bioconductor Package ◽

G Quadruplex ◽

Weak Points ◽

User Friendly

Abstract Motivation G-quadruplex is a DNA or RNA form in which four guanine-rich regions are held together by base pairing between guanine nucleotides in coordination with potassium ions. G-quadruplexes are increasingly seen as a biologically important component of genomes. Their detection in vivo is problematic; however, sequencing and spectrometric techniques exist for their in vitro detection. We previously devised the pqsfinder algorithm for PQS identification, implemented it in C++ and published as an R/Bioconductor package. We looked for ways to optimize pqsfinder for faster and user-friendly sequence analysis. Results We identified two weak points where pqsfinder could be optimized. We modified the internals of the recursive algorithm to avoid matching and scoring many sub-optimal PQS conformations that are later discarded. To accommodate the needs of a broader range of users, we created a website for submission of sequence analysis jobs that does not require knowledge of R to use pqsfinder. Availability and implementation https://pqsfinder.fi.muni.cz, https://bioconductor.org/packages/pqsfinder. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Haplotype assembly of autotetraploid potato using integer linear programing

Bioinformatics ◽

10.1093/bioinformatics/btz060 ◽

2019 ◽

Vol 35 (18) ◽

pp. 3279-3286 ◽

Cited By ~ 4

Author(s):

Enrico Siragusa ◽

Niina Haiminen ◽

Richard Finkers ◽

Richard Visser ◽

Laxmi Parida

Keyword(s):

Experimental Data ◽

Experimental Studies ◽

Simulated Data ◽

Supplementary Information ◽

Linear Programs ◽

Plant Genomics ◽

Optimal Method ◽

Haplotype Assembly ◽

Open Issue ◽

Removal Model

Abstract Summary Haplotype assembly of polyploids is an open issue in plant genomics. Recent experimental studies on highly heterozygous autotetraploid potato have shown that available methods do not deliver satisfying results in practice. We propose an optimal method to assemble haplotypes of highly heterozygous polyploids from Illumina short-sequencing reads. Our method is based on a generalization of the existing minimum fragment removal model to the polyploid case and on new integer linear programs to reconstruct optimal haplotypes. We validate our methods experimentally by means of a combined evaluation on simulated and experimental data based on 83 previously sequenced autotetraploid potato cultivars. Results on simulated data show that our methods produce highly accurate haplotype assemblies, while results on experimental data confirm a sensible improvement over the state of the art. Availability and implementation Executables for Linux at http://github.com/Computational Genomics/HaplotypeAssembler. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text