Prioritizing transcriptomic and epigenomic experiments by using an optimization strategy that leverages imputed data

AbstractSuccessful science often involves not only performing experiments well, but also choosing well among many possible experiments. In a hypothesis generation setting, choosing an experiment well means choosing an experiment whose results are interesting or novel. In this work, we formalize this selection procedure in the context of genomics and epigenomics data generation. Specifically, we consider the task faced by a scientific consortium such as the National Institutes of Health ENCODE Consortium, whose goal is to characterize all of the functional elements in the human genome. Given a list of possible cell types or tissue types (“biosamples”) and a list of possible high throughput sequencing assays, we ask “Which experiments should ENCODE perform next?” We demonstrate how to represent this task as an optimization problem, where the goal is to maximize the information gained in each successive experiment. Compared with previous work that has addressed a similar problem, our approach has the advantage that it can use imputed data to tailor the selected list of experiments based on data collected previously by the consortium. We demonstrate the utility of our proposed method in simulations, and we provide a general software framework, named Kiwano, for selecting genomic and epigenomic experiments.

Download Full-text

Prioritizing transcriptomic and epigenomic experiments by using an optimization strategy that leverages imputed data

Bioinformatics ◽

10.1093/bioinformatics/btaa830 ◽

2020 ◽

Author(s):

Jacob Schreiber ◽

Jeffrey Bilmes ◽

William Stafford Noble

Keyword(s):

Facility Location ◽

Domain Knowledge ◽

High Throughput Sequencing ◽

Selection Procedure ◽

Optimization Procedure ◽

Supplementary Information ◽

Data Generation ◽

Optimization Strategy ◽

Submodular Optimization ◽

Location Function

Abstract Motivation Successful science often involves not only performing experiments well, but also choosing well among many possible experiments. In a hypothesis generation setting, choosing an experiment well means choosing an experiment whose results are interesting or novel. In this work, we formalize this selection procedure in the context of genomics and epigenomics data generation. Specifically, we consider the task faced by a scientific consortium such as the National Institutes of Health ENCODE Consortium, whose goal is to characterize all of the functional elements in the human genome. Given a list of possible cell types or tissue types (“biosamples”) and a list of possible high throughput sequencing assays, where at least one experiment has been performed in each biosample and for each assay, we ask “Which experiments should ENCODE perform next?” Results We demonstrate how to represent this task as a submodular optimization problem, where the goal is to choose a panel of experiments that maximize the facility location function. A key aspect of our approach is that we use imputed data, rather than experimental data, to directly answer the posed question. We find that, across several evaluations, our method chooses a panel of experiments that span a diversity of biochemical activity. Finally, we propose two modifications the facility location function, including a novel submodular-supermodular function, that allow incorporation of domain knowledge or constraints into the optimization procedure. Availability and Implementation Our method is available as a Python package at https://github.com/jmschrei/kiwano and can be installed using the command pip install kiwano. The source code used here and the similarity matrix can be found at http://doi.org/10.5281/zenodo.3708538. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Advantages of using graph databases to explore chromatin conformation capture experiments

BMC Bioinformatics ◽

10.1186/s12859-020-03937-0 ◽

2021 ◽

Vol 22 (S2) ◽

Author(s):

Daniele D’Agostino ◽

Pietro Liò ◽

Marco Aldinucci ◽

Ivan Merelli

Keyword(s):

Web Application ◽

High Throughput Sequencing ◽

Cell Types ◽

Graph Database ◽

Graph Databases ◽

Sources Of Information ◽

Chromosome Conformation ◽

Wide Scale ◽

User Friendly ◽

Different Cell Types

Abstract Background High-throughput sequencing Chromosome Conformation Capture (Hi-C) allows the study of DNA interactions and 3D chromosome folding at the genome-wide scale. Usually, these data are represented as matrices describing the binary contacts among the different chromosome regions. On the other hand, a graph-based representation can be advantageous to describe the complex topology achieved by the DNA in the nucleus of eukaryotic cells. Methods Here we discuss the use of a graph database for storing and analysing data achieved by performing Hi-C experiments. The main issue is the size of the produced data and, working with a graph-based representation, the consequent necessity of adequately managing a large number of edges (contacts) connecting nodes (genes), which represents the sources of information. For this, currently available graph visualisation tools and libraries fall short with Hi-C data. The use of graph databases, instead, supports both the analysis and the visualisation of the spatial pattern present in Hi-C data, in particular for comparing different experiments or for re-mapping omics data in a space-aware context efficiently. In particular, the possibility of describing graphs through statistical indicators and, even more, the capability of correlating them through statistical distributions allows highlighting similarities and differences among different Hi-C experiments, in different cell conditions or different cell types. Results These concepts have been implemented in NeoHiC, an open-source and user-friendly web application for the progressive visualisation and analysis of Hi-C networks based on the use of the Neo4j graph database (version 3.5). Conclusion With the accumulation of more experiments, the tool will provide invaluable support to compare neighbours of genes across experiments and conditions, helping in highlighting changes in functional domains and identifying new co-organised genomic compartments.

Download Full-text

Test Data Generation and Selection Using Levy Flight-Based Firefly Algorithm

International Journal of Software Innovation ◽

10.4018/ijsi.2021040102 ◽

2021 ◽

Vol 9 (2) ◽

pp. 18-34

Author(s):

Abhishek Pandey ◽

Soumya Banerjee

Keyword(s):

Test Data ◽

Firefly Algorithm ◽

Optimization Problem ◽

Structural Testing ◽

Software Test ◽

Testing Time ◽

Test Data Generation ◽

Data Generation ◽

Test Suite Optimization ◽

Test Optimization

This article discusses the application of an improved version of the firefly algorithm for the test suite optimization problem. Software test optimization refers to optimizing test data generation and selection for structural testing criteria for white box testing. This will subsequently reduce the two most costly activities performed during testing: time and cost. Recently, various search-based approaches proved very interesting results for the software test optimization problem. Also, due to no free lunch theorem, scientists are continuously searching for more efficient and convergent methods for the optimization problem. In this paper, firefly algorithm is modified in a way that local search ability is improved. Levy flights are incorporated into the firefly algorithm. This modified algorithm is applied to the software test optimization problem. This is the first application of Levy-based firefly algorithm for software test optimization. Results are shown and compared with some existing metaheuristic approaches.

Download Full-text

DualSeqDB: the host–pathogen dual RNA sequencing database for infection processes

Nucleic Acids Research ◽

10.1093/nar/gkaa890 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D687-D693

Author(s):

Javier Macho Rendón ◽

Benjamin Lang ◽

Marc Ramos Llorens ◽

Gian Gaetano Tartaglia ◽

Marc Torrent Burgas

Keyword(s):

Pathogenic Bacteria ◽

High Throughput Sequencing ◽

Homo Sapiens ◽

Cell Types ◽

Natural Hosts ◽

Host Infection ◽

Infection Processes ◽

Different Strains

Abstract Despite antibiotic resistance being a matter of growing concern worldwide, the bacterial mechanisms of pathogenesis remain underexplored, restraining our ability to develop new antimicrobials. The rise of high-throughput sequencing technology has made available a massive amount of transcriptomic data that could help elucidate the mechanisms underlying bacterial infection. Here, we introduce the DualSeqDB database, a resource that helps the identification of gene transcriptional changes in both pathogenic bacteria and their natural hosts upon infection. DualSeqDB comprises nearly 300 000 entries from eight different studies, with information on bacterial and host differential gene expression under in vivo and in vitro conditions. Expression data values were calculated entirely from raw data and analyzed through a standardized pipeline to ensure consistency between different studies. It includes information on seven different strains of pathogenic bacteria and a variety of cell types and tissues in Homo sapiens, Mus musculus and Macaca fascicularis at different time points. We envisage that DualSeqDB can help the research community in the systematic characterization of genes involved in host infection and help the development and tailoring of new molecules against infectious diseases. DualSeqDB is freely available at http://www.tartaglialab.com/dualseq.

Download Full-text

Hi-C chromosome conformation capture sequencing of avian genomes using the BGISEQ-500 platform

GigaScience ◽

10.1093/gigascience/giaa087 ◽

2020 ◽

Vol 9 (8) ◽

Author(s):

Marcela Sandoval-Velasco ◽

Juan Antonio Rodríguez ◽

Cynthia Perez Estrada ◽

Guojie Zhang ◽

Erez Lieberman Aiden ◽

...

Keyword(s):

Next Generation Sequencing ◽

High Throughput Sequencing ◽

Data Generation ◽

Next Generation ◽

Sequencing Data ◽

Yield Data ◽

Chromosome Conformation ◽

Sequencing Platform ◽

Sequencing Platforms ◽

Generation Sequencing

Abstract Background Hi-C experiments couple DNA-DNA proximity with next-generation sequencing to yield an unbiased description of genome-wide interactions. Previous methods describing Hi-C experiments have focused on the industry-standard Illumina sequencing. With new next-generation sequencing platforms such as BGISEQ-500 becoming more widely available, protocol adaptations to fit platform-specific requirements are useful to give increased choice to researchers who routinely generate sequencing data. Results We describe an in situ Hi-C protocol adapted to be compatible with the BGISEQ-500 high-throughput sequencing platform. Using zebra finch (Taeniopygia guttata) as a biological sample, we demonstrate how Hi-C libraries can be constructed to generate informative data using the BGISEQ-500 platform, following circularization and DNA nanoball generation. Our protocol is a modification of an Illumina-compatible method, based around blunt-end ligations in library construction, using un-barcoded, distally overhanging double-stranded adapters, followed by amplification using indexed primers. The resulting libraries are ready for circularization and subsequent sequencing on the BGISEQ series of platforms and yield data similar to what can be expected using Illumina-compatible approaches. Conclusions Our straightforward modification to an Illumina-compatible in situHi-C protocol enables data generation on the BGISEQ series of platforms, thus expanding the options available for researchers who wish to utilize the powerful Hi-C techniques in their research.

Download Full-text

High-Throughput Sequencing is a Crucial Tool to Investigate the Contribution of Human Endogenous Retroviruses (HERVs) to Human Biology and Development

Viruses ◽

10.3390/v12060633 ◽

2020 ◽

Vol 12 (6) ◽

pp. 633 ◽

Cited By ~ 1

Author(s):

Maria Paola Pisano ◽

Nicole Grandi ◽

Enzo Tramontano

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Developmental Stages ◽

Large Fraction ◽

Expression Patterns ◽

Cell Types ◽

Endogenous Retroviruses ◽

Human Endogenous Retroviruses ◽

Retroviral Infections ◽

The Impact

Human Endogenous retroviruses (HERVs) are remnants of ancient retroviral infections that represent a large fraction of our genome. Their transcriptional activity is finely regulated in early developmental stages and their expression is modulated in different cell types and tissues. Such activity has an impact on human physiology and pathology that is only partially understood up to date. Novel high-throughput sequencing tools have recently allowed for a great advancement in elucidating the various HERV expression patterns in different tissues as well as the mechanisms controlling their transcription, and overall, have helped in gaining better insights in an all-inclusive understanding of the impact of HERVs in biology of the host.

Download Full-text

Optimization of fastener pattern in airframe assembly

Assembly Automation ◽

10.1108/aa-03-2019-0040 ◽

2020 ◽

Vol 40 (5) ◽

pp. 723-733

Author(s):

Sergey Lupuleac ◽

Tatiana Pogarskaia ◽

Maria Churilova ◽

Michael Kokkolaras ◽

Elodie Bonhomme

Keyword(s):

Optimization Problem ◽

Computational Effort ◽

Direct Search ◽

Optimization Strategy ◽

Mesh Adaptive Direct Search ◽

Optimization Analysis ◽

Content Type ◽

Rear Wing ◽

Random Character ◽

Problem Formulations

Purpose The authors consider the problem of optimizing temporary fastener patterns in aircraft assembly. Minimizing the number of fasteners while maintaining final product quality is one of the key enablers for intensifying production in the aerospace industry. The purpose of this study is to formulate the fastener pattern optimization problem and compare different solving approaches on both test benchmarks and rear wing-to-fuselage assembly of an Airbus A350-900. Design/methodology/approach The first considered algorithm is based on a local exhaustive search. It is proved to be efficient and reliable but requires much computational effort. Secondly, the Mesh Adaptive Direct Search (MADS) implemented in NOMAD software (Nonlinear Optimization by Mesh Adaptive Direct Search) is used to apply the powerful mathematical machinery of surrogate modeling and associated optimization strategy. In addition, another popular optimization algorithm called simulated annealing (SA) was implemented. Since a single fastener pattern must be used for the entire aircraft series, cross-validation of obtained results was applied. The available measured initial gaps from 340 different aircraft of the A350-900 series were used. Findings The results indicated that SA cannot be applicable as its random character does not provide repeatable results and requires tens of runs for any optimization analysis. Both local variations (LV) method and MADS have proved to be appropriate as they improved the existing fastener pattern for all available gaps. The modification of the MADS' search step was performed to exploit all the information the authors have about the problem. Originality/value The paper presents deterministic and probabilistic optimization problem formulations and considers three different approaches for their solution. The existing fastener pattern was improved.

Download Full-text

Make way for the ‘next generation’: application and prospects for genome-wide, epigenome-specific technologies in endocrine research

Journal of Molecular Endocrinology ◽

10.1530/jme-12-0045 ◽

2012 ◽

Vol 49 (1) ◽

pp. R19-R27 ◽

Cited By ~ 14

Author(s):

Richard D Emes ◽

William E Farrell

Keyword(s):

High Throughput Sequencing ◽

Cell Types ◽

Epigenetic Changes ◽

Disease States ◽

Base Level ◽

Genome Wide ◽

Endocrine Organs ◽

Life Threatening ◽

Mechanism Of Interaction ◽

Genome Level

Epigenetic changes, which target DNA and associated histones, can be described as a pivotal mechanism of interaction between genes and the environment. The field of epigenomics aims to detect and interpret epigenetic modifications at the whole genome level. These approaches have the potential to increase resolution of epigenetic changes to the single base level in multiple disease states or across a population of individuals. Identification and comparison of the epigenomic landscape has challenged our understanding of the regulation of phenotype. Additionally, inclusion of these marks as biomarkers in the early detection or progression monitoring of disease is providing novel avenues for future biomedical research. Cells of the endocrine organs, which include pituitary, thyroid, thymus, pancreas ovary and testes, have been shown to be susceptible to epigenetic alteration, leading to both local and systemic changes often resulting in life-threatening metabolic disease. As with other cell types and populations, endocrine cells are susceptible to tumour development, which in turn may have resulted from aberration of epigenetic control. Techniques including high-throughput sequencing and array-based analysis to investigate these changes have rapidly emerged and are continually evolving. Here, we present a review of these methods and their promise to influence our studies on the epigenome for endocrine research and perhaps to uncover novel therapeutic options in disease states.

Download Full-text

Time domain damage localization and quantification in seismically excited structures using a limited number of sensors

Journal of Vibration and Control ◽

10.1177/1077546315625141 ◽

2016 ◽

Vol 23 (18) ◽

pp. 2942-2961 ◽

Cited By ~ 7

Author(s):

Abdollah Bagheri ◽

Ali Zare Hosseinzadeh ◽

Piervincenzo Rizzo ◽

Gholamreza Ghodrati Amiri

Keyword(s):

Optimization Problem ◽

Moment Generating Function ◽

Discrete Wavelet ◽

Damage Localization ◽

Northridge Earthquake ◽

Optimization Strategy ◽

Noisy Signals ◽

Time Histories ◽

The Moment ◽

Different Levels

This paper presents a new algorithm to determine the occurrence, location, and severity of damage in structures subjected to earthquakes. The algorithm is based on the analysis of the time series associated with displacement or acceleration, and provided by a limited number of sensors. The algorithm is formulated in terms of an optimization problem. An objective function is defined based on the moment generating function for a segment of the time histories and an evolutionary optimization strategy, based on the competitive optimization algorithm, is employed to detect damage. The efficiency of the proposed method is numerically validated by studying the response of some structures subjected to the 1940 El-Centro earthquake and the 1994 Northridge earthquake. In order to simulate real conditions, different levels of noise are added to the response’s signals, and then the discrete wavelet transform is used to de-noise the signals. Moreover, the robustness of the method is evaluated by considering an error in the model of the structures. Overall, we find that the proposed algorithm detects and localizes damage even in presence of noisy signals and errors in the model.

Download Full-text

Systematic clustering algorithm for chromatin accessibility data and its application to hematopoietic cells

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008422 ◽

2020 ◽

Vol 16 (11) ◽

pp. e1008422

Author(s):

Azusa Tanaka ◽

Yasuhiro Ishitsuka ◽

Hiroki Ohta ◽

Akihiro Fujimoto ◽

Jun-ichirou Yasunaga ◽

...

Keyword(s):

Data Reduction ◽

Clustering Algorithm ◽

High Throughput Sequencing ◽

Hematopoietic Cells ◽

Cell Types ◽

Chromatin Accessibility ◽

Open Chromatin ◽

Genome Wide ◽

Data Reduction Method ◽

Effective Analysis

The huge amount of data acquired by high-throughput sequencing requires data reduction for effective analysis. Here we give a clustering algorithm for genome-wide open chromatin data using a new data reduction method. This method regards the genome as a string of 1s and 0s based on a set of peaks and calculates the Hamming distances between the strings. This algorithm with the systematically optimized set of peaks enables us to quantitatively evaluate differences between samples of hematopoietic cells and classify cell types, potentially leading to a better understanding of leukemia pathogenesis.

Download Full-text