Epigenetic research focuses on understanding non-inheritable factors influencing gene regulation and covers various cellular mechanisms such as DNA methylation, histone modification, miRNA function and transcription factor binding sites. Recent advances in high-throughput profiling technologies allow for systematically collecting data on each of these mechanisms in large-scale experiments. These efforts are fostered and concerted by international collaborations, such as the International Human Epigenome Consortium (IHEC) and its members. As a result of these collaborations, researchers can exploit massive amounts of publicly available epigenomic data on dozens of cell types, cell lines and tissues. Access to these data is streamlined by existing data portals and, in principle, allows for answering important biomedical questions.
However, working with such data requires a suitable computational infrastructure not accessible ubiquitously. This creates a serious bottleneck in research and, as a result, data from these costly experiments are currently underused. To address this issue, we developed a new web resource, the DeepBlue Epigenomic Data Server to provide access to more than 40,000 experimental files from four major epigenome projects: ENCODE, ROADMAP, BLUEPRINT, the German Epigenome Program DEEP, the Canadian CEEHRC, and the Japanese CREST.
A common challenge with this resources is that researchers are typically interested in a small fraction of the available epigenomic data to answer specific biomedical questions. Using a typical data repository to solve this task would require the user to download several files amounting to gigabytes of data that subsequently need to be filtered locally. In addition, it is often important to perform memory- and cpu-intensive operations to transform or aggregate these data, while the necessary computational resources are not accessible to every user. Therefore, the DeepBlue Data Server offers features beyond those of a centralized epigenomic data repository. It has a comprehensive programmatic interface (API) to enable users to perform complex data operations, such as searching, selecting, filtering, summarizing, and downloading of epigenomic data of interest. These operations can be combined into custom workflows, thus offering nearly the same degree of flexibility as a local programming environment.
Here, we present DeepBlueR, a new R/Bioconductor package that enables users to engage with the DeepBlue server in a seamless fashion from within the R environment. DeepBlueR mirrors all DeepBlue data operations as R commands and provides additional features for compressing, downloading and transforming aggregated epigenomic data into suitable R data structures. A mechanism for local caching guarantees that complex scripts can be executed without the need to download previously requested data from the server.