Biomedical Data Annotation

Big data bioinformatics aims at drawing biological conclusions from huge and complex biological datasets. Added value from the analysis of big data, however, is only possible if the data is accompanied by accurate metadata annotation. Particularly in high-throughput experiments intelligent approaches are needed to keep track of the experimental design, including the conditions that are studied as well as information that might be interesting for failure analysis or further experiments in the future. In addition to the management of this information, means for an integrated design and interfaces for structured data annotation are urgently needed by researchers. Here, we propose a factor-based experimental design approach that enables scientists to easily create large-scale experiments with the help of a web-based system. We present a novel implementation of a web-based interface allowing the collection of arbitrary metadata. To exchange and edit information we provide a spreadsheet-based, humanly readable format. Subsequently, sample sheets with identifiers and metainformation for data generation facilities can be created. Data files created after measurement of the samples can be uploaded to a datastore, where they are automatically linked to the previously created experimental design model.

Download Full-text

Evaluation of a large-scale biomedical data annotation initiative

BMC Bioinformatics ◽

10.1186/1471-2105-10-s9-s10 ◽

2009 ◽

Vol 10 (S9) ◽

Cited By ~ 7

Author(s):

Ronilda Lacson ◽

Erik Pitzer ◽

Christian Hinske ◽

Pedro Galante ◽

Lucila Ohno-Machado

Keyword(s):

Large Scale ◽

Biomedical Data ◽

Data Annotation

Download Full-text

Biomedical Data Annotation

Encyclopedia of Database Systems ◽

10.1007/978-0-387-39940-9_2116 ◽

2009 ◽

pp. 224-224

Keyword(s):

Biomedical Data ◽

Data Annotation

Download Full-text

Crowd control: Effectively utilizing unscreened crowd workers for biomedical data annotation

Journal of Biomedical Informatics ◽

10.1016/j.jbi.2017.04.003 ◽

2017 ◽

Vol 69 ◽

pp. 86-92 ◽

Cited By ~ 8

Author(s):

Anne Cocos ◽

Ting Qian ◽

Chris Callison-Burch ◽

Aaron J. Masino

Keyword(s):

Biomedical Data ◽

Data Annotation ◽

Crowd Control

Download Full-text

Informatics management of tumor specimens in the era of Big Data: Challenges and solutions (Preprint)

10.2196/preprints.20363 ◽

2020 ◽

Author(s):

Peifen Zhang ◽

XiaoHui Zheng ◽

XiZhao Li ◽

Lin Sun ◽

WeiHua Jia

Keyword(s):

Experimental Data ◽

Big Data ◽

Heterogeneous Databases ◽

Biomedical Data ◽

High Quality ◽

Data Annotation ◽

Data Standardization ◽

Management Procedure ◽

And Storage ◽

Primary Mission

UNSTRUCTURED Biomedical data bears the potential to facilitate personalize diagnosis and precision treatment in the era of Big Data. Based on this, high-quality annotation of human specimens has become the primary mission of bio-bankers, especially for tumor bio-banks with large amounts of “omics” and clinical data. However, the lack of agreed-upon standardizations and the gap among heterogeneous databases make information application and communication a major challenge. International efforts are undergoing to develop national projects on informatics management. The aim of this paper is to provide references in data annotation and process to standardize and take full advantage of biomedical information. First, information categories that are vital for specimen applications, including sample attributes, external clinical and experimental data, are systematically listed to provide references for subsequent data mining. Second, commonly-used approaches in data collection, recording, extraction, transformation, integration and storage were summarized in support of data processes. In particular, a practical workflow of information annotation in daily bio-banking was drawn to help handling each step of the informatics management procedure. This review highlights the importance of informatics management of tumor specimens, presents the process of data standardization, and provides practical instructions for bio-bankers in specimen annotation and data management.

Download Full-text

Linking big biomedical datasets to modular analysis with Portable Encapsulated Projects

10.1101/2020.10.08.331322 ◽

2020 ◽

Author(s):

Nathan C. Sheffield ◽

Michał Stolarczyk ◽

Vincent P. Reuter ◽

André F. Rendeiro

Keyword(s):

Biological Sample ◽

Single Cells ◽

Biological Research ◽

Biomedical Data ◽

Data Annotation ◽

Data Intensive ◽

Sample Data ◽

Modular Analysis ◽

Data Source ◽

Definition Of

Organizing and annotating biological sample data is critical in data-intensive bioinformatics. Unfortunately, incompatibility is common between metadata format of a data source and that required by a processing tool. There is no broadly accepted standard to organize metadata across biological projects and bioinformatics tools, restricting the portability and reusability of both annotated datasets and analysis software. To address this, we present Portable Encapsulated Projects (PEP), a formal specification for biological sample metadata structure. The PEP specification accommodates typical features of data-intensive bioinformatics projects with many samples, whether from individual experiments, organisms, or single cells. In addition to standardization, the PEP specification provides descriptors and modifiers for different organizational layers of a project, which improve portability among computing environments and facilitate use of different processing tools. PEP includes a schema validator framework, allowing formal definition of required metadata attributes for any type of biomedical data analysis. We have implemented packages for reading PEPs in both Python and R to provide a language-agnostic interface for organizing project metadata. PEP therefore presents an important step toward unifying data annotation and processing tools in data-intensive biological research projects.

Download Full-text