scholarly journals OME-NGFF: scalable format strategies for interoperable bioimaging data

2021 ◽  
Author(s):  
Josh Moore ◽  
Chris Allan ◽  
Sebastien Besson ◽  
Jean-marie Burel ◽  
Erin Diel ◽  
...  

Biological imaging is one of the most innovative fields in the modern biological sciences. New imaging modalities, probes, and analysis tools appear every few months and often prove decisive for enabling new directions in scientific discovery. One feature of this dynamic field is the need to capture new types of data and data structures. While there is a strong drive to make scientific data Findable, Accessible, Interoperable and Reproducible (FAIR, 1), the rapid rate of innovation in imaging impedes the unification and adoption of standardized data formats. Despite this, the opportunities for sharing and integrating bioimaging data and, in particular, linking these data to other "omics" datasets have never been greater; therefore, to every extent possible, increasing "FAIRness" of bioimaging data is critical for maximizing scientific value, as well as for promoting openness and integrity. In the absence of a common, FAIR format, two approaches have emerged to provide access to bioimaging data: translation and conversion. On-the-fly translation produces a transient representation of bioimage metadata and binary data but must be repeated on each use. In contrast, conversion produces a permanent copy of the data, ideally in an open format that makes the data more accessible and improves performance and parallelization in reads and writes. Both approaches have been implemented successfully in the bioimaging community but both have limitations. At cloud-scale, those shortcomings limit scientific analysis and the sharing of results. We introduce here next-generation file formats (NGFF) as a solution to these challenges.

2010 ◽  
Vol 189 (5) ◽  
pp. 777-782 ◽  
Author(s):  
Melissa Linkert ◽  
Curtis T. Rueden ◽  
Chris Allan ◽  
Jean-Marie Burel ◽  
Will Moore ◽  
...  

Data sharing is important in the biological sciences to prevent duplication of effort, to promote scientific integrity, and to facilitate and disseminate scientific discovery. Sharing requires centralized repositories, and submission to and utility of these resources require common data formats. This is particularly challenging for multidimensional microscopy image data, which are acquired from a variety of platforms with a myriad of proprietary file formats (PFFs). In this paper, we describe an open standard format that we have developed for microscopy image data. We call on the community to use open image data standards and to insist that all imaging platforms support these file formats. This will build the foundation for an open image data repository.


2017 ◽  
Vol 9 (3) ◽  
pp. 267-276 ◽  
Author(s):  
Daiga Plase ◽  
Laila Niedrite ◽  
Romans Taranovs

In this paper, file formats like Avro and Parquet are compared with text formats to evaluate the performance of the data queries. Different data query patterns have been evaluated. Cloudera’s open-source Apache Hadoop distribution CDH 5.4 has been chosen for the experiments presented in this article. The results show that compact data formats (Avro and Parquet) take up less storage space when compared with plain text data formats because of binary data format and compression advantage. Furthermore, data queries from the column based data format Parquet are faster when compared with text data formats and Avro.


2011 ◽  
Vol 6 (2) ◽  
pp. 245-252 ◽  
Author(s):  
René Van Horik ◽  
Dirk Roorda

Data Archiving and Networked Services (DANS), the Dutch scientific data archive for the social sciences and humanities, is engaged in the Migration to Intermediate XML for Electronic Data (MIXED) project to develop open source software that implements the smart migration strategy concerning the long-term archiving of file formats. Smart migration concerns the conversion upon ingest of specific kinds of data formats, such as spreadsheets and databases, to an intermediate XML formatted file. It is assumed that the long-term curation of the XML files is much less problematic than the migration of binary source files and that the intermediate XML file can be converted in an efficient way to file formats that are common in the future. The features of the intermediate XML files are stored in the so-called Standard Data Formats for Preservation (SDFP) specification. This XML schema can be considered an umbrella as it contains existing formal descriptions of file formats developed by others. SDFP also contain schemata developed by DANS, for example, a schema for file-oriented databases. It can be used, for example, for the binary DataPerfect format, that was used on a large scale about twenty years ago, and for which no existing XML schema could be found. The software developed in the MIXED project has been set up as a generic framework, together with a number of plug-ins. It can be considered as a repository of durable file format conversions. This paper contains an overview of the results of the MIXED project.


AI & Society ◽  
2021 ◽  
Author(s):  
Suzanne Anker

AbstractThis paper addresses three aspects of Bio Art: iconography, artificial life, and wetware. The development of models for innovation require hybrid practices which generate knowledge through epistemic experimental practices. The intersection of art and the biological sciences contain both scientific data as well as the visualization of its cultural imagination. In the Bio Art Lab at the School of Visual Arts, artists use the tools of science to make art.


2014 ◽  
Vol 22 (2) ◽  
pp. 173-185 ◽  
Author(s):  
Eli Dart ◽  
Lauren Rotman ◽  
Brian Tierney ◽  
Mary Hester ◽  
Jason Zurawski

The ever-increasing scale of scientific data has become a significant challenge for researchers that rely on networks to interact with remote computing systems and transfer results to collaborators worldwide. Despite the availability of high-capacity connections, scientists struggle with inadequate cyberinfrastructure that cripples data transfer performance, and impedes scientific progress. The ScienceDMZparadigm comprises a proven set of network design patterns that collectively address these problems for scientists. We explain the Science DMZ model, including network architecture, system configuration, cybersecurity, and performance tools, that creates an optimized network environment for science. We describe use cases from universities, supercomputing centers and research laboratories, highlighting the effectiveness of the Science DMZ model in diverse operational settings. In all, the Science DMZ model is a solid platform that supports any science workflow, and flexibly accommodates emerging network technologies. As a result, the Science DMZ vastly improves collaboration, accelerating scientific discovery.


Author(s):  
Francesco Gagliardi

The author introduces a machine learning system for cluster analysis to take on the problem of syndrome discovery in the clinical domain. A syndrome is a set of typical clinical features (a prototype) that appear together often enough to suggest they may represent a single, unknown, disease. The discovery of syndromes and relative taxonomy formation is therefore the critical early phase of the process of scientific discovery in the medical domain. The system proposed discovers syndromes following Eleanor Rosch’s prototype theory on how the human mind categorizes and forms taxonomies, and thereby to understand how humans perform these activities and to automate or assist the process of scientific discovery. The system implemented can be considered a scientific discovery support system as it can discover unknown syndromes to the advantage of subsequent clinical practices and research activities.


2008 ◽  
Vol 3 (1) ◽  
pp. 44-62 ◽  
Author(s):  
Jeremy Frey

The explosion in the production of scientific data in recent years is placing strains upon conventional systems supporting integration, analysis, interpretation and dissemination of data and thus constraining the whole scientific process. Support for handling large quantities of diverse information can be provided by e-Science methodologies and the cyber-infrastructure that enables collaborative handling of such data. Regard needs to be taken of the whole process involved in scientific discovery. This includes the consideration of the requirements of the users and consumers further down the information chain and what they might ideally prefer to impose on the generators of those data. As the degree of digital capture in the laboratory increases, it is possible to improve the automatic acquisition of the ‘context of the data’ as well as the data themselves. This process provides an opportunity for the data creators to ensure that many of the problems they often encounter in later stages are avoided. We wish to elevate curation to an operation to be considered by the laboratory scientist as part of good laboratory practice, not a procedure of concern merely to the few specialising in archival processes. Designing curation into experiments is an effective solution to the provision of high-quality metadata that leads to better, more re-usable data and to better science.


Author(s):  
Glen L. Niebur ◽  
Thomas R. Chase

Abstract Integration of engineering software continues to be an important topic in mechanical design and manufacturing. One integration technique which has been proposed is to store a complete product representation in a single database using a database management system. In order to integrate existing CAE applications which are not designed for use with a DBMS, a method for importing and exporting data to the database is needed. A system for recognizing and translating a large class of engineering data, those data formats which can be described by regular grammars, is proposed.


2016 ◽  
Author(s):  
Edmund Hart ◽  
Pauline Barmby ◽  
David LeBauer ◽  
François Michonneau ◽  
Sarah Mount ◽  
...  

Data is the central currency of science, but the nature of scientific data has changed dramatically with the rapid pace of technology. This change has led to the development of a wide variety of data formats, dataset sizes, data complexity, data use cases, and data sharing practices. Improvements in high throughput DNA sequencing, sustained institutional support for large sensor networks, and sky surveys with large-format digital cameras have created massive quantities of data. At the same time, the combination of increasingly diverse research teams and data aggregation in portals (e.g. for biodiversity data, GBIF or iDigBio) necessitates increased coordination among data collectors and institutions. As a consequence, “data” can now mean anything from petabytes of information stored in professionally-maintained databases, through spreadsheets on a single computer, to hand-written tables in lab notebooks on shelves. All remain important, but data curation practices must continue to keep pace with the changes brought about by new forms and practices of data collection and storage.


2008 ◽  
Vol 2 (2) ◽  
pp. 31-40 ◽  
Author(s):  
Carole L. Palmer ◽  
Bryan P. Heidorn ◽  
Dan Wright ◽  
Melissa H. Cragin

Scientific data problems do not stand in isolation. They are part of a larger set of challenges associated with the escalation of scientific information and changes in scholarly communication in the digital environment. Biologists in particular are generating enormous sets of data at a high rate, and new discoveries in the biological sciences will increasingly depend on the integration of data across multiple scales. This work will require new kinds of information expertise in key areas. To build this professional capacity we have developed two complementary educational programs: a Biological Information Specialist (BIS) masters degree and a concentration in Data Curation (DC). We believe that BISs will be central in the development of cyberinfrastructure and information services needed to facilitate interdisciplinary and multi-scale science. Here we present three sample cases from our current research projects to illustrate areas in which we expect information specialists to make important contributions to biological research practice.


Sign in / Sign up

Export Citation Format

Share Document