OME Files - An open source reference library for the OME-XML metadata model and the OME-TIFF file format

AbstractHere we present the Pop-Gen Pipeline Platform (PPP), a software platform with the goal of reducing the computational expertise required for conducting population genomic analyses. The PPP was designed as a collection of scripts that facilitate common population genomic workflows in a consistent and standardized Python environment. Functions were developed to encompass entire workflows, including: input preparation, file format conversion, various population genomic analyses, output generation, and visualization. By facilitating entire workflows, the PPP offers several benefits to prospective end users - it reduces the need of redundant in-house software and scripts that would require development time and may be error-prone, or incorrect. The platform has also been developed with reproducibility and extensibility of analyses in mind. The PPP is an open-source package that is available for download and use at https://ppp.readthedocs.io/en/latest/PPP_pages/install.html

Download Full-text

Pairs and Pairix: a file format and a tool for efficient storage and retrieval for Hi-C read pairs

10.1101/2021.08.24.457552 ◽

2021 ◽

Author(s):

Soohyun Lee ◽

Carl Vitzthum ◽

Burak H. Alver ◽

Peter J. Park

Keyword(s):

Quality Control ◽

Open Source ◽

Source Code ◽

Three Dimensional ◽

File Format ◽

Interaction Data ◽

Text File ◽

Storage And Retrieval ◽

Link Type ◽

Efficient Storage

AbstractSummaryAs the amount of three-dimensional chromosomal interaction data continues to increase, storing and accessing such data efficiently becomes paramount. We introduce Pairs, a block-compressed text file format for storing paired genomic coordinates from Hi-C data, and Pairix, an open-source C application to index and query Pairs files. Pairix (also available in Python and R) extends the functionalities of Tabix to paired coordinates data. We have also developed PairsQC, a collapsible HTML quality control report generator for Pairs files.AvailabilityThe format specification and source code are available at https://github.com/4dn-dcic/pairix, https://github.com/4dn-dcic/Rpairix and https://github.com/4dn-dcic/[email protected] or [email protected]

Download Full-text

Realising a Push Button Modality for Video-Based Forensics

Infrastructures ◽

10.3390/infrastructures6040054 ◽

2021 ◽

Vol 6 (4) ◽

pp. 54

Author(s):

Bako Zawali ◽

Richard A. Ikuesan ◽

Victor R. Kebande ◽

Steven Furnell ◽

Arafat A-Dhaqm

Keyword(s):

Open Source ◽

Comprehensive Evaluation ◽

File Format ◽

Video Content ◽

Forensic Investigation ◽

Push Button ◽

Video File

Complexity and sophistication among multimedia-based tools have made it easy for perpetrators to conduct digital crimes such as counterfeiting, modification, and alteration without being detected. It may not be easy to verify the integrity of video content that, for example, has been manipulated digitally. To address this perennial investigative challenge, this paper proposes the integration of a forensically sound push button forensic modality (PBFM) model for the investigation of the MP4 video file format as a step towards automated video forensic investigation. An open-source multimedia forensic tool was developed based on the proposed PBFM model. A comprehensive evaluation of the efficiency of the tool against file alteration showed that the tool was capable of identifying falsified files, which satisfied the underlying assertion of the PBFM model. Furthermore, the outcome can be used as a complementary process for enhancing the evidence admissibility of MP4 video for forensic investigation.

Download Full-text

illuminaio: An open source IDAT parsing tool for Illumina microarrays

F1000Research ◽

10.12688/f1000research.2-264.v1 ◽

2013 ◽

Vol 2 ◽

pp. 264 ◽

Cited By ~ 31

Author(s):

Mike L Smith ◽

Keith A. Baggerly ◽

Henrik Bengtsson ◽

Matthew E. Ritchie ◽

Kasper D. Hansen

Keyword(s):

Open Source ◽

Research Community ◽

Probe Type ◽

File Format ◽

Affymetrix Platform ◽

Text Format ◽

Expression Arrays ◽

Gene Expression Arrays ◽

Proprietary Format ◽

Readable Text

The IDAT file format is used to store BeadArray data from the myriad of genomewide profiling platforms on offer from Illumina Inc. This proprietary format is output directly from the scanner and stores summary intensities for each probe-type on an array in a compact manner. A lack of open source tools to process IDAT files has hampered their uptake by the research community beyond the standard step of using the vendor’s software to extract the data they contain in a human readable text format. To fill this void, we have developed the illuminaio package that parses IDAT files from any BeadArray platform, including the decryption of files from Illumina’s gene expression arrays. illuminaio provides the first open-source package for this task, and will promote wider uptake of the IDAT format as a standard for sharing Illumina BeadArray data in public databases, in the same way that the CEL file serves as the standard for the Affymetrix platform.

Download Full-text

Using Cesium for 3D Thematic Visualisations on the Web

Proceedings of the ICA ◽

10.5194/ica-proc-1-45-2018 ◽

2018 ◽

Vol 1 ◽

pp. 1-4

Author(s):

Mátyás Gede

Keyword(s):

Open Source ◽

File Format ◽

Virtual Globes ◽

The Creation ◽

Gis Software ◽

Excellent Tool ◽

The Web

Cesium (http://cesiumjs.org) is an open source, WebGL-based JavaScript library for virtual globes and 3D maps. It is an excellent tool for 3D thematic visualisations, but to use its full functionality it has to be feed with its own file format, CZML. Unfortunately, this format is not yet supported by any major GIS software. This paper intro- duces a plugin for QGIS, developed by the author, which facilitates the creation of CZML file for various types of visualisations. The usability of Cesium is also examined in various hardware/software environments.

Download Full-text

Digitization of Text Documents Using PDF/A

Information Technology and Libraries ◽

10.6017/ital.v37i1.9878 ◽

2018 ◽

Vol 37 (1) ◽

pp. 52-64

Author(s):

Yan Han ◽

Xueheng Wan

Keyword(s):

Theoretical Analysis ◽

Open Source ◽

Open Source Software ◽

Real Life ◽

Use Case ◽

File Format ◽

Text Documents

The purpose of this article is to demonstrate a practical use case of PDF/A file format for digitization of textual documents, following recommendation of using PDF/A as a preferred digitization file format. The authors showed how to convert and combine all the TIFFs with associated metadata into a single PDF/A-2b file for a document. Using open source software with real-life examples, the authors show readers how to convert TIFF images, extract associated metadata and ICC profiles, and validate against the newly released PDF/A validator. The generated PDF/A file is a self-contained and self-described container which accommodates all the data from digitization of textual materials, including page-level metadata and/or ICC profiles. With theoretical analysis and empirical examples, PDF/A file format has many advantages over traditional preferred file format TIFF / JPEG2000 for digitization of textual documents.

Download Full-text

The evolution of an open source file format: a version control story

Microscopy and Microanalysis ◽

10.1017/s1431927621004116 ◽

2021 ◽

Vol 27 (S1) ◽

pp. 1092-1094

Author(s):

Benjamin Savitzky ◽

Steven Zeltmann ◽

Luis Rangel DaCosta ◽

Peter Ercius ◽

Mary Scott ◽

...

Keyword(s):

Open Source ◽

Version Control ◽

File Format ◽

Source File

Download Full-text

PDBeCIF: an open-source mmCIF/CIF parsing and processing package

BMC Bioinformatics ◽

10.1186/s12859-021-04271-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Glen van Ginkel ◽

Lukáš Pravda ◽

José M. Dana ◽

Mihaly Varadi ◽

Peter Keller ◽

...

Keyword(s):

Open Source ◽

Protein Data Bank ◽

Hybrid Methods ◽

Structural Data ◽

Data Bank ◽

File Format ◽

Macromolecular Complexes ◽

Link Type ◽

Information File ◽

Multiscale Structures

Abstract Background Biomacromolecular structural data outgrew the legacy Protein Data Bank (PDB) format which the scientific community relied on for decades, yet the use of its successor PDBx/Macromolecular Crystallographic Information File format (PDBx/mmCIF) is still not widespread. Perhaps one of the reasons is the availability of easy to use tools that only support the legacy format, but also the inherent difficulties of processing mmCIF files correctly, given the number of edge cases that make efficient parsing problematic. Nevertheless, to fully exploit macromolecular structure data and their associated annotations such as multiscale structures from integrative/hybrid methods or large macromolecular complexes determined using traditional methods, it is necessary to fully adopt the new format as soon as possible. Results To this end, we developed PDBeCIF, an open-source Python project for manipulating mmCIF and CIF files. It is part of the official list of mmCIF parsers recorded by the wwPDB and is heavily employed in the processes of the Protein Data Bank in Europe. The package is freely available both from the PyPI repository (http://pypi.org/project/pdbecif) and from GitHub (https://github.com/pdbeurope/pdbecif) along with rich documentation and many ready-to-use examples. Conclusions PDBeCIF is an efficient and lightweight Python 2.6+/3+ package with no external dependencies. It can be readily integrated with 3rd party libraries as well as adopted for broad scientific analyses.

Download Full-text

Personalization of structural PDB files.

Acta Biochimica Polonica ◽

10.18388/abp.2013_2025 ◽

2013 ◽

Vol 60 (4) ◽

Author(s):

Tomasz Woźniak ◽

Ryszard W Adamiak

Keyword(s):

Open Source ◽

Three Dimensional ◽

Dimensional Structure ◽

File Format ◽

Three Dimensional Structure

PDB format is most commonly applied by various programs to define three-dimensional structure of biomolecules. However, the programs often use different versions of the format. Thus far, no comprehensive solution for unifying the PDB formats has been developed. Here we present an open-source, Python-based tool called PDBinout for processing and conversion of various versions of PDB file format for biostructural applications. Moreover, PDBinout allows to create one's own PDB versions. PDBinout is freely available under the LGPL licence at http://pdbinout.ibch.poznan.pl.

Download Full-text

XLIFF Mapping to RDF

The Journal of Internationalization and Localization ◽

10.1075/jial.2.04ana ◽

2012 ◽

Vol 2 ◽

pp. 66-96

Author(s):

Dimitra Anastasiou

Keyword(s):

Resource Description Framework ◽

Symbiotic Relationship ◽

File Format ◽

Metadata Standard ◽

Metadata Model ◽

File Formats ◽

Description Framework ◽

Conversion Tool ◽

Resource Description ◽

The Web

This paper discusses the lack of interoperability between file formats, standards, and applications. We suggest a mapping from the ‘XML Localisation Interchange File Format’ (XLIFF) into the ‘Resource Description Framework’ (RDF) in order to enhance interoperability between a metadata standard and a metadata model. Three use cases are provided (a minimal, a modular and one with alternative translations); each one with a source (XLIFF), an output (RDF), and an ‘Extensible Stylesheet Language Transformations’ (XSLT) file. We explain in detail how the XLIFF file elements and attributes can be matched by the XSLT. Believing in the symbiotic relationship for a more effective way of presenting multilingual content on the Web, we developed a conversion tool to translate from XLIFF into RDF in order to automate the process. Our contribution is to translate XLIFF into RDF in order to facilitate ontology localisation, i.e. localise monolingual ontologies and populate Semantic Web approaches with localisation-related metadata.

Download Full-text