Assessing and assuring interoperability of a genomics file format

Bioinformatics software often requires human-generated tabular text files as input and have specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians, and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support concurrent editing of a single spreadsheet by different users working on different platforms. Often most of the researchers who are entering data will not be familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis. We present Keemei, a Google Sheets Add-on for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google’s Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports validation of two widely used tabular bioinformatics formats, the QIIME sample metadata mapping file format, and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others. Keemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes.

Download Full-text

Defining File Format Obsolescence: A Risky Journey

International Journal of Digital Curation ◽

10.2218/ijdc.v3i1.44 ◽

2008 ◽

Vol 3 (1) ◽

pp. 89-106 ◽

Cited By ~ 5

Author(s):

David Pearson ◽

Colin Webb

Keyword(s):

Risk Factor ◽

Software Tool ◽

Digital Information ◽

File Format ◽

Sources Of Information ◽

National Library ◽

File Formats ◽

Wide Range ◽

Information Collections ◽

Further Development

File format obsolescence is a major risk factor threatening the ongoing usefulness of digital information collections. While the preservation community has become increasingly interested in tools for assessing a wide range of risks, the National Library of Australia is developing mechanisms specifically focused on the risks of format obsolescence. The paper reports on the AONS II Project, undertaken in conjunction with the Australian Partnership for Sustainable Repositories (APSR). The project aimed to refine and develop a software tool that would automatically find and report indicators of obsolescence risks, to help repository managers decide if preservation action is needed. The paper discusses the current mismatch between this objective and the available sources of information on file formats, and emphasises the need to take account of both local and global factors in assessing risk. The paper calls for the preservation community to engage with the further development of thinking about file format obsolescence.

Download Full-text

Keemei: cloud-based validation of tabular bioinformatics file formats in Google Sheets

10.7287/peerj.preprints.1670v1 ◽

2016 ◽

Author(s):

Jai Ram Rideout ◽

John H Chase ◽

Evan Bolyen ◽

Gail Ackermann ◽

Antonio Gonzalez ◽

...

Keyword(s):

Data Entry ◽

File Format ◽

Tabular Data ◽

Bioinformatics Analyses ◽

Web Browser ◽

File Formats ◽

Bioinformatics Software ◽

Data Files ◽

Metadata Mapping ◽

Mapping File

Bioinformatics software often requires human-generated tabular text files as input and have specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians, and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support concurrent editing of a single spreadsheet by different users working on different platforms. Often most of the researchers who are entering data will not be familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis. We present Keemei, a Google Sheets Add-on for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google’s Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports validation of two widely used tabular bioinformatics formats, the QIIME sample metadata mapping file format, and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others. Keemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes.

Download Full-text

GPSRdocker: A Docker-based Resource for Genomics, Proteomics and Systems biology

10.1101/827766 ◽

2019 ◽

Cited By ~ 3

Author(s):

Piyush Agrawal ◽

Rajesh Kumar ◽

Salman Sadullah Usmani ◽

Anjali Dhall ◽

Sumeet Patiyal ◽

...

Keyword(s):

Function Class ◽

General Purpose ◽

System Level ◽

Full Potential ◽

Biomedical Sciences ◽

Web Based ◽

Computing Power ◽

Software Packages ◽

Wide Range ◽

Bioinformatics Software

AbstractBackgroundIn past number of web-based resources has been developed in the field of Bioinformatics. These resources are heavily used by scientific community to provide solution for challenges faced by experimental researchers particularly in the field of biomedical sciences. There are number of challenges in utilizing full potential of these services that includes internet speed, limits on computing power, and security of data. In order to enhance utilities of these web-based assets, we developed a docker-based container that integrates large number resources available in literature.ResultsThis paper describes GPSRdocker a docker-based container developed for providing wide-range of computational tools in the field of bioinformatics particularly in genomics, proteomics and system biology. Majority of tools integrated in GPSRdocker are based on web services developed at Raghava’s group in last two decades. Broadly, these tools can be categorized in three categories; i) general scripts, ii) supporting software and iii) major standalone software. In order to facilitate students or developers working in the field of bioinformatics, we developed general scripts in Perl and Python. These general-purpose scripts serve as building block for any bioinformatics tools like computing features/descriptors of a protein. Supporting software packages includes SCIKIT, WEKA, SVMlight, and PSI-BLAST; these software packages allow one to develop/implement bioinformatics software. Major Standalone software is core of this container which allows predicting function/class of biomolecules. These tools can be classified broadly in following categories; protein annotation, epitope-based vaccines, prediction of interaction and drug discovery.ConclusionA docker-based container has been developed which can be easily run on any operating system as well as it can be directly ported on cloud. Scripts can be run to build pipelines for addressing problems at system level like prediction of vaccine candidate for a pathogen. GPSRdocker including manual is available free for academic use from https://webs.iiitd.edu.in/gpsrdocker.

Download Full-text

Micro-tomographic characterization of the root and canal system morphology of mandibular first premolars in a Chilean population

Scientific Reports ◽

10.1038/s41598-020-80046-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Alfredo Sierra-Cristancho ◽

Luis González-Osuna ◽

Daniela Poblete ◽

Emilio A. Cafferata ◽

Paola Carvajal ◽

...

Keyword(s):

Root Canal ◽

Multiple Root ◽

Root Anatomy ◽

Type I ◽

Micro Computed Tomography ◽

Canal System ◽

Root Canals ◽

Wide Range ◽

Root Canal System ◽

Chilean Population

AbstractThis study aimed to analyze the root anatomy and root canal system morphology of mandibular first premolars in a Chilean population. 186 teeth were scanned using micro-computed tomography and reconstructed three-dimensionally. The root canal system morphology was classified using both Vertucci’s and Ahmed’s criteria. The radicular grooves were categorized using the ASUDAS system, and the presence of Tomes’ anomalous root was associated with Ahmed’s score. A single root canal was identified in 65.05% of teeth, being configuration type I according to Vertucci’s criteria and code 1MP1 according to Ahmed’s criteria. Radicular grooves were observed in 39.25% of teeth. The ASUDAS scores for radicular grooves were 60.75%, 13.98%, 12.36%, 10.22%, 2.15%, and 0.54%, from grade 0 to grade 5, respectively. The presence of Tomes’ anomalous root was identified only in teeth with multiple root canals, and it was more frequently associated with code 1MP1–2 of Ahmed’s criteria. The root canal system morphology of mandibular first premolars showed a wide range of anatomical variations in the Chilean population. Teeth with multiple root canals had a higher incidence of radicular grooves, which were closely related to more complex internal anatomy. Only teeth with multiple root canals presented Tomes’ anomalous root.

Download Full-text

Risk Factors and Prediction Models for Venous Thromboembolism in Ambulatory Patients with Lung Cancer

Healthcare ◽

10.3390/healthcare9060778 ◽

2021 ◽

Vol 9 (6) ◽

pp. 778

Author(s):

Ann-Rong Yan ◽

Indira Samarawickrema ◽

Mark Naunton ◽

Gregory M. Peterson ◽

Desmond Yip ◽

...

Keyword(s):

Risk Factors ◽

Lung Cancer ◽

Venous Thromboembolism ◽

High Risk ◽

Prediction Models ◽

Poor Performance ◽

Risk Models ◽

Related Risk ◽

Wide Range ◽

Ambulatory Patients

Venous thromboembolism (VTE) is a significant cause of mortality in patients with lung cancer. Despite the availability of a wide range of anticoagulants to help prevent thrombosis, thromboprophylaxis in ambulatory patients is a challenge due to its associated risk of haemorrhage. As a result, anticoagulation is only recommended in patients with a relatively high risk of VTE. Efforts have been made to develop predictive models for VTE risk assessment in cancer patients, but the availability of a reliable predictive model for ambulate patients with lung cancer is unclear. We have analysed the latest information on this topic, with a focus on the lung cancer-related risk factors for VTE, and risk prediction models developed and validated in this group of patients. The existing risk models, such as the Khorana score, the PROTECHT score and the CONKO score, have shown poor performance in external validations, failing to identify many high-risk individuals. Some of the newly developed and updated models may be promising, but their further validation is needed.

Download Full-text

Research on the Three-Dimensional Displaying of STL ASCII and Binary File

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.940.433 ◽

2014 ◽

Vol 940 ◽

pp. 433-436 ◽

Cited By ~ 1

Author(s):

Ying Zhang ◽

Xin Shi

Keyword(s):

Detailed Analysis ◽

Programming Language ◽

Coordinate Transformation ◽

Graphical Representation ◽

Three Dimensional ◽

File Format ◽

Binary File ◽

Stl File ◽

File Formats ◽

Three Dimensional Display

Based on the detailed analysis of the STL file format, VC++ 6.0 programming language was used to extract the STL ASCII and binary file information, at the same time, using the OpenGL triangle drawing technology for graphical representation of the STL file, with rendering functions such as material, coordinate transformation, lighting, et al, finally realizing the loading and three-dimensional display of STL ASCII and binary file formats.

Download Full-text

Using Relion software within Scipion framework

10.1101/2020.12.06.399808 ◽

2020 ◽

Author(s):

Grigory Sharov ◽

Dustin R. Morado ◽

Marta Carroni ◽

José Miguel de la Rosa-Trevín

Keyword(s):

Image Processing ◽

User Interfaces ◽

Published Data ◽

New Developments ◽

Processing Pipeline ◽

Software Packages ◽

File Formats ◽

Processing Framework ◽

Taking Care

Scipion is a modular image processing framework integrating several software packages under a unified interface while taking care of file formats and conversions. Here new developments and capabilities of the Scipion plugin for the Relion software are presented and illustrated with the image processing pipeline of published data. The user interfaces of Scipion and Relion are compared and the key differences highlighted, allowing this manuscript to be used as a guide for both new and experienced users of these software. Different streaming image processing options are also discussed demonstrating the flexibility of the Scipion framework.SynopsisAn overview of the Scipion plugin for the Relion software is presented and various capabilities of image processing within Scipion framework are discussed.

Download Full-text

Minimum cost network flows: Problems, algorithms, and software

Yugoslav journal of operations research ◽

10.2298/yjor121120001s ◽

2013 ◽

Vol 23 (1) ◽

pp. 3-17 ◽

Cited By ~ 24

Author(s):

Angelo Sifaleras

Keyword(s):

Network Flow ◽

Flow Problem ◽

Network Flows ◽

Minimum Cost ◽

Network Flow Problem ◽

Software Packages ◽

Wide Range ◽

Solution Methods ◽

Minimum Cost Network Flow ◽

Algorithmic Approaches

We present a wide range of problems concerning minimum cost network flows, and give an overview of the classic linear single-commodity Minimum Cost Network Flow Problem (MCNFP) and some other closely related problems, either tractable or intractable. We also discuss state-of-the-art algorithmic approaches and recent advances in the solution methods for the MCNFP. Finally, optimization software packages for the MCNFP are presented.

Download Full-text

Digital File Formats for Digital Preservation

Digital Curation ◽

10.4018/978-1-5225-6921-3.ch010 ◽

2018 ◽

pp. 218-233

Author(s):

Mayank Yuvaraj

Keyword(s):

Digital Libraries ◽

Digital Preservation ◽

Institutional Repository ◽

File Format ◽

Institutional Repositories ◽

File Formats ◽

Digital File ◽

The Common ◽

Common Understanding

During the course of planning an institutional repository, digital library collections or digital preservation service it is inevitable to draft file format policies in order to ensure long term digital preservation, its accessibility and compatibility. Sincere efforts have been made to encourage the adoption of standard formats yet the digital preservation policies vary from library to library. The present paper is based against this background to present the digital preservation community with a common understanding of the common file formats used in the digital libraries or institutional repositories. The paper discusses both open and proprietary file formats for several media.

Download Full-text