scholarly journals Assessing and assuring interoperability of a genomics file format

2022 ◽  
Author(s):  
Yi Nian Niu ◽  
Eric G. Roberts ◽  
Danielle Denisko ◽  
Michael M. Hoffman

Background: Bioinformatics software tools operate largely through the use of specialized genomics file formats. Often these formats lack formal specification, and only rarely do the creators of these tools robustly test them for correct handling of input and output. This causes problems in interoperability between different tools that, at best, wastes time and frustrates users. At worst, interoperability issues could lead to undetected errors in scientific results. Methods: We sought (1) to assess the interoperability of a wide range of bioinformatics software using a shared genomics file format and (2) to provide a simple, reproducible method for enhancing interoperability. As a focus, we selected the popular BED file format for genomic interval data. Based on the file format's original documentation, we created a formal specification. We developed a new verification system, Acidbio (https://github.com/hoffmangroup/acidbio), which tests for correct behavior in bioinformatics software packages. We crafted tests to unify correct behavior when tools encounter various edge cases—potentially unexpected inputs that exemplify the limits of the format. To analyze the performance of existing software, we tested the input validation of 80 Bioconda packages that parsed the BED format. We also used a fuzzing approach to automatically perform additional testing. Results: Of 80 software packages examined, 75 achieved less than 70% correctness on our test suite. We categorized multiple root causes for the poor performance of different types of software. Fuzzing detected other errors that the manually designed test suite could not. We also created a badge system that developers can use to indicate more precisely which BED variants their software accepts and to advertise the software's performance on the test suite. Discussion: Acidbio makes it easy to assess interoperability of software using the BED format, and therefore to identify areas for improvement in individual software packages. Applying our approach to other file formats would increase the reliability of bioinformatics software and data.

2016 ◽  
Author(s):  
Jai Ram Rideout ◽  
John H Chase ◽  
Evan Bolyen ◽  
Gail Ackermann ◽  
Antonio Gonzalez ◽  
...  

Bioinformatics software often requires human-generated tabular text files as input and have specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians, and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support concurrent editing of a single spreadsheet by different users working on different platforms. Often most of the researchers who are entering data will not be familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis. We present Keemei, a Google Sheets Add-on for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google’s Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports validation of two widely used tabular bioinformatics formats, the QIIME sample metadata mapping file format, and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others. Keemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes.


2008 ◽  
Vol 3 (1) ◽  
pp. 89-106 ◽  
Author(s):  
David Pearson ◽  
Colin Webb

File format obsolescence is a major risk factor threatening the ongoing usefulness of digital information collections. While the preservation community has become increasingly interested in tools for assessing a wide range of risks, the National Library of Australia is developing mechanisms specifically focused on the risks of format obsolescence. The paper reports on the AONS II Project, undertaken in conjunction with the Australian Partnership for Sustainable Repositories (APSR). The project aimed to refine and develop a software tool that would automatically find and report indicators of obsolescence risks, to help repository managers decide if preservation action is needed. The paper discusses the current mismatch between this objective and the available sources of information on file formats, and emphasises the need to take account of both local and global factors in assessing risk. The paper calls for the preservation community to engage with the further development of thinking about file format obsolescence.


2016 ◽  
Author(s):  
Jai Ram Rideout ◽  
John H Chase ◽  
Evan Bolyen ◽  
Gail Ackermann ◽  
Antonio Gonzalez ◽  
...  

Bioinformatics software often requires human-generated tabular text files as input and have specific requirements for how those data are formatted. Users frequently manage these data in spreadsheet programs, which is convenient for researchers who are compiling the requisite information because the spreadsheet programs can easily be used on different platforms including laptops and tablets, and because they provide a familiar interface. It is increasingly common for many different researchers to be involved in compiling these data, including study coordinators, clinicians, lab technicians, and bioinformaticians. As a result, many research groups are shifting toward using cloud-based spreadsheet programs, such as Google Sheets, which support concurrent editing of a single spreadsheet by different users working on different platforms. Often most of the researchers who are entering data will not be familiar with the formatting requirements of the bioinformatics programs that will be used, so validating and correcting file formats is often a bottleneck prior to beginning bioinformatics analysis. We present Keemei, a Google Sheets Add-on for validating tabular files used in bioinformatics analyses. Keemei is available free of charge from Google’s Chrome Web Store. Keemei can be installed and run on any web browser supported by Google Sheets. Keemei currently supports validation of two widely used tabular bioinformatics formats, the QIIME sample metadata mapping file format, and the Spatially Referenced Genetic Data (SRGD) format, but is designed to easily support the addition of others. Keemei will save researchers time and frustration by providing a convenient interface for tabular bioinformatics file format validation. By allowing everyone involved with data entry for a project to easily validate their data, it will reduce the validation and formatting bottlenecks that are commonly encountered when human-generated data files are first used with a bioinformatics system. Simplifying the validation of essential tabular data files, such as sample metadata, will reduce common errors and thereby improve the quality and reliability of research outcomes.


2019 ◽  
Author(s):  
Piyush Agrawal ◽  
Rajesh Kumar ◽  
Salman Sadullah Usmani ◽  
Anjali Dhall ◽  
Sumeet Patiyal ◽  
...  

AbstractBackgroundIn past number of web-based resources has been developed in the field of Bioinformatics. These resources are heavily used by scientific community to provide solution for challenges faced by experimental researchers particularly in the field of biomedical sciences. There are number of challenges in utilizing full potential of these services that includes internet speed, limits on computing power, and security of data. In order to enhance utilities of these web-based assets, we developed a docker-based container that integrates large number resources available in literature.ResultsThis paper describes GPSRdocker a docker-based container developed for providing wide-range of computational tools in the field of bioinformatics particularly in genomics, proteomics and system biology. Majority of tools integrated in GPSRdocker are based on web services developed at Raghava’s group in last two decades. Broadly, these tools can be categorized in three categories; i) general scripts, ii) supporting software and iii) major standalone software. In order to facilitate students or developers working in the field of bioinformatics, we developed general scripts in Perl and Python. These general-purpose scripts serve as building block for any bioinformatics tools like computing features/descriptors of a protein. Supporting software packages includes SCIKIT, WEKA, SVMlight, and PSI-BLAST; these software packages allow one to develop/implement bioinformatics software. Major Standalone software is core of this container which allows predicting function/class of biomolecules. These tools can be classified broadly in following categories; protein annotation, epitope-based vaccines, prediction of interaction and drug discovery.ConclusionA docker-based container has been developed which can be easily run on any operating system as well as it can be directly ported on cloud. Scripts can be run to build pipelines for addressing problems at system level like prediction of vaccine candidate for a pathogen. GPSRdocker including manual is available free for academic use from https://webs.iiitd.edu.in/gpsrdocker.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Alfredo Sierra-Cristancho ◽  
Luis González-Osuna ◽  
Daniela Poblete ◽  
Emilio A. Cafferata ◽  
Paola Carvajal ◽  
...  

AbstractThis study aimed to analyze the root anatomy and root canal system morphology of mandibular first premolars in a Chilean population. 186 teeth were scanned using micro-computed tomography and reconstructed three-dimensionally. The root canal system morphology was classified using both Vertucci’s and Ahmed’s criteria. The radicular grooves were categorized using the ASUDAS system, and the presence of Tomes’ anomalous root was associated with Ahmed’s score. A single root canal was identified in 65.05% of teeth, being configuration type I according to Vertucci’s criteria and code 1MP1 according to Ahmed’s criteria. Radicular grooves were observed in 39.25% of teeth. The ASUDAS scores for radicular grooves were 60.75%, 13.98%, 12.36%, 10.22%, 2.15%, and 0.54%, from grade 0 to grade 5, respectively. The presence of Tomes’ anomalous root was identified only in teeth with multiple root canals, and it was more frequently associated with code 1MP1–2 of Ahmed’s criteria. The root canal system morphology of mandibular first premolars showed a wide range of anatomical variations in the Chilean population. Teeth with multiple root canals had a higher incidence of radicular grooves, which were closely related to more complex internal anatomy. Only teeth with multiple root canals presented Tomes’ anomalous root.


Healthcare ◽  
2021 ◽  
Vol 9 (6) ◽  
pp. 778
Author(s):  
Ann-Rong Yan ◽  
Indira Samarawickrema ◽  
Mark Naunton ◽  
Gregory M. Peterson ◽  
Desmond Yip ◽  
...  

Venous thromboembolism (VTE) is a significant cause of mortality in patients with lung cancer. Despite the availability of a wide range of anticoagulants to help prevent thrombosis, thromboprophylaxis in ambulatory patients is a challenge due to its associated risk of haemorrhage. As a result, anticoagulation is only recommended in patients with a relatively high risk of VTE. Efforts have been made to develop predictive models for VTE risk assessment in cancer patients, but the availability of a reliable predictive model for ambulate patients with lung cancer is unclear. We have analysed the latest information on this topic, with a focus on the lung cancer-related risk factors for VTE, and risk prediction models developed and validated in this group of patients. The existing risk models, such as the Khorana score, the PROTECHT score and the CONKO score, have shown poor performance in external validations, failing to identify many high-risk individuals. Some of the newly developed and updated models may be promising, but their further validation is needed.


2014 ◽  
Vol 940 ◽  
pp. 433-436 ◽  
Author(s):  
Ying Zhang ◽  
Xin Shi

Based on the detailed analysis of the STL file format, VC++ 6.0 programming language was used to extract the STL ASCII and binary file information, at the same time, using the OpenGL triangle drawing technology for graphical representation of the STL file, with rendering functions such as material, coordinate transformation, lighting, et al, finally realizing the loading and three-dimensional display of STL ASCII and binary file formats.


2020 ◽  
Author(s):  
Grigory Sharov ◽  
Dustin R. Morado ◽  
Marta Carroni ◽  
José Miguel de la Rosa-Trevín

Scipion is a modular image processing framework integrating several software packages under a unified interface while taking care of file formats and conversions. Here new developments and capabilities of the Scipion plugin for the Relion software are presented and illustrated with the image processing pipeline of published data. The user interfaces of Scipion and Relion are compared and the key differences highlighted, allowing this manuscript to be used as a guide for both new and experienced users of these software. Different streaming image processing options are also discussed demonstrating the flexibility of the Scipion framework.SynopsisAn overview of the Scipion plugin for the Relion software is presented and various capabilities of image processing within Scipion framework are discussed.


2013 ◽  
Vol 23 (1) ◽  
pp. 3-17 ◽  
Author(s):  
Angelo Sifaleras

We present a wide range of problems concerning minimum cost network flows, and give an overview of the classic linear single-commodity Minimum Cost Network Flow Problem (MCNFP) and some other closely related problems, either tractable or intractable. We also discuss state-of-the-art algorithmic approaches and recent advances in the solution methods for the MCNFP. Finally, optimization software packages for the MCNFP are presented.


2018 ◽  
pp. 218-233
Author(s):  
Mayank Yuvaraj

During the course of planning an institutional repository, digital library collections or digital preservation service it is inevitable to draft file format policies in order to ensure long term digital preservation, its accessibility and compatibility. Sincere efforts have been made to encourage the adoption of standard formats yet the digital preservation policies vary from library to library. The present paper is based against this background to present the digital preservation community with a common understanding of the common file formats used in the digital libraries or institutional repositories. The paper discusses both open and proprietary file formats for several media.


Sign in / Sign up

Export Citation Format

Share Document