programmatic access Latest Research Papers

insectDisease: programmatic access to the Ecological Database of the World's Insect Pathogens

10.32942/osf.io/yd3x5 ◽

2021 ◽

Author(s):

Tad Dallas ◽

Colin J. Carlson ◽

Patrick Stephens ◽

Sadie Jane Ryan ◽

David Onstad

Keyword(s):

Species Interactions ◽

Original Work ◽

Biotic Interactions ◽

R Package ◽

Insect Host ◽

Time Data ◽

Current Effort ◽

Host Pathogen ◽

Insect Pathogens ◽

Programmatic Access

Curated databases of species interactions are instrumental to exploring and understanding the spatial distribution of species and their biotic interactions. In the process of conducting such projects, data development and curation efforts may give rise to a data product with utility beyond the scope of the original work, but which becomes inaccessible over time. Data describing insect host-pathogen interactions are fairly rare, and should thus be preserved and curated with appropriate metadata. Here, we introduce the insectDisease R package, a mechanism for curating, updating, and distributing data from the Ecological Database of the World's Insect Pathogens, a database of insect host-pathogen associations, including attempted inoculations and infection outcomes for insect hosts and pathogens (bacteria, fungi, nematodes, protozoans, and viruses). This dataset has been utilized for several projects since its inception, but without a well defined, curated and permanent repository, its existence and access have been limited to word-of-mouth connections. The current effort presented here aims to provide a means to preserve, augment, and disseminate the database in a documented and versioned format. This project is an example of the type of effort that will be necessary to maintain valuable databases after the original funding disappears.

ppx: Programmatic Access to Proteomics Data Repositories

Journal of Proteome Research ◽

10.1021/acs.jproteome.1c00454 ◽

2021 ◽

Author(s):

William E. Fondrie ◽

Wout Bittremieux ◽

William S. Noble

Keyword(s):

Proteomics Data ◽

Data Repositories ◽

Programmatic Access

ppx: Programmatic access to proteomics data repositories

10.1101/2021.05.29.446304 ◽

2021 ◽

Author(s):

William E Fondrie ◽

Wout Bittremieux ◽

William S Noble

Keyword(s):

Mass Spectrometry ◽

Open Science ◽

Mass Spectrometry Data ◽

Reproducible Research ◽

Easy Access ◽

Proteomics Data ◽

Data Repositories ◽

Access To Data ◽

Python Package ◽

Programmatic Access

The volume of proteomics and mass spectrometry data available in public repositories continues to grow at a rapid pace as more researchers embrace open science practices. Open access to the data behind scientific discoveries has become critical to validate published findings and develop new computational tools. Here, we present ppx, a Python package that provides easy, programmatic access to the data stored in ProteomeXchange repositories, such as PRIDE and MassIVE. The ppx package can either be used as a command line tool or a Python package to retrieve the files and metadata associated with a project when provided its identifier. To demonstrate how ppx enhances reproducible research, we used ppx within a Snakemake workflow to reanalyze a published dataset with the open modification search tool ANN-SoLo and compared our reanalysis to the original results. We show that ppx readily integrates into workflows and our reanalysis produced results consistent with the original analysis. We envision that ppx will be a valuable tool for creating reproducible analyses, providing tool developers easy access to data for development, testing, and benchmarking, and enabling the use of mass spectrometry data in data-intensive analyses. The ppx package is freely available and open source under the MIT license at: https://github.com/wfondrie/ppx

quincunx: an R package to query, download and wrangle PGS Catalog data

10.1101/2021.02.19.431997 ◽

2021 ◽

Author(s):

Ramiro Magno ◽

Isabel Duarte ◽

Ana -Teresa Maia

Keyword(s):

R Package ◽

Polygenic Score ◽

Link Type ◽

Polygenic Scores ◽

Rest Api ◽

Programmatic Access

AbstractMotivationThe Polygenic Score (PGS) Catalog is a recently established open database of published polygenic scores that, to date, has collected, curated, and made available 721 polygenic scores from over 133 publications. The PGS Catalog REST API is the only method allowing programmatic access to this resource.ResultsHere, we describe quincunx, an R package that provides the first client interface to the PGS Catalog REST API. quincunx enables users to query and quickly retrieve, filter and integrate metadata associated with polygenic scores, as well as polygenic scoring files in tidy table format.Availabilityquincunx is freely available under an MIT License, and can be accessed from https://github.com/maialab/quincunx.

HTSlib: C library for reading/writing high-throughput sequencing data

GigaScience ◽

10.1093/gigascience/giab007 ◽

2021 ◽

Vol 10 (2) ◽

Cited By ~ 2

Author(s):

James K Bonfield ◽

John Marshall ◽

Petr Danecek ◽

Heng Li ◽

Valeriu Ohan ◽

...

Keyword(s):

High Throughput Sequencing ◽

International Standards ◽

Software Library ◽

Sequencing Data ◽

Global Alliance ◽

Access Protocols ◽

High Throughput Sequencing Data ◽

File Formats ◽

Data Files ◽

Programmatic Access

Abstract Background Since the original publication of the VCF and SAM formats, an explosion of software tools have been created to process these data files. To facilitate this a library was produced out of the original SAMtools implementation, with a focus on performance and robustness. The file formats themselves have become international standards under the jurisdiction of the Global Alliance for Genomics and Health. Findings We present a software library for providing programmatic access to sequencing alignment and variant formats. It was born out of the widely used SAMtools and BCFtools applications. Considerable improvements have been made to the original code plus many new features including newer access protocols, the addition of the CRAM file format, better indexing and iterators, and better use of threading. Conclusion Since the original Samtools release, performance has been considerably improved, with a BAM read-write loop running 5 times faster and BAM to SAM conversion 13 times faster (both using 16 threads, compared to Samtools 0.1.19). Widespread adoption has seen HTSlib downloaded >1 million times from GitHub and conda. The C library has been used directly by an estimated 900 GitHub projects and has been incorporated into Perl, Python, Rust, and R, significantly expanding the number of uses via other languages. HTSlib is open source and is freely available from htslib.org under MIT/BSD license.

rfaRm: An R client-side interface to facilitate the analysis of the Rfam database of RNA families

PLoS ONE ◽

10.1371/journal.pone.0245280 ◽

2021 ◽

Vol 16 (1) ◽

pp. e0245280

Author(s):

Lara Sellés Vidal ◽

Rafael Ayala ◽

Guy-Bart Stan ◽

Rodrigo Ledesma-Amaro

Keyword(s):

R Package ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Rfam Database ◽

Non Coding Rna ◽

Covariance Models ◽

Available Information ◽

Client Side ◽

Programmatic Access

rfaRm is an R package providing a client-side interface for the Rfam database of non-coding RNA and other structured RNA elements. The package facilitates the search of the Rfam database by keywords or sequences, as well as the retrieval of all available information about specific Rfam families, such as member sequences, multiple sequence alignments, secondary structures and covariance models. By providing such programmatic access to the Rfam database, rfaRm enables genomic workflows to incorporate information about non-coding RNA, whose potential cannot be fully exploited just through interactive access to the database. The features of rfaRm are demonstrated by using it to analyze the SARS-CoV-2 genome as an example case.

HTSlib - C library for reading/writing high-throughput sequencing data

10.1101/2020.12.16.423064 ◽

2020 ◽

Author(s):

James K. Bonfield ◽

John Marshall ◽

Petr Danecek ◽

Heng Li ◽

Valeriu Ohan ◽

...

Keyword(s):

High Throughput Sequencing ◽

International Standards ◽

Software Library ◽

Sequencing Data ◽

Global Alliance ◽

Access Protocols ◽

High Throughput Sequencing Data ◽

File Formats ◽

Data Files ◽

Programmatic Access

AbstractBackgroundSince the original publication of the VCF and SAM formats, an explosion of software tools have been created to process these data files. To facilitate this a library was produced out of the original SAMtools implementation, with a focus on performance and robustness. The file formats themselves have become international standards under the jurisdiction of the Global Alliance for Genomics and Health.FindingsWe present a software library for providing programmatic access to sequencing alignment and variant formats. It was born out of the widely used SAMtools and BCFtools applications. Considerable improvements have been made to the original code plus many new features including newer access protocols, the addition of the CRAM file format, better indexing and iterators, and better use of threading.ConclusionSince the original Samtools release, performance has been considerably improved, with a BAM read-write loop running 5 times faster and BAM to SAM conversion 13 times faster (both using 16 threads, compared to Samtools 0.1.19).Widespread adoption has seen HTSlib downloaded over a million times from GitHub and conda. The C library has been used directly by an estimated 900 GitHub projects and has been incorporated into Perl, Python, Rust and R, significantly expanding the number of uses via other languages. HTSlib is open source and is freely available from htslib.org under MIT / BSD [email protected]

From ArrayExpress to BioStudies

Nucleic Acids Research ◽

10.1093/nar/gkaa1062 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D1502-D1506

Author(s):

Ugis Sarkans ◽

Anja Füllgrabe ◽

Ahmed Ali ◽

Awais Athar ◽

Ehsan Behrangi ◽

...

Keyword(s):

Functional Genomics ◽

Microarray Data ◽

Archival Data ◽

Central Concept ◽

Multimodal Data ◽

Online Tool ◽

Technical Aspects ◽

Data Infrastructure ◽

European Nucleotide Archive ◽

Programmatic Access

Abstract ArrayExpress (https://www.ebi.ac.uk/arrayexpress) is an archive of functional genomics data at EMBL-EBI, established in 2002, initially as an archive for publication-related microarray data and was later extended to accept sequencing-based data. Over the last decade an increasing share of biological experiments involve multiple technologies assaying different biological modalities, such as epigenetics, and RNA and protein expression, and thus the BioStudies database (https://www.ebi.ac.uk/biostudies) was established to deal with such multimodal data. Its central concept is a study, which typically is associated with a publication. BioStudies stores metadata describing the study, provides links to the relevant databases, such as European Nucleotide Archive (ENA), as well as hosts the types of data for which specialized databases do not exist. With BioStudies now fully functional, we are able to further harmonize the archival data infrastructure at EMBL-EBI, and ArrayExpress is being migrated to BioStudies. In future, all functional genomics data will be archived at BioStudies. The process will be seamless for the users, who will continue to submit data using the online tool Annotare and will be able to query and download data largely in the same manner as before. Nevertheless, some technical aspects, particularly programmatic access, will change. This update guides the users through these changes.

Europe PMC: Programmatic access

10.6019/tol.epmc-api-w.2020.00001.1 ◽

2020 ◽

Keyword(s):

Programmatic Access

ThermoMutDB: a thermodynamic database for missense mutations

Nucleic Acids Research ◽

10.1093/nar/gkaa925 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D475-D479

Author(s):

Joicymara S Xavier ◽

Thanh-Binh Nguyen ◽

Malancha Karmarkar ◽

Stephanie Portelli ◽

Pâmela M Rezende ◽

...

Keyword(s):

Structural Information ◽

Amino Acid Sequences ◽

Missense Mutations ◽

Mutant Proteins ◽

Manual Curation ◽

Data Points ◽

Dynamic Structures ◽

Further Development ◽

Programmatic Access ◽

Annotation Errors

Abstract Proteins are intricate, dynamic structures, and small changes in their amino acid sequences can lead to large effects on their folding, stability and dynamics. To facilitate the further development and evaluation of methods to predict these changes, we have developed ThermoMutDB, a manually curated database containing >14,669 experimental data of thermodynamic parameters for wild type and mutant proteins. This represents an increase of 83% in unique mutations over previous databases and includes thermodynamic information on 204 new proteins. During manual curation we have also corrected annotation errors in previously curated entries. Associated with each entry, we have included information on the unfolding Gibbs free energy and melting temperature change, and have associated entries with available experimental structural information. ThermoMutDB supports users to contribute to new data points and programmatic access to the database via a RESTful API. ThermoMutDB is freely available at: http://biosig.unimelb.edu.au/thermomutdb.

programmatic access
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

insectDisease: programmatic access to the Ecological Database of the World's Insect Pathogens

ppx: Programmatic Access to Proteomics Data Repositories

ppx: Programmatic access to proteomics data repositories

quincunx: an R package to query, download and wrangle PGS Catalog data

HTSlib: C library for reading/writing high-throughput sequencing data

rfaRm: An R client-side interface to facilitate the analysis of the Rfam database of RNA families

HTSlib - C library for reading/writing high-throughput sequencing data

From ArrayExpress to BioStudies

Europe PMC: Programmatic access

ThermoMutDB: a thermodynamic database for missense mutations

Export Citation Format

programmatic accessRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

insectDisease: programmatic access to the Ecological Database of the World's Insect Pathogens

ppx: Programmatic Access to Proteomics Data Repositories

ppx: Programmatic access to proteomics data repositories

quincunx: an R package to query, download and wrangle PGS Catalog data

HTSlib: C library for reading/writing high-throughput sequencing data

rfaRm: An R client-side interface to facilitate the analysis of the Rfam database of RNA families

HTSlib - C library for reading/writing high-throughput sequencing data

From ArrayExpress to BioStudies

Europe PMC: Programmatic access

ThermoMutDB: a thermodynamic database for missense mutations

programmatic access
Recently Published Documents