Europe PMC: Programmatic access

AbstractThe Omics Discovery Index is an open source platform that can be used to access, discover and disseminate omics datasets. OmicsDI integrates proteomics, genomics, metabolomics, models and transcriptomics datasets. Using an efficient indexing system, OmicsDI integrates different biological entities including genes, transcripts, proteins, metabolites and the corresponding publications from PubMed. In addition, it implements a group of pipelines to estimate the impact of each dataset by tracing the number of citations, reanalysis and biological entities reported by each dataset. Here, we present the OmicsDI REST interface to enable programmatic access to any dataset in OmicsDI or all the datasets for a specific provider (database). Clients can perform queries on the API using different metadata information such as sample details (species, tissues, etc), instrumentation (mass spectrometer, sequencer), keywords and other provided annotations. In addition, we present two different libraries in R and Python to facilitate the development of tools that can programmatically interact with the OmicsDI REST interface.

Download Full-text

R Python, and Ruby clients for GBIF species occurrence data

10.7287/peerj.preprints.3304v1 ◽

2017 ◽

Cited By ~ 5

Author(s):

Scott A Chamberlain ◽

Carl Boettiger

Keyword(s):

Programming Languages ◽

Significant Contribution ◽

Species Occurrence ◽

Global Biodiversity Information Facility ◽

Occurrence Data ◽

Research Questions ◽

Global Biodiversity ◽

Number Of Individuals ◽

Biodiversity Information ◽

Programmatic Access

Background. The number of individuals of each species in a given location forms the basis for many sub-fields of ecology and evolution. Data on individuals, including which species, and where they're found can be used for a large number of research questions. Global Biodiversity Information Facility (hereafter, GBIF) is the largest of these. Programmatic clients for GBIF would make research dealing with GBIF data much easier and more reproducible. Methods. We have developed clients to access GBIF data for each of the R, Python, and Ruby programming languages: rgbif, pygbif, gbifrb. Results. For all clients we describe their design and utility, and demonstrate some use cases. Discussion. Programmatic access to GBIF will facilitate more open and reproducible science - the three GBIF clients described herein are a significant contribution towards this goal.

Download Full-text

quincunx: an R package to query, download and wrangle PGS Catalog data

10.1101/2021.02.19.431997 ◽

2021 ◽

Author(s):

Ramiro Magno ◽

Isabel Duarte ◽

Ana -Teresa Maia

Keyword(s):

R Package ◽

Polygenic Score ◽

Link Type ◽

Polygenic Scores ◽

Rest Api ◽

Programmatic Access

AbstractMotivationThe Polygenic Score (PGS) Catalog is a recently established open database of published polygenic scores that, to date, has collected, curated, and made available 721 polygenic scores from over 133 publications. The PGS Catalog REST API is the only method allowing programmatic access to this resource.ResultsHere, we describe quincunx, an R package that provides the first client interface to the PGS Catalog REST API. quincunx enables users to query and quickly retrieve, filter and integrate metadata associated with polygenic scores, as well as polygenic scoring files in tidy table format.Availabilityquincunx is freely available under an MIT License, and can be accessed from https://github.com/maialab/quincunx.

Download Full-text

DisProt: intrinsic protein disorder annotation in 2020

Nucleic Acids Research ◽

10.1093/nar/gkz975 ◽

2019 ◽

Cited By ~ 16

Author(s):

András Hatos ◽

Borbála Hajdu-Soltész ◽

Alexander M Monzon ◽

Nicolas Palopoli ◽

Lucía Álvarez ◽

...

Keyword(s):

Text Mining ◽

Search Engine ◽

Intrinsically Disordered Proteins ◽

Disordered Proteins ◽

Protein Disorder ◽

Graphical Interface ◽

Intrinsically Disordered ◽

Intrinsic Protein Disorder ◽

Recent Developments ◽

Programmatic Access

Abstract The Database of Protein Disorder (DisProt, URL: https://disprot.org) provides manually curated annotations of intrinsically disordered proteins from the literature. Here we report recent developments with DisProt (version 8), including the doubling of protein entries, a new disorder ontology, improvements of the annotation format and a completely new website. The website includes a redesigned graphical interface, a better search engine, a clearer API for programmatic access and a new annotation interface that integrates text mining technologies. The new entry format provides a greater flexibility, simplifies maintenance and allows the capture of more information from the literature. The new disorder ontology has been formalized and made interoperable by adopting the OWL format, as well as its structure and term definitions have been improved. The new annotation interface has made the curation process faster and more effective. We recently showed that new DisProt annotations can be effectively used to train and validate disorder predictors. We believe the growth of DisProt will accelerate, contributing to the improvement of function and disorder predictors and therefore to illuminate the ‘dark’ proteome.

Download Full-text

From ArrayExpress to BioStudies

Nucleic Acids Research ◽

10.1093/nar/gkaa1062 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D1502-D1506

Author(s):

Ugis Sarkans ◽

Anja Füllgrabe ◽

Ahmed Ali ◽

Awais Athar ◽

Ehsan Behrangi ◽

...

Keyword(s):

Functional Genomics ◽

Microarray Data ◽

Archival Data ◽

Central Concept ◽

Multimodal Data ◽

Online Tool ◽

Technical Aspects ◽

Data Infrastructure ◽

European Nucleotide Archive ◽

Programmatic Access

Abstract ArrayExpress (https://www.ebi.ac.uk/arrayexpress) is an archive of functional genomics data at EMBL-EBI, established in 2002, initially as an archive for publication-related microarray data and was later extended to accept sequencing-based data. Over the last decade an increasing share of biological experiments involve multiple technologies assaying different biological modalities, such as epigenetics, and RNA and protein expression, and thus the BioStudies database (https://www.ebi.ac.uk/biostudies) was established to deal with such multimodal data. Its central concept is a study, which typically is associated with a publication. BioStudies stores metadata describing the study, provides links to the relevant databases, such as European Nucleotide Archive (ENA), as well as hosts the types of data for which specialized databases do not exist. With BioStudies now fully functional, we are able to further harmonize the archival data infrastructure at EMBL-EBI, and ArrayExpress is being migrated to BioStudies. In future, all functional genomics data will be archived at BioStudies. The process will be seamless for the users, who will continue to submit data using the online tool Annotare and will be able to query and download data largely in the same manner as before. Nevertheless, some technical aspects, particularly programmatic access, will change. This update guides the users through these changes.

Download Full-text

ThermoMutDB: a thermodynamic database for missense mutations

Nucleic Acids Research ◽

10.1093/nar/gkaa925 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D475-D479

Author(s):

Joicymara S Xavier ◽

Thanh-Binh Nguyen ◽

Malancha Karmarkar ◽

Stephanie Portelli ◽

Pâmela M Rezende ◽

...

Keyword(s):

Structural Information ◽

Amino Acid Sequences ◽

Missense Mutations ◽

Mutant Proteins ◽

Manual Curation ◽

Data Points ◽

Dynamic Structures ◽

Further Development ◽

Programmatic Access ◽

Annotation Errors

Abstract Proteins are intricate, dynamic structures, and small changes in their amino acid sequences can lead to large effects on their folding, stability and dynamics. To facilitate the further development and evaluation of methods to predict these changes, we have developed ThermoMutDB, a manually curated database containing >14,669 experimental data of thermodynamic parameters for wild type and mutant proteins. This represents an increase of 83% in unique mutations over previous databases and includes thermodynamic information on 204 new proteins. During manual curation we have also corrected annotation errors in previously curated entries. Associated with each entry, we have included information on the unfolding Gibbs free energy and melting temperature change, and have associated entries with available experimental structural information. ThermoMutDB supports users to contribute to new data points and programmatic access to the database via a RESTful API. ThermoMutDB is freely available at: http://biosig.unimelb.edu.au/thermomutdb.

Download Full-text

Clinical Trials in Precision Oncology

Clinical Chemistry ◽

10.1373/clinchem.2015.247437 ◽

2016 ◽

Vol 62 (3) ◽

pp. 442-448 ◽

Cited By ~ 5

Author(s):

Susan M Mockus ◽

Sara E Patterson ◽

Cara Statz ◽

Carol J Bult ◽

Gregory J Tsongalis

Keyword(s):

Clinical Trial ◽

Clinical Trials ◽

Egfr Mutation ◽

Bioinformatics Analysis ◽

Precision Oncology ◽

Reporting Standards ◽

Clinical Trial Registries ◽

Genomics Research ◽

Trial Outcomes ◽

Programmatic Access

Abstract BACKGROUND Availability of genomic information used in the management of cancer treatment has outpaced both regulatory and reimbursement efforts. Many types of clinical trials are underway to validate the utility of emerging genome-based biomarkers for diagnostic, prognostic, and predictive applications. Clinical trials are a key source of evidence required for US Food and Drug Administration approval of therapies and companion diagnostics and for establishing the acceptance criteria for reimbursement. CONTENT Determining the eligibility of patients for molecular-based clinical trials and the interpretation of data emerging from clinical trials is significantly hampered by 2 primary factors: the lack of specific reporting standards for biomarkers in clinical trials and the lack of adherence to official gene and variant naming standards. Clinical trial registries need specifics on the mutation required for enrollment as opposed to allowing a generic mutation entry such as, “EGFR mutation.” The use of clinical trials data in bioinformatics analysis and reporting is also gated by the lack of robust, state of the art programmatic access support. An initiative is needed to develop community standards for clinical trial descriptions and outcome reporting that are modeled after similar efforts in the genomics research community. SUMMARY Systematic implementation of reporting standards is needed to insure consistency and specificity of biomarker data, which will in turn enable better comparison and assessment of clinical trial outcomes across multiple studies. Reporting standards will facilitate improved identification of relevant clinical trials, aggregation and comparison of information across independent trials, and programmatic access to clinical trials databases.

Download Full-text