scholarly journals Towards an Ecological Trait-data Standard Vocabulary

Author(s):  
Florian Schneider ◽  
David Fichtmüller ◽  
Martin Gossner ◽  
Anton Güntsch ◽  
Malte Jochum ◽  
...  

Trait-based research spans from evolutionary studies of individual-level properties to global patterns of biodiversity and ecosystem functioning. An increasing number of trait data is available for many different organism groups, published as open access data on a variety of file hosting services. Thus, standardization between datasets is generally lacking due to heterogeneous data formats and types. The compilation of these published data into centralised databases remains a difficult and time-consuming task. We reviewed existing trait databases and online services, as well as initiatives for trait data standardization. Together with data providers and users participating in a large long-term observation project on multiple taxa and research questions (the Biodiversity Exploratories, www.biodiversity-exploratories.de), we identified a need for a minimal trait-data terminology that is flexible enough to include traits from all types of organisms but simple enough to be adopted by different research communities. In order to facilitate reproducibility of analyses, the reuse of data and the combination of datasets from multiple sources, we propose a standardized vocabulary for trait data, the Ecological Trait-data Standard Vocabulary (ETS, hosted on GFBio Terminology Service, https://terminologies.gfbio.org/terms/ets/pages), which builds upon and is compatible with existing ontologies. By relying on unambiguous identifiers, the proposed minimal vocabulary for trait data captures the different degrees of resolution and measurement detail for multiple use cases of trait-based research. It further encourages the use of global Uniform Resource Identifiers (URI) for taxa and trait definitions, methods and units, thereby readying the data publication for the semantic web. An accompanying R-package (traitdataform) facilitates the upload of data to hosting services but also simplifies the access to published trait data. While originating from a current need in ecological research, in the next step, the described products are being developed for a seamless fit with broader initiatives on biodiversity data standardisation to foster a better linkage of ecological trait data and global e-infrastructures for biological data. The ETS is maintained and discussion on terms are managed via Github (https://github.com/EcologicalTraitData/ETS).

Author(s):  
Fabian Schmich ◽  
Jack Kuipers ◽  
Gunter Merdes ◽  
Niko Beerenwinkel

Abstract In the post-genomic era of big data in biology, computational approaches to integrate multiple heterogeneous data sets become increasingly important. Despite the availability of large amounts of omics data, the prioritisation of genes relevant for a specific functional pathway based on genetic screening experiments, remains a challenging task. Here, we introduce netprioR, a probabilistic generative model for semi-supervised integrative prioritisation of hit genes. The model integrates multiple network data sets representing gene–gene similarities and prior knowledge about gene functions from the literature with gene-based covariates, such as phenotypes measured in genetic perturbation screens, for example, by RNA interference or CRISPR/Cas9. We evaluate netprioR on simulated data and show that the model outperforms current state-of-the-art methods in many scenarios and is on par otherwise. In an application to real biological data, we integrate 22 network data sets, 1784 prior knowledge class labels and 3840 RNA interference phenotypes in order to prioritise novel regulators of Notch signalling in Drosophila melanogaster. The biological relevance of our predictions is evaluated using in silico and in vivo experiments. An efficient implementation of netprioR is available as an R package at http://bioconductor.org/packages/netprioR.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Yixin Kong ◽  
Ariangela Kozik ◽  
Cindy H. Nakatsu ◽  
Yava L. Jones-Hall ◽  
Hyonho Chun

Abstract A latent factor model for count data is popularly applied in deconvoluting mixed signals in biological data as exemplified by sequencing data for transcriptome or microbiome studies. Due to the availability of pure samples such as single-cell transcriptome data, the accuracy of the estimates could be much improved. However, the advantage quickly disappears in the presence of excessive zeros. To correctly account for this phenomenon in both mixed and pure samples, we propose a zero-inflated non-negative matrix factorization and derive an effective multiplicative parameter updating rule. In simulation studies, our method yielded the smallest bias. We applied our approach to brain gene expression as well as fecal microbiome datasets, illustrating the superior performance of the approach. Our method is implemented as a publicly available R-package, iNMF.


2020 ◽  
Vol 10 (1) ◽  
pp. 7
Author(s):  
Miguel R. Luaces ◽  
Jesús A. Fisteus ◽  
Luis Sánchez-Fernández ◽  
Mario Munoz-Organero ◽  
Jesús Balado ◽  
...  

Providing citizens with the ability to move around in an accessible way is a requirement for all cities today. However, modeling city infrastructures so that accessible routes can be computed is a challenge because it involves collecting information from multiple, large-scale and heterogeneous data sources. In this paper, we propose and validate the architecture of an information system that creates an accessibility data model for cities by ingesting data from different types of sources and provides an application that can be used by people with different abilities to compute accessible routes. The article describes the processes that allow building a network of pedestrian infrastructures from the OpenStreetMap information (i.e., sidewalks and pedestrian crossings), improving the network with information extracted obtained from mobile-sensed LiDAR data (i.e., ramps, steps, and pedestrian crossings), detecting obstacles using volunteered information collected from the hardware sensors of the mobile devices of the citizens (i.e., ramps and steps), and detecting accessibility problems with software sensors in social networks (i.e., Twitter). The information system is validated through its application in a case study in the city of Vigo (Spain).


2015 ◽  
Vol 66 (12) ◽  
pp. 1278 ◽  
Author(s):  
Diriba B. Kumssa ◽  
Edward J. M. Joy ◽  
E. Louise Ander ◽  
Michael J. Watts ◽  
Scott D. Young ◽  
...  

Magnesium (Mg) is an essential mineral micronutrient in humans. Risks of dietary Mg deficiency are affected by the quantity of Mg ingested and its bioavailability, which is influenced by the consumption of other nutrients and ‘anti-nutrients’. Here, we assess global dietary Mg supplies and risks of dietary deficiency, including the influence of other nutrients. Food supply and food composition data were used to derive the amount of Mg available per capita at national levels. Supplies of Mg were compared with estimated national per capita average requirement ‘cut points’. In 2011, global weighted mean Mg supply was 613 ± 69 mg person–1 day–1 compared with a weighted estimated average requirement for Mg of 173 mg person–1 day–1. This indicates a low risk of dietary Mg deficiency of 0.26% based on supply. This contrasts with published data from national individual-level dietary surveys, which indicate greater Mg deficiency risks. However, individuals in high-income countries are likely to under-report food consumption, which could lead to overestimation of deficiency risks. Furthermore, estimates of deficiency risk based on supply do not account for potential inhibitors of Mg absorption, including calcium, phytic acid and oxalate, and do not consider household food wastage.


Author(s):  
Jessica Bell ◽  
Megan Prictor ◽  
Lauren Davenport ◽  
Lynda O’Brien ◽  
Melissa Wake

‘Digital Mega-Studies’ are entirely or extensively digitised, longitudinal, population-scale initiatives, collecting, storing, and making available individual-level research data of different types and from multiple sources, shaped by technological developments and unforeseeable risks over time. The Australian ‘Gen V’ project exemplifies this new research paradigm. In 2019, we undertook a multidisciplinary, multi-stakeholder process to map Digital Mega-Studies’ key characteristics, legal and governance challenges and likely solutions. We conducted large and small group processes within a one-day symposium and directed online synthesis and group prioritisation over subsequent weeks. We present our methods (including elicitation, affinity mapping and prioritisation processes) and findings, proposing six priority governance principles across three areas—data, participation, trust—to support future high-quality, large-scale digital research in health.


AI Magazine ◽  
2015 ◽  
Vol 36 (1) ◽  
pp. 75-86 ◽  
Author(s):  
Jennifer Sleeman ◽  
Tim Finin ◽  
Anupam Joshi

We describe an approach for identifying fine-grained entity types in heterogeneous data graphs that is effective for unstructured data or when the underlying ontologies or semantic schemas are unknown. Identifying fine-grained entity types, rather than a few high-level types, supports coreference resolution in heterogeneous graphs by reducing the number of possible coreference relations that must be considered. Big data problems that involve integrating data from multiple sources can benefit from our approach when the datas ontologies are unknown, inaccessible or semantically trivial. For such cases, we use supervised machine learning to map entity attributes and relations to a known set of attributes and relations from appropriate background knowledge bases to predict instance entity types. We evaluated this approach in experiments on data from DBpedia, Freebase, and Arnetminer using DBpedia as the background knowledge base.


2017 ◽  
Author(s):  
Josine Min ◽  
Gibran Hemani ◽  
George Davey Smith ◽  
Caroline Relton ◽  
Matthew Suderman

AbstractBackgroundTechnological advances in high throughput DNA methylation microarrays have allowed dramatic growth of a new branch of epigenetic epidemiology. DNA methylation datasets are growing ever larger in terms of the number of samples profiled, the extent of genome coverage, and the number of studies being meta-analysed. Novel computational solutions are required to efficiently handle these data.MethodsWe have developed meffil, an R package designed to quality control, normalize and perform epigenome-wide association studies (EWAS) efficiently on large samples of Illumina Infinium HumanMethylation450 and MethylationEPIC BeadChip microarrays. We tested meffil by applying it to 6000 450k microarrays generated from blood collected for two different datasets, Accessible Resource for Integrative Epigenomic Studies (ARIES) and The Genetics of Overweight Young Adults (GOYA) study.ResultsA complete reimplementation of functional normalization minimizes computational memory requirements to 5% of that required by other R packages, without increasing running time. Incorporating fixed and random effects alongside functional normalization, and automated estimation of functional normalisation parameters reduces technical variation in DNA methylation levels, thus reducing false positive associations and improving power. We also demonstrate that the ability to normalize datasets distributed across physically different locations without sharing any biologically-based individual-level data may reduce heterogeneity in meta-analyses of epigenome-wide association studies. However, we show that when batch is perfectly confounded with cases and controls functional normalization is unable to prevent spurious associations.Conclusionsmeffil is available online (https://github.com/perishky/meffil/) along with tutorials covering typical use cases.


2020 ◽  
Author(s):  
Cindy Veldhuis

Intimate relationships provide protections against excess stress. Little research has investigated this in same-sex/gender couples, and particularly interracial/interethnic same-sex/gender couples. In a sample of N = 215 women in same-sex/gender couple relationships, 43% if whom were in interracial/interethnic relationships, we examined differences in general stressors and both individual- and couple-level minority stressors. Women in interracial/interethnic couple relationships reported higher levels of individual-level childhood stress, microaggressions, stress related to race/ethnicity, and couple-level expectations and stereotypes. We also examined the associations between stressors and relationship outcomes and whether these associations differed comparing women in monoracial and interracial/interethnic couple relationships. We found multiple sources of general stressors and individual- and couple-level stressors that were associated with poorer relationship outcomes but found few differences by whether couples were monoracial or interracial/interethnic. Our findings have implications for couple-level interventions and highlight the importance of taking intersectional approaches to research on same-sex couples, as well as the importance of examining multiple sources and levels of stress.


2021 ◽  
Author(s):  
Sara B Mullaney ◽  
Heather Bayko ◽  
Gerald D Moore ◽  
Hannah E Funke ◽  
Matthew J Enroth ◽  
...  

ABSTRACT Introduction U.S. Army Veterinary Corps provides highly skilled and adaptive veterinary professionals to protect and improve the health of people and animals while enhancing readiness throughout the DOD. Army veterinarians must be trained and credentialed for critical tasks within the animal health and food protection missions across all components. The Veterinary Metrics Division in the U.S. Army Public Health Center’s Veterinary Services and Public Health Sanitation Directorate is responsible for tracking readiness metrics of Army veterinarians and maintains a robust online Readiness Metrics Platform. Readiness targets were developed based on trends in readiness platform data, input of senior veterinary subject matter experts, and feedback from the field. To date, no data have been published describing the cases presented to DOD-owned Veterinary Treatment Facilities (VTFs). Without capturing and codifying the types of cases that present to the VTF and comparing to cases typically encountered during deployments, it is difficult to determine whether the VTF serves as an adequate readiness platform. In this study, we compare a representative random sample of non-wellness VTF patient encounters in garrison to cases reported from two different combat zones to determine if the VTF is a suitable clinical readiness platform. Materials and Methods Multiple data sources, including pre-existing published data and new data extracted from multiple sources, were used. The Iraq 2009-2010 dataset includes data collected from a Medical Detachment, Veterinary Service Support (MDVSS) deployed to Iraq from January 5, 2009 through August 23, 2010. The Iraq 2003-2007 dataset originated from a retrospective cross-sectional survey that included database and medical record abstraction. The Afghanistan 2014-2015 dataset includes data collected from the MDVSS deployed to Afghanistan from June 2014 to March 2015. Working dog veterinary encounter data were manually extracted from monthly and daily clinical reports. Data for the Garrison 2016-2018 dataset were extracted from the Remote Online Veterinary Record. A random representative sample of government-owned animal (GOA) and privately owned animal (POA) encounters seen across all DOD-owned VTFs from June 2016 to May 2018 were selected. Results We found that animals present to the VTF for a wide variety of illnesses. Overall, the top 10 encounter categories (90.3%) align with 84.2%, 92.4%, and 85.9% of all the encounter types seen in the three combat zone datasets. Comparing these datasets identifies potential gaps in readiness training relying solely on the VTF, especially in the areas of traumatic and combat-related injuries. Conclusions Ultimately, the success of the DOD Veterinary Services Animal Health mission depends on both the competence and confidence of the individual Army veterinarian. As the MHS transitions and DOD Veterinary Services continues to transform emphasizing readiness through a public health and prevention-based Army medicine approach, Army veterinarians must strike a delicate balance to continue to provide comprehensive health care to GOAs and POAs in the VTFs. Leaders at all levels must recognize the roles VTFs play in overall public health readiness and disease prevention through the proper appropriation and allocation of resources while fostering the development, confidence, and competence of Army veterinarians training within these readiness platforms.


2021 ◽  
Vol 8 (12) ◽  
Author(s):  
Daniel J. Lawson ◽  
Vinesh Solanki ◽  
Igor Yanovich ◽  
Johannes Dellert ◽  
Damian Ruck ◽  
...  

Integrating datasets from different disciplines is hard because the data are often qualitatively different in meaning, scale and reliability. When two datasets describe the same entities, many scientific questions can be phrased around whether the (dis)similarities between entities are conserved across such different data. Our method, CLARITY, quantifies consistency across datasets, identifies where inconsistencies arise and aids in their interpretation. We illustrate this using three diverse comparisons: gene methylation versus expression, evolution of language sounds versus word use, and country-level economic metrics versus cultural beliefs. The non-parametric approach is robust to noise and differences in scaling, and makes only weak assumptions about how the data were generated. It operates by decomposing similarities into two components: a ‘structural’ component analogous to a clustering, and an underlying ‘relationship’ between those structures. This allows a ‘structural comparison’ between two similarity matrices using their predictability from ‘structure’. Significance is assessed with the help of re-sampling appropriate for each dataset. The software, CLARITY, is available as an R package from github.com/danjlawson/CLARITY .


Sign in / Sign up

Export Citation Format

Share Document