scholarly journals GlyGen data model and processing workflow

2020 ◽  
Vol 36 (12) ◽  
pp. 3941-3943 ◽  
Author(s):  
Robel Kahsay ◽  
Jeet Vora ◽  
Rahi Navelkar ◽  
Reza Mousavi ◽  
Brian C Fochtman ◽  
...  

Abstract Summary Glycoinformatics plays a major role in glycobiology research, and the development of a comprehensive glycoinformatics knowledgebase is critical. This application note describes the GlyGen data model, processing workflow and the data access interfaces featuring programmatic use case example queries based on specific biological questions. The GlyGen project is a data integration, harmonization and dissemination project for carbohydrate and glycoconjugate-related data retrieved from multiple international data sources including UniProtKB, GlyTouCan, UniCarbKB and other key resources. Availability and implementation GlyGen web portal is freely available to access at https://glygen.org. The data portal, web services, SPARQL endpoint and GitHub repository are also freely available at https://data.glygen.org, https://api.glygen.org, https://sparql.glygen.org and https://github.com/glygener, respectively. All code is released under license GNU General Public License version 3 (GNU GPLv3) and is available on GitHub https://github.com/glygener. The datasets are made available under Creative Commons Attribution 4.0 International (CC BY 4.0) license. Supplementary information Supplementary data are available at Bioinformatics online.

2014 ◽  
Vol 635-637 ◽  
pp. 1948-1951
Author(s):  
Yao Guang Hu ◽  
Dong Feng Wu ◽  
Jing Qian Wen

On the basis of the electronic components business processes and the analysis of the quality data related, a model based on the object entity of the product life cycle is proposed. Object entity as the carrier of the related data this model mergers and reorganizes the related business, meanwhile links the entity through the revolved information of the quality data model thus achieving the integrity of the business in both time and space. This data model as the basis, can effectively realize the integration and sharing of quality data, facilitates the quality data analysis and quality traceability, and improve the capabilities of quality data management for the enterprise.


Author(s):  
Sharif Islam

The European Loans and Visits System (ELViS) is an e-service in development designed to improve access to natural history collections across Europe. Bringing together heterogeneous datasets about institutions, people, collections and specimens, ELViS will provide an e-service (with application programming interfaces (APIs) and portal) that handles various stages of collections-based research. One of the main functionalities of ELViS is to facilitate loan and visit requests related to collections. To facilitate activities such as searching for collections, requesting loans, generating reports on collection usage, and ensuring interoperability with existing and new systems and services, ELViS must use a standard way of describing collections. In this talk, I show how ELViS can use the Collection Descriptions (CD) standard currently being developed by the CD Task Group at TDWG. I will provide a brief introduction to ELViS, summarise the current development efforts, and show how the Collection Description standard can support specific user requirements (gathered via an extensive set of user stories). I will also provide insight into the data elements within ELViS (see Fig. 1) and how they relate to the Collection Description data model.


2021 ◽  
Author(s):  
Sarah Bauermeister ◽  
Joshua R Bauermeister ◽  
R Bridgman ◽  
C Felici ◽  
M Newbury ◽  
...  

Abstract Research-ready data (that curated to a defined standard) increases scientific opportunity and rigour by integrating the data environment. The development of research platforms has highlighted the value of research-ready data, particularly for multi-cohort analyses. Following user consultation, a standard data model (C-Surv), optimised for data discovery, was developed using data from 12 Dementias Platform UK (DPUK) population and clinical cohort studies. The model uses a four-tier nested structure based on 18 data themes selected according to user behaviour or technology. Standard variable naming conventions are applied to uniquely identify variables within the context of longitudinal studies. The data model was used to develop a harmonised dataset for 11 cohorts. This dataset populated the Cohort Explorer data discovery tool for assessing the feasibility of an analysis prior to making a data access request. It was concluded that developing and applying a standard data model (C-Surv) for research cohort data is feasible and useful.


2021 ◽  
Vol 2083 (4) ◽  
pp. 042001
Author(s):  
Nan Zhang ◽  
Wenqiang Zhang ◽  
Yingnan Shang

Abstract The emergence of computer big data related data provides a new method for the construction of knowledge links in the knowledge map. This realizes an objective knowledge network with practical significance that is easier to be understood by machines. The article combines the four principles of linked data publishing content objects and their semantic characteristics, and uses the RDF data model to convert unstructured data on the Internet and structured data that adopts different standards into unified standard structured data for association. The system forms a huge knowledge map with semantics, intelligence, and dynamics.


Author(s):  
Eugenia Rinaldi ◽  
Sylvia Thun

HiGHmed is a German Consortium where eight University Hospitals have agreed to the cross-institutional data exchange through novel medical informatics solutions. The HiGHmed Use Case Infection Control group has modelled a set of infection-related data in the openEHR format. In order to establish interoperability with the other German Consortia belonging to the same national initiative, we mapped the openEHR information to the Fast Healthcare Interoperability Resources (FHIR) format recommended within the initiative. FHIR enables fast exchange of data thanks to the discrete and independent data elements into which information is organized. Furthermore, to explore the possibility of maximizing analysis capabilities for our data set, we subsequently mapped the FHIR elements to the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM). The OMOP data model is designed to support the conduct of research to identify and evaluate associations between interventions and outcomes caused by these interventions. Mapping across standard allows to exploit their peculiarities while establishing and/or maintaining interoperability. This article provides an overview of our experience in mapping infection control related data across three different standards openEHR, FHIR and OMOP CDM.


Author(s):  
Matt Woodburn ◽  
Gabriele Droege ◽  
Sharon Grant ◽  
Quentin Groom ◽  
Janeen Jones ◽  
...  

The utopian vision is of a future where a digital representation of each object in our collections is accessible through the internet and sustainably linked to other digital resources. This is a long term goal however, and in the meantime there is an urgent need to share data about our collections at a higher level with a range of stakeholders (Woodburn et al. 2020). To sustainably achieve this, and to aggregate this information across all natural science collections, the data need to be standardised (Johnston and Robinson 2002). To this end, the Biodiversity Information Standards (TDWG) Collection Descriptions (CD) Interest Group has developed a data standard for describing collections, which is approaching formal review for ratification as a new TDWG standard. It proposes 20 classes (Suppl. material 1) and over 100 properties that can be used to describe, categorise, quantify, link and track digital representations of natural science collections, from high-level approximations to detailed breakdowns depending on the purpose of a particular implementation. The wide range of use cases identified for representing collection description data means that a flexible approach to the standard and the underlying modelling concepts is essential. These are centered around the ‘ObjectGroup’ (Fig. 1), a class that may represent any group (of any size) of physical collection objects, which have one or more common characteristics. This generic definition of the ‘collection’ in ‘collection descriptions’ is an important factor in making the standard flexible enough to support the breadth of use cases. For any use case or implementation, only a subset of classes and properties within the standard are likely to be relevant. In some cases, this subset may have little overlap with those selected for other use cases. This additional need for flexibility means that very few classes and properties, representing the core concepts, are proposed to be mandatory. Metrics, facts and narratives are represented in a normalised structure using an extended MeasurementOrFact class, so that these can be user-defined rather than constrained to a set identified by the standard. Finally, rather than a rigid underlying data model as part of the normative standard, documentation will be developed to provide guidance on how the classes in the standard may be related and quantified according to relational, dimensional and graph-like models. So, in summary, the standard has, by design, been made flexible enough to be used in a number of different ways. The corresponding risk is that it could be used in ways that may not deliver what is needed in terms of outputs, manageability and interoperability with other resources of collection-level or object-level data. To mitigate this, it is key for any new implementer of the standard to establish how it should be used in that particular instance, and define any necessary constraints within the wider scope of the standard and model. This is the concept of the ‘collection description scheme,’ a profile that defines elements such as: which classes and properties should be included, which should be mandatory, and which should be repeatable; which controlled vocabularies and hierarchies should be used to make the data interoperable; how the collections should be broken down into individual ObjectGroups and interlinked, and how the various classes should be related to each other. which classes and properties should be included, which should be mandatory, and which should be repeatable; which controlled vocabularies and hierarchies should be used to make the data interoperable; how the collections should be broken down into individual ObjectGroups and interlinked, and how the various classes should be related to each other. Various factors might influence these decisions, including the types of information that are relevant to the use case, whether quantitative metrics need to be captured and aggregated across collection descriptions, and how many resources can be dedicated to amassing and maintaining the data. This process has particular relevance to the Distributed System of Scientific Collections (DiSSCo) consortium, the design of which incorporates use cases for storing, interlinking and reporting on the collections of its member institutions. These include helping users of the European Loans and Visits System (ELViS) (Islam 2020) to discover specimens for physical and digital loans by providing descriptions and breakdowns of the collections of holding institutions, and monitoring digitisation progress across European collections through a dynamic Collections Digitisation Dashboard. In addition, DiSSCo will be part of a global collections data ecosystem requiring interoperation with other infrastructures such as the GBIF (Global Biodiversity Information Facility) Registry of Scientific Collections, the CETAF (Consortium of European Taxonomic Facilities) Registry of Collections and Index Herbariorum. In this presentation, we will introduce the draft standard and discuss the process of defining new collection description schemes using the standard and data model, and focus on DiSSCo requirements as examples of real-world collection descriptions use cases.


2020 ◽  
Vol 1 ◽  
pp. 1-23
Author(s):  
Majid Hojati ◽  
Colin Robertson

Abstract. With new forms of digital spatial data driving new applications for monitoring and understanding environmental change, there are growing demands on traditional GIS tools for spatial data storage, management and processing. Discrete Global Grid System (DGGS) are methods to tessellate globe into multiresolution grids, which represent a global spatial fabric capable of storing heterogeneous spatial data, and improved performance in data access, retrieval, and analysis. While DGGS-based GIS may hold potential for next-generation big data GIS platforms, few of studies have tried to implement them as a framework for operational spatial analysis. Cellular Automata (CA) is a classic dynamic modeling framework which has been used with traditional raster data model for various environmental modeling such as wildfire modeling, urban expansion modeling and so on. The main objectives of this paper are to (i) investigate the possibility of using DGGS for running dynamic spatial analysis, (ii) evaluate CA as a generic data model for dynamic phenomena modeling within a DGGS data model and (iii) evaluate an in-database approach for CA modelling. To do so, a case study into wildfire spread modelling is developed. Results demonstrate that using a DGGS data model not only provides the ability to integrate different data sources, but also provides a framework to do spatial analysis without using geometry-based analysis. This results in a simplified architecture and common spatial fabric to support development of a wide array of spatial algorithms. While considerable work remains to be done, CA modelling within a DGGS-based GIS is a robust and flexible modelling framework for big-data GIS analysis in an environmental monitoring context.


Author(s):  
Денис Валерьевич Сикулер

В статье выполнен обзор 10 ресурсов сети Интернет, позволяющих подобрать данные для разнообразных задач, связанных с машинным обучением и искусственным интеллектом. Рассмотрены как широко известные сайты (например, Kaggle, Registry of Open Data on AWS), так и менее популярные или узкоспециализированные ресурсы (к примеру, The Big Bad NLP Database, Common Crawl). Все ресурсы предоставляют бесплатный доступ к данным, в большинстве случаев для этого даже не требуется регистрация. Для каждого ресурса указаны характеристики и особенности, касающиеся поиска и получения наборов данных. В работе представлены следующие сайты: Kaggle, Google Research, Microsoft Research Open Data, Registry of Open Data on AWS, Harvard Dataverse Repository, Zenodo, Портал открытых данных Российской Федерации, World Bank, The Big Bad NLP Database, Common Crawl. The work presents review of 10 Internet resources that can be used to find data for different tasks related to machine learning and artificial intelligence. There were examined some popular sites (like Kaggle, Registry of Open Data on AWS) and some less known and specific ones (like The Big Bad NLP Database, Common Crawl). All included resources provide free access to data. Moreover in most cases registration is not needed for data access. Main features are specified for every examined resource, including regarding data search and access. The following sites are included in the review: Kaggle, Google Research, Microsoft Research Open Data, Registry of Open Data on AWS, Harvard Dataverse Repository, Zenodo, Open Data portal of the Russian Federation, World Bank, The Big Bad NLP Database, Common Crawl.


2016 ◽  
pp. 1756-1773
Author(s):  
Grzegorz Spyra ◽  
William J. Buchanan ◽  
Peter Cruickshank ◽  
Elias Ekonomou

This paper proposes a new identity, and its underlying meta-data, model. The approach enables secure spanning of identity meta-data across many boundaries such as health-care, financial and educational institutions, including all others that store and process sensitive personal data. It introduces the new concepts of Compound Personal Record (CPR) and Compound Identifiable Data (CID) ontology, which aim to move toward own your own data model. The CID model ensures authenticity of identity meta-data; high availability via unified Cloud-hosted XML data structure; and privacy through encryption, obfuscation and anonymity applied to Ontology-based XML distributed content. Additionally CID via XML ontologies is enabled for identity federation. The paper also suggests that access over sensitive data should be strictly governed through an access control model with granular policy enforcement on the service side. This includes the involvement of relevant access control model entities, which are enabled to authorize an ad-hoc break-glass data access, which should give high accountability for data access attempts.


2020 ◽  
Vol 10 (1) ◽  
pp. 140-146
Author(s):  
Klaus Böhm ◽  
Tibor Kubjatko ◽  
Daniel Paula ◽  
Hans-Georg Schweiger

AbstractWith the upcoming new legislative rules in the EU on Event Data Recorder beginning 2022 the question is whether the discussed data base is sufficient for the needs of clarifying accidents involving automated vehicles. Based on the reconstruction of real accidents including vehicles with ADAS combined with specially designed crash tests a broader data base than US EDR regulation (NHTSA 49 CFR Part 563.7) is proposed. The working group AHEAD, to which the authors contribute, has already elaborated a data model that fits the needs of automated driving. The structure of this data model is shown. Moreover, the special benefits of storing internal video or photo feeds form the vehicle camera systems combined with object data is illustrated. When using a sophisticate 3D measurement method of the accident scene the videos or photos can also serve as a control instance for the stored vehicle data. The AHEAD Data Model enhanced with the storage of the video and photo feeds should be considered in the planned roadmap of the Informal Working Group (IWG) on EDR/ DSSAD (Data Storage System for Automated Driving) reporting to UNECE WP29. Also, a data access over the air using technology already applied in China for electric vehicles called Real Time Monitoring would allow a quantum leap in forensic accident reconstruction.


Sign in / Sign up

Export Citation Format

Share Document