scholarly journals Towards FAIRer Biological Knowledge Networks Using a Hybrid Linked Data and Graph Database Approach

2018 ◽  
Vol 15 (3) ◽  
Author(s):  
Marco Brandizi ◽  
Ajit Singh ◽  
Christopher Rawlings ◽  
Keywan Hassani-Pak

Abstract The speed and accuracy of new scientific discoveries – be it by humans or artificial intelligence – depends on the quality of the underlying data and on the technology to connect, search and share the data efficiently. In recent years, we have seen the rise of graph databases and semi-formal data models such as knowledge graphs to facilitate software approaches to scientific discovery. These approaches extend work based on formalised models, such as the Semantic Web. In this paper, we present our developments to connect, search and share data about genome-scale knowledge networks (GSKN). We have developed a simple application ontology based on OWL/RDF with mappings to standard schemas. We are employing the ontology to power data access services like resolvable URIs, SPARQL endpoints, JSON-LD web APIs and Neo4j-based knowledge graphs. We demonstrate how the proposed ontology and graph databases considerably improve search and access to interoperable and reusable biological knowledge (i.e. the FAIRness data principles).

Author(s):  
Yang Zhao ◽  
Jiajun Zhang ◽  
Yu Zhou ◽  
Chengqing Zong

Knowledge graphs (KGs) store much structured information on various entities, many of which are not covered by the parallel sentence pairs of neural machine translation (NMT). To improve the translation quality of these entities, in this paper we propose a novel KGs enhanced NMT method. Specifically, we first induce the new translation results of these entities by transforming the source and target KGs into a unified semantic space. We then generate adequate pseudo parallel sentence pairs that contain these induced entity pairs. Finally, NMT model is jointly trained by the original and pseudo sentence pairs. The extensive experiments on Chinese-to-English and Englishto-Japanese translation tasks demonstrate that our method significantly outperforms the strong baseline models in translation quality, especially in handling the induced entities.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1651 ◽  
Author(s):  
Ajit Singh ◽  
Christopher J. Rawlings ◽  
Keywan Hassani-Pak

KnetMaps is a BioJS component for the interactive visualization of biological knowledge networks. It is well suited for applications that need to visualise complementary, connected and content-rich data in a single view in order to help users to traverse pathways linking entities of interest, for example to go from genotype to phenotype. KnetMaps loads data in JSON format, visualizes the structure and content of knowledge networks using lightweight JavaScript libraries, and supports interactive touch gestures. KnetMaps uses effective visualization techniques to prevent information overload and to allow researchers to progressively build their knowledge.


2019 ◽  
Author(s):  
Kokulapalan Wimalanathan ◽  
Carolyn J. Lawrence-Dill

AbstractAnnotating gene structures and functions to genome assemblies is a must to make assembly resources useful for biological inference. Gene Ontology (GO) term assignment is the most pervasively used functional annotation system, and new methods for GO assignment have improved the quality of GO-based function predictions. GOMAP, the Gene Ontology Meta Annotator for Plants (GOMAP) is an optimized, high-throughput, and reproducible pipeline for genome-scale GO annotation for plant genomes. GOMAP’s methods have been shown to expand and improve the number of genes annotated and annotations assigned per gene as well as the quality (based on F-score) of GO assignments in maize. Here we report on the pipeline’s availability and performance for annotating large, repetitive plant genomes and describe how to deploy GOMAP to annotate additional plant genomes. We containerized GOMAP to increase portability and reproducibility, and optimized its performance for HPC environments. GOMAP has been used to annotate multiple maize lines, and is currently being deployed to annotate other species including wheat, rice, barley, cotton, soy, and others. Instructions along with access to the GOMAP Singularity container are freely available online at https://gomap-singularity.readthedocs.io/en/latest/. A list of annotated genomes and links to data is maintained at https://dill-picl.org/projects/gomap/gomap-datasets/.


BMJ Open ◽  
2018 ◽  
Vol 8 (3) ◽  
pp. e019082 ◽  
Author(s):  
Filipa Landeiro ◽  
Katie Walsh ◽  
Isaac Ghinai ◽  
Seher Mughal ◽  
Elsbeth Nye ◽  
...  

IntroductionDementia is the fastest growing major cause of disability globally and may have a profound impact on the health-related quality of life (HRQoL) of both the patient with dementia and those who care for them. This review aims to systematically identify and synthesise the measurements of HRQoL for people with, and their caregivers across the full spectrum of, dementia from its preceding stage of predementia to end of life.Methods and analysisA systematic literature review was conducted in Medical Literature Analysis and Retrieval System Online , ExcerptaMedicadataBASE, Cochrane Database of Systematic Reviews , Cochrane Central Register of Controlled Trials, Database of Abstracts of Reviews of Effect, National Health Service Economic Evaluation Database and PsycINFO between January 1990 and the end of April 2017. Two reviewers will independently assess each study for inclusion and disagreements will be resolved by a third reviewer. Data will be extracted using a predefined data extraction form following best practice. Study quality will be assessed with the Effective Public Health Practice Project quality assessment tool. HRQoL measurements will be presented separately for people with dementia and caregivers by instrument used and, when possible, HRQoL will be reported by disease type and stage of the disease. Descriptive statistics of the results will be provided. A narrative synthesis of studies will also be provided discussing differences in HRQoL measurements by instrument used to estimate it, type of dementia and disease severity.Ethics and disseminationThis systematic literature review is exempt from ethics approval because the work is carried out on published documents. The findings of the review will be disseminated in a related peer-reviewed journal and presented at conferences. They will also contribute to the work developed in the Real World Outcomes across the Alzheimer’s disease spectrum for better care: multimodal data access platform (ROADMAP).Trial registration numberCRD42017071416.


2018 ◽  
Vol 10 (3) ◽  
pp. 707-715 ◽  
Author(s):  
Guillaume Bernard ◽  
Jananan S Pathmanathan ◽  
Romain Lannes ◽  
Philippe Lopez ◽  
Eric Bapteste

2011 ◽  
pp. 544-549
Author(s):  
Ning Chen

In many large-scale enterprise information system solutions, process design, data modeling and software component design are performed relatively independently by different people using various tools and methodologies. This usually leads to gaps among business process modeling, component design and data modeling. Currently, these functional or non-functional disconnections are fixed manually, which increases the complexity and decrease the efficiency and quality of development. In this chapter, a pattern-based approach is proposed to bridge the gaps with automatically generated data access components. Data access rules and patterns are applied to optimize these data access components. In addition, the authors present the design of a toolkit that automatically applies these patterns to bridge the gaps to ensure reduced development time, and higher solution quality.


2013 ◽  
Vol 5 (1) ◽  
pp. 53-69
Author(s):  
Jacques Jorda ◽  
Aurélien Ortiz ◽  
Abdelaziz M’zoughi ◽  
Salam Traboulsi

Grid computing is commonly used for large scale application requiring huge computation capabilities. In such distributed architectures, the data storage on the distributed storage resources must be handled by a dedicated storage system to ensure the required quality of service. In order to simplify the data placement on nodes and to increase the performance of applications, a storage virtualization layer can be used. This layer can be a single parallel filesystem (like GPFS) or a more complex middleware. The latter is preferred as it allows the data placement on the nodes to be tuned to increase both the reliability and the performance of data access. Thus, in such a middleware, a dedicated monitoring system must be used to ensure optimal performance. In this paper, the authors briefly introduce the Visage middleware – a middleware for storage virtualization. They present the most broadly used grid monitoring systems, and explain why they are not adequate for virtualized storage monitoring. The authors then present the architecture of their monitoring system dedicated to storage virtualization. We introduce the workload prediction model used to define the best node for data placement, and show on a simple experiment its accuracy.


Author(s):  
Ning Chen

In many large-scale enterprise information system solutions, process design, data modeling and software component design are performed relatively independently by different people using various tools and methodologies. This usually leads to gaps among business process modeling, component design and data modeling. Currently, these functional or non-functional disconnections are fixed manually, which increases the complexity and decrease the efficiency and quality of development. In this chapter, a pattern-based approach is proposed to bridge the gaps with automatically generated data access components. Data access rules and patterns are applied to optimize these data access components. In addition, the authors present the design of a toolkit that automatically applies these patterns to bridge the gaps to ensure reduced development time, and higher solution quality.


2015 ◽  
Author(s):  
Pablo Pareja-Tobes ◽  
Raquel Tobes ◽  
Marina Manrique ◽  
Eduardo Pareja ◽  
Eduardo Pareja-Tobes

Background. Next Generation Sequencing and other high-throughput technologies have brought a revolution to the bioinformatics landscape, by offering sheer amounts of data about previously unaccessible domains in a cheap and scalable way. However, fast, reproducible, and cost-effective data analysis at such scale remains elusive. A key need for achieving it is being able to access and query the vast amount of publicly available data, specially so in the case of knowledge-intensive, semantically rich data: incredibly valuable information about proteins and their functions, genes, pathways, or all sort of biological knowledge encoded in ontologies remains scattered, semantically and physically fragmented. Methods and Results. Guided by this, we have designed and developed Bio4j. It aims to offer a platform for the integration of semantically rich biological data using typed graph models. We have modeled and integrated most publicly available data linked with proteins into a set of interdependent graphs. Data querying is possible through a data model aware Domain Specific Language implemented in Java, letting the user write typed graph traversals over the integrated data. A ready to use cloud-based data distribution, based on the Titan graph database engine is provided; generic data import code can also be used for in-house deployment. Conclusion. Bio4j represents a unique resource for the current Bioinformatician, providing at once a solution for several key problems: data integration; expressive, high performance data access; and a cost-effective scalable cloud deployment model.


2019 ◽  
Vol 29 (Supplement_4) ◽  
Author(s):  
T Makovski ◽  
G Le Coroller ◽  
P Putrik ◽  
S Stranges ◽  
L Huiart ◽  
...  

Abstract Multimorbidity defined most commonly as co-existence of 2+ diseases is one of the major challenges of an ageing society. It is often accompanied with declining quality of life (QoL). The study aims to 1) assess the relationship between increasing number of diseases and QoL over time, 2) explore the differences between several European countries. Longitudinal data analysis performed on the relevant waves (2004 to 2017) of the Survey of Health, Ageing and Retirement in Europe (SHARE). Data were collected every two years among participants aged 50+. Health conditions were identified through an open-end questionnaire containing 17 prelisted conditions. QoL was evaluated by Control, Autonomy, Self-Realization and Pleasure questionnaire (CASP-12v). Maximum QoL score, describing the best state was 48; minimum, 12 points. Association between increasing number of diseases and QoL is being assessed with multilevel analysis accounting for time and clustering within household and country. Minimum follow-up is 2 time points. Confounding variables include age, sex, socio-economic status, social support and health care parameters. Preliminary findings show that 20 countries and 87,087 individuals participated in at least 2 waves; 80,041 answered CASP at least twice. Number of diseases when first reported was on average 1.65 (IQR=0,2) and increased to 1.88 (IQR=1,3) when last reported. Similarly, between first and last reported point QoL decreased on average by -0.32 (SD: ± 5.9); estimated by non-rescaled CASP scale. Greece showed the strongest decrease of -1.73 (SD: ± 6.36), while QoL increased in some countries, the most in Portugal for 0.76 (SD: ± 5.62). Our preliminary findings suggest high geographic variations in QoL, possibly driven by differential clustering of multimorbidity across Europe, design issues and other factors. This may underline the need for country-specific analysis and initiatives to address the growing burden of multimorbidity in our ageing populations. Key messages First longitudinal study to address this research questions across wide range of European countries using SHARE. Study accounts for large number of confounding factors owing to the abundance of collected information.


Sign in / Sign up

Export Citation Format

Share Document