federated queries
Recently Published Documents


TOTAL DOCUMENTS

30
(FIVE YEARS 14)

H-INDEX

6
(FIVE YEARS 1)

Author(s):  
Chuming Chen ◽  
Karen E Ross ◽  
Sachin Gavali ◽  
Julie E Cowart ◽  
Cathy H Wu

Abstract Summary The global response to the COVID-19 pandemic has led to a rapid increase of scientific literature on this deadly disease. Extracting knowledge from biomedical literature and integrating it with relevant information from curated biological databases is essential to gain insight into COVID-19 etiology, diagnosis, and treatment. We used Semantic Web technology RDF to integrate COVID-19 knowledge mined from literature by iTextMine, PubTator, and SemRep with relevant biological databases and formalized the knowledge in a standardized and computable COVID-19 Knowledge Graph (KG). We published the COVID-19 KG via a SPARQL endpoint to support federated queries on the Semantic Web and developed a knowledge portal with browsing and searching interfaces. We also developed a RESTful API to support programmatic access and provided RDF dumps for download. Availability and implementation The COVID-19 Knowledge Graph is publicly available under CC-BY 4.0 license at https://research.bioinformatics.udel.edu/covid19kg/.


Author(s):  
Lucía Prieto Santamaría ◽  
David Fernández Lobón ◽  
Antonio Jesús Díaz-Honrubia ◽  
Ernestina Menasalvas Ruiz ◽  
Sokratis Nifakos ◽  
...  

Abstract Objectives The aim of the study is to design an ontology model for the representation of assets and its features in distributed health care environments. Allow the interchange of information about these assets through the use of specific vocabularies based on the use of ontologies. Methods Ontologies are a formal way to represent knowledge by means of triples composed of a subject, a predicate, and an object. Given the sensitivity of network assets in health care institutions, this work by using an ontology-based representation of information complies with the FAIR principles. Federated queries to the ontology systems, allow users to obtain data from multiple sources (i.e., several hospitals belonging to the same public body). Therefore, this representation makes it possible for network administrators in health care institutions to have a clear understanding of possible threats that may emerge in the network. Results As a result of this work, the “Software Defined Networking Description Language—CUREX Asset Discovery Tool Ontology” (SDNDL-CAO) has been developed. This ontology uses the main concepts in network assets to represent the knowledge extracted from the distributed health care environments: interface, device, port, service, etc. Conclusion The developed SDNDL-CAO ontology allows to represent the aforementioned knowledge about the distributed health care environments. Network administrators of these institutions will benefit as they will be able to monitor emerging threats in real-time, something critical when managing personal medical information.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Jakub Galgonek ◽  
Jiří Vondrášek

AbstractThe Resource Description Framework (RDF), together with well-defined ontologies, significantly increases data interoperability and usability. The SPARQL query language was introduced to retrieve requested RDF data and to explore links between them. Among other useful features, SPARQL supports federated queries that combine multiple independent data source endpoints. This allows users to obtain insights that are not possible using only a single data source. Owing to all of these useful features, many biological and chemical databases present their data in RDF, and support SPARQL querying. In our project, we primary focused on PubChem, ChEMBL and ChEBI small-molecule datasets. These datasets are already being exported to RDF by their creators. However, none of them has an official and currently supported SPARQL endpoint. This omission makes it difficult to construct complex or federated queries that could access all of the datasets, thus underutilising the main advantage of the availability of RDF data. Our goal is to address this gap by integrating the datasets into one database called the Integrated Database of Small Molecules (IDSM) that will be accessible through a SPARQL endpoint. Beyond that, we will also focus on increasing mutual interoperability of the datasets. To realise the endpoint, we decided to implement an in-house developed SPARQL engine based on the PostgreSQL relational database for data storage. In our approach, data are stored in the traditional relational form, and the SPARQL engine translates incoming SPARQL queries into equivalent SQL queries. An important feature of the engine is that it optimises the resulting SQL queries. Together with optimisations performed by PostgreSQL, this allows efficient evaluations of SPARQL queries. The endpoint provides not only querying in the dataset, but also the compound substructure and similarity search supported by our Sachem project. Although the endpoint is accessible from an internet browser, it is mainly intended to be used for programmatic access by other services, for example as a part of federated queries. For regular users, we offer a rich web application called ChemWebRDF using the endpoint. The application is publicly available at https://idsm.elixir-czech.cz/chemweb/.


2021 ◽  
Author(s):  
Ziye Tao ◽  
Griffin M. Weber ◽  
Yun William Yu

AbstractMotivationThe rapid growth in of electronic medical records provide immense potential to researchers, but are often silo-ed at separate hospitals. As a result, federated networks have arisen, which allow simultaneously querying medical databases at a group of connected institutions. The most basic such query is the aggregate count—e.g. How many patients have diabetes? However, depending on the protocol used to estimate that total, there is always a trade-off in the accuracy of the estimate against the risk of leaking confidential data. Prior work has shown that it is possible to empirically control that trade-off by using the HyperLogLog (HLL) probabilistic sketch.ResultsIn this article, we prove complementary theoretical bounds on the k-anonymity privacy risk of using HLL sketches, as well as exhibit code to efficiently compute those bounds.Availabilityhttps://github.com/tzyRachel/[email protected] informationN/A


2021 ◽  
Author(s):  
Marvin Martens ◽  
Chris Evelo ◽  
Egon Willighagen

<div>The AOP-Wiki is the main environment for the development and storage of Adverse Outcome Pathways. These Adverse Outcome Pathways describe mechanistic information about toxicodynamic processes and can be used to develop effective risk assessment strategies. However, it is challenging to automatically and systematically parse, filter, and use its contents. We explored solutions to better structure the AOP-Wiki content and to link it with chemical and biological resources. Together this allows more detailed exploration which can be automated.</div><div><br></div><div>We converted the complete AOP-Wiki content into Resource Description Framework. We used over twenty ontologies for the semantic annotation of property-object relations, including the ChemInformatics Ontology, Dublin Core, and the Adverse Outcome Pathway Ontology. The latter was used over 8,000 times. Furthermore, over 3,500 link-outs were added to twelve chemical databases and over 6,500 link-outs to four gene and protein databases. </div><div><br></div><div>SPARQL queries can be used against the Resource Description Framework to answer biological and toxicological questions, such as listing measurement methods for all Key Events leading to an Adverse Outcome of interest. The full power that the use of this new resource provides becomes apparent when combining the content with external databases using federated queries. For example, we can link genes related to Key Events with molecular pathway on WikiPathways in which they occur and find all Adverse Outcome Pathways caused by stressors that are part of a particular chemical group. Overall, the AOP-Wiki Resource Description Framework allows new ways to explore the rapidly growing Adverse Outcome Pathway knowledge and makes the integration of this database in automated workflows possible.</div>


2021 ◽  
Author(s):  
Marvin Martens ◽  
Chris Evelo ◽  
Egon Willighagen

<div>The AOP-Wiki is the main environment for the development and storage of Adverse Outcome Pathways. These Adverse Outcome Pathways describe mechanistic information about toxicodynamic processes and can be used to develop effective risk assessment strategies. However, it is challenging to automatically and systematically parse, filter, and use its contents. We explored solutions to better structure the AOP-Wiki content and to link it with chemical and biological resources. Together this allows more detailed exploration which can be automated.</div><div><br></div><div>We converted the complete AOP-Wiki content into Resource Description Framework. We used over twenty ontologies for the semantic annotation of property-object relations, including the ChemInformatics Ontology, Dublin Core, and the Adverse Outcome Pathway Ontology. The latter was used over 8,000 times. Furthermore, over 3,500 link-outs were added to twelve chemical databases and over 6,500 link-outs to four gene and protein databases. </div><div><br></div><div>SPARQL queries can be used against the Resource Description Framework to answer biological and toxicological questions, such as listing measurement methods for all Key Events leading to an Adverse Outcome of interest. The full power that the use of this new resource provides becomes apparent when combining the content with external databases using federated queries. For example, we can link genes related to Key Events with molecular pathway on WikiPathways in which they occur and find all Adverse Outcome Pathways caused by stressors that are part of a particular chemical group. Overall, the AOP-Wiki Resource Description Framework allows new ways to explore the rapidly growing Adverse Outcome Pathway knowledge and makes the integration of this database in automated workflows possible.</div>


2021 ◽  
Vol 7 (1) ◽  
pp. 6659-6673
Author(s):  
Gabriel Lucas Pimenta ◽  
Gisane Aparecida Michelon ◽  
Lúcelia de Souza ◽  
Josiane Michalak Hauagge Dall'Agnol ◽  
Sandro Rautenberg
Keyword(s):  

10.2196/18735 ◽  
2020 ◽  
Vol 22 (11) ◽  
pp. e18735
Author(s):  
Yun William Yu ◽  
Griffin M Weber

Background Over the past decade, the emergence of several large federated clinical data networks has enabled researchers to access data on millions of patients at dozens of health care organizations. Typically, queries are broadcast to each of the sites in the network, which then return aggregate counts of the number of matching patients. However, because patients can receive care from multiple sites in the network, simply adding the numbers frequently double counts patients. Various methods such as the use of trusted third parties or secure multiparty computation have been proposed to link patient records across sites. However, they either have large trade-offs in accuracy and privacy or are not scalable to large networks. Objective This study aims to enable accurate estimates of the number of patients matching a federated query while providing strong guarantees on the amount of protected medical information revealed. Methods We introduce a novel probabilistic approach to running federated network queries. It combines an algorithm called HyperLogLog with obfuscation in the form of hashing, masking, and homomorphic encryption. It is tunable, in that it allows networks to balance accuracy versus privacy, and it is computationally efficient even for large networks. We built a user-friendly free open-source benchmarking platform to simulate federated queries in large hospital networks. Using this platform, we compare the accuracy, k-anonymity privacy risk (with k=10), and computational runtime of our algorithm with several existing techniques. Results In simulated queries matching 1 to 100 million patients in a 100-hospital network, our method was significantly more accurate than adding aggregate counts while maintaining k-anonymity. On average, it required a total of 12 kilobytes of data to be sent to the network hub and added only 5 milliseconds to the overall federated query runtime. This was orders of magnitude better than other approaches, which guaranteed the exact answer. Conclusions Using our method, it is possible to run highly accurate federated queries of clinical data repositories that both protect patient privacy and scale to large networks.


2020 ◽  
Vol 30 (02) ◽  
pp. 2050007
Author(s):  
Abdulelah Algosaibi ◽  
Khaled Ragab ◽  
Saleh Albahli

In recent years, data are generated rapidly that advanced the evolving of the linked data. Modern data are globally distributed over the semantically linked graphs. The nature of the distributed data over the semantic graph raised new demands on further investigation on improving performance on the semantic graphs. In this work, we analyzed the time latency as an important factor to be further investigated and improved. We evaluated the parallel computing on these distributed data in order to better utilize the parallelism approaches. A federation framework based on a multi-threaded environment supporting federated SPARQL query was introduced. In our experiments, we show the achievability and effectiveness of our model on a set of real-world quires through real-world Linked Open Data cloud. Significant performance improvement has noticed. Further, we highlight short-comings that could open an avenue in the research of federated queries. Keywords: Semantic web; distributed query processing; query federation; linked data; join methods.


2020 ◽  
Author(s):  
Yun William Yu ◽  
Griffin M Weber

BACKGROUND Over the past decade, the emergence of several large federated clinical data networks has enabled researchers to access data on millions of patients at dozens of health care organizations. Typically, queries are broadcast to each of the sites in the network, which then return aggregate counts of the number of matching patients. However, because patients can receive care from multiple sites in the network, simply adding the numbers frequently double counts patients. Various methods such as the use of trusted third parties or secure multiparty computation have been proposed to <i>link</i> patient records across sites. However, they either have large trade-offs in accuracy and privacy or are not scalable to large networks. OBJECTIVE This study aims to enable accurate estimates of the number of patients matching a federated query while providing strong guarantees on the amount of protected medical information revealed. METHODS We introduce a novel probabilistic approach to running federated network queries. It combines an algorithm called HyperLogLog with obfuscation in the form of hashing, masking, and homomorphic encryption. It is <i>tunable</i>, in that it allows networks to balance accuracy versus privacy, and it is computationally efficient even for large networks. We built a user-friendly free open-source benchmarking platform to simulate federated queries in large hospital networks. Using this platform, we compare the accuracy, <i>k</i>-anonymity privacy risk (with <i>k</i>=10), and computational runtime of our algorithm with several existing techniques. RESULTS In simulated queries matching 1 to 100 million patients in a 100-hospital network, our method was significantly more accurate than adding aggregate counts while maintaining <i>k</i>-anonymity. On average, it required a total of 12 kilobytes of data to be sent to the network hub and added only 5 milliseconds to the overall federated query runtime. This was orders of magnitude better than other approaches, which guaranteed the exact answer. CONCLUSIONS Using our method, it is possible to run highly accurate federated queries of clinical data repositories that both protect patient privacy and scale to large networks.


Sign in / Sign up

Export Citation Format

Share Document