scholarly journals A Survey of RDF Stores & SPARQL Engines for Querying Knowledge Graphs

Author(s):  
Waqas Ali ◽  
Muhammad Saleem ◽  
Yao Bin ◽  
Aidan Hogan ◽  
A.-C. Ngonga Ngomo

Recent years have seen the growing adoption of non-relational data models for representing diverse, incomplete data. Among these, the RDF graph-based data model has seen ever-broadening adoption, particularly on the Web. This adoption has prompted the standardization of the SPARQL query language for RDF, as well as the development of a variety of local and distributed engines for processing queries over RDF graphs. These engines implement a diverse range of specialized techniques for storage, indexing, and query processing. A number of benchmarks, based on both synthetic and real-world data, have also emerged to allow for contrasting the performance of different query engines, often at large scale. This survey paper draws together these developments, providing a comprehensive review of the techniques, engines and benchmarks for querying RDF knowledge graphs.

2021 ◽  
Author(s):  
waqas ali ◽  
Bin Yao ◽  
Muhammad Saleem ◽  
Aidan Hogan ◽  
A.-C. Ngonga Ngomo

Recent years have seen the growing adoption of non-relational data models for representing diverse, incomplete data. Among these, the RDF graph-based data model has seen ever-broadening adoption, particularly on the Web. This adoption has prompted the standardization of the SPARQL query language for RDF, as well as the development of a variety of local and distributed engines for processing queries over RDF graphs. These engines implement a diverse range of specialized techniques for storage, indexing, and query processing. A number of benchmarks, based on both synthetic and real-world data, have also emerged to allow for contrasting the performance of different query engines, often at large scale. This survey paper draws together these developments, providing a comprehensive review of the techniques, engines and benchmarks for querying RDF knowledge graphs.


2021 ◽  
Author(s):  
waqas ali ◽  
Bin Yao ◽  
Muhammad Saleem ◽  
Aidan Hogan ◽  
A.-C. Ngonga Ngomo

Recent years have seen the growing adoption of non-relational data models for representing diverse, incomplete data. Among these, the RDF graph-based data model has seen ever-broadening adoption, particularly on the Web. This adoption has prompted the standardization of the SPARQL query language for RDF, as well as the development of a variety of local and distributed engines for processing queries over RDF graphs. These engines implement a diverse range of specialized techniques for storage, indexing, and query processing. A number of benchmarks, based on both synthetic and real-world data, have also emerged to allow for contrasting the performance of different query engines, often at large scale. This survey paper draws together these developments, providing a comprehensive review of the techniques, engines and benchmarks for querying RDF knowledge graphs.


2017 ◽  
Vol 1 (2) ◽  
pp. 84-103 ◽  
Author(s):  
Dong Wang ◽  
Lei Zou ◽  
Dongyan Zhao

Abstract The Simple Protocol and RDF Query Language (SPARQL) query language allows users to issue a structural query over a resource description framework (RDF) graph. However, the lack of a spatiotemporal query language limits the usage of RDF data in spatiotemporal-oriented applications. As the spatiotemporal information continuously increases in RDF data, it is necessary to design an effective and efficient spatiotemporal RDF data management system. In this paper, we formally define the spatiotemporal information-integrated RDF data, introduce a spatiotemporal query language that extends the SPARQL language with spatiotemporal assertions to query spatiotemporal information-integrated RDF data, and design a novel index and the corresponding query algorithm. The experimental results on a large, real RDF graph integrating spatial and temporal information (> 180 million triples) confirm the superiority of our approach. In contrast to its competitors, gst-store outperforms by more than 20%-30% in most cases.


2016 ◽  
Author(s):  
John W. Williams ◽  
◽  
Simon Goring ◽  
Eric Grimm ◽  
Jason McLachlan

Author(s):  
Andrew Reid ◽  
Julie Ballantyne

In an ideal world, assessment should be synonymous with effective learning and reflect the intricacies of the subject area. It should also be aligned with the ideals of education: to provide equitable opportunities for all students to achieve and to allow both appropriate differentiation for varied contexts and students and comparability across various contexts and students. This challenge is made more difficult in circumstances in which the contexts are highly heterogeneous, for example in the state of Queensland, Australia. Assessment in music challenges schooling systems in unique ways because teaching and learning in music are often naturally differentiated and diverse, yet assessment often calls for standardization. While each student and teacher has individual, evolving musical pathways in life, the syllabus and the system require consistency and uniformity. The challenge, then, is to provide diverse, equitable, and quality opportunities for all children to learn and achieve to the best of their abilities. This chapter discusses the designing and implementation of large-scale curriculum as experienced in secondary schools in Queensland, Australia. The experiences detailed explore the possibilities offered through externally moderated school-based assessment. Also discussed is the centrality of system-level clarity of purpose, principles and processes, and the provision of supportive networks and mechanisms to foster autonomy for a diverse range of music educators and contexts. Implications for education systems that desire diversity, equity, and quality are discussed, and the conclusion provokes further conceptualization and action on behalf of students, teachers, and the subject area of music.


2021 ◽  
Vol 15 (5) ◽  
pp. 1-52
Author(s):  
Lorenzo De Stefani ◽  
Erisa Terolli ◽  
Eli Upfal

We introduce Tiered Sampling , a novel technique for estimating the count of sparse motifs in massive graphs whose edges are observed in a stream. Our technique requires only a single pass on the data and uses a memory of fixed size M , which can be magnitudes smaller than the number of edges. Our methods address the challenging task of counting sparse motifs—sub-graph patterns—that have a low probability of appearing in a sample of M edges in the graph, which is the maximum amount of data available to the algorithms in each step. To obtain an unbiased and low variance estimate of the count, we partition the available memory into tiers (layers) of reservoir samples. While the base layer is a standard reservoir sample of edges, other layers are reservoir samples of sub-structures of the desired motif. By storing more frequent sub-structures of the motif, we increase the probability of detecting an occurrence of the sparse motif we are counting, thus decreasing the variance and error of the estimate. While we focus on the designing and analysis of algorithms for counting 4-cliques, we present a method which allows generalizing Tiered Sampling to obtain high-quality estimates for the number of occurrence of any sub-graph of interest, while reducing the analysis effort due to specific properties of the pattern of interest. We present a complete analytical analysis and extensive experimental evaluation of our proposed method using both synthetic and real-world data. Our results demonstrate the advantage of our method in obtaining high-quality approximations for the number of 4 and 5-cliques for large graphs using a very limited amount of memory, significantly outperforming the single edge sample approach for counting sparse motifs in large scale graphs.


2020 ◽  
Vol 10 (1) ◽  
pp. 7
Author(s):  
Miguel R. Luaces ◽  
Jesús A. Fisteus ◽  
Luis Sánchez-Fernández ◽  
Mario Munoz-Organero ◽  
Jesús Balado ◽  
...  

Providing citizens with the ability to move around in an accessible way is a requirement for all cities today. However, modeling city infrastructures so that accessible routes can be computed is a challenge because it involves collecting information from multiple, large-scale and heterogeneous data sources. In this paper, we propose and validate the architecture of an information system that creates an accessibility data model for cities by ingesting data from different types of sources and provides an application that can be used by people with different abilities to compute accessible routes. The article describes the processes that allow building a network of pedestrian infrastructures from the OpenStreetMap information (i.e., sidewalks and pedestrian crossings), improving the network with information extracted obtained from mobile-sensed LiDAR data (i.e., ramps, steps, and pedestrian crossings), detecting obstacles using volunteered information collected from the hardware sensors of the mobile devices of the citizens (i.e., ramps and steps), and detecting accessibility problems with software sensors in social networks (i.e., Twitter). The information system is validated through its application in a case study in the city of Vigo (Spain).


Author(s):  
Miguel Ángel Hernández-Rodríguez ◽  
Ermengol Sempere-Verdú ◽  
Caterina Vicens-Caldentey ◽  
Francisca González-Rubio ◽  
Félix Miguel-García ◽  
...  

We aimed to identify and compare medication profiles in populations with polypharmacy between 2005 and 2015. We conducted a cross-sectional study using information from the Computerized Database for Pharmacoepidemiologic Studies in Primary Care (BIFAP, Spain). We estimated the prevalence of therapeutic subgroups in all individuals 15 years of age and older with polypharmacy (≥5 drugs during ≥6 months) using the Anatomical Therapeutic Chemical classification system level 4, by sex and age group, for both calendar years. The most prescribed drugs were proton-pump inhibitors (PPIs), statins, antiplatelet agents, benzodiazepine derivatives, and angiotensin-converting enzyme inhibitors. The greatest increases between 2005 and 2015 were observed in PPIs, statins, other antidepressants, and β-blockers, while the prevalence of antiepileptics was almost tripled. We observed increases in psychotropic drugs in women and cardiovascular medications in men. By patient´s age groups, there were notable increases in antipsychotics, antidepressants, and antiepileptics (15–44 years); antidepressants, PPIs, and selective β-blockers (45–64 years); selective β-blockers, biguanides, PPIs, and statins (65–79 years); and in statins, selective β-blockers, and PPIs (80 years and older). Our results revealed important increases in the use of specific therapeutic subgroups, like PPIs, statins, and psychotropic drugs, highlighting opportunities to design and implement strategies to analyze such prescriptions’ appropriateness.


Author(s):  
Trung-Kien Tran ◽  
Mohamed H. Gad-Elrab ◽  
Daria Stepanova ◽  
Evgeny Kharlamov ◽  
Jannik Strötgen

2021 ◽  
Vol 15 (3) ◽  
pp. 1-28
Author(s):  
Xueyan Liu ◽  
Bo Yang ◽  
Hechang Chen ◽  
Katarzyna Musial ◽  
Hongxu Chen ◽  
...  

Stochastic blockmodel (SBM) is a widely used statistical network representation model, with good interpretability, expressiveness, generalization, and flexibility, which has become prevalent and important in the field of network science over the last years. However, learning an optimal SBM for a given network is an NP-hard problem. This results in significant limitations when it comes to applications of SBMs in large-scale networks, because of the significant computational overhead of existing SBM models, as well as their learning methods. Reducing the cost of SBM learning and making it scalable for handling large-scale networks, while maintaining the good theoretical properties of SBM, remains an unresolved problem. In this work, we address this challenging task from a novel perspective of model redefinition. We propose a novel redefined SBM with Poisson distribution and its block-wise learning algorithm that can efficiently analyse large-scale networks. Extensive validation conducted on both artificial and real-world data shows that our proposed method significantly outperforms the state-of-the-art methods in terms of a reasonable trade-off between accuracy and scalability. 1


Sign in / Sign up

Export Citation Format

Share Document