Directing the Development of Constraint Languages by Checking Constraints on RDF Data

2016 ◽  
Vol 10 (02) ◽  
pp. 193-217
Author(s):  
Thomas Hartmann ◽  
Benjamin Zapilko ◽  
Joachim Wackerow ◽  
Kai Eckert

For research institutes, data libraries, and data archives, validating RDF data according to predefined constraints is a much sought-after feature, particularly as this is taken for granted in the XML world. Based on our work in two international working groups on RDF validation and jointly identified requirements to formulate constraints and validate RDF data, we have published 81 types of constraints that are required by various stakeholders for data applications. In this paper, we evaluate the usability of identified constraint types for assessing RDF data quality by (1) collecting and classifying 115 constraints on vocabularies commonly used in the social, behavioral, and economic sciences, either from the vocabularies themselves or from domain experts, and (2) validating 15,694 data sets (4.26 billion triples) of research data against these constraints. We classify each constraint according to (1) the severity of occurring violations and (2) based on which types of constraint languages are able to express its constraint type. Based on the large-scale evaluation, we formulate several findings to direct the further development of constraint languages.

2017 ◽  
Vol 44 (2) ◽  
pp. 203-229 ◽  
Author(s):  
Javier D Fernández ◽  
Miguel A Martínez-Prieto ◽  
Pablo de la Fuente Redondo ◽  
Claudio Gutiérrez

The publication of semantic web data, commonly represented in Resource Description Framework (RDF), has experienced outstanding growth over the last few years. Data from all fields of knowledge are shared publicly and interconnected in active initiatives such as Linked Open Data. However, despite the increasing availability of applications managing large-scale RDF information such as RDF stores and reasoning tools, little attention has been given to the structural features emerging in real-world RDF data. Our work addresses this issue by proposing specific metrics to characterise RDF data. We specifically focus on revealing the redundancy of each data set, as well as common structural patterns. We evaluate the proposed metrics on several data sets, which cover a wide range of designs and models. Our findings provide a basis for more efficient RDF data structures, indexes and compressors.


2020 ◽  
Vol 1 (4) ◽  
pp. 1493-1509
Author(s):  
Christian Zingg ◽  
Vahan Nanumyan ◽  
Frank Schweitzer

To what extent is the citation rate of new papers influenced by the past social relations of their authors? To answer this question, we present a data-driven analysis of nine different physics journals. Our analysis is based on a two-layer network representation constructed from two large-scale data sets, INSPIREHEP and APS. The social layer contains authors as nodes and coauthorship relations as links. This allows us to quantify the social relations of each author, prior to the publication of a new paper. The publication layer contains papers as nodes and citations between papers as links. This layer allows us to quantify scientific attention as measured by the change of the citation rate over time. We particularly study how this change correlates with the social relations of their authors, prior to publication. We find that on average the maximum value of the citation rate is reached sooner for authors who have either published more papers or who have had more coauthors in previous papers. We also find that for these authors the decay in the citation rate is faster, meaning that their papers are forgotten sooner.


2020 ◽  
Author(s):  
◽  
Dylan G Rees

The contact centre industry employs 4% of the entire United King-dom and United States’ working population and generates gigabytes of operational data that require analysis, to provide insight and to improve efficiency. This thesis is the result of a collaboration with QPC Limited who provide data collection and analysis products for call centres. They provided a large data-set featuring almost 5 million calls to be analysed. This thesis utilises novel visualisation techniques to create tools for the exploration of the large, complex call centre data-set and to facilitate unique observations into the data.A survey of information visualisation books is presented, provid-ing a thorough background of the field. Following this, a feature-rich application that visualises large call centre data sets using scatterplots that support millions of points is presented. The application utilises both the CPU and GPU acceleration for processing and filtering and is exhibited with millions of call events.This is expanded upon with the use of glyphs to depict agent behaviour in a call centre. A technique is developed to cluster over-lapping glyphs into a single parent glyph dependant on zoom level and a customizable distance metric. This hierarchical glyph repre-sents the mean value of all child agent glyphs, removing overlap and reducing visual clutter. A novel technique for visualising individually tailored glyphs using a Graphics Processing Unit is also presented, and demonstrated rendering over 100,000 glyphs at interactive frame rates. An open-source code example is provided for reproducibility.Finally, a novel interaction and layout method is introduced for improving the scalability of chord diagrams to visualise call transfers. An exploration of sketch-based methods for showing multiple links and direction is made, and a sketch-based brushing technique for filtering is proposed. Feedback from domain experts in the call centre industry is reported for all applications developed.


Author(s):  
DONG-HYUK IM ◽  
SANG-WON LEE ◽  
HYOUNG-JOO KIM

RDF is widely used as an ontology language for representing the metadata in Semantic Web, knowledge management system and E-commerce. Since ontologies model the knowledge in a particular domain, they may change over time. Furthermore, ontologies are usually developed and controlled in a distributed and collaborative way. Thus, it is very important to be able to manage multiple versions for RDF data. Earlier studies on RDF versions have focused on providing the accesses to different versions (i.e. snapshots) and computing the differences between those two versions. However, the existing approaches suffer from the space overhead for large scale data, since all snapshots should be redundantly kept in a repository. Moreover, it is very time consuming to compute the delta between two specific versions, which is very common in RDF applications. In this paper, we propose a framework for RDF version management in relational databases. It stores the original version and the deltas between two consecutive versions, thereby reducing the space requirement considerably. The other benefit of our approach is appropriate for change queries. On the flip side, in order to answer a query on a specific logical version, version should be constructed on the fly by applying the deltas between the original version and the logical version. This can slow down query performance. In order to overcome this, we propose a compression technique for deltas, called Aggregated Delta, to create a logical version directly rather than executing the sequence of deltas. An experimental study with real life RDF data sets shows our framework maintains multiple versions efficiently.


2021 ◽  
Vol 2021 (4) ◽  
pp. 30-50
Author(s):  
Yurii RADIONOV ◽  

Theoretical bases of establishment and development of institutional theory as a new direction of economic science are analyzed. The preconditions for the emergence of institutionalism are studied, the fundamental differences between the new economic trend and classical economic theory are considered. The weakness of economic theories on the role and importance of the state in economic development is noted, the need to synthesize the strengths of institutionalism with neoclassicism to link the social attitudes and interests of individuals is emphasized. The stages of development of institutional theory, different approaches of institutional scientists, the emergence of a new, modern direction – neo-institutionalism – are studied. Differences in the interpretation of the term “institution” between traditional institutionalists and neo-institutionalists are outlined, which indicates a different methodology of its perception. It is emphasized that the doctrine of the depth of nature of institutions and its interpretation divided institutionalism into old and new. If the old questioned the individualistic worldview inherent in the neoclassical paradigm, then the new institutionalists do not deny the individualistic approach. Economic institutions that operate within the social environment are the frameworks or constraints that govern the behavior of society in economic conditions. Emphasis is placed on the prospects for further development of institutional theory, which allows the emergence and development of other theories, social sciences, reveals hitherto unexplored or little-studied phenomena and processes. In modern conditions, the economic difficulties faced by the world economy convincingly confirm the relevance of institutional theory, and the construction of an efficient economy is not limited to an approach based solely on the methodology of the classical school of economic theory. The contradictions posed by modern globalization are becoming a large-scale source of social, political, economic and even military challenges for less developed countries in relation to the more prosperous ones, and international institutionalization is the mechanism designed to alleviate instability.


2014 ◽  
Vol 17 (07n08) ◽  
pp. 1430001 ◽  
Author(s):  
MARCELO CATALDO ◽  
INGO SCHOLTES ◽  
GIUSEPPE VALETTO

Large collaborative software engineering projects are interesting examples for evolving complex systems. The complexity of these systems unfolds both in evolving software structures, as well as in the social dynamics and organization of development teams. Due to the adoption of Open Source practices and the increasing use of online support infrastructures, large-scale data sets covering both the social and technical dimension of collaborative software engineering processes are increasingly becoming available. In the analysis of these data, a growing number of studies employ a network perspective, using methods and abstractions from network science to generate insights about software engineering processes. Featuring a collection of inspiring works in this area, with this topical issue, we intend to give an overview of state-of-the-art research. We hope that this collection of articles will stimulate downstream applications of network-based data mining techniques in empirical software engineering.


2014 ◽  
Vol 10 (1) ◽  
pp. 57-76 ◽  
Author(s):  
Hongjun Yin ◽  
Jing Li ◽  
Yue Niu

Social network partitioning has become a very important function. One objective for partitioning is to identify interested communities to target for marketing and advertising activities. The bottleneck to detection of these communities is the large scalability of the social network. Previous methods did not effectively address the problem because they considered the overall network. Social networks have strong locality, so designing a local algorithm to find an interested community to address this objective is necessary. In this paper, we develop a local partition algorithm, named, Personalized PageRank Partitioning, to identify the community. We compute the conductance of the social network with a Personalized PageRank and Markov chain stationary distribution of the social network, and then sweep the conductance to find the smallest cut. The efficiency of the cut can reach. In order to detect a larger scale social network, we design and implement the algorithm on a MapReduce-programming framework. Finally, we execute our experiment on several actual social network data sets and compare our method to others. The experimental results show that our algorithm is feasible and very effective.


2016 ◽  
Vol 4 (4) ◽  
pp. 508-530 ◽  
Author(s):  
CHRISTIAN L. STAUDT ◽  
ALEKSEJS SAZONOVS ◽  
HENNING MEYERHENKE

AbstractWe introduce NetworKit, an open-source software package for analyzing the structure of large complex networks. Appropriate algorithmic solutions are required to handle increasingly common large graph data sets containing up to billions of connections. We describe the methodology applied to develop scalable solutions to network analysis problems, including techniques like parallelization, heuristics for computationally expensive problems, efficient data structures, and modular software architecture. Our goal for the software is to package results of our algorithm engineering efforts and put them into the hands of domain experts. NetworKit is implemented as a hybrid combining the kernels written in C++ with a Python frontend, enabling integration into the Python ecosystem of tested tools for data analysis and scientific computing. The package provides a wide range of functionality (including common and novel analytics algorithms and graph generators) and does so via a convenient interface. In an experimental comparison with related software, NetworKit shows the best performance on a range of typical analysis tasks.


1994 ◽  
Vol 6 (3) ◽  
pp. 133-142 ◽  
Author(s):  
Steve King

Re-creating the social, economic and demographic life-cycles of ordinary people is one way in which historians might engage with the complex continuities and changes which underlay the development of early modern communities. Little, however, has been written on the ways in which historians might deploy computers, rather than card indexes, to the task of identifying such life cycles from the jumble of the sources generated by local and national administration. This article suggests that multiple-source linkage is central to historical and demographic analysis, and reviews, in broad outline, some of the procedures adopted in a study which aims at large scale life cycle reconstruction.


2019 ◽  
Vol 73 (2) ◽  
pp. 72-79
Author(s):  
Carla Marcantonio

FQ books editor Carla Marcantonio guides readers through the 33rd edition of Il Cinema Ritrovato Festival held each year in Bologna at the end of June. Highlights of this year's festival included a restoration of one of Vittorio De Sica's hard-to-find and hence lesser-known films, the social justice fairy tale, Miracolo a Milano (Miracle in Milan, 1951). The film was presented by De Sica's daughter, Emi De Sica, and was an example of the ongoing project to restore De Sica's archive, which was given to the Cineteca de Bologna in 2016. Marcantonio also notes her unexpected responses to certain reviewings; Apocalypse Now: Final Cut (2019), presented by Francis Ford Coppola on the large-scale screen of Piazza Maggiore and accompanied by remastered Dolby Atmos sound, struck her as a tour-de-force while a restoration of David Lynch's Blue Velvet (1986) had lost some of its strange allure.


Sign in / Sign up

Export Citation Format

Share Document