Directing the Development of Constraint Languages by Checking Constraints on RDF Data

For research institutes, data libraries, and data archives, validating RDF data according to predefined constraints is a much sought-after feature, particularly as this is taken for granted in the XML world. Based on our work in two international working groups on RDF validation and jointly identified requirements to formulate constraints and validate RDF data, we have published 81 types of constraints that are required by various stakeholders for data applications. In this paper, we evaluate the usability of identified constraint types for assessing RDF data quality by (1) collecting and classifying 115 constraints on vocabularies commonly used in the social, behavioral, and economic sciences, either from the vocabularies themselves or from domain experts, and (2) validating 15,694 data sets (4.26 billion triples) of research data against these constraints. We classify each constraint according to (1) the severity of occurring violations and (2) based on which types of constraint languages are able to express its constraint type. Based on the large-scale evaluation, we formulate several findings to direct the further development of constraint languages.

Download Full-text

Characterising RDF data sets

Journal of Information Science ◽

10.1177/0165551516677945 ◽

2017 ◽

Vol 44 (2) ◽

pp. 203-229 ◽

Cited By ~ 6

Author(s):

Javier D Fernández ◽

Miguel A Martínez-Prieto ◽

Pablo de la Fuente Redondo ◽

Claudio Gutiérrez

Keyword(s):

Data Structures ◽

Large Scale ◽

Open Data ◽

Structural Features ◽

Data Sets ◽

Data Set ◽

Wide Range ◽

Rdf Data ◽

Description Framework ◽

Resource Description

The publication of semantic web data, commonly represented in Resource Description Framework (RDF), has experienced outstanding growth over the last few years. Data from all fields of knowledge are shared publicly and interconnected in active initiatives such as Linked Open Data. However, despite the increasing availability of applications managing large-scale RDF information such as RDF stores and reasoning tools, little attention has been given to the structural features emerging in real-world RDF data. Our work addresses this issue by proposing specific metrics to characterise RDF data. We specifically focus on revealing the redundancy of each data set, as well as common structural patterns. We evaluate the proposed metrics on several data sets, which cover a wide range of designs and models. Our findings provide a basis for more efficient RDF data structures, indexes and compressors.

Download Full-text

Citations driven by social connections? A multi-layer representation of coauthorship networks

Quantitative Science Studies ◽

10.1162/qss_a_00092 ◽

2020 ◽

Vol 1 (4) ◽

pp. 1493-1509

Author(s):

Christian Zingg ◽

Vahan Nanumyan ◽

Frank Schweitzer

Keyword(s):

Social Relations ◽

Large Scale ◽

Citation Rate ◽

Data Sets ◽

Large Scale Data ◽

Coauthorship Networks ◽

The Social ◽

Scientific Attention ◽

Physics Journals ◽

Scale Data

To what extent is the citation rate of new papers influenced by the past social relations of their authors? To answer this question, we present a data-driven analysis of nine different physics journals. Our analysis is based on a two-layer network representation constructed from two large-scale data sets, INSPIREHEP and APS. The social layer contains authors as nodes and coauthorship relations as links. This allows us to quantify the social relations of each author, prior to the publication of a new paper. The publication layer contains papers as nodes and citations between papers as links. This layer allows us to quantify scientific attention as measured by the change of the citation rate over time. We particularly study how this change correlates with the social relations of their authors, prior to publication. We find that on average the maximum value of the citation rate is reached sooner for authors who have either published more papers or who have had more coauthors in previous papers. We also find that for these authors the decay in the citation rate is faster, meaning that their papers are forgotten sooner.

Download Full-text

Visualisation of Large-Scale Call-Centre Data

10.23889/suthesis.56839 ◽

2020 ◽

Author(s):

◽

Dylan G Rees

Keyword(s):

Large Scale ◽

Graphics Processing Unit ◽

Large Data ◽

Mean Value ◽

Data Sets ◽

Call Centre ◽

Processing Unit ◽

Data Set ◽

Domain Experts ◽

Graphics Processing

The contact centre industry employs 4% of the entire United King-dom and United States’ working population and generates gigabytes of operational data that require analysis, to provide insight and to improve efficiency. This thesis is the result of a collaboration with QPC Limited who provide data collection and analysis products for call centres. They provided a large data-set featuring almost 5 million calls to be analysed. This thesis utilises novel visualisation techniques to create tools for the exploration of the large, complex call centre data-set and to facilitate unique observations into the data.A survey of information visualisation books is presented, provid-ing a thorough background of the field. Following this, a feature-rich application that visualises large call centre data sets using scatterplots that support millions of points is presented. The application utilises both the CPU and GPU acceleration for processing and filtering and is exhibited with millions of call events.This is expanded upon with the use of glyphs to depict agent behaviour in a call centre. A technique is developed to cluster over-lapping glyphs into a single parent glyph dependant on zoom level and a customizable distance metric. This hierarchical glyph repre-sents the mean value of all child agent glyphs, removing overlap and reducing visual clutter. A novel technique for visualising individually tailored glyphs using a Graphics Processing Unit is also presented, and demonstrated rendering over 100,000 glyphs at interactive frame rates. An open-source code example is provided for reproducibility.Finally, a novel interaction and layout method is introduced for improving the scalability of chord diagrams to visualise call transfers. An exploration of sketch-based methods for showing multiple links and direction is made, and a sketch-based brushing technique for filtering is proposed. Feedback from domain experts in the call centre industry is reported for all applications developed.

Download Full-text

A VERSION MANAGEMENT FRAMEWORK FOR RDF TRIPLE STORES

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194012500040 ◽

2012 ◽

Vol 22 (01) ◽

pp. 85-106 ◽

Cited By ~ 21

Author(s):

DONG-HYUK IM ◽

SANG-WON LEE ◽

HYOUNG-JOO KIM

Keyword(s):

Relational Databases ◽

Large Scale ◽

Real Life ◽

Original Version ◽

Data Sets ◽

Compression Technique ◽

Version Management ◽

Large Scale Data ◽

Ontology Language ◽

Rdf Data

RDF is widely used as an ontology language for representing the metadata in Semantic Web, knowledge management system and E-commerce. Since ontologies model the knowledge in a particular domain, they may change over time. Furthermore, ontologies are usually developed and controlled in a distributed and collaborative way. Thus, it is very important to be able to manage multiple versions for RDF data. Earlier studies on RDF versions have focused on providing the accesses to different versions (i.e. snapshots) and computing the differences between those two versions. However, the existing approaches suffer from the space overhead for large scale data, since all snapshots should be redundantly kept in a repository. Moreover, it is very time consuming to compute the delta between two specific versions, which is very common in RDF applications. In this paper, we propose a framework for RDF version management in relational databases. It stores the original version and the deltas between two consecutive versions, thereby reducing the space requirement considerably. The other benefit of our approach is appropriate for change queries. On the flip side, in order to answer a query on a specific logical version, version should be constructed on the fly by applying the deltas between the original version and the logical version. This can slow down query performance. In order to overcome this, we propose a compression technique for deltas, called Aggregated Delta, to create a logical version directly rather than executing the sequence of deltas. An experimental study with real life RDF data sets shows our framework maintains multiple versions efficiently.

Download Full-text

INSTITUTIONAL THEORY IN THE DEVELOPMENT OF ECONOMICS

Economy of Ukraine ◽

10.15407/economyukr.2021.04.030 ◽

2021 ◽

Vol 2021 (4) ◽

pp. 30-50

Author(s):

Yurii RADIONOV ◽

Keyword(s):

Economic Theory ◽

Institutional Theory ◽

Large Scale ◽

Economic Trend ◽

Developed Countries ◽

Economic Science ◽

The Social ◽

Classical Economic ◽

Political Economic ◽

Further Development

Theoretical bases of establishment and development of institutional theory as a new direction of economic science are analyzed. The preconditions for the emergence of institutionalism are studied, the fundamental differences between the new economic trend and classical economic theory are considered. The weakness of economic theories on the role and importance of the state in economic development is noted, the need to synthesize the strengths of institutionalism with neoclassicism to link the social attitudes and interests of individuals is emphasized. The stages of development of institutional theory, different approaches of institutional scientists, the emergence of a new, modern direction – neo-institutionalism – are studied. Differences in the interpretation of the term “institution” between traditional institutionalists and neo-institutionalists are outlined, which indicates a different methodology of its perception. It is emphasized that the doctrine of the depth of nature of institutions and its interpretation divided institutionalism into old and new. If the old questioned the individualistic worldview inherent in the neoclassical paradigm, then the new institutionalists do not deny the individualistic approach. Economic institutions that operate within the social environment are the frameworks or constraints that govern the behavior of society in economic conditions. Emphasis is placed on the prospects for further development of institutional theory, which allows the emergence and development of other theories, social sciences, reveals hitherto unexplored or little-studied phenomena and processes. In modern conditions, the economic difficulties faced by the world economy convincingly confirm the relevance of institutional theory, and the construction of an efficient economy is not limited to an approach based solely on the methodology of the classical school of economic theory. The contradictions posed by modern globalization are becoming a large-scale source of social, political, economic and even military challenges for less developed countries in relation to the more prosperous ones, and international institutionalization is the mechanism designed to alleviate instability.

Download Full-text

A COMPLEX NETWORKS PERSPECTIVE ON COLLABORATIVE SOFTWARE ENGINEERING

Advances in Complex Systems ◽

10.1142/s0219525914300011 ◽

2014 ◽

Vol 17 (07n08) ◽

pp. 1430001 ◽

Cited By ~ 3

Author(s):

MARCELO CATALDO ◽

INGO SCHOLTES ◽

GIUSEPPE VALETTO

Keyword(s):

Software Engineering ◽

Large Scale ◽

Social Dynamics ◽

Data Sets ◽

Collaborative Software ◽

Large Scale Data ◽

Development Teams ◽

The Social ◽

Art Research ◽

Engineering Projects

Large collaborative software engineering projects are interesting examples for evolving complex systems. The complexity of these systems unfolds both in evolving software structures, as well as in the social dynamics and organization of development teams. Due to the adoption of Open Source practices and the increasing use of online support infrastructures, large-scale data sets covering both the social and technical dimension of collaborative software engineering processes are increasingly becoming available. In the analysis of these data, a growing number of studies employ a network perspective, using methods and abstractions from network science to generate insights about software engineering processes. Featuring a collection of inspiring works in this area, with this topical issue, we intend to give an overview of state-of-the-art research. We hope that this collection of articles will stimulate downstream applications of network-based data mining techniques in empirical software engineering.

Download Full-text

Detecting Local Communities within a Large Scale Social Network Using Mapreduce

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2014010104 ◽

2014 ◽

Vol 10 (1) ◽

pp. 57-76 ◽

Cited By ~ 8

Author(s):

Hongjun Yin ◽

Jing Li ◽

Yue Niu

Keyword(s):

Social Network ◽

Large Scale ◽

Local Algorithm ◽

Network Data ◽

Data Sets ◽

Network Partitioning ◽

Personalized Pagerank ◽

Social Network Data ◽

Programming Framework ◽

The Social

Social network partitioning has become a very important function. One objective for partitioning is to identify interested communities to target for marketing and advertising activities. The bottleneck to detection of these communities is the large scalability of the social network. Previous methods did not effectively address the problem because they considered the overall network. Social networks have strong locality, so designing a local algorithm to find an interested community to address this objective is necessary. In this paper, we develop a local partition algorithm, named, Personalized PageRank Partitioning, to identify the community. We compute the conductance of the social network with a Personalized PageRank and Markov chain stationary distribution of the social network, and then sweep the conductance to find the smallest cut. The efficiency of the cut can reach. In order to detect a larger scale social network, we design and implement the algorithm on a MapReduce-programming framework. Finally, we execute our experiment on several actual social network data sets and compare our method to others. The experimental results show that our algorithm is feasible and very effective.

Download Full-text

NetworKit: A tool suite for large-scale complex network analysis

Network Science ◽

10.1017/nws.2016.20 ◽

2016 ◽

Vol 4 (4) ◽

pp. 508-530 ◽

Cited By ~ 48

Author(s):

CHRISTIAN L. STAUDT ◽

ALEKSEJS SAZONOVS ◽

HENNING MEYERHENKE

Keyword(s):

Network Analysis ◽

Large Scale ◽

Experimental Comparison ◽

Algorithm Engineering ◽

Data Sets ◽

Domain Experts ◽

Modular Software ◽

Wide Range ◽

Open Source Software Package ◽

Efficient Data

AbstractWe introduce NetworKit, an open-source software package for analyzing the structure of large complex networks. Appropriate algorithmic solutions are required to handle increasingly common large graph data sets containing up to billions of connections. We describe the methodology applied to develop scalable solutions to network analysis problems, including techniques like parallelization, heuristics for computationally expensive problems, efficient data structures, and modular software architecture. Our goal for the software is to package results of our algorithm engineering efforts and put them into the hands of domain experts. NetworKit is implemented as a hybrid combining the kernels written in C++ with a Python frontend, enabling integration into the Python ecosystem of tested tools for data analysis and scientific computing. The package provides a wide range of functionality (including common and novel analytics algorithms and graph generators) and does so via a convenient interface. In an experimental comparison with related software, NetworKit shows the best performance on a range of typical analysis tasks.

Download Full-text

Multiple-source Record Linkage in a Rural Industrial Community, 1680–1820

History and Computing ◽

10.3366/hac.1994.6.3.133 ◽

1994 ◽

Vol 6 (3) ◽

pp. 133-142 ◽

Cited By ~ 7

Author(s):

Steve King

Keyword(s):

Early Modern ◽

Large Scale ◽

Social Economic ◽

Life Cycles ◽

Multiple Source ◽

Demographic Analysis ◽

Ordinary People ◽

Industrial Community ◽

The Social ◽

Broad Outline

Re-creating the social, economic and demographic life-cycles of ordinary people is one way in which historians might engage with the complex continuities and changes which underlay the development of early modern communities. Little, however, has been written on the ways in which historians might deploy computers, rather than card indexes, to the task of identifying such life cycles from the jumble of the sources generated by local and national administration. This article suggests that multiple-source linkage is central to historical and demographic analysis, and reviews, in broad outline, some of the procedures adopted in a study which aims at large scale life cycle reconstruction.

Download Full-text

Bologna, My Love: Il Cinema Ritrovato

Film Quarterly ◽

10.1525/fq.2019.73.2.72 ◽

2019 ◽

Vol 73 (2) ◽

pp. 72-79

Author(s):

Carla Marcantonio

Keyword(s):

Social Justice ◽

Fairy Tale ◽

Large Scale ◽

Francis Ford Coppola ◽

The Social ◽

Ongoing Project

FQ books editor Carla Marcantonio guides readers through the 33rd edition of Il Cinema Ritrovato Festival held each year in Bologna at the end of June. Highlights of this year's festival included a restoration of one of Vittorio De Sica's hard-to-find and hence lesser-known films, the social justice fairy tale, Miracolo a Milano (Miracle in Milan, 1951). The film was presented by De Sica's daughter, Emi De Sica, and was an example of the ongoing project to restore De Sica's archive, which was given to the Cineteca de Bologna in 2016. Marcantonio also notes her unexpected responses to certain reviewings; Apocalypse Now: Final Cut (2019), presented by Francis Ford Coppola on the large-scale screen of Piazza Maggiore and accompanied by remastered Dolby Atmos sound, struck her as a tour-de-force while a restoration of David Lynch's Blue Velvet (1986) had lost some of its strange allure.

Download Full-text