scholarly journals Benchmarking Graph Database Backends—What Works Well with Wikidata?

2019 ◽  
Vol 24 (1) ◽  
pp. 43-60
Author(s):  
Tibor Kovács ◽  
Gábor Simon ◽  
Gergely Mezei

Knowledge bases often utilize graphs as logical model. RDF-based knowledge bases (KB) are prime examples, as RDF (Resource Description Framework) does use graph as logical model. Graph databases are an emerging breed of NoSQL-type databases, offering graph as the logical model. Although there are specialized databases, the so-called triple stores, for storing RDF data, graph databases can also be promising candidates for storing knowledge. In this paper, we benchmark different graph database implementations loaded with Wikidata, a real-life, large-scale knowledge base. Graph databases come in all shapes and sizes, offer different APIs and graph models. Hence we used a measurement system, that can abstract away the API differences. For the modeling aspect, we made measurements with different graph encodings previously suggested in the literature, in order to observe the impact of the encoding aspect on the overall performance.  

Author(s):  
Krzysztof Jurczuk ◽  
Marcin Czajkowski ◽  
Marek Kretowski

AbstractThis paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.


Author(s):  
Gianluca Bardaro ◽  
Alessio Antonini ◽  
Enrico Motta

AbstractOver the last two decades, several deployments of robots for in-house assistance of older adults have been trialled. However, these solutions are mostly prototypes and remain unused in real-life scenarios. In this work, we review the historical and current landscape of the field, to try and understand why robots have yet to succeed as personal assistants in daily life. Our analysis focuses on two complementary aspects: the capabilities of the physical platform and the logic of the deployment. The former analysis shows regularities in hardware configurations and functionalities, leading to the definition of a set of six application-level capabilities (exploration, identification, remote control, communication, manipulation, and digital situatedness). The latter focuses on the impact of robots on the daily life of users and categorises the deployment of robots for healthcare interventions using three types of services: support, mitigation, and response. Our investigation reveals that the value of healthcare interventions is limited by a stagnation of functionalities and a disconnection between the robotic platform and the design of the intervention. To address this issue, we propose a novel co-design toolkit, which uses an ecological framework for robot interventions in the healthcare domain. Our approach connects robot capabilities with known geriatric factors, to create a holistic view encompassing both the physical platform and the logic of the deployment. As a case study-based validation, we discuss the use of the toolkit in the pre-design of the robotic platform for an pilot intervention, part of the EU large-scale pilot of the EU H2020 GATEKEEPER project.


2020 ◽  
pp. 1-7
Author(s):  
Sumit Kumar Gupta ◽  

Nanotechnology is new frontiers of this century. The world is facing great challenges in meeting rising demands for basic commodities(e.g., food, water and energy), finished goods (e.g., cellphones, cars and airplanes) and services (e.g., shelter, healthcare and employment) while reducing and minimizing the impact of human activities on Earth’s global environment and climate. Nanotechnology has emerged as a versatile platform that could provide efficient, cost-effective and environmentally acceptable solutions to the global sustainability challenges facing society. In recent years there has been a rapid increase in nanotechnology in the fields of medicine and more specifically in targeted drug delivery. Opportunities of utilizing nanotechnology to address global challenges in (1) water purification, (2) clean energy technologies, (3) greenhouse gases management, (4) materials supply and utilization, and (5) green manufacturing and hemistry. Smart delivery of nutrients, bio-separation of proteins, rapid sampling of biological and chemical contaminants, and nano encapsulation of nutraceuticals are some of the emerging topics of nanotechnology for food and agriculture. Nanotechnology is helping to considerably improve, even revolutionize, many technology and Industry sectors: information technology, energy, environmental science, medicine, homeland security, food safety, and transportation, among many others. Today’s nanotechnology harnesses current progress in chemistry, physics, materials science, and biotechnology to create novel materials that have unique properties because their structures are determined on the nanometer scale. This paper summarizes the various applications of nanotechnology in recent decades Nanotechnology is one of the leading scientific fields today since it combines knowledge from the fields of Physics, Chemistry, Biology, Medicine, Informatics, and Engineering. It is an emerging technological field with great potential to lead in great breakthroughs that can be applied in real life. Novel Nano and biomaterials, and Nano devices are fabricated and controlled by nanotechnology tools and techniques, which investigate and tune the properties, responses, and functions of living and non-living matter, at sizes below100 nm. The application and use of Nano materials in electronic and mechanical devices, in optical and magnetic components, quantum computing, tissue engineering, and other biotechnologies, with smallest features, widths well below 100 nm, are the economically most important parts of the nanotechnology nowadays and presumably in the near future. The number of Nano products is rapidly growing since more and more Nano engineered materials are reaching the global market the continuous revolution in nanotechnology will result in the fabrication of nanomaterial with properties and functionalities which are going to have positive changes in the lives of our citizens, be it in health, environment, electronics or any other field. In the energy generation challenge where the conventional fuel resources cannot remain the dominant energy source, taking into account the increasing consumption demand and the CO2 .Emissions alternative renewable energy sources based on new technologies have to be promoted. Innovative solar cell technologies that utilize nanostructured materials and composite systems such as organic photovoltaic offer great technological potential due to their attractive properties such as the potential of large-scale and low-cost roll-to-roll manufacturing processes


2017 ◽  
Vol 44 (2) ◽  
pp. 203-229 ◽  
Author(s):  
Javier D Fernández ◽  
Miguel A Martínez-Prieto ◽  
Pablo de la Fuente Redondo ◽  
Claudio Gutiérrez

The publication of semantic web data, commonly represented in Resource Description Framework (RDF), has experienced outstanding growth over the last few years. Data from all fields of knowledge are shared publicly and interconnected in active initiatives such as Linked Open Data. However, despite the increasing availability of applications managing large-scale RDF information such as RDF stores and reasoning tools, little attention has been given to the structural features emerging in real-world RDF data. Our work addresses this issue by proposing specific metrics to characterise RDF data. We specifically focus on revealing the redundancy of each data set, as well as common structural patterns. We evaluate the proposed metrics on several data sets, which cover a wide range of designs and models. Our findings provide a basis for more efficient RDF data structures, indexes and compressors.


Author(s):  
Zongmin Ma ◽  
Li Yan

The resource description framework (RDF) is a model for representing information resources on the web. With the widespread acceptance of RDF as the de-facto standard recommended by W3C (World Wide Web Consortium) for the representation and exchange of information on the web, a huge amount of RDF data is being proliferated and becoming available. So, RDF data management is of increasing importance and has attracted attention in the database community as well as the Semantic Web community. Currently, much work has been devoted to propose different solutions to store large-scale RDF data efficiently. In order to manage massive RDF data, NoSQL (not only SQL) databases have been used for scalable RDF data store. This chapter focuses on using various NoSQL databases to store massive RDF data. An up-to-date overview of the current state of the art in RDF data storage in NoSQL databases is provided. The chapter aims at suggestions for future research.


Author(s):  
Zongmin Ma ◽  
Li Yan

The Resource Description Framework (RDF) is a model for representing information resources on the Web. With the widespread acceptance of RDF as the de-facto standard recommended by W3C (World Wide Web Consortium) for the representation and exchange of information on the Web, a huge amount of RDF data is being proliferated and becoming available. So RDF data management is of increasing importance, and has attracted attentions in the database community as well as the Semantic Web community. Currently much work has been devoted to propose different solutions to store large-scale RDF data efficiently. In order to manage massive RDF data, NoSQL (“not only SQL”) databases have been used for scalable RDF data store. This chapter focuses on using various NoSQL databases to store massive RDF data. An up-to-date overview of the current state of the art in RDF data storage in NoSQL databases is provided. The chapter aims at suggestions for future research.


Author(s):  
Aatif Ahmad Khan ◽  
Sanjay Kumar Malik

Semantic Search refers to set of approaches dealing with usage of Semantic Web technologies for information retrieval in order to make the process machine understandable and fetch precise results. Knowledge Bases (KB) act as the backbone for semantic search approaches to provide machine interpretable information for query processing and retrieval of results. These KB include Resource Description Framework (RDF) datasets and populated ontologies. In this paper, an assessment of the largest cross-domain KB is presented that are exploited in large scale semantic search and are freely available on Linked Open Data Cloud. Analysis of these datasets is a prerequisite for modeling effective semantic search approaches because of their suitability for particular applications. Only the large scale, cross-domain datasets are considered, which are having sizes more than 10 million RDF triples. Survey of sizes of the datasets in triples count has been depicted along with triples data format(s) supported by them, which is quite significant to develop effective semantic search models.


2018 ◽  
Vol 231 ◽  
pp. 05003 ◽  
Author(s):  
Arkadiusz Matysiak ◽  
Paula Razin

The article presents the analysis of the performance of the vehicles equipped with automated driving systems (ADS) which were tested in real-life road conditions from 2015 to 2017 in the state of California. It aims at the effort to assess the impact on the road safety the continuous technological advancements in driving automation might have, based on of the first large-scale, real-life test deployments. Vehicle manufacturers and other stakeholders testing the highly automated vehicles in California are obliged to issue yearly reports which provide an insight on the test scale as well as the technology maturity. The so-called 'disengagement reports' highlight the range and number of control takeovers between the ADS and driver, which are made either based on driver's decision or information provided by the vehicle itself. The analysis of these reports allowed to investigate the development made in automated driving technology throughout the years of tests, as well as the direct or indirect influence of the external factors (e.g. various weather conditions) on the ADS performance. The results show that there is still a significant gap in reliability and safety between human drivers and highly automated vehicles which has been yet steadily decreasing due to technology advancements made while driving in the specific infrastructure and traffic conditions of California.


Publications ◽  
2019 ◽  
Vol 7 (2) ◽  
pp. 38 ◽  
Author(s):  
Lyubomir Penev ◽  
Mariya Dimitrova ◽  
Viktor Senderov ◽  
Georgi Zhelezov ◽  
Teodor Georgiev ◽  
...  

Hundreds of years of biodiversity research have resulted in the accumulation of a substantial pool of communal knowledge; however, most of it is stored in silos isolated from each other, such as published articles or monographs. The need for a system to store and manage collective biodiversity knowledge in a community-agreed and interoperable open format has evolved into the concept of the Open Biodiversity Knowledge Management System (OBKMS). This paper presents OpenBiodiv: An OBKMS that utilizes semantic publishing workflows, text and data mining, common standards, ontology modelling and graph database technologies to establish a robust infrastructure for managing biodiversity knowledge. It is presented as a Linked Open Dataset generated from scientific literature. OpenBiodiv encompasses data extracted from more than 5000 scholarly articles published by Pensoft and many more taxonomic treatments extracted by Plazi from journals of other publishers. The data from both sources are converted to Resource Description Framework (RDF) and integrated in a graph database using the OpenBiodiv-O ontology and an RDF version of the Global Biodiversity Information Facility (GBIF) taxonomic backbone. Through the application of semantic technologies, the project showcases the value of open publishing of Findable, Accessible, Interoperable, Reusable (FAIR) data towards the establishment of open science practices in the biodiversity domain.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Nils Kohn ◽  
Jan Heidkamp ◽  
Guillén Fernández ◽  
Jurgen Fütterer ◽  
Indira Tendolkar

AbstractPeople often experience high level of distress during invasive interventions, which may exceed their coping abilities. This may be in particular evident when confronted with the suspicion of cancer. Taking the example of prostate biopsy sampling, we aimed at investigating the impact of an MRI guided prostate biopsy on the acute stress response and its mechanistic basis. We recruited 20 men with a clinical suspicion of prostate cancer. Immediately before an MRI guided biopsy procedure, we conducted fMRI in the same scanner to assess resting-state brain connectivity. Physiological and hormonal stress measures were taken during the procedure and associated with questionnaires, hair cortisol levels and brain measures to elucidate mechanistic factors for elevated stress. As expected, patients reported a stress-related change in affect. Decreased positive affect was associated with higher hair but not saliva cortisol concentration. Stronger use of maladaptive emotion regulation techniques, elevated depression scores and higher within-salience-network connectivity was associated with stronger increase in negative affect and/or decrease of positive affect during the procedure. While being limited in its generalization due to age, sample size and gender, our proof of concept study demonstrates the utility of real-life stressors and large-scale brain network measures in stress regulation research with potential impact in clinical practice.


Sign in / Sign up

Export Citation Format

Share Document