scholarly journals Towards Flexible Retrieval, Integration and Analysis of JSON Data Sets through Fuzzy Sets: A Case Study

Information ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 258
Author(s):  
Paolo Fosci ◽  
Giuseppe Psaila

How to exploit the incredible variety of JSON data sets currently available on the Internet, for example, on Open Data portals? The traditional approach would require getting them from the portals, then storing them into some JSON document store and integrating them within the document store. However, once data are integrated, the lack of a query language that provides flexible querying capabilities could prevent analysts from successfully completing their analysis. In this paper, we show how the J-CO Framework, a novel framework that we developed at the University of Bergamo (Italy) to manage large collections of JSON documents, is a unique and innovative tool that provides analysts with querying capabilities based on fuzzy sets over JSON data sets. Its query language, called J-CO-QL, is continuously evolving to increase potential applications; the most recent extensions give analysts the capability to retrieve data sets directly from web portals as well as constructs to apply fuzzy set theory to JSON documents and to provide analysts with the capability to perform imprecise queries on documents by means of flexible soft conditions. This paper presents a practical case study in which real data sets are retrieved, integrated and analyzed to effectively show the unique and innovative capabilities of the J-CO Framework.

Electronics ◽  
2021 ◽  
Vol 10 (5) ◽  
pp. 621
Author(s):  
Giuseppe Psaila ◽  
Paolo Fosci

Internet technology and mobile technology have enabled producing and diffusing massive data sets concerning almost every aspect of day-by-day life. Remarkable examples are social media and apps for volunteered information production, as well as Open Data portals on which public administrations publish authoritative and (often) geo-referenced data sets. In this context, JSON has become the most popular standard for representing and exchanging possibly geo-referenced data sets over the Internet.Analysts, wishing to manage, integrate and cross-analyze such data sets, need a framework that allows them to access possibly remote storage systems for JSON data sets, to retrieve and query data sets by means of a unique query language (independent of the specific storage technology), by exploiting possibly-remote computational resources (such as cloud servers), comfortably working on their PC in their office, more or less unaware of real location of resources. In this paper, we present the current state of the J-CO Framework, a platform-independent and analyst-oriented software framework to manipulate and cross-analyze possibly geo-tagged JSON data sets. The paper presents the general approach behind the J-CO Framework, by illustrating the query language by means of a simple, yet non-trivial, example of geographical cross-analysis. The paper also presents the novel features introduced by the re-engineered version of the execution engine and the most recent components, i.e., the storage service for large single JSON documents and the user interface that allows analysts to comfortably share data sets and computational resources with other analysts possibly working in different places of the Earth globe. Finally, the paper reports the results of an experimental campaign, which show that the execution engine actually performs in a more than satisfactory way, proving that our framework can be actually used by analysts to process JSON data sets.


Author(s):  
Khayra Bencherif ◽  
Mimoun Malki ◽  
Djamel Amar Bensaber

This article describes how the Linked Open Data Cloud project allows data providers to publish structured data on the web according to the Linked Data principles. In this context, several link discovery frameworks have been developed for connecting entities contained in knowledge bases. In order to achieve a high effectiveness for the link discovery task, a suitable link configuration is required to specify the similarity conditions. Unfortunately, such configurations are specified manually; which makes the link discovery task tedious and more difficult for the users. In this article, the authors address this drawback by proposing a novel approach for the automatic determination of link specifications. The proposed approach is based on a neural network model to combine a set of existing metrics into a compound one. The authors evaluate the effectiveness of the proposed approach in three experiments using real data sets from the LOD Cloud. In addition, the proposed approach is compared against link specifications approaches to show that it outperforms them in most experiments.


2019 ◽  
Author(s):  
Benedict C Jones ◽  
Lisa Marie DeBruine ◽  
Urszula M marcinkowska

Secondary data analyses (analyses of open data from published studies) can play a critical role in hypothesis generation and in maximizing the contribution of collected data to the accumulation of scientific knowledge. However, assessing the evidentiary value of results from secondary data analyses is often challenging because analytical decisions can be biased by knowledge of the results of (and analytical choices made in) the original study and by unacknowledged exploratory analyses of open data sets (Scott & Kline, 2019; Weston, Ritchie, Rohrer, & Przybylski, 2018). Using the secondary data analyses reported by Gangestad et al. (this issue) as a case study, we outline several approaches that, if implemented, would allow readers to assess the evidentiary value of results from secondary data analyses with greater confidence.


Author(s):  
R. E. Abd EL-Kader ◽  
A. M. Abd AL-Fattah ◽  
G. R. AL-Dayian ◽  
A. A. EL-Helbawy

Statistical prediction is one of the most important problems in life testing; it has been applied in medicine, engineering, business and other areas as well. In this paper, the exponentiated generalized xgamma distribution is introduced as an application on the exponentiated generalized general class of distributions. Bayesian point and interval prediction of exponentiated generalized xgamma distribution based on dual generalized order statistics are considered. All results are specialized to lower records. The results are verified using simulation study as well as applications to real data sets to demonstrate the flexibility and potential applications of the distribution.


2021 ◽  
Author(s):  
Anna Laurinavichyute ◽  
Shravan Vasishth

In 2019 the Journal of Memory and Language instituted an open data and code policy; this policy requires that, as a rule, code and data be released at the latest upon publication. Does this policy lead to reproducible results? We looked at whether 57 papers published between 2019 and 2021 were reproducible, in the sense that the published summary statistics should be possible to regenerate given the data, and given the code, when code was provided. We found that for 10 out of the 57 papers, data sets were inaccessible; 29 of the remaining 47 papers provided code, of which 16 were reproducible. Of the 18 papers that did not provide code, one was reproducible. Overall, the reproducibility rate was about 30%. This estimate is similar to the ones reported for psychology, economics, and other areas, but it is probably possible to do better. We provide some suggestions on how reproducibility can be improvedin future work.


2003 ◽  
pp. 88-120
Author(s):  
Tanguy Chateau ◽  
Cecile Leroy ◽  
Johanna W. Rahayu ◽  
David Taniar

The emerging use of object-relational databases with Web technologies has only recently begun. This chapter discusses a practical realization of an application using this technology. The aim is to show readers how to construct a full application from a design using object-oriented features up to the implementation. In this chapter, we highlight important or difficult stages with an emphasis on the mapping of object design into Oracle 8i and the use of stored procedures with the extended features for objects manipulation of Oracle 8i. This enables developers to construct professional Web applications achieving a high modularity and evolution capacity with an accelerated development phase in comparison with the traditional approach.


Author(s):  
John S. Miller

Geographic information systems (GIS) have been used for more than a decade to display crash sites. Now that their novelty has worn off, the traffic records community should ask what additional analytical benefits GIS can uniquely provide. A literature review illustrates that safety-related GIS analytical capabilities surpass the common practice of producing crash location pin maps. Additional useful GIS techniques include using grid-based modeling; producing increasingly accurate collision diagrams; verifying disparate sources of crash data; applying spatially based statistical applications; examining crash location patterns for causal factors; aligning public opinion with real data; and improving routing capabilities for pedestrians, bicyclists, and hazardous material carriers. This research explores GIS analytic capabilities that can be practically applied to crash data evaluation. A case study demonstrates that despite limited budgets and imperfect data, GIS can still help to identify potential crash countermeasures. Using a typical non-GIS source of crash data (a software package that records crashes at either an intersection or a midblock location), it was possible to place approximately 82 percent of crash locations within a GIS. When private property crashes were excluded from this process, the placement rate climbed to an estimated 94 percent for intersections and 87 percent for midblock locations. By focusing the case study on understanding the practical spatial analytical capabilities rather than merely the mechanics of specific GIS software, one can eventually take advantage of the more extensive crash and roadway data sets that will become available. Despite its many capabilities, however, GIS has not eliminated the need for a comprehensive safety analysis framework integrating the spatial and statistical queries necessary for engineering, enforcement, or educational improvements.


2021 ◽  
Vol 9 (1) ◽  
pp. 62-81
Author(s):  
Kjersti Aas ◽  
Thomas Nagler ◽  
Martin Jullum ◽  
Anders Løland

Abstract In this paper the goal is to explain predictions from complex machine learning models. One method that has become very popular during the last few years is Shapley values. The original development of Shapley values for prediction explanation relied on the assumption that the features being described were independent. If the features in reality are dependent this may lead to incorrect explanations. Hence, there have recently been attempts of appropriately modelling/estimating the dependence between the features. Although the previously proposed methods clearly outperform the traditional approach assuming independence, they have their weaknesses. In this paper we propose two new approaches for modelling the dependence between the features. Both approaches are based on vine copulas, which are flexible tools for modelling multivariate non-Gaussian distributions able to characterise a wide range of complex dependencies. The performance of the proposed methods is evaluated on simulated data sets and a real data set. The experiments demonstrate that the vine copula approaches give more accurate approximations to the true Shapley values than their competitors.


Author(s):  
Álvaro Lozano Murciego ◽  
Gabriel Villarrubia González ◽  
Alberto López Barriuso ◽  
Daniel Hernández De La Iglesia ◽  
Jorge Revuelta Herrero

Along this paper, we present a new multi agent-based system to gather waste on cities and villages. We have developed a low cost wireless sensor prototype to measure the volume level of the containers. Furthermore a route system is developed to optimize the routes of the trucks and a mobile application has been developed to help drivers in their working days. In order to evaluate and validate the proposed system a practical case study in a real city environment is modeled using open data available and with the purpose of identifying limitations of the system.


2020 ◽  
Vol 21 (2) ◽  
Author(s):  
Bogumiła Hnatkowska ◽  
Zbigniew Huzar ◽  
Lech Tuzinkiewicz

A conceptual model is a high-level, graphical representation of a specic do-main, presenting its key concepts and relationships between them. In particular, these dependencies can be inferred from concepts' instances being a part of big raw data les. The paper aims to propose a method for constructing a conceptual model from data frames encompassed in data les. The result is presented in the form of a class diagram. The method is explained with several examples and veried by a case study in which the real data sets are processed. It can also be applied for checking the quality of the data set.


Sign in / Sign up

Export Citation Format

Share Document