The (ir)reproducibility of published analyses: A case study of 57 JML articles published between 2019 and 2021

Mapping Intimacies ◽

10.31234/osf.io/hf297 ◽

2021 ◽

Author(s):

Anna Laurinavichyute ◽

Shravan Vasishth

Keyword(s):

Open Data ◽

Data Sets ◽

Summary Statistics ◽

Future Work

In 2019 the Journal of Memory and Language instituted an open data and code policy; this policy requires that, as a rule, code and data be released at the latest upon publication. Does this policy lead to reproducible results? We looked at whether 57 papers published between 2019 and 2021 were reproducible, in the sense that the published summary statistics should be possible to regenerate given the data, and given the code, when code was provided. We found that for 10 out of the 57 papers, data sets were inaccessible; 29 of the remaining 47 papers provided code, of which 16 were reproducible. Of the 18 papers that did not provide code, one was reproducible. Overall, the reproducibility rate was about 30%. This estimate is similar to the ones reported for psychology, economics, and other areas, but it is probably possible to do better. We provide some suggestions on how reproducibility can be improvedin future work.

Download Full-text

Assessing the evidentiary value of secondary data analyses: A commentary on Gangestad, Dinh, Grebe, Del Giudice, and Thompson (2019)

10.31234/osf.io/7u5ph ◽

2019 ◽

Author(s):

Benedict C Jones ◽

Lisa Marie DeBruine ◽

Urszula M marcinkowska

Keyword(s):

Scientific Knowledge ◽

Critical Role ◽

Open Data ◽

Secondary Data ◽

Original Study ◽

Hypothesis Generation ◽

Data Sets ◽

Data Analyses ◽

Made In

Secondary data analyses (analyses of open data from published studies) can play a critical role in hypothesis generation and in maximizing the contribution of collected data to the accumulation of scientific knowledge. However, assessing the evidentiary value of results from secondary data analyses is often challenging because analytical decisions can be biased by knowledge of the results of (and analytical choices made in) the original study and by unacknowledged exploratory analyses of open data sets (Scott & Kline, 2019; Weston, Ritchie, Rohrer, & Przybylski, 2018). Using the secondary data analyses reported by Gangestad et al. (this issue) as a case study, we outline several approaches that, if implemented, would allow readers to assess the evidentiary value of results from secondary data analyses with greater confidence.

Download Full-text

Towards Flexible Retrieval, Integration and Analysis of JSON Data Sets through Fuzzy Sets: A Case Study

Information ◽

10.3390/info12070258 ◽

2021 ◽

Vol 12 (7) ◽

pp. 258

Author(s):

Paolo Fosci ◽

Giuseppe Psaila

Keyword(s):

Fuzzy Sets ◽

Query Language ◽

Traditional Approach ◽

Open Data ◽

Real Data ◽

Data Sets ◽

Practical Case ◽

Innovative Capabilities ◽

Potential Applications

How to exploit the incredible variety of JSON data sets currently available on the Internet, for example, on Open Data portals? The traditional approach would require getting them from the portals, then storing them into some JSON document store and integrating them within the document store. However, once data are integrated, the lack of a query language that provides flexible querying capabilities could prevent analysts from successfully completing their analysis. In this paper, we show how the J-CO Framework, a novel framework that we developed at the University of Bergamo (Italy) to manage large collections of JSON documents, is a unique and innovative tool that provides analysts with querying capabilities based on fuzzy sets over JSON data sets. Its query language, called J-CO-QL, is continuously evolving to increase potential applications; the most recent extensions give analysts the capability to retrieve data sets directly from web portals as well as constructs to apply fuzzy set theory to JSON documents and to provide analysts with the capability to perform imprecise queries on documents by means of flexible soft conditions. This paper presents a practical case study in which real data sets are retrieved, integrated and analyzed to effectively show the unique and innovative capabilities of the J-CO Framework.

Download Full-text

Towards Big Linked Data

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2013100102 ◽

2013 ◽

Vol 9 (4) ◽

pp. 19-43 ◽

Cited By ~ 1

Author(s):

Bo Hu ◽

Nuno Carvalho ◽

Takahide Matsutsuka

Keyword(s):

Big Data ◽

Large Scale ◽

Stress Test ◽

Open Data ◽

Small Data ◽

Data Sets ◽

And Performance ◽

Machine Readable ◽

Future Work ◽

Semantic Layer

In light of the challenges of effectively managing Big Data, the authors are witnessing a gradual shift towards the increasingly popular Linked Open Data (LOD) paradigm. LOD aims to impose a machine-readable semantic layer over structured as well as unstructured data and hence automate some data analysis tasks that are not designed for computers. The convergence of Big Data and LOD is, however, not straightforward: the semantic layer of LOD and the Big Data large scale storage do not get along easily. Meanwhile, the sheer data size envisioned by Big Data denies certain computationally expensive semantic technologies, rendering the latter much less efficient than their performance on relatively small data sets. In this paper, the authors propose a mechanism allowing LOD to take advantage of existing large-scale data stores while sustaining its “semantic” nature. The authors demonstrate how RDF-based semantic models can be distributed across multiple storage servers and the authors examine how a fundamental semantic operation can be tuned to meet the requirements on distributed and parallel data processing. The authors' future work will focus on stress test of the platform in the magnitude of tens of billions of triples, as well as comparative studies in usability and performance against similar offerings.

Download Full-text

“Usage by stakeholders” as the objective of “transparency-by-design” in open government data

Information and Learning Sciences ◽

10.1108/ils-05-2017-0034 ◽

2017 ◽

Vol 118 (7/8) ◽

pp. 420-432 ◽

Cited By ~ 5

Author(s):

Stuti Saxena

Keyword(s):

Sri Lanka ◽

Open Data ◽

Open Government ◽

Comparative Approach ◽

Data Sets ◽

Sri Lankan ◽

Content Type ◽

Open Government Data ◽

Government Data

Purpose While “transparency-by-design” serves as the antecedent of any Open Government Data (OGD) initiative (Janssen et al., 2017), its logical objective would be the extent to which data “usage” is facilitated. This paper aims to underscore the significance, drivers and barriers to ensure “usage” of data sets conceding that re-use of data sets is one of the key objectives of any OGD initiative. Design/methodology/approach With a documentary analysis approach, the OGD initiative of Sri Lanka is investigated for the present purpose. Furthermore, the theoretical model of citizen engagement in OGD suggested by Sieber and Johnson (2015) is being referred to appreciate the extent to which the usage of data sets is facilitated via the OGD platform. Findings There are drivers as well as barriers as far as facilitating the usage of the data sets in the Sri Lankan OGD initiative is concerned. For instance, some of the drivers are the provision of suggesting data sets or the possibility of referring to data sets which are historical in nature. However, there are countless barriers to usage than the drivers. For instance, there is absence of metadata in the data sets; the data sets are not updated regularly; there are historical data; the formats of the data sets are limited in nature and are not user-friendly; there is no facility of conducting data visualization or analytics, and there is no collaborative approach towards building further the OGD initiative. Research limitations/implications As only one case study is probed in the paper, further research is warranted to undertake a comparative approach by taking two or more case studies into consideration. Practical implications This study holds relevance for Sri Lankan Government and other stakeholders (policy makers, citizens, developers and the like) so far as furthering of user engagement in OGD initiative is concerned. Social implications Facilitating more usage by citizens would increase their engagement, and they might derive value out of the data sets. At the same time, the government’s objective of ensuring increased usage of the data sets would be better realized. Originality/value “Transparency-by-design” approach had its focus on the publishing phase of OGD, and this paper seeks to provide its logical conclusion by emphasizing upon “usage by stakeholders” because by opening data sets, the government has the target to ensure that these open data sets are being used and re-used. Therefore, it is the outcome which is being discussed with the support of a case study set in the background of Sri Lanka’s Open Data initiative. Besides, this is the first study which probes the OGD initiatives of Sri Lanka – therein lies the major contribution of the study.

Download Full-text

Mapping Queries to the Linking Open Data Cloud: A Case Study Using DBpedia

SSRN Electronic Journal ◽

10.2139/ssrn.3199534 ◽

2011 ◽

Cited By ~ 1

Author(s):

Edgar Meij ◽

Marc Bron ◽

Laura Hollink ◽

Bouke Huurnink ◽

Maarten de Rijke

Keyword(s):

Open Data

Download Full-text

Segment-Based Approach for Assessing Hazard Risk of Coastal Highways in Hawai‘i

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198118821679 ◽

2019 ◽

Vol 2673 (1) ◽

pp. 83-91 ◽

Cited By ~ 2

Author(s):

Harrison Togia ◽

Oceana P. Francis ◽

Karl Kim ◽

Guohui Zhang

Keyword(s):

Qualitative Method ◽

Data Sets ◽

Regional Environment ◽

The Road ◽

Hazard Exposure ◽

Multiple Indicators ◽

System Data ◽

Rural Highway ◽

Geographic Information System Data

Hazards to roadways and travelers can be drastically different because hazards are largely dependent on the regional environment and climate. This paper describes the development of a qualitative method for assessing infrastructure importance and hazard exposure for rural highway segments in Hawai‘i under different conditions. Multiple indicators of roadway importance are considered, including traffic volume, population served, accessibility, connectivity, reliability, land use, and roadway connection to critical infrastructures, such as hospitals and police stations. The method of evaluating roadway hazards and importance can be tailored to fit different regional hazard scenarios. It assimilates data from diverse sources to estimate risks of disruption. A case study for Highway HI83 in Hawai‘i, which is exposed to multiple hazards, is conducted. Weakening of the road by coastal erosion, inundation from sea level rise, and rockfall hazards require adaptation solutions. By analyzing the risk of disruption to highway segments, adaptation approaches can be prioritized. Using readily available geographic information system data sets for the exposure and impacts of potential hazards, this method could be adapted not only for emergency management but also for planning, design, and engineering of resilient highways.

Download Full-text

Smarter Open Government Data for Society 5.0: Are Your Open Data Smart Enough?

Sensors ◽

10.3390/s21155204 ◽

2021 ◽

Vol 21 (15) ◽

pp. 5204

Author(s):

Anastasija Nikiforova

Keyword(s):

Industry 4.0 ◽

Economic Value ◽

Open Data ◽

Digital Data ◽

Open Government ◽

Data Sets ◽

Time Data ◽

Open Government Data ◽

Information And Communication ◽

Government Data

Nowadays, governments launch open government data (OGD) portals that provide data that can be accessed and used by everyone for their own needs. Although the potential economic value of open (government) data is assessed in millions and billions, not all open data are reused. Moreover, the open (government) data initiative as well as users’ intent for open (government) data are changing continuously and today, in line with IoT and smart city trends, real-time data and sensor-generated data have higher interest for users. These “smarter” open (government) data are also considered to be one of the crucial drivers for the sustainable economy, and might have an impact on information and communication technology (ICT) innovation and become a creativity bridge in developing a new ecosystem in Industry 4.0 and Society 5.0. The paper inspects OGD portals of 60 countries in order to understand the correspondence of their content to the Society 5.0 expectations. The paper provides a report on how much countries provide these data, focusing on some open (government) data success facilitating factors for both the portal in general and data sets of interest in particular. The presence of “smarter” data, their level of accessibility, availability, currency and timeliness, as well as support for users, are analyzed. The list of most competitive countries by data category are provided. This makes it possible to understand which OGD portals react to users’ needs, Industry 4.0 and Society 5.0 request the opening and updating of data for their further potential reuse, which is essential in the digital data-driven world.

Download Full-text

Ontology-Based Correlation Detection Among Heterogeneous Data Sets: A Case Study of University Campus Issues

2020 IEEE Third International Conference on Artificial Intelligence and Knowledge Engineering (AIKE) ◽

10.1109/aike48582.2020.00014 ◽

2020 ◽

Author(s):

Yuto Tsukagoshi ◽

Shusaku Egami ◽

Yuichi Sei ◽

Yasuyuki Tahara ◽

Akihiko Ohsuga

Keyword(s):

Heterogeneous Data ◽

Data Sets ◽

University Campus

Download Full-text

Mapping Public Urban Green Spaces Based on OpenStreetMap and Sentinel-2 Imagery Using Belief Functions

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10040251 ◽

2021 ◽

Vol 10 (4) ◽

pp. 251

Author(s):

Christina Ludwig ◽

Robert Hecht ◽

Sven Lautenbach ◽

Martin Schorcht ◽

Alexander Zipf

Keyword(s):

Vegetation Index ◽

Normalized Difference Vegetation Index ◽

Open Data ◽

Green Spaces ◽

Data Sets ◽

Urban Green ◽

Urban Green Spaces ◽

Urban Quality Of Life ◽

Shafer Theory ◽

Sentinel 2

Public urban green spaces are important for the urban quality of life. Still, comprehensive open data sets on urban green spaces are not available for most cities. As open and globally available data sets, the potential of Sentinel-2 satellite imagery and OpenStreetMap (OSM) data for urban green space mapping is high but limited due to their respective uncertainties. Sentinel-2 imagery cannot distinguish public from private green spaces and its spatial resolution of 10 m fails to capture fine-grained urban structures, while in OSM green spaces are not mapped consistently and with the same level of completeness everywhere. To address these limitations, we propose to fuse these data sets under explicit consideration of their uncertainties. The Sentinel-2 derived Normalized Difference Vegetation Index was fused with OSM data using the Dempster–Shafer theory to enhance the detection of small vegetated areas. The distinction between public and private green spaces was achieved using a Bayesian hierarchical model and OSM data. The analysis was performed based on land use parcels derived from OSM data and tested for the city of Dresden, Germany. The overall accuracy of the final map of public urban green spaces was 95% and was mainly influenced by the uncertainty of the public accessibility model.

Download Full-text