Open Data Availability and Suitability for Financial Analyses

Data are a key resource for modern societies and expected to improve quality, accessibility, affordability, safety, and equity of health care. Dental care and research are currently transforming into what we term data dentistry, with 3 main applications: 1) medical data analysis uses deep learning, allowing one to master unprecedented amounts of data (language, speech, imagery) and put them to productive use. 2) Data-enriched clinical care integrates data from individual (e.g., demographic, social, clinical and omics data, consumer data), setting (e.g., geospatial, environmental, provider-related data), and systems level (payer or regulatory data to characterize input, throughput, output, and outcomes of health care) to provide a comprehensive and continuous real-time assessment of biologic perturbations, individual behaviors, and context. Such care may contribute to a deeper understanding of health and disease and a more precise, personalized, predictive, and preventive care. 3) Data for research include open research data and data sharing, allowing one to appraise, benchmark, pool, replicate, and reuse data. Concerns and confidence into data-driven applications, stakeholders’ and system’s capabilities, and lack of data standardization and harmonization currently limit the development and implementation of data dentistry. Aspects of bias and data-user interaction require attention. Action items for the dental community circle around increasing data availability, refinement, and usage; demonstrating safety, value, and usefulness of applications; educating the dental workforce and consumers; providing performant and standardized infrastructure and processes; and incentivizing and adopting open data and data sharing.

Download Full-text

The Open Data Challenge: An Analysis of 124,000 Data Availability Statements and an Ironic Lesson about Data Management Plans

Data Intelligence ◽

10.1162/dint_a_00061 ◽

2020 ◽

Vol 2 (4) ◽

pp. 554-568

Author(s):

Chris Graf ◽

Dave Flanagan ◽

Lisa Wylie ◽

Deirdre Silver

Keyword(s):

Machine Learning ◽

Data Management ◽

Open Data ◽

Research Data ◽

Data Availability ◽

Policy Changes ◽

Management Plans ◽

Trends Over Time ◽

New Research ◽

Over Time

Data availability statements can provide useful information about how researchers actually share research data. We used unsupervised machine learning to analyze 124,000 data availability statements submitted by research authors to 176 Wiley journals between 2013 and 2019. We categorized the data availability statements, and looked at trends over time. We found expected increases in the number of data availability statements submitted over time, and marked increases that correlate with policy changes made by journals. Our open data challenge becomes to use what we have learned to present researchers with relevant and easy options that help them to share and make an impact with new research data.

Download Full-text

Open Data Policies among Library and Information Science Journals

Publications ◽

10.3390/publications9020025 ◽

2021 ◽

Vol 9 (2) ◽

pp. 25

Author(s):

Brian Jackson

Keyword(s):

Information Science ◽

Open Data ◽

Research Data ◽

Data Availability ◽

Academic Publishing ◽

Library And Information Science ◽

Data Archiving ◽

Open Research ◽

Open Access Journals ◽

Public Data

Journal publishers play an important role in the open research data ecosystem. Through open data policies that include public data archiving mandates and data availability statements, journal publishers help promote transparency in research and wider access to a growing scholarly record. The library and information science (LIS) discipline has a unique relationship with both open data initiatives and academic publishing and may be well-positioned to adopt rigorous open data policies. This study examines the information provided on public-facing websites of LIS journals in order to describe the extent, and nature, of open data guidance provided to prospective authors. Open access journals in the discipline have disproportionately adopted detailed, strict open data policies. Commercial publishers, which account for the largest share of publishing in the discipline, have largely adopted weaker policies. Rigorous policies, adopted by a minority of journals, describe the rationale, application, and expectations for open research data, while most journals that provide guidance on the matter use hesitant and vague language. Recommendations are provided for strengthening journal open data policies.

Download Full-text

Open Data and Open Access Articles: Exploring Connections in the Life Sciences

Journal of eScience Librarianship ◽

10.7191/jeslib.2020.1184 ◽

2020 ◽

Vol 9 (1) ◽

Author(s):

Sarah Williams

Keyword(s):

Open Access ◽

Life Sciences ◽

Open Data ◽

Data Bank ◽

Data Availability ◽

Research Articles ◽

Future Research ◽

Small Scale ◽

Data Repositories ◽

Current State

Objectives: This small-scale study explores the current state of connections between open data and open access (OA) articles in the life sciences. Methods: This study involved 44 openly available life sciences datasets from the Illinois Data Bank that had 45 related research articles. For each article, I gathered the OA status of the journal and the article on the publisher website and checked whether the article was openly available via Unpaywall and Research Gate. I also examined how and where the open data was included in the HTML and PDF versions of the related articles. Results: Of the 45 articles studied, less than half were published in Gold/Full OA journals, and while the remaining articles were published in Gold/Hybrid journals, none of them were OA. This study found that OA articles pointed to the Illinois Data Bank datasets similarly to all of the related articles, most commonly with a data availability statement containing a DOI. Conclusions: The findings indicate that Gold OA in hybrid journals does not appear to be a popular option, even for articles connected to open data, and this study emphasizes the importance of data repositories providing DOIs, since the related articles frequently used DOIs to point to the Illinois Data Bank datasets. This study also revealed concerns about free (not licensed OA) access to articles on publisher websites, which will be a significant topic for future research.

Download Full-text

A descriptive analysis of the data availability statements accompanying medRxiv preprints and a comparison with their published counterparts

PLoS ONE ◽

10.1371/journal.pone.0250887 ◽

2021 ◽

Vol 16 (5) ◽

pp. e0250887

Author(s):

Luke A. McGuinness ◽

Athena L. Sheppard

Keyword(s):

Data Sharing ◽

Descriptive Analysis ◽

Open Data ◽

System Change ◽

Research Data ◽

Data Availability ◽

Published Data ◽

Editorial Policies ◽

Journal Editors ◽

Closed Data

Objective To determine whether medRxiv data availability statements describe open or closed data—that is, whether the data used in the study is openly available without restriction—and to examine if this changes on publication based on journal data-sharing policy. Additionally, to examine whether data availability statements are sufficient to capture code availability declarations. Design Observational study, following a pre-registered protocol, of preprints posted on the medRxiv repository between 25th June 2019 and 1st May 2020 and their published counterparts. Main outcome measures Distribution of preprinted data availability statements across nine categories, determined by a prespecified classification system. Change in the percentage of data availability statements describing open data between the preprinted and published versions of the same record, stratified by journal sharing policy. Number of code availability declarations reported in the full-text preprint which were not captured in the corresponding data availability statement. Results 3938 medRxiv preprints with an applicable data availability statement were included in our sample, of which 911 (23.1%) were categorized as describing open data. 379 (9.6%) preprints were subsequently published, and of these published articles, only 155 contained an applicable data availability statement. Similar to the preprint stage, a minority (59 (38.1%)) of these published data availability statements described open data. Of the 151 records eligible for the comparison between preprinted and published stages, 57 (37.7%) were published in journals which mandated open data sharing. Data availability statements more frequently described open data on publication when the journal mandated data sharing (open at preprint: 33.3%, open at publication: 61.4%) compared to when the journal did not mandate data sharing (open at preprint: 20.2%, open at publication: 22.3%). Conclusion Requiring that authors submit a data availability statement is a good first step, but is insufficient to ensure data availability. Strict editorial policies that mandate data sharing (where appropriate) as a condition of publication appear to be effective in making research data available. We would strongly encourage all journal editors to examine whether their data availability policies are sufficiently stringent and consistently enforced.

Download Full-text

Loch Prospector: Metadata Visualization for Lakes of Open Data

10.31219/osf.io/2s76d ◽

2020 ◽

Author(s):

Neha Makhija ◽

Mansi Jain ◽

Nikolaos Tziavelis ◽

Laura Di Rocco ◽

Sara Di Bartolomeo ◽

...

Keyword(s):

Data Management ◽

Data Science ◽

Open Data ◽

Data Availability ◽

Great Promise ◽

Management Techniques ◽

New Challenges ◽

Integration Data ◽

New Algorithms ◽

Structural Aspects

Data lakes are an emerging storage paradigm that promotes data availability over integration. A prime example are repositories of Open Data which show great promise for transparent data science. Due to the lack of proper integration, Data Lakes may not have a common consistent schema and traditional data management techniques fall short with these repositories. Much recent research has tried to address the new challenges associated with these data lakes. Researchers in this area are mainly interested in the structural properties of the data for developing new algorithms, yet typical Open Data portals offer limited functionality in that respect and instead focus on data semantics.We propose Loch Prospector, a visualization to assist data management researchers in exploring and understanding the most crucial structural aspects of Open Data — in particular, metadata attributes — and the associated task abstraction for their work. Our visualization enables researchers to navigate the contents of data lakes effectively and easily accomplish what were previously laborious tasks. A copy of this paper with all supplemental material is available at osf.io/zkxv9

Download Full-text

Digital preservation

Library Hi Tech ◽

10.1108/lht-07-2016-0078 ◽

2016 ◽

Vol 34 (4) ◽

pp. 733-747 ◽

Cited By ~ 12

Author(s):

Kofi Koranteng Adu ◽

Luyande Dube ◽

Emmanuel Adjei

Keyword(s):

Open Data ◽

Digital Preservation ◽

Data Availability ◽

Electronic Government ◽

Future Research ◽

Efficient System ◽

Content Type ◽

Open Archival Information System ◽

Right To Information ◽

The Right

Purpose The purpose of this paper is to explore the extent to which digital preservation facilitate the implementation of electronic government, open data and the right to information. Design/methodology/approach A case study research which chronicles the link between transparency and data availability. It makes use of a theoretical framework based on the open archival information system to analyse, explain, clarify and justify the application of open data, electronic government and the right to information. Findings The paper argued that e-government, open data and the RTI will remain elusive if a digital preservation infrastructure is not pursued. Within the context of e-government, the paper adumbrated that government agencies can incorporate e-government legislations into their digital preservation activities, precisely because the relationship between digital preservation and e-government have always been symbiotic. It alluded to the fact that an obligation will be placed on all public authorities and private agencies covered by the RTL law to create, keep and organise an effective and efficient system of record keeping, so as to give meaning to the right to information when citizens apply for information. Practical implications Future research should examine closely the implication of open data government within the context of digital preservation. Whilst digital preservation looks forward to the longevity of digital records and its accessibility, open data focusses on the utility of these records through online services, reuse and distribution for the purposes of transparency and citizens’ participation. Originality/value The application of digital preservation to open data in this paper appears to be more relevant at a time when most governments of the world are striving to obtain data to fight poverty, achieve universal primary education, fight HIV and foster maternal health. Its originality can further be established from the symbiotic relationship between digital preservation and electronic government, open data and the right to information.

Download Full-text

Paving the Way to Open Data

Data Intelligence ◽

10.1162/dint_a_00021 ◽

2019 ◽

Vol 1 (4) ◽

pp. 368-380 ◽

Cited By ~ 1

Author(s):

Yan Wu ◽

Elizabeth Moylan ◽

Hope Inman ◽

Chris Graf

Keyword(s):

Data Sharing ◽

Open Data ◽

Data Availability ◽

Peer Reviews ◽

Open Research ◽

Research Communities ◽

The Way

It is easy to argue that open data are critical to enabling faster and more effective research discovery. In this article, we describe the approach we have taken at Wiley to support open data and to start enabling more data to be FAIR data (Findable, Accessible, Interoperable and Reusable) with the implementation of four data policies: “Encourages”, “Expects”, “Mandates” and “Mandates and Peer Reviews Data”. We describe the rationale for these policies and levels of adoption so far. In the coming months we plan to measure and monitor the implementation of these policies via the publication of data availability statements and data citations. With this information, we'll be able to celebrate adoption of data-sharing practices by the research communities we work with and serve, and we hope to showcase researchers from those communities leading in open research.

Download Full-text

Analysis of Open Data Availability in Czech Republic Agrarian Sector

Agris on-line Papers in Economics and Informatics ◽

10.7160/aol.2016.080306 ◽

2016 ◽

Vol 08 (03) ◽

pp. 57-67 ◽

Cited By ~ 1

Author(s):

Jan Jarolímek ◽

R. Martinec

Keyword(s):

Czech Republic ◽

Open Data ◽

Data Availability ◽

Agrarian Sector

Download Full-text

The Italian Node of the European Integrated Data Archive

Seismological Research Letters ◽

10.1785/0220200409 ◽

2021 ◽

Vol 92 (3) ◽

pp. 1726-1737 ◽

Cited By ~ 1

Author(s):

Peter Danecek ◽

Stefano Pintore ◽

Salvatore Mazza ◽

Alfonso Mandiello ◽

Massimo Fares ◽

...

Keyword(s):

Service Management ◽

Open Data ◽

Data Availability ◽

Data Archive ◽

Waveform Data ◽

Data Archiving ◽

Seismological Data ◽

Fair Principles ◽

Key Aspects ◽

Management Capabilities

Abstract The Orfeus European Integrated Data Archive (EIDA) provides a federated approach to the dissemination of seismological waveform data and ensures access to 12 regional seismological data centers—the EIDA nodes. The Istituto Nazionale di Geofisica e Vulcanologia (INGV) is one of the founding partners of this EIDA federation and manages the EIDA data distribution node in Italy. INGV has actively managed the smaller MedNet archive since 1990 and adopted a more comprehensive and systematic approach to seismological data archiving since 2007. The Italian EIDA node data archive currently totals 90 TBytes of waveform data available for download, originating from 25 networks and 974 stations, provided by INGV, MedNet, or contributed by various partner institutions. Geographically, it covers mainly Italy and some stations from the Mediterranean region. The archive is currently growing at a rate of approximately 11 TB/yr. INGV recently strengthened its data management capabilities, resources, and infrastructure to effectively respond to the growing scale of station inventory, archive, and volumes of delivered data, and to acknowledge increasing attention toward open data sharing, appropriate attribution, and FAIR principles (Findability, Accessibility, Interoperability, and Reuse), as well as higher demands on data quality and expectations of the scientific user community. To this end, it established a dedicated internal unit in charge of all relevant activities related to the Italian EIDA node. In this article, we address key aspects of the EIDA node in Italy such as evolution and status of the seismological waveform archive, and we describe the technical, organizational, and operational setup of data and service management. We also outline ongoing activities and future evolutions aiming to further increase the quality of services, data availability, data and metadata quality, resilience, and sustainability.

Download Full-text