Has open data arrived at the British Medical Journal (BMJ)? An observational study

ObjectiveTo quantify data sharing trends and data sharing policy compliance at the British Medical Journal (BMJ) by analysing the rate of data sharing practices, and investigate attitudes and examine barriers towards data sharing.DesignObservational study.SettingThe BMJ research archive.Participants160 randomly sampled BMJ research articles from 2009 to 2015, excluding meta-analysis and systematic reviews.Main outcome measuresPercentages of research articles that indicated the availability of their raw data sets in their data sharing statements, and those that easily made their data sets available on request.Results3 articles contained the data in the article. 50 out of 157 (32%) remaining articles indicated the availability of their data sets. 12 used publicly available data and the remaining 38 were sent email requests to access their data sets. Only 1 publicly available data set could be accessed and only 6 out of 38 shared their data via email. So only 7/157 research articles shared their data sets, 4.5% (95% CI 1.8% to 9%). For 21 clinical trials bound by the BMJ data sharing policy, the per cent shared was 24% (8% to 47%).ConclusionsDespite the BMJ's strong data sharing policy, sharing rates are low. Possible explanations for low data sharing rates could be: the wording of the BMJ data sharing policy, which leaves room for individual interpretation and possible loopholes; that our email requests ended up in researchers spam folders; and that researchers are not rewarded for sharing their data. It might be time for a more effective data sharing policy and better incentives for health and medical researchers to share their data.

Download Full-text

Patterns in research and data sharing for the study of form and function in caviomorph rodents

Journal of Mammalogy ◽

10.1093/jmammal/gyaa002 ◽

2020 ◽

Vol 101 (2) ◽

pp. 604-612

Author(s):

Luis D Verde Arregoitia ◽

Pablo Teta ◽

Guillermo D’Elía

Keyword(s):

Data Sharing ◽

Open Data ◽

Data Sets ◽

Ecological Data ◽

Data Set ◽

Form And Function ◽

Information Collections ◽

Phylogenetic Hypotheses ◽

Single Data ◽

And Function

Abstract The combination of morphometrics, phylogenetic comparative methods, and open data sets has renewed interest in relating morphology to adaptation and ecological opportunities. Focusing on the Caviomorpha, a well-studied mammalian group, we evaluated patterns in research and data sharing in studies relating form and function. Caviomorpha encompasses a radiation of rodents that is diverse both taxonomically and ecologically. We reviewed 41 publications investigating ecomorphology in this group. We recorded the type of data used in each study and whether these data were made available, and we re-digitized all provided data. We tracked two major lines of information: collections material examined and trait data for morphological and ecological traits. Collectively, the studies considered 63% of extant caviomorph species; all extant families and genera were represented. We found that species-level trait data rarely were provided. Specimen-level data were even less common. Morphological and ecological data were too heterogeneous and sparse to aggregate into a single data set, so we created relational tables with the data. Additionally, we concatenated all specimen lists into a single data set and standardized all relevant data for phylogenetic hypotheses and gene sequence accessions to facilitate future morphometric and phylogenetic comparative research. This work highlights the importance and ongoing use of scientific collections, and it allows for the integration of specimen information with species trait data. Recientemente ha resurgido el interés por estudiar la relación entre morfología, ecología, y adaptación. Esto se debe al desarrollo de nuevas herramientas morfométricas y filogenéticas, y al acceso a grandes bases de datos para estudios comparados. Revisamos 41 publicaciones sobre ecomorfología de roedores caviomorfos, un grupo diverso y bien estudiado, para evaluar los patrones de investigación y la transparencia para la liberación de datos. Registramos los tipos de datos que se utilizaron para cada estudio y si los datos están disponibles. Cuando estos datos se compartieron, los redigitalizamos. Nos enfocamos en los ejemplares consultados, y en datos que describen rasgos ecológicos y morfológicos para las especies estudiadas. Los estudios que revisamos abarcan el 63% de las especies de caviomorfos que actualmente existen. Encontramos que raramente fueron compartidos los datos que se tomaron para especies, y menos aún para ejemplares. Los datos morfológicos y ecológicos eran demasiado heterogéneos e exiguos para consolidar en un solo banco de datos; debido a esta circunstancia, creamos tablas relacionales con los datos. Además, enlazamos todas las listas individuales de especímenes para crear un solo banco de datos y estandarizamos todos los datos pertinentes a hipótesis filogenéticas, así como los números de acceso de secuencias genéticas, para así facilitar eventuales estudios comparados de morfometría y filogenia. Este trabajo resalta la importancia de las colecciones científicas y documenta su uso, además permitiendo la futura integración de datos derivados de ejemplares con datos sobre rasgos ecomorfológicos a nivel de especie.

Download Full-text

Getting Started Creating Data Dictionaries: How to Create a Shareable Data Set

Advances in Methods and Practices in Psychological Science ◽

10.1177/2515245920928007 ◽

2021 ◽

Vol 4 (1) ◽

pp. 251524592092800

Author(s):

Erin M. Buchanan ◽

Sarah E. Crain ◽

Ari L. Cunningham ◽

Hannah R. Johnson ◽

Hannah Stash ◽

...

Keyword(s):

Data Collection ◽

Data Sharing ◽

Search Engine ◽

Web Applications ◽

Data Sets ◽

Data Dictionary ◽

Data Set ◽

Entire Process ◽

Shared Data ◽

Source Data

As researchers embrace open and transparent data sharing, they will need to provide information about their data that effectively helps others understand their data sets’ contents. Without proper documentation, data stored in online repositories such as OSF will often be rendered unfindable and unreadable by other researchers and indexing search engines. Data dictionaries and codebooks provide a wealth of information about variables, data collection, and other important facets of a data set. This information, called metadata, provides key insights into how the data might be further used in research and facilitates search-engine indexing to reach a broader audience of interested parties. This Tutorial first explains terminology and standards relevant to data dictionaries and codebooks. Accompanying information on OSF presents a guided workflow of the entire process from source data (e.g., survey answers on Qualtrics) to an openly shared data set accompanied by a data dictionary or codebook that follows an agreed-upon standard. Finally, we discuss freely available Web applications to assist this process of ensuring that psychology data are findable, accessible, interoperable, and reusable.

Download Full-text

Characterising RDF data sets

Journal of Information Science ◽

10.1177/0165551516677945 ◽

2017 ◽

Vol 44 (2) ◽

pp. 203-229 ◽

Cited By ~ 6

Author(s):

Javier D Fernández ◽

Miguel A Martínez-Prieto ◽

Pablo de la Fuente Redondo ◽

Claudio Gutiérrez

Keyword(s):

Data Structures ◽

Large Scale ◽

Open Data ◽

Structural Features ◽

Data Sets ◽

Data Set ◽

Wide Range ◽

Rdf Data ◽

Description Framework ◽

Resource Description

The publication of semantic web data, commonly represented in Resource Description Framework (RDF), has experienced outstanding growth over the last few years. Data from all fields of knowledge are shared publicly and interconnected in active initiatives such as Linked Open Data. However, despite the increasing availability of applications managing large-scale RDF information such as RDF stores and reasoning tools, little attention has been given to the structural features emerging in real-world RDF data. Our work addresses this issue by proposing specific metrics to characterise RDF data. We specifically focus on revealing the redundancy of each data set, as well as common structural patterns. We evaluate the proposed metrics on several data sets, which cover a wide range of designs and models. Our findings provide a basis for more efficient RDF data structures, indexes and compressors.

Download Full-text

Sharing Open Data in Agriculture

Advances in Library and Information Science - Open Access Implications for Sustainable Social, Political, and Economic Development ◽

10.4018/978-1-7998-5018-2.ch013 ◽

2021 ◽

pp. 244-266

Author(s):

Liah Shonhe

Keyword(s):

Agricultural Sector ◽

Open Data ◽

Research Data ◽

Data Sets ◽

Research Activity ◽

African Countries ◽

Data Set ◽

Data Repositories ◽

Bibliographic Data ◽

Prolific Authors

The main focus of the study was to explore the practices of open data sharing in the agricultural sector, including establishing the research outputs concerning open data in agriculture. The study adopted a desktop research methodology based on literature review and bibliographic data from WoS database. Bibliometric indicators discussed include yearly productivity, most prolific authors, and enhanced countries. Study findings revealed that research activity in the field of agriculture and open access is very low. There were 36 OA articles and only 6 publications had an open data badge. Most researchers do not yet embrace the need to openly publish their data set despite the availability of numerous open data repositories. Unfortunately, most African countries are still lagging behind in management of agricultural open data. The study therefore recommends that researchers should publish their research data sets as OA. African countries need to put more efforts in establishing open data repositories and implementing the necessary policies to facilitate OA.

Download Full-text

A systematical approach to classification problems with feature space heterogeneity

Kybernetes ◽

10.1108/k-06-2018-0313 ◽

2019 ◽

Vol 48 (9) ◽

pp. 2006-2029

Author(s):

Hongshan Xiao ◽

Yu Wang

Keyword(s):

Factor Analysis ◽

Meta Analysis ◽

Feature Space ◽

Classification Performance ◽

Classification Algorithm ◽

Significant Feature ◽

Data Sets ◽

Data Set ◽

Classification Techniques ◽

Content Type

Purpose Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical diagnosis. This paper aims to study the relationship between feature space heterogeneity and classification performance. Design/methodology/approach A measurement is first developed for measuring and identifying any significant heterogeneity that exists in the feature space of a data set. The main idea of this measurement is derived from a meta-analysis. For the data set with significant feature space heterogeneity, a classification algorithm based on factor analysis and clustering is proposed to learn the data patterns, which, in turn, are used for data classification. Findings The proposed approach has two main advantages over the previous methods. The first advantage lies in feature transform using orthogonal factor analysis, which results in new features without redundancy and irrelevance. The second advantage rests on samples partitioning to capture the feature space heterogeneity reflected by differences of factor scores. The validity and effectiveness of the proposed approach is verified on a number of benchmarking data sets. Research limitations/implications Measurement should be used to guide the heterogeneity elimination process, which is an interesting topic in future research. In addition, to develop a classification algorithm that enables scalable and incremental learning for large data sets with significant feature space heterogeneity is also an important issue. Practical implications Measuring and eliminating the feature space heterogeneity possibly existing in the data are important for accurate classification. This study provides a systematical approach to feature space heterogeneity measurement and elimination for better classification performance, which is favorable for applications of classification techniques in real-word problems. Originality/value A measurement based on meta-analysis for measuring and identifying any significant feature space heterogeneity in a classification problem is developed, and an ensemble classification framework is proposed to deal with the feature space heterogeneity and improve the classification accuracy.

Download Full-text

Open-Source Data Collection and Data Sets for Activity Recognition in Smart Homes

Sensors ◽

10.3390/s20030879 ◽

2020 ◽

Vol 20 (3) ◽

pp. 879 ◽

Cited By ~ 2

Author(s):

Uwe Köckemann ◽

Marjan Alirezaie ◽

Jennifer Renoux ◽

Nicolas Tsiftes ◽

Mobyen Uddin Ahmed ◽

...

Keyword(s):

Data Collection ◽

Activity Recognition ◽

Care Home ◽

Open Data ◽

Ground Truth ◽

Smart Homes ◽

Sensor Data ◽

Data Sets ◽

Data Set ◽

Home Setting

As research in smart homes and activity recognition is increasing, it is of ever increasing importance to have benchmarks systems and data upon which researchers can compare methods. While synthetic data can be useful for certain method developments, real data sets that are open and shared are equally as important. This paper presents the E-care@home system, its installation in a real home setting, and a series of data sets that were collected using the E-care@home system. Our first contribution, the E-care@home system, is a collection of software modules for data collection, labeling, and various reasoning tasks such as activity recognition, person counting, and configuration planning. It supports a heterogeneous set of sensors that can be extended easily and connects collected sensor data to higher-level Artificial Intelligence (AI) reasoning modules. Our second contribution is a series of open data sets which can be used to recognize activities of daily living. In addition to these data sets, we describe the technical infrastructure that we have developed to collect the data and the physical environment. Each data set is annotated with ground-truth information, making it relevant for researchers interested in benchmarking different algorithms for activity recognition.

Download Full-text

Data-sharing recommendations in biomedical journals and randomised controlled trials: an audit of journals following the ICMJE recommendations

BMJ Open ◽

10.1136/bmjopen-2020-038887 ◽

2020 ◽

Vol 10 (5) ◽

pp. e038887

Author(s):

Maximilian Siebert ◽

Jeanne Fabiola Gaba ◽

Laura Caquelin ◽

Henri Gouraud ◽

Alain Dupuy ◽

...

Keyword(s):

Medical Journal ◽

Data Sharing ◽

Randomised Controlled Trials ◽

Primary Outcome ◽

Controlled Trials ◽

Cross Sectional Survey ◽

Eligibility Criteria ◽

Cross Sectional ◽

Data Set ◽

Randomised Controlled

ObjectiveTo explore the implementation of the International Committee of Medical Journal Editors (ICMJE) data-sharing policy which came into force on 1 July 2018 by ICMJE-member journals and by ICMJE-affiliated journals declaring they follow the ICMJE recommendations.DesignA cross-sectional survey of data-sharing policies in 2018 on journal websites and in data-sharing statements in randomised controlled trials (RCTs).SettingICMJE website; PubMed/Medline.Eligibility criteriaICMJE-member journals and 489 ICMJE-affiliated journals that published an RCT in 2018, had an accessible online website and were not considered as predatory journals according to Beall’s list. One hundred RCTs for member journals and 100 RCTs for affiliated journals with a data-sharing policy, submitted after 1 July 2018.Main outcome measuresThe primary outcome for the policies was the existence of a data-sharing policy (explicit data-sharing policy, no data-sharing policy, policy merely referring to ICMJE recommendations) as reported on the journal website, especially in the instructions for authors. For RCTs, our primary outcome was the intention to share individual participant data set out in the data-sharing statement.ResultsEight (out of 14; 57%) member journals had an explicit data-sharing policy on their website (three were more stringent than the ICMJE requirements, one was less demanding and four were compliant), five (35%) additional journals stated that they followed the ICMJE requirements, and one (8%) had no policy online. In RCTs published in these journals, there were data-sharing statements in 98 out of 100, with expressed intention to share individual patient data reaching 77 out of 100 (77%; 95% CI 67% to 85%). One hundred and forty-five (out of 489) ICMJE-affiliated journals (30%; 26% to 34%) had an explicit data-sharing policy on their website (11 were more stringent than the ICMJE requirements, 85 were less demanding and 49 were compliant) and 276 (56%; 52% to 61%) merely referred to the ICMJE requirements. In RCTs published in affiliated journals with an explicit data-sharing policy, data-sharing statements were rare (25%), and expressed intentions to share data were found in 22% (15% to 32%).ConclusionThe implementation of ICMJE data-sharing requirements in online journal policies was suboptimal for ICMJE-member journals and poor for ICMJE-affiliated journals. The implementation of the policy was good in member journals and of concern for affiliated journals. We suggest the conduct of continuous audits of medical journal data-sharing policies in the future.RegistrationThe protocol was registered before the start of the research on the Open Science Framework (https://osf.io/n6whd/).

Download Full-text

Artificial intelligence in oral and maxillofacial radiology: what is currently possible?

Dentomaxillofacial Radiology ◽

10.1259/dmfr.20200375 ◽

2020 ◽

pp. 20200375

Author(s):

Min-Suk Heo ◽

Jo-Eun Kim ◽

Jae-Joon Hwang ◽

Sang-Sun Han ◽

Jin-Soo Kim ◽

...

Keyword(s):

Artificial Intelligence ◽

Open Data ◽

Data Sets ◽

Radiographic Images ◽

Data Set ◽

Actual Clinical Practice ◽

Area Of Interest ◽

The Future ◽

Treatment Plans ◽

Image Quality Improvement

Artificial intelligence, which has been actively applied in a broad range of industries in recent years, is an active area of interest for many researchers. Dentistry is no exception to this trend, and the applications of artificial intelligence are particularly promising in the field of oral and maxillofacial (OMF) radiology. Recent researches on artificial intelligence in OMF radiology have mainly used convolutional neural networks, which can perform image classification, detection, segmentation, registration, generation, and refinement. Artificial intelligence systems in this field have been developed for the purposes of radiographic diagnosis, image analysis, forensic dentistry, and image quality improvement. Tremendous amounts of data are needed to achieve good results, and involvement of OMF radiologist is essential for making accurate and consistent data sets, which is a time-consuming task. In order to widely use artificial intelligence in actual clinical practice in the future, there are lots of problems to be solved, such as building up a huge amount of fine-labeled open data set, understanding of the judgment criteria of artificial intelligence, and DICOM hacking threats using artificial intelligence. If solutions to these problems are presented with the development of artificial intelligence, artificial intelligence will develop further in the future and is expected to play an important role in the development of automatic diagnosis systems, the establishment of treatment plans, and the fabrication of treatment tools. OMF radiologists, as professionals who thoroughly understand the characteristics of radiographic images, will play a very important role in the development of artificial intelligence applications in this field.

Download Full-text