Supporting open data: the key role of data managers

Author(s):  
Alice Fremand

<p>Open data is not a new concept. Over sixty years ago in 1959, knowledge sharing was at the heart of the Antarctic Treaty which included in article III 1c the statement: “scientific observations and results from Antarctica shall be exchanged and made freely available”. ​At a similar time, the World Data Centre (WDC) system was created to manage and distribute the data collected from the International Geophysical Year (1957-1958) led by the International Council of Science (ICSU) building the foundations of today’s research data management practices.</p><p>What about now? The WDC system still exists through the World Data System (WDS). Open data has been endorsed by a majority of funders and stakeholders. Technology has dramatically evolved. And the profession of data manager/curator has emerged. Utilising their professional expertise means that their role is far wider than the long-term curation and publication of data sets.</p><p>Data managers are involved in all stages of the data life cycle: from data management planning, data accessioning to data publication and re-use. They implement open data policies; help write data management plans and provide advice on how to manage data during, and beyond the life of, a science project. In liaison with software developers as well as scientists, they are developing new strategies to publish data either via data catalogues, via more sophisticated map-based viewer services or in machine-readable form via APIs. Often, they bring the expertise of the field they are working in to better assist scientists satisfy Findable, Accessible, Interoperable and Re-usable (FAIR) principles. Recent years have seen the development of a large community of experts that are essential to share, discuss and set new standards and procedures. The data are published to be re-used, and data managers are key to promoting high-quality datasets and participation in large data compilations.</p><p>To date, there is no magical formula for FAIR data. The Research Data Alliance is a great platform allowing data managers and researchers to work together, develop and adopt infrastructure that promotes data-sharing and data-driven research. However, the challenge to properly describe each data set remains. Today, scientists are expecting more and more from their data publication or data requests: they want interactive maps, they want more complex data systems, they want to query data, combine data from different sources and publish them rapidly.  By developing new procedures and standards, and looking at new technologies, data managers help set the foundations to data science.</p>

2020 ◽  
Vol 6 ◽  
Author(s):  
Christoph Steinbeck ◽  
Oliver Koepler ◽  
Felix Bach ◽  
Sonja Herres-Pawlis ◽  
Nicole Jung ◽  
...  

The vision of NFDI4Chem is the digitalisation of all key steps in chemical research to support scientists in their efforts to collect, store, process, analyse, disclose and re-use research data. Measures to promote Open Science and Research Data Management (RDM) in agreement with the FAIR data principles are fundamental aims of NFDI4Chem to serve the chemistry community with a holistic concept for access to research data. To this end, the overarching objective is the development and maintenance of a national research data infrastructure for the research domain of chemistry in Germany, and to enable innovative and easy to use services and novel scientific approaches based on re-use of research data. NFDI4Chem intends to represent all disciplines of chemistry in academia. We aim to collaborate closely with thematically related consortia. In the initial phase, NFDI4Chem focuses on data related to molecules and reactions including data for their experimental and theoretical characterisation. This overarching goal is achieved by working towards a number of key objectives: Key Objective 1: Establish a virtual environment of federated repositories for storing, disclosing, searching and re-using research data across distributed data sources. Connect existing data repositories and, based on a requirements analysis, establish domain-specific research data repositories for the national research community, and link them to international repositories. Key Objective 2: Initiate international community processes to establish minimum information (MI) standards for data and machine-readable metadata as well as open data standards in key areas of chemistry. Identify and recommend open data standards in key areas of chemistry, in order to support the FAIR principles for research data. Finally, develop standards, if there is a lack. Key Objective 3: Foster cultural and digital change towards Smart Laboratory Environments by promoting the use of digital tools in all stages of research and promote subsequent Research Data Management (RDM) at all levels of academia, beginning in undergraduate studies curricula. Key Objective 4: Engage with the chemistry community in Germany through a wide range of measures to create awareness for and foster the adoption of FAIR data management. Initiate processes to integrate RDM and data science into curricula. Offer a wide range of training opportunities for researchers. Key Objective 5: Explore synergies with other consortia and promote cross-cutting development within the NFDI. Key Objective 6: Provide a legally reliable framework of policies and guidelines for FAIR and open RDM.


Neuroforum ◽  
2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Michael Denker ◽  
Sonja Grün ◽  
Thomas Wachtler ◽  
Hansjörg Scherberger

Abstract Preparing a neurophysiological data set with the aim of sharing and publishing is hard. Many of the available tools and services to provide a smooth workflow for data publication are still in their maturing stages and not well integrated. Also, best practices and concrete examples of how to create a rigorous and complete package of an electrophysiology experiment are still lacking. Given the heterogeneity of the field, such unifying guidelines and processes can only be formulated together as a community effort. One of the goals of the NFDI-Neuro consortium initiative is to build such a community for systems and behavioral neuroscience. NFDI-Neuro aims to address the needs of the community to make data management easier and to tackle these challenges in collaboration with various international initiatives (e.g., INCF, EBRAINS). This will give scientists the opportunity to spend more time analyzing the wealth of electrophysiological data they leverage, rather than dealing with data formats and data integrity.


2021 ◽  
pp. 1-5
Author(s):  
Cosima Meyer

ABSTRACT This article introduces how to teach an interactive, one-semester-long statistics and programming class. The setting also can be applied to shorter and longer classes as well as introductory and advanced courses. I propose a project-based seminar that also encompasses elements of an inverted classroom. As a result of this combination, the seminar supports students’ learning progress and also creates engaging virtual classes. To demonstrate how to apply a project-based seminar setting to teaching statistics and programming classes, I use an introductory class to data wrangling and management with the statistical software program R. Students are guided through a typical data science workflow that requires data management and data wrangling and concludes with visualizing and presenting first research results during a simulated mini-conference.


2020 ◽  
Vol 2 (4) ◽  
pp. 554-568
Author(s):  
Chris Graf ◽  
Dave Flanagan ◽  
Lisa Wylie ◽  
Deirdre Silver

Data availability statements can provide useful information about how researchers actually share research data. We used unsupervised machine learning to analyze 124,000 data availability statements submitted by research authors to 176 Wiley journals between 2013 and 2019. We categorized the data availability statements, and looked at trends over time. We found expected increases in the number of data availability statements submitted over time, and marked increases that correlate with policy changes made by journals. Our open data challenge becomes to use what we have learned to present researchers with relevant and easy options that help them to share and make an impact with new research data.


Author(s):  
Liah Shonhe

The main focus of the study was to explore the practices of open data sharing in the agricultural sector, including establishing the research outputs concerning open data in agriculture. The study adopted a desktop research methodology based on literature review and bibliographic data from WoS database. Bibliometric indicators discussed include yearly productivity, most prolific authors, and enhanced countries. Study findings revealed that research activity in the field of agriculture and open access is very low. There were 36 OA articles and only 6 publications had an open data badge. Most researchers do not yet embrace the need to openly publish their data set despite the availability of numerous open data repositories. Unfortunately, most African countries are still lagging behind in management of agricultural open data. The study therefore recommends that researchers should publish their research data sets as OA. African countries need to put more efforts in establishing open data repositories and implementing the necessary policies to facilitate OA.


2019 ◽  
Vol 18 ◽  
pp. 160940691882386 ◽  
Author(s):  
Amelia Chauvette ◽  
Kara Schick-Makaroff ◽  
Anita E. Molzahn

There is a growing movement for research data to be accessed, used, and shared by multiple stakeholders for various purposes. The changing technological landscape makes it possible to digitally store data, creating opportunity to both share and reuse data anywhere in the world for later use. This movement is growing rapidly and becoming widely accepted as publicly funded agencies are mandating that researchers open their research data for sharing and reuse. While there are numerous advantages to use of open data, such as facilitating accountability and transparency, not all data are created equally. Accordingly, reusing data in qualitative research present some epistemological, methodological, legal, and ethical issues that must be addressed in the movement toward open data. We examine some of these challenges and make a case that some qualitative research data should not be reused in secondary analysis.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 292
Author(s):  
Michael Hewera ◽  
Daniel Hänggi ◽  
Björn Gerlach ◽  
Ulf Dietrich Kahlert

Reports of non-replicable research demand new methods of research data management. Electronic laboratory notebooks (ELNs) are suggested as tools to improve the documentation of research data and make them universally accessible. In a self-guided approach, we introduced the open-source ELN eLabFTW into our lab group and, after using it for a while, think it is a useful tool to overcome hurdles in ELN introduction by providing a combination of properties making it suitable for small preclinical labs, like ours. We set up our instance of eLabFTW, without any further programming needed. Our efforts to embrace open data approach by introducing an ELN fits well with other institutional organized ELN initiatives in academic research.


The 2017 SIS Conference aims to highlight the crucial role of the Statistics in Data Science. In this new domain of ‘meaning’ extracted from the data, the increasing amount of produced and available data in databases, nowadays, has brought new challenges. That involves different fields of statistics, machine learning, information and computer science, optimization, pattern recognition. These afford together a considerable contribute in the analysis of ‘Big data’, open data, relational and complex data, structured and no-structured. The interest is to collect the contributes which provide from the different domains of Statistics, in the high dimensional data quality validation, sampling extraction, dimensional reduction, pattern selection, data modelling, testing hypotheses and confirming conclusions drawn from the data.


2020 ◽  
Author(s):  
Neha Makhija ◽  
Mansi Jain ◽  
Nikolaos Tziavelis ◽  
Laura Di Rocco ◽  
Sara Di Bartolomeo ◽  
...  

Data lakes are an emerging storage paradigm that promotes data availability over integration. A prime example are repositories of Open Data which show great promise for transparent data science. Due to the lack of proper integration, Data Lakes may not have a common consistent schema and traditional data management techniques fall short with these repositories. Much recent research has tried to address the new challenges associated with these data lakes. Researchers in this area are mainly interested in the structural properties of the data for developing new algorithms, yet typical Open Data portals offer limited functionality in that respect and instead focus on data semantics.We propose Loch Prospector, a visualization to assist data management researchers in exploring and understanding the most crucial structural aspects of Open Data — in particular, metadata attributes — and the associated task abstraction for their work. Our visualization enables researchers to navigate the contents of data lakes effectively and easily accomplish what were previously laborious tasks. A copy of this paper with all supplemental material is available at osf.io/zkxv9


Sign in / Sign up

Export Citation Format

Share Document