scholarly journals Data Management and Sharing for Collaborative Science: Lessons Learnt From the Euromammals Initiative

2021 ◽  
Vol 9 ◽  
Author(s):  
Ferdinando Urbano ◽  
Francesca Cagnacci ◽  

The current and future consequences of anthropogenic impacts such as climate change and habitat loss on ecosystems will be better understood and therefore addressed if diverse ecological data from multiple environmental contexts are more effectively shared. Re-use requires that data are readily available to the scientific scrutiny of the research community. A number of repositories to store shared data have emerged in different ecological domains and developments are underway to define common data and metadata standards. Nevertheless, the goal is far from being achieved and many challenges still need to be addressed. The definition of best practices for data sharing and re-use can benefit from the experience accumulated by pilot collaborative projects. The Euromammals bottom-up initiative has pioneered collaborative science in spatial animal ecology since 2007. It involves more than 150 institutes to address scientific, management and conservation questions regarding terrestrial mammal species in Europe using data stored in a shared database. In this manuscript we present some key lessons that we have learnt from the process of making shared data and knowledge accessible to researchers and we stress the importance of data management for data quality assurance. We suggest putting in place a pro-active data review before data are made available in shared repositories via robust technical support and users’ training in data management and standards. We recommend pursuing the definition of common data collection protocols, data and metadata standards, and shared vocabularies with direct involvement of the community to boost their implementation. We stress the importance of knowledge sharing, in addition to data sharing. We show the crucial relevance of collaborative networking with pro-active involvement of data providers in all stages of the scientific process. Our main message is that for data-sharing collaborative efforts to obtain substantial and durable scientific returns, the goals should not only consist in the creation of e-infrastructures and software tools but primarily in the establishment of a network and community trust. This requires moderate investment, but over long-term horizons.

2007 ◽  
Vol 4 (1) ◽  
pp. 115-131
Author(s):  
Hee-Jeong Jin ◽  
Jeong-Won Lee ◽  
Hwan-Gue Cho

Summary A microarray is a principal technology in molecular biology. It generates thousands of expressions of genotypes at once. Typically, a microarray experiment contains many kinds of information, such as gene names, sequences, expression profiles, scanned images, and annotation. So, the organization and analysis of vast amounts of data are required. Microarray LIMS (Laboratory Information Management System) provides data management, search, and basic analysis. Recently, microarray joint researches, such as the skeletal system disease and anti-cancer medicine have been widely conducted. This research requires data sharing among laboratories within the joint research group. In this paper, we introduce a web based microarray LIMS, SMILE (Small and solid MIcroarray Lims for Experimenters), especially for shared data management. The data sharing function of SMILE is based on Friend-to-Friend (F2F), which is based on anonymous P2P (Peer-to-Peer), in which people connect directly with their “friends”. It only allows its friends to exchange data directly using IP addresses or digital signatures you trust. In SMILE, there are two types of friends: “service provider”, which provides data, and “client”, which is provided with data. So, the service provider provides shared data only to its clients. SMILE provides useful functions for microarray experiments, such as variant data management, image analysis, normalization, system management, project schedule management, and shared data management. Moreover, it connections with two systems: ArrayMall for analyzing microarray images and GENAW for constructing a genetic network. SMILE is available on http://neobio.cs.pusan.ac.kr:8080/smile.


Author(s):  
Urmas Kõljalg ◽  
Kessy Abarenkov ◽  
Allan Zirk ◽  
Veljo Runnel ◽  
Timo Piirmann ◽  
...  

PlutoF online platform (https://plutof.ut.ee) is built for the management of biodiversity data. The concept is to provide a common workbench where the full data lifecycle can be managed and support seamless data sharing between single users, workgroups and institutions. Today, large and sophisticated biodiversity datasets are increasingly developed and managed by international workgroups. PlutoF's ambition is to serve such collaborative projects as well as to provide data management services to single users, museum or private collections and research institutions. Data management in PlutoF follows a logical order of the data lifecycle Fig. 1. At first, project metadata is uploaded including the project description, data management plan, participants, sampling areas, etc. Data upload and management activities then follow which is often linked to the internal data sharing. Some data analyses can be performed directly in the workbench or data can be exported in standard formats. PlutoF includes also data publishing module. Users can publish their data, generating a citable DOI without datasets leaving PlutoF workbench. PlutoF is part of the DataCite collaboration (https://datacite.org) and so far released more than 600 000 DOIs. Another option is to publish observation or collection datasets via the GBIF (Global Biodiversity Information Facility) portal. A. new feature implemented in 2019 allows users to publish High Throughput Sequencing data as taxon occurrences in GBIF. There is an additional option to send specific datasets directly to the Pensoft online journals. Ultimately, PlutoF works as a data archive which completes the data life cycle. In PlutoF users can manage different data types. Most common types include specimen and living specimen data, nucleotide sequences, human observations, material samples, taxonomic backbones and ecological data. Another important feature is that these data types can be managed as a single datasets or projects. PlutoF follows several biodiversity standards. Examples include Darwin Core, GGBN (Global Genome Biodiversity Network), EML (Ecological Metadata Language), MCL (Microbiological Common Language), and MIxS (Minimum Information about any (x) Sequence).


2021 ◽  
Vol 4 (1) ◽  
pp. 251524592092800
Author(s):  
Erin M. Buchanan ◽  
Sarah E. Crain ◽  
Ari L. Cunningham ◽  
Hannah R. Johnson ◽  
Hannah Stash ◽  
...  

As researchers embrace open and transparent data sharing, they will need to provide information about their data that effectively helps others understand their data sets’ contents. Without proper documentation, data stored in online repositories such as OSF will often be rendered unfindable and unreadable by other researchers and indexing search engines. Data dictionaries and codebooks provide a wealth of information about variables, data collection, and other important facets of a data set. This information, called metadata, provides key insights into how the data might be further used in research and facilitates search-engine indexing to reach a broader audience of interested parties. This Tutorial first explains terminology and standards relevant to data dictionaries and codebooks. Accompanying information on OSF presents a guided workflow of the entire process from source data (e.g., survey answers on Qualtrics) to an openly shared data set accompanied by a data dictionary or codebook that follows an agreed-upon standard. Finally, we discuss freely available Web applications to assist this process of ensuring that psychology data are findable, accessible, interoperable, and reusable.


2018 ◽  
Vol 55 (4) ◽  
pp. 657-696 ◽  
Author(s):  
Steve Myran ◽  
Ian Sutherland

Purpose: The purpose of this article is to reframe our field’s narrative around the science of learning. We seek to (1) describe the patterns within educational leadership and administration that are conceptually tethered to scientific management and highlight the absence of clearly defined conceptions of learning, (2) provide a synthesis of the science of learning, and (3) offer a “progressive problem shift” that promotes such a reframing. Methods: An integration of theory building methods with problem posing/identification strategies is designed to deconstruct the field of educational leadership through a science of learning lens and build toward theory that is more adaptive to our goals of leading for learning. Findings: Our findings stem from the central observation that educational leadership and administration has to date produced no conceptual or explicit operational definition of learning. Lacking such a definition, the field has been vulnerable to outlooks about learning that default to assumptions notably shaped by scientific management. This is in contrast to our review of the learning sciences literature, which emphasizes that learning is dependent on the active and deliberate agency of the learner and a host of introspective outlooks and behaviors and that these individual learning characteristics are situated within complex and dynamic social contexts that serve to mediate and shape learning. Implications and Conclusions: We argue that the future of our field rests, in large measure, on our ability to address the incongruences between our field’s foundations in scientific management and the science of learning.


2021 ◽  
Vol 9 ◽  
Author(s):  
Javad Chamanara ◽  
Jitendra Gaikwad ◽  
Roman Gerlach ◽  
Alsayed Algergawy ◽  
Andreas Ostrowski ◽  
...  

Obtaining fit-to-use data associated with diverse aspects of biodiversity, ecology and environment is challenging since often it is fragmented, sub-optimally managed and available in heterogeneous formats. Recently, with the universal acceptance of the FAIR data principles, the requirements and standards of data publications have changed substantially. Researchers are encouraged to manage the data as per the FAIR data principles and ensure that the raw data, metadata, processed data, software, codes and associated material are securely stored and the data be made available with the completion of the research. We have developed BEXIS2 as an open-source community-driven web-based research data management system to support research data management needs of mid to large-scale research projects with multiple sub-projects and up to several hundred researchers. BEXIS2 is a modular and extensible system providing a range of functions to realise the complete data lifecycle from data structure design to data collection, data discovery, dissemination, integration, quality assurance and research planning. It is an extensible and customisable system that allows for the development of new functions and customisation of its various components from database schemas to the user interface layout, elements and look and feel. During the development of BEXIS2, we aimed to incorporate key aspects of what is encoded in FAIR data principles. To investigate the extent to which BEXIS2 conforms to these principles, we conducted the self-assessment using the FAIR indicators, definitions and criteria provided in the FAIR Data Maturity Model. Even though the FAIR data maturity model is developed initially to judge the conformance of datasets, the self-assessment results indicated that BEXIS2 remarkably conforms and supports FAIR indicators. BEXIS2 strongly conforms to the indicators Findability and Accessibility. The indicator Interoperability is moderately supported as of now; however, for many of the lesssupported facets, we have concrete plans for improvement. Reusability (as defined by the FAIR data principles) is partially achieved. This paper also illustrates community deployment examples of the BEXIS2 instances as success stories to exemplify its capacity to meet the biodiversity and ecological data management needs of differently sized projects and serve as an organisational research data management system.


2015 ◽  
Vol 10 (1) ◽  
pp. 260-267 ◽  
Author(s):  
Kevin Read ◽  
Jessica Athens ◽  
Ian Lamb ◽  
Joey Nicholson ◽  
Sushan Chin ◽  
...  

A need was identified by the Department of Population Health (DPH) for an academic medical center to facilitate research using large, externally funded datasets. Barriers identified included difficulty in accessing and working with the datasets, and a lack of knowledge about institutional licenses. A need to facilitate sharing and reuse of datasets generated by researchers at the institution (internal datasets) was also recognized. The library partnered with a researcher in the DPH to create a catalog of external datasets, which provided detailed metadata and access instructions. The catalog listed researchers at the medical center and the main campus with expertise in using these external datasets in order to facilitate research and cross-campus collaboration. Data description standards were reviewed to create a set of metadata to facilitate access to both externally generated datasets, as well as the internally generated datasets that would constitute the next phase of development of the catalog. Interviews with a range of investigators at the institution identified DPH researchers as most interested in data sharing, therefore targeted outreach to this group was undertaken. Initial outreach resulted in additional external datasets being described, new local experts volunteering, proposals for additional functionality, and interest from researchers in inclusion of their internal datasets in the catalog. Despite limited outreach, the catalog has had ~250 unique page views in the three months since it went live. The establishment of the catalog also led to partnerships with the medical center’s data management core and the main university library. The Data Catalog in its present state serves a direct user need from the Department of Population Health to describe large, externally funded datasets. The library will use this initial strong community of users to expand the catalog and include internally generated research datasets. Future expansion plans will include working with DataCore and the main university library.


2021 ◽  
Author(s):  
Diana Kapiszewski ◽  
Sebastian Karcher

This chapter argues that the benefits of data sharing will accrue more quickly, and will be more significant and more enduring, if researchers make their data “meaningfully accessible.” Data are meaningfully accessible when they can be interpreted and analyzed by scholars far beyond those who generated them. Making data meaningfully accessible requires that scholars take the appropriate steps to prepare their data for sharing, and avail themselves of the increasingly sophisticated infrastructure for publishing and preserving research data. The better other researchers can understand shared data and the more researchers who can access them, the more those data will be re-used for secondary analysis, producing knowledge. Likewise, the richer an understanding an instructor and her students can gain of the shared data being used to teach and learn a particular research method, the more useful those data are for that pedagogical purpose. And the more a scholar who is evaluating the work of another can learn about the evidence that underpins its claims and conclusions, the better their ability to identify problems and biases in data generation and analysis, and the better informed and thus stronger an endorsement of the work they can offer.


2021 ◽  
Author(s):  
Kevin B Read ◽  
Heather Ganshorn ◽  
Sarah Rutley ◽  
David R. Scott

Background:As Canada increases requirements for research data management (RDM) and sharing, there is value in identifying how research data are shared, and what has been done to make them findable and reusable. This study aims to understand Canada’s data sharing landscape by reviewing how Canadian Institutes of Health Research (CIHR) funded data are shared, and comparing researchers’ data sharing practices to RDM and sharing best practices. Methods:We performed a descriptive analysis of CIHR-funded publications from PubMed and PubMed Central that were published between 1946 and Dec 31, 2019 and that indicated the research data underlying the results of the publication were shared. Each publication was analyzed to identify how and where data were shared, who shared data, and what documentation was included to support data reuse.Results:Of 4,144 CIHR-funded publications, 45.2% (n=1,876) included accessible data, 21.9% (n=909) stated data were available by request, 7.3% (n=304) stated data sharing was not applicable/possible, and we found no evidence of data sharing in 37.6% (n=1,558) of publications. Frequent data sharing methods included via a repository (n=1,549, 37.3%), within supplementary files (n=1,048, 25.2%), and by request (n=919, 22.1%). 13.1% (n=554) of publications included documentation that would facilitate data reuse.Interpretation:Our findings reveal that CIHR-funded publications largely lack the metadata, access instructions, and documentation to facilitate data discovery and reuse. Without measures to address these concerns, and enhanced support for researchers seeking to implement RDM and sharing best practices, most CIHR-funded research data will remain hidden, inaccessible, and unusable.


Sign in / Sign up

Export Citation Format

Share Document