data curation
Recently Published Documents


TOTAL DOCUMENTS

409
(FIVE YEARS 166)

H-INDEX

20
(FIVE YEARS 4)

Semantic Web ◽  
2022 ◽  
pp. 1-8
Author(s):  
Robert Forkel ◽  
Harald Hammarström

Glottocodes constitute the backbone identification system for the language, dialect and family inventory Glottolog (https://glottolog.org). In this paper, we summarize the motivation and history behind the system of glottocodes and describe the principles and practices of data curation, technical infrastructure and update/version-tracking systematics. Since our understanding of the target domain – the dialects, languages and language families of the entire world – is continually evolving, changes and updates are relatively common. The resulting data is assessed in terms of the FAIR (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship. As such the glottocode-system responds to an important challenge in the realm of Linguistic Linked Data with numerous NLP applications.


2022 ◽  
Author(s):  
Jesse Piburn ◽  
Robert Stewart ◽  
Jason Kaufman ◽  
Alexandre Sorokine ◽  
David Axley

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
SiZhe Xiao ◽  
Tsz Yan Ng ◽  
Tao T. Yang

PurposeThe purpose of this paper is to look at the journey and experience of the University of Hong Kong (HKU) Research Data Management (RDM) practice to respond to the needs of researchers in an academic library.Design/methodology/approachThe research data services (RDS) practice is based on the FAIR data principle. And the authors designed the RDM Stewardship framework to implement the RDS step by step.FindingsThe HKU Libraries developed and implemented a set of RDS under a research data stewardship framework in response to the recent evolving research needs for RDM amongst the academic communities. The services cover policy and procedure settings for research data planning, research data infrastructure establishment, data curation services and provision of online resources and instructional guidelines.Originality/value This study provides an example of an approach to respond to the needs of the academic libraries about how to start the RDS including the data policy, data repository, data librarianship and data curation.


Author(s):  
Meike Klettke ◽  
Uta Störl

AbstractData-driven methods and data science are important scientific methods in many research fields. All data science approaches require professional data engineering components. At the moment, computer science experts are needed for solving these data engineering tasks. Simultaneously, scientists from many fields (like natural sciences, medicine, environmental sciences, and engineering) want to analyse their data autonomously. The arising task for data engineering is the development of tools that can support an automated data curation and are utilisable for domain experts. In this article, we will introduce four generations of data engineering approaches classifying the data engineering technologies of the past and presence. We will show which data engineering tools are needed for the scientific landscape of the next decade.


2021 ◽  
pp. e1232

Data Soup is a collaboration between the Journal of eScience Librarianship (JeSLIB) and the Data Curation Networkto host a series of community focused webinars/discussions to exchange practices for curating research data of different formats or subject areas among data curators. The lineup of the inaugural webinar includes the following speakers and topics from the recent JeSLIB Special Issue: Data Curation in Practice: Creating Guidance for Canadian Dataverse Curators: Portage Network’s Dataverse Curation Guide Alexandra Cooper, Michael Steeleworthy, Ève Paquette-Bigras, Erin Clary, Erin MacPherson, Louise Gillis, and Jason Brodeur, https://escholarship.umassmed.edu/jeslib/vol10/iss3/2; Active Curation of Large Longitudinal Surveys: A Case Study Inna Kouper, Karen L. Tucker, Kevin Tharp, Mary Ellen van Booven, and Ashley Clark, https://doi.org/10.7191/jeslib.2021.1210; Data Curation through Catalogs: A Repository-Independent Model for Data Discovery Helenmary Sheridan, Anthony J. Dellureficio, Melissa A. Ratajeski, Sara Mannheimer, and Terrie R. Wheeler, https://doi.org/10.7191/jeslib.2021.1203.


2021 ◽  
Vol 5 ◽  
Author(s):  
Adriana E. Radulovici ◽  
Pedro E. Vieira ◽  
Sofia Duarte ◽  
Marcos A. L. Teixeira ◽  
Luisa M. S. Borges ◽  
...  

The accuracy of specimen identification through DNA barcoding and metabarcoding relies on reference libraries containing records with reliable taxonomy and sequence quality. The considerable growth in barcode data requires stringent data curation, especially in taxonomically difficult groups such as marine invertebrates. A major effort in curating marine barcode data in the Barcode of Life Data Systems (BOLD) was undertaken during the 8th International Barcode of Life Conference (Trondheim, Norway, 2019). Major taxonomic groups (crustaceans, echinoderms, molluscs, and polychaetes) were reviewed to identify those which had disagreement between Linnaean names and Barcode Index Numbers (BINs). The records with disagreement were annotated with four tags: a) MIS-ID (misidentified, mislabeled, or contaminated records), b) AMBIG (ambiguous records unresolved with the existing data), c) COMPLEX (species names occurring in multiple BINs), and d) SHARE (barcodes shared between species). A total of 83,712 specimen records corresponding to 7,576 species were reviewed and 39% of the species were tagged (7% MIS-ID, 17% AMBIG, 14% COMPLEX, and 1% SHARE). High percentages (>50%) of AMBIG tags were recorded in gastropods, whereas COMPLEX tags dominated in crustaceans and polychaetes. The high proportion of tagged species reflects either flaws in the barcoding workflow (e.g., misidentification, cross-contamination) or taxonomic difficulties (e.g., synonyms, undescribed species). Although data curation is essential for barcode applications, such manual attempts to examine large datasets are unsustainable and automated solutions are extremely desirable.


Author(s):  
Jaime C. Acosta ◽  
Stephanie Medina ◽  
Jason Ellis ◽  
Luisana Clarke ◽  
Veronica Rivas ◽  
...  

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Benjamin Murray ◽  
Eric Kerfoot ◽  
Liyuan Chen ◽  
Jie Deng ◽  
Mark S. Graham ◽  
...  

AbstractThe Covid Symptom Study, a smartphone-based surveillance study on COVID-19 symptoms in the population, is an exemplar of big data citizen science. As of May 23rd, 2021, over 5 million participants have collectively logged over 360 million self-assessment reports since its introduction in March 2020. The success of the Covid Symptom Study creates significant technical challenges around effective data curation. The primary issue is scale. The size of the dataset means that it can no longer be readily processed using standard Python-based data analytics software such as Pandas on commodity hardware. Alternative technologies exist but carry a higher technical complexity and are less accessible to many researchers. We present ExeTera, a Python-based open source software package designed to provide Pandas-like data analytics on datasets that approach terabyte scales. We present its design and capabilities, and show how it is a critical component of a data curation pipeline that enables reproducible research across an international research group for the Covid Symptom Study.


2021 ◽  
Vol 17 (2) ◽  
pp. 223-237
Author(s):  
Seno Yudhanto ◽  
Laksmi Laksmi

Introduction. This study aims to identify research data curation activities and business processes at PDDI LIPI Data Collection Methods. This research used a case study approach with interviews and observations of five informants from  June to July 2021. Data Analysis Three stages of coding are used to sort, identify, and associate categories with existing theories. Presentation of data was conducted through the description of entities from business processes based on the business process model and notation (BPMN). Results and Discussion. Four main activities of the curation process were carried out. The data owners and curators become actors in the process, and between the two requires trust, communication, and collaboration in their implementation. In addition, validation of the business processes was conducted to show that the process has been running scientifically. Conclusion. The flow of knowledge in these activities is documented in a structured manner based on mapped business procedures and processes. It is necessary to conduct periodic reviews and analyzes of time and resources. Further research should focus on human resources, policy documents, and facilities related to research data curation activities.


2021 ◽  
Vol 10 (4) ◽  
Author(s):  
Sara Mannheimer

Objective: Big social data (such as social media and blogs) and archived qualitative data (such as interview transcripts, field notebooks, and diaries) are similar, but their respective communities of practice are under-connected. This paper explores shared challenges in qualitative data reuse and big social research and identifies implications for data curation. Methods: This paper uses a broad literature search and inductive coding of 300 articles relating to qualitative data reuse and big social research. The literature review produces six key challenges relating to data use and reuse that are present in both qualitative data reuse and big social research—context, data quality, data comparability, informed consent, privacy & confidentiality, and intellectual property & data ownership. Results: This paper explores six key challenges related to data use and reuse for qualitative data and big social research and discusses their implications for data curation practices. Conclusions: Data curators can benefit from understanding these six key challenges and examining data curation implications. Data curation implications from these challenges include strategies for: providing clear documentation; linking and combining datasets; supporting trustworthy repositories; using and advocating for metadata standards; discussing alternative consent strategies with researchers and IRBs; understanding and supporting deidentification challenges; supporting restricted access for data; creating data use agreements; supporting rights management and data licensing; developing and supporting alternative archiving strategies. Considering these data curation implications will help data curators support sounder practices for both qualitative data reuse and big social research.


Sign in / Sign up

Export Citation Format

Share Document