scholarly journals Data Curation Network: A Cross-Institutional Staffing Model for Curating Research Data

2018 ◽  
Vol 13 (1) ◽  
pp. 125-140 ◽  
Author(s):  
Lisa R Johnston ◽  
Jake Carlson ◽  
Cynthia Hudson-Vitale ◽  
Heidi Imker ◽  
Wendy Kozlowski ◽  
...  

Funders increasingly require that data sets arising from sponsored research must be preserved and shared, and many publishers either require or encourage that data sets accompanying articles are made available through a publicly accessible repository. Additionally, many researchers wish to make their data available regardless of funder requirements both to enhance their impact and also to propel the concept of open science. However, the data curation activities that support these preservation and sharing activities are costly, requiring advanced curation practices, training, specific technical competencies, and relevant subject expertise. Few colleges or universities will be able to hire and sustain all of the data curation expertise locally that its researchers will require, and even those with the means to do more will benefit from a collective approach that will allow them to supplement at peak times, access specialized capacity when infrequently-curated types arise, and stabilize service levels to account for local staff transition, such as during turn-over periods. The Data Curation Network (DCN) provides a solution for partners of all sizes to develop or to supplement local curation expertise with the expertise of a resilient, distributed network, and creates a funding stream to both sustain central services and support expansion of distributed expertise over time. This paper presents our next steps for piloting the DCN, scheduled to launch in the spring of 2018 across nine partner institutions. Our implementation plan is based on planning phase research performed from 2016-2017 that monitored the types, disciplines, frequency, and curation needs of data sets passing through the curation services at the six planning phase institutions. Our DCN implementation plan includes a well-coordinated and tiered staffing model, a technology-agnostic submission workflow, standardized curation procedures, and a sustainability approach that will allow the DCN to prevail beyond the grant-supported implementation phase as a curation-as-service model.

2020 ◽  
Vol 9 (1) ◽  
Author(s):  
Elizabeth Coburn ◽  
Lisa Johnston

Objective: Data curation is becoming widely accepted as a necessary component of data sharing. Yet, as there are so many different types of data with various curation needs, the Data Curation Network (DCN) project anticipated that a collaborative approach to data curation across a network of repositories would expand what any single institution might offer alone. Now, halfway through a three-year implementation phase, we’re testing our assumptions using one year of data from the DCN. Methods: Ten institutions participated in the implementation phase of a shared staffing model for curating research data. Starting on January 1, 2019, for 12 months we tracked the number, file types, and disciplines represented in data sets submitted to the DCN. Participating curators were matched to data sets based on their self-reported curation expertise. Aspects such as curation time, level of satisfaction with the assignment, and lack of appropriate expertise in the network were tracked and analyzed. Results: Seventy-four data sets were submitted to the DCN in year one. Seventy-one of them were successfully curated by DCN curators. Each curation assignment takes 2.4 hours on average, and data sets take a median of three days to pass through the network. By analyzing the domain and file types of first- year submissions, we find that our coverage is well represented across domains and that our capacity is higher than the demand, but we also observed that the higher volume of data containing software code relied on certain curator expertise more often than others, creating potential unbalance. Conclusions: The data from year one of the DCN pilot have verified key assumptions about our collaborative approach to data curation, and these results have raised additional questions about capacity, equitable use of network resources, and sustained growth that we hope to answer by the end of this implementation phase.


2020 ◽  
Author(s):  
Annika Tjuka ◽  
Robert Forkel ◽  
Johann-Mattis List

Psychologists and linguists have collected a great diversity of data for word and concept properties. In psychology, many studies accumulate norms and ratings such as word frequencies or age-of-acquisition often for a large number of words. Linguistics, on the other hand, provides valuable insights into relations of word meanings. We present a collection of those data sets for norms, ratings, and relations that cover different languages: ‘NoRaRe.’ To enable a comparison between the diverse data types, we established workflows that facilitate the expansion of the database. A web application allows convenient access to the data (https://digling.org/norare/). Furthermore, a software API ensures consistent data curation by providing tests to validate the data sets. The NoRaRe collection is linked to the database curated by the Concepticon project (https://concepticon.clld.org) which offers a reference catalog of unified concept sets. The link between words in the data sets and the Concepticon concept sets makes a cross-linguistic comparison possible. In three case studies, we test the validity of our approach, the accuracy of our workflow, and the applicability of our database. The results indicate that the NoRaRe database can be applied for the study of word properties across multiple languages. The data can be used by psychologists and linguists to benefit from the knowledge rooted in both research disciplines.


2017 ◽  
Author(s):  
Federica Rosetta

Watch the VIDEO here.Within the Open Science discussions, the current call for “reproducibility” comes from the raising awareness that results as presented in research papers are not as easily reproducible as expected, or even contradicted those original results in some reproduction efforts. In this context, transparency and openness are seen as key components to facilitate good scientific practices, as well as scientific discovery. As a result, many funding agencies now require the deposit of research data sets, institutions improve the training on the application of statistical methods, and journals begin to mandate a high level of detail on the methods and materials used. How can researchers be supported and encouraged to provide that level of transparency? An important component is the underlying research data, which is currently often only partly available within the article. At Elsevier we have therefore been working on journal data guidelines which clearly explain to researchers when and how they are expected to make their research data available. Simultaneously, we have also developed the corresponding infrastructure to make it as easy as possible for researchers to share their data in a way that is appropriate in their field. To ensure researchers get credit for the work they do on managing and sharing data, all our journals support data citation in line with the FORCE11 data citation principles – a key step in the direction of ensuring that we address the lack of credits and incentives which emerged from the Open Data analysis (Open Data - the Researcher Perspective https://www.elsevier.com/about/open-science/research-data/open-data-report ) recently carried out by Elsevier together with CWTS. Finally, the presentation will also touch upon a number of initiatives to ensure the reproducibility of software, protocols and methods. With STAR methods, for instance, methods are submitted in a Structured, Transparent, Accessible Reporting format; this approach promotes rigor and robustness, and makes reporting easier for the author and replication easier for the reader.


2012 ◽  
pp. 862-880
Author(s):  
Russ Miller ◽  
Charles Weeks

Grids represent an emerging technology that allows geographically- and organizationally-distributed resources (e.g., computer systems, data repositories, sensors, imaging systems, and so forth) to be linked in a fashion that is transparent to the user. The New York State Grid (NYS Grid) is an integrated computational and data grid that provides access to a wide variety of resources to users from around the world. NYS Grid can be accessed via a Web portal, where the users have access to their data sets and applications, but do not need to be made aware of the details of the data storage or computational devices that are specifically employed in solving their problems. Grid-enabled versions of the SnB and BnP programs, which implement the Shake-and-Bake method of molecular structure (SnB) and substructure (BnP) determination, respectively, have been deployed on NYS Grid. Further, through the Grid Portal, SnB has been run simultaneously on all computational resources on NYS Grid as well as on more than 1100 of the over 3000 processors available through the Open Science Grid.


Curationis ◽  
1998 ◽  
Vol 21 (2) ◽  
Author(s):  
P.A. Mc Inerney

The reasons for changing from a traditional curriculum to a problem- based learning curriculum are outlined. The process used in preparing for this change is described. The planning phase made use of workshops, core committees and international workshops and visits. Preparation of the necessary resources are enumerated, as are the preparation of the human resources with which the department is affiliated. The early implementation phase describes some of the problems which were encountered and the solutions which were ascribed. Finally an informal evaluation of the first experiences of problem-based learning is presented.


2020 ◽  
Vol 43 (4) ◽  
pp. 1-23 ◽  
Author(s):  
Jessica Mozersky ◽  
Heidi Walsh ◽  
Meredith Parsons ◽  
Tristan McIntosh ◽  
Kari Baldwin ◽  
...  

Data sharing maximizes the value of data, which is time and resource intensive to collect. Major funding bodies in the United States (US), like the National Institutes of Health (NIH), require data sharing and researchers frequently share de-identified quantitative data. In contrast, qualitative data are rarely shared in the US but the increasing trend towards data sharing and open science suggest this may be required in future. Qualitative methods are often used to explore sensitive health topics raising unique ethical challenges regarding protecting confidentiality while maintaining enough contextual detail for secondary analyses. Here, we report findings from semi-structured in-depth interviews with 30 data repository curators, 30 qualitative researchers, and 30 IRB staff members to explore their experience and knowledge of QDS. Our findings indicate that all stakeholder groups lack preparedness for QDS. Researchers are the least knowledgeable and are often unfamiliar with the concept of sharing qualitative data in a repository. Curators are highly supportive of QDS, but not all have experienced curating qualitative data sets and indicated they would like guidance and standards specific to QDS. IRB members lack familiarity with QDS although they support it as long as proper legal and regulatory procedures are followed. IRB members and data curators are not prepared to advise researchers on legal and regulatory matters, potentially leaving researchers who have the least knowledge with no guidance. Ethical and productive QDS will require overcoming barriers, creating standards, and changing long held practices among all stakeholder groups.


2014 ◽  
Author(s):  
Susanta Tewari ◽  
John L Spouge

Importance sampling is widely used in coalescent theory to compute data likelihood. Efficient importance sampling requires a trial distribution close to the target distribution of the genealogies conditioned on the data. Moreover, an efficient proposal requires intuition about how the data influence the target distribution. Different proposals might work under similar conditions, and sometimes the corresponding concepts overlap extensively. Currently, there is no framework available for coalescent theory that evaluates proposals in an integrated manner. Typically, problems are not modeled, optimization is performed vigorously on limited datasets, user interaction requires thorough knowledge, and programs are not aligned with the current demands of open science. We have designed a general framework (http://coalescent.sourceforge.net) for importance sampling, to compute data likelihood under the infinite sites model of mutation. The framework models the necessary core concepts, comes integrated with several data sets of varying size, implements the standard competing proposals, and integrates tightly with our previous framework for calculating exact probabilities. The framework computes the data likelihood and provides maximum likelihood estimates of the mutation parameter. Well-known benchmarks in the coalescent literature validate the framework’s accuracy. We evaluate several proposals in the coalescent literature, to discover that the order of efficiency among three standard proposals changes when running time is considered along with the effective sample size. The framework provides an intuitive user interface with minimal clutter. For speed, the framework switches automatically to modern multicore hardware, if available. It runs on three major platforms (Windows, Mac and Linux). Extensive tests and coverage make the framework accessible to a large community.


Author(s):  
Jacob Deichmann

The presentation describes challenges and possible solutions for achieving truly accessible high-class urban public transportation based on a case from Trondheim, where a new high-class bus system was implemented. The implemented solution did not reflect the wheelchair user’s needs – despite clearly stated ambitions for accessibility. Ramboll conducted a study comprising a screening of the international market for relevant solutions, combined with interviews with representatives of Public transport authorities. The results were presented to the local user’s representatives, and some solutions tested on location. Based on this process, recommendations for short-, medium- and long-term solutions were made. The project highlights the need for involvement of sufficient professional knowledge of universal design in the planning phase as well as in the implementation phase.


2018 ◽  
Author(s):  
Diana E Kornbrot ◽  
George J Georgiou ◽  
Mike Page

Identifying the best framework for two-choice decision-making has been a goal of psychology theory for many decades (Bohil, Szalma, & Hancock, 2015; Macmillan & Creelman, 1991). There are two main candidates: the theory of signal detectability (TSD) (Swets, Tanner Jr, & Birdsall, 1961; Thurstone, 1927) based on a normal distribution/probit function, and the choice-model theory (Link, 1975; Luce, 1959) that uses the logistic distribution/logit function. A probit link function, and hence TSD, was shown to have a better Bayesian Goodness of Fit than the logit function for every one of eighteen diverse psychology data sets (Open-Science-Collaboration, 2015a), conclusions having been obtained using Generalized Linear Mixed Models (Lindstrom & Bates, 1990; Nelder & Wedderburn, 1972) . These findings are important, not only for the psychology of perceptual, cognitive and social decision-making, but for any science that use binary proportions to measure effectiveness, as well as the meta-analysis of such studies.


2007 ◽  
Vol 40 (5) ◽  
pp. 938-944 ◽  
Author(s):  
Russ Miller ◽  
Naimesh Shah ◽  
Mark L. Green ◽  
William Furey ◽  
Charles M. Weeks

Computational and data grids represent an emerging technology that allows geographically and organizationally distributed resources (e.g.computing and storage resources) to be linked and accessed in a fashion that is transparent to the user, presenting an extension of the desktop for users whose computational, data and visualization needs extend beyond their local systems. The New York State Grid is an integrated computational and data grid that provides web-based access for users from around the world to computational, application and data storage resources. This grid is used in a ubiquitous fashion, where the users have virtual access to their data sets and applications, but do not need to be made aware of the details of the data storage or computational devices that are specifically employed. Two of the applications that users worldwide have access to on a variety of grids, including the New York State Grid, are theSnBandBnPprograms, which implement theShake-and-Bakemethod of molecular structure (SnB) and substructure (BnP) determination, respectively. In particular, through our grid portal (i.e.logging on to a web site),SnBhas been run simultaneously on all computational resources on the New York State Grid as well as on more than 1100 of the over 3000 processors available through the Open Science Grid.


Sign in / Sign up

Export Citation Format

Share Document