scholarly journals How to automatically document data with the codebook package to facilitate data re-use

Author(s):  
Ruben C. Arslan

Data documentation in psychology lags behind not only many other disciplines, but also basic standards of usefulness. Psychological scientists often prefer to invest the time and effort necessary to document existing data well into other duties such as writing and collecting more data. Codebooks therefore tend to be unstandardised and stored in proprietary formats, and are rarely properly indexed in search engines. This means that rich datasets are sometimes used only once—by their creators—and left to disappear into oblivion; even if they can find it, researchers are unlikely to publish analyses based on existing datasets if they cannot be confident they understand them well enough. My codebook package makes it easier to generate rich metadata in human- and machine-readable codebooks. By using metadata from existing sources and by automating some tedious tasks such as documenting psychological scales and reliabilities, summarising descriptives, and identifying missingness patterns, I aim to encourage researchers to use the package for their own or their team's benefit. The codebook R package and web app make it possible to generate rich codebooks in a few minutes and just three clicks. Over time, this could lead to psychological data becoming findable, accessible, interoperable, and reusable, and to reduced research waste, thereby benefiting the scientific community as a whole.

2019 ◽  
Vol 2 (2) ◽  
pp. 169-187 ◽  
Author(s):  
Ruben C. Arslan

Data documentation in psychology lags behind not only many other disciplines, but also basic standards of usefulness. Psychological scientists often prefer to invest the time and effort that would be necessary to document existing data well in other duties, such as writing and collecting more data. Codebooks therefore tend to be unstandardized and stored in proprietary formats, and they are rarely properly indexed in search engines. This means that rich data sets are sometimes used only once—by their creators—and left to disappear into oblivion. Even if they can find an existing data set, researchers are unlikely to publish analyses based on it if they cannot be confident that they understand it well enough. My codebook package makes it easier to generate rich metadata in human- and machine-readable codebooks. It uses metadata from existing sources and automates some tedious tasks, such as documenting psychological scales and reliabilities, summarizing descriptive statistics, and identifying patterns of missingness. The codebook R package and Web app make it possible to generate a rich codebook in a few minutes and just three clicks. Over time, its use could lead to psychological data becoming findable, accessible, interoperable, and reusable, thereby reducing research waste and benefiting both its users and the scientific community as a whole.


2020 ◽  
Author(s):  
Jeff W. Atkins ◽  
Elizabeth Agee ◽  
Alexandra Barry ◽  
Kyla M. Dahlin ◽  
Kalyn Dorheim ◽  
...  

Abstract. The fortedata R package is an open data notebook from the Forest Resilience Threshold Experiment (FoRTE) – a modeling and manipulative field experiment that tests the effects of disturbance severity and disturbance type on carbon cycling dynamics in a temperate forest. Package data consists of measurements of carbon pools and fluxes and ancillary measurements to help users analyse and interpret carbon cycling over time. Currently the package includes data and metadata from the first two years of FoRTE, and serves as a central, updatable resource for the FoRTE project team and is intended as a resource for external users over the course of the experiment and in perpetuity. Further, it supports all associated FoRTE publications, analyses, and modeling efforts. This increases efficiency, consistency, compatibility, and productivity, while minimizing duplicated effort and error propagation that can arise as a function of a large, distributed and collaborative effort. More broadly, fortedata represents an innovative, collaborative way of approaching science that unites and expedites the delivery of complementary datasets in near real time to the broader scientific community, increasing transparency and reproducibility of taxpayer-funded science. fortedata is available via GitHub: https://github.com/FoRTExperiment/fortedata and detailed documentation on the access, used, and applications of fortedata are available at: https://fortexperiment.github.io/fortedata/. The first public release, version 1.0.1 is also archived at: https://doi.org/10.5281/zenodo.3936146 (Atkins et al., 2020b). All level one data products are also available outside of the package as .csv files: https://doi.org/10.6084/m9.figshare.12292490.v3 (Atkins et al. 2020c).


2021 ◽  
Vol 13 (3) ◽  
pp. 943-952
Author(s):  
Jeff W. Atkins ◽  
Elizabeth Agee ◽  
Alexandra Barry ◽  
Kyla M. Dahlin ◽  
Kalyn Dorheim ◽  
...  

Abstract. The fortedata R package is an open data notebook from the Forest Resilience Threshold Experiment (FoRTE) – a modeling and manipulative field experiment that tests the effects of disturbance severity and disturbance type on carbon cycling dynamics in a temperate forest. Package data consist of measurements of carbon pools and fluxes and ancillary measurements to help analyze and interpret carbon cycling over time. Currently the package includes data and metadata from the first three FoRTE field seasons, serves as a central, updatable resource for the FoRTE project team, and is intended as a resource for external users over the course of the experiment and in perpetuity. Further, it supports all associated FoRTE publications, analyses, and modeling efforts. This increases efficiency, consistency, compatibility, and productivity while minimizing duplicated effort and error propagation that can arise as a function of a large, distributed and collaborative effort. More broadly, fortedata represents an innovative, collaborative way of approaching science that unites and expedites the delivery of complementary datasets to the broader scientific community, increasing transparency and reproducibility of taxpayer-funded science. The fortedata package is available via GitHub: https://github.com/FoRTExperiment/fortedata (last access: 19 February 2021), and detailed documentation on the access, used, and applications of fortedata are available at https://fortexperiment.github.io/fortedata/ (last access: 19 February 2021). The first public release, version 1.0.1 is also archived at https://doi.org/10.5281/zenodo.4399601 (Atkins et al., 2020b).  All data products are also available outside of the package as .csv files: https://doi.org/10.6084/m9.figshare.13499148.v1 (Atkins et al., 2020c).


2020 ◽  
Vol 2020 (11) ◽  
pp. 267-1-267-8
Author(s):  
Mitchell J.P. van Zuijlen ◽  
Sylvia C. Pont ◽  
Maarten W.A. Wijntjes

The human face is a popular motif in art and depictions of faces can be found throughout history in nearly every culture. Artists have mastered the depiction of faces after employing careful experimentation using the relatively limited means of paints and oils. Many of the results of these experimentations are now available to the scientific domain due to the digitization of large art collections. In this paper we study the depiction of the face throughout history. We used an automated facial detection network to detect a set of 11,659 faces in 15,534 predominately western artworks, from 6 international, digitized art galleries. We analyzed the pose and color of these faces and related those to changes over time and gender differences. We find a number of previously known conventions, such as the convention of depicting the left cheek for females and vice versa for males, as well as unknown conventions, such as the convention of females to be depicted looking slightly down. Our set of faces will be released to the scientific community for further study.


2021 ◽  
pp. 193229682110289
Author(s):  
Evan Olawsky ◽  
Yuan Zhang ◽  
Lynn E Eberly ◽  
Erika S Helgeson ◽  
Lisa S Chow

Background: With the development of continuous glucose monitoring systems (CGMS), detailed glycemic data are now available for analysis. Yet analysis of this data-rich information can be formidable. The power of CGMS-derived data lies in its characterization of glycemic variability. In contrast, many standard glycemic measures like hemoglobin A1c (HbA1c) and self-monitored blood glucose inadequately describe glycemic variability and run the risk of bias toward overreporting hyperglycemia. Methods that adjust for this bias are often overlooked in clinical research due to difficulty of computation and lack of accessible analysis tools. Methods: In response, we have developed a new R package rGV, which calculates a suite of 16 glycemic variability metrics when provided a single individual’s CGM data. rGV is versatile and robust; it is capable of handling data of many formats from many sensor types. We also created a companion R Shiny web app that provides these glycemic variability analysis tools without prior knowledge of R coding. We analyzed the statistical reliability of all the glycemic variability metrics included in rGV and illustrate the clinical utility of rGV by analyzing CGM data from three studies. Results: In subjects without diabetes, greater glycemic variability was associated with higher HbA1c values. In patients with type 2 diabetes mellitus (T2DM), we found that high glucose is the primary driver of glycemic variability. In patients with type 1 diabetes (T1DM), we found that naltrexone use may potentially reduce glycemic variability. Conclusions: We present a new R package and accompanying web app to facilitate quick and easy computation of a suite of glycemic variability metrics.


GigaScience ◽  
2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Jon Ison ◽  
Hans Ienasescu ◽  
Emil Rydza ◽  
Piotr Chmura ◽  
Kristoffer Rapacki ◽  
...  

Abstract Background Life scientists routinely face massive and heterogeneous data analysis tasks and must find and access the most suitable databases or software in a jungle of web-accessible resources. The diversity of information used to describe life-scientific digital resources presents an obstacle to their utilization. Although several standardization efforts are emerging, no information schema has been sufficiently detailed to enable uniform semantic and syntactic description—and cataloguing—of bioinformatics resources. Findings Here we describe biotoolsSchema, a formalized information model that balances the needs of conciseness for rapid adoption against the provision of rich technical information and scientific context. biotoolsSchema results from a series of community-driven workshops and is deployed in the bio.tools registry, providing the scientific community with >17,000 machine-readable and human-understandable descriptions of software and other digital life-science resources. We compare our approach to related initiatives and provide alignments to foster interoperability and reusability. Conclusions biotoolsSchema supports the formalized, rigorous, and consistent specification of the syntax and semantics of bioinformatics resources, and enables cataloguing efforts such as bio.tools that help scientists to find, comprehend, and compare resources. The use of biotoolsSchema in bio.tools promotes the FAIRness of research software, a key element of open and reproducible developments for data-intensive sciences.


2021 ◽  
Vol 8 (3) ◽  
pp. 135-142
Author(s):  
Shamsulhadi Bandi

An assessment of IJBES's performance since 2015 was presented in this communication using metrics data from Clarivate and the OJS Report Generator. Raw data were analyzed for the purpose of reporting to readers on the journal's performance using performance metrics available to the editor. Key performance metrics such as submissions, acceptance and rejection rates, and citation trends over time were reported and presented to the reader. It has been observed that ensuring balanced content and continuously working on a niche are among the priorities of the journal. It is also necessary to attract relevant and quality manuscripts among the authors to increase citations in other publications. Despite everything, the journal, which is relatively young, was able to withstand the initial test of time and improve its visibility in the scientific community.


2020 ◽  
Author(s):  
Daniel Lakens ◽  
Lisa Marie DeBruine

Making scientific information machine-readable greatly facilitates its re-use. Many scientific articles have the goal to test a hypothesis, so making the tests of statistical predictions easier to find and access could be very beneficial. We propose an approach that can be used to make hypothesis tests machine readable. We believe there are two benefits to specifying a hypothesis test in a way that a computer can evaluate whether the statistical prediction is corroborated or not. First, hypothesis test will become more transparent, falsifiable, and rigorous. Second, scientists will benefit if information related to hypothesis tests in scientific articles is easily findable and re-usable, for example when performing meta-analyses, during peer review, and when examining meta-scientific research questions. We examine what a machine readable hypothesis test should look like, and demonstrate the feasibility of machine readable hypothesis tests in a real-life example using the fully operational prototype R package scienceverse.


Daedalus ◽  
2015 ◽  
Vol 144 (1) ◽  
pp. 67-82 ◽  
Author(s):  
Brendon O. Watson ◽  
György Buzsáki

Sleep occupies roughly one-third of our lives, yet the scientific community is still not entirely clear on its purpose or function. Existing data point most strongly to its role in memory and homeostasis: that sleep helps maintain basic brain functioning via a homeostatic mechanism that loosens connections between overworked synapses, and that sleep helps consolidate and re-form important memories. In this review, we will summarize these theories, but also focus on substantial new information regarding the relation of electrical brain rhythms to sleep. In particular, while REM sleep may contribute to the homeostatic weakening of overactive synapses, a prominent and transient oscillatory rhythm called “sharp-wave ripple” seems to allow for consolidation of behaviorally relevant memories across many structures of the brain. We propose that a theory of sleep involving the division of labor between two states of sleep–REM and non-REM, the latter of which has an abundance of ripple electrical activity–might allow for a fusion of the two main sleep theories. This theory then postulates that sleep performs a combination of consolidation and homeostasis that promotes optimal knowledge retention as well as optimal waking brain function.


2019 ◽  
Vol 5 (Supplement_1) ◽  
Author(s):  
Chenlu Di ◽  
Andreas L S Meyer ◽  
John J Wiens

Abstract The diversity of life is shaped by rates of speciation and extinction, and so estimating these rates correctly is crucial for understanding diversity patterns among clades, regions, and habitats. In 2011, Morlon and collaborators developed a promising likelihood-based approach to estimate speciation and extinction and to infer the model describing how these rates change over time based on AICc. This approach is now implemented in an R package (RPANDA). Here, we test the accuracy of this approach under simulated conditions, to evaluate its ability to correctly estimate rates of speciation, extinction, and diversification (speciation—extinction) and to choose the correct underlying model of diversification (e.g. constant or changing rates of speciation and extinction over time). We found that this likelihood-based approach frequently picked the incorrect model. For example, with changing speciation rates over time, the correct model was chosen in only ∼10 per cent of replicates. There were significant relationships between true and estimated speciation rates using this approach, but relationships were weak when speciation rates were constant within clades. Relationships were consistently weak between true and estimated rates of extinction and of diversification. Overall, we suggest that results from this approach should be interpreted with considerable caution.


Sign in / Sign up

Export Citation Format

Share Document