Global Research on Coronaviruses: An R Package

Background In these trying times, we developed an R package about bibliographic references on coronaviruses. Working with reproducible research principles based on open science, disseminating scientific information, providing easy access to scientific production on this particular issue, and offering a rapid integration in researchers’ workflows may help save time in this race against the virus, notably in terms of public health. Objective The goal is to simplify the workflow of interested researchers, with multidisciplinary research in mind. With more than 60,500 medical bibliographic references at the time of publication, this package is among the largest about coronaviruses. Methods This package could be of interest to epidemiologists, researchers in scientometrics, biostatisticians, as well as data scientists broadly defined. This package collects references from PubMed and organizes the data in a data frame. We then built functions to sort through this collection of references. Researchers can also integrate the data into their pipeline and implement them in R within their code libraries. Results We provide a short use case in this paper based on a bibliometric analysis of the references made available by this package. Classification techniques can also be used to go through the large volume of references and allow researchers to save time on this part of their research. Network analysis can be used to filter the data set. Text mining techniques can also help researchers calculate similarity indices and help them focus on the parts of the literature that are relevant for their research. Conclusions This package aims at accelerating research on coronaviruses. Epidemiologists can integrate this package into their workflow. It is also possible to add a machine learning layer on top of this package to model the latest advances in research about coronaviruses, as we update this package daily. It is also the only one of this size, to the best of our knowledge, to be built in the R language.

Download Full-text

Global Research on Coronaviruses: An R Package (Preprint)

10.2196/preprints.19615 ◽

2020 ◽

Author(s):

Thierry Warin

Keyword(s):

Scientific Information ◽

R Package ◽

Open Science ◽

Research Network ◽

Reproducible Research ◽

Easy Access ◽

Multidisciplinary Research ◽

Data Set ◽

R Language ◽

Similarity Indices

BACKGROUND In these trying times, we developed an R package about bibliographic references on coronaviruses. Working with reproducible research principles based on open science, disseminating scientific information, providing easy access to scientific production on this particular issue, and offering a rapid integration in researchers’ workflows may help save time in this race against the virus, notably in terms of public health. OBJECTIVE The goal is to simplify the workflow of interested researchers, with multidisciplinary research in mind. With more than 60,500 medical bibliographic references at the time of publication, this package is among the largest about coronaviruses. METHODS This package could be of interest to epidemiologists, researchers in scientometrics, biostatisticians, as well as data scientists broadly defined. This package collects references from PubMed and organizes the data in a data frame. We then built functions to sort through this collection of references. Researchers can also integrate the data into their pipeline and implement them in R within their code libraries. RESULTS We provide a short use case in this paper based on a bibliometric analysis of the references made available by this package. Classification techniques can also be used to go through the large volume of references and allow researchers to save time on this part of their research. Network analysis can be used to filter the data set. Text mining techniques can also help researchers calculate similarity indices and help them focus on the parts of the literature that are relevant for their research. CONCLUSIONS This package aims at accelerating research on coronaviruses. Epidemiologists can integrate this package into their workflow. It is also possible to add a machine learning layer on top of this package to model the latest advances in research about coronaviruses, as we update this package daily. It is also the only one of this size, to the best of our knowledge, to be built in the R language.

Download Full-text

ppx: Programmatic access to proteomics data repositories

10.1101/2021.05.29.446304 ◽

2021 ◽

Author(s):

William E Fondrie ◽

Wout Bittremieux ◽

William S Noble

Keyword(s):

Mass Spectrometry ◽

Open Science ◽

Mass Spectrometry Data ◽

Reproducible Research ◽

Easy Access ◽

Proteomics Data ◽

Data Repositories ◽

Access To Data ◽

Python Package ◽

Programmatic Access

The volume of proteomics and mass spectrometry data available in public repositories continues to grow at a rapid pace as more researchers embrace open science practices. Open access to the data behind scientific discoveries has become critical to validate published findings and develop new computational tools. Here, we present ppx, a Python package that provides easy, programmatic access to the data stored in ProteomeXchange repositories, such as PRIDE and MassIVE. The ppx package can either be used as a command line tool or a Python package to retrieve the files and metadata associated with a project when provided its identifier. To demonstrate how ppx enhances reproducible research, we used ppx within a Snakemake workflow to reanalyze a published dataset with the open modification search tool ANN-SoLo and compared our reanalysis to the original results. We show that ppx readily integrates into workflows and our reanalysis produced results consistent with the original analysis. We envision that ppx will be a valuable tool for creating reproducible analyses, providing tool developers easy access to data for development, testing, and benchmarking, and enabling the use of mass spectrometry data in data-intensive analyses. The ppx package is freely available and open source under the MIT license at: https://github.com/wfondrie/ppx

Download Full-text

cchsflow: an open science approach to transform and combine population health surveys

Can J Public Health ◽

10.17269/s41997-020-00470-8 ◽

2021 ◽

Author(s):

Warsame Yusuf ◽

Rostyslav Vyuha ◽

Carol Bennett ◽

Yulric Sequeira ◽

Courtney Maskerine ◽

...

Keyword(s):

Population Health ◽

Health Surveys ◽

R Package ◽

Canadian Community Health Survey ◽

Open Science ◽

Cross Sectional ◽

Data Set ◽

Science Approach ◽

Combine Population ◽

The Many

Abstract Setting The Canadian Community Health Survey (CCHS) is one of the world’s largest ongoing cross-sectional population health surveys, with over 130,000 respondents every two years or over 1.1 million respondents since its inception in 2001. While the survey remains relatively consistent over the years, there are differences between cycles that pose a challenge to analyze the survey over time. Intervention A program package called cchsflow was developed to transform and harmonize CCHS variables to consistent formats across multiple survey cycles. An open science approach was used to maintain transparency, reproducibility and collaboration. Outcomes The cchsflow R package uses CCHS survey data between 2001 and 2014. Worksheets were created that identify variables, their names in previous cycles, their category structure, and their final variable names. These worksheets were then used to recode variables in each CCHS cycle into consistently named and labelled variables. Following, survey cycles can be combined. The package was then added as a GitHub repository to encourage collaboration with other researchers. Implication The cchsflow package has been added to the Comprehensive R Archive Network (CRAN) and contains support for over 160 CCHS variables, generating a combined data set of over 1 million respondents. By implementing open science practices, cchsflow aims to minimize the amount of time needed to clean and prepare data for the many CCHS users across Canada.

Download Full-text

WORCS: A workflow for open reproducible code in science

Data Science ◽

10.3233/ds-210031 ◽

2021 ◽

pp. 1-21

Author(s):

Caspar J. Van Lissa ◽

Andreas M. Brandmaier ◽

Loek Brinkman ◽

Anna-Lena Lamprecht ◽

Aaron Peikert ◽

...

Keyword(s):

Best Practices ◽

Source Code ◽

R Package ◽

Open Science ◽

Research Projects ◽

Tabular Data ◽

Step Procedure ◽

Starting Point ◽

Conducting Research ◽

And Training

Adopting open science principles can be challenging, requiring conceptual education and training in the use of new tools. This paper introduces the Workflow for Open Reproducible Code in Science (WORCS): A step-by-step procedure that researchers can follow to make a research project open and reproducible. This workflow intends to lower the threshold for adoption of open science principles. It is based on established best practices, and can be used either in parallel to, or in absence of, top-down requirements by journals, institutions, and funding bodies. To facilitate widespread adoption, the WORCS principles have been implemented in the R package worcs, which offers an RStudio project template and utility functions for specific workflow steps. This paper introduces the conceptual workflow, discusses how it meets different standards for open science, and addresses the functionality provided by the R implementation, worcs. This paper is primarily targeted towards scholars conducting research projects in R, conducting research that involves academic prose, analysis code, and tabular data. However, the workflow is flexible enough to accommodate other scenarios, and offers a starting point for customized solutions. The source code for the R package and manuscript, and a list of examplesof WORCS projects, are available at https://github.com/cjvanlissa/worcs.

Download Full-text

Metastatic heterogeneity of the consensus molecular subtypes of colorectal cancer

npj Genomic Medicine ◽

10.1038/s41525-021-00223-7 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Peter W. Eide ◽

Seyed H. Moosavi ◽

Ina A. Eilertsen ◽

Tuva H. Brunsell ◽

Jonas Langerud ◽

...

Keyword(s):

Gene Expression ◽

Colorectal Cancer ◽

Principal Components ◽

Prognostic Value ◽

Tumor Heterogeneity ◽

Molecular Subtypes ◽

R Package ◽

Data Set ◽

Primary Tumors ◽

External Data

AbstractGene expression-based subtypes of colorectal cancer have clinical relevance, but the representativeness of primary tumors and the consensus molecular subtypes (CMS) for metastatic cancers is not well known. We investigated the metastatic heterogeneity of CMS. The best approach to subtype translation was delineated by comparisons of transcriptomic profiles from 317 primary tumors and 295 liver metastases, including multi-metastatic samples from 45 patients and 14 primary-metastasis sets. Associations were validated in an external data set (n = 618). Projection of metastases onto principal components of primary tumors showed that metastases were depleted of CMS1-immune/CMS3-metabolic signals, enriched for CMS4-mesenchymal/stromal signals, and heavily influenced by the microenvironment. The tailored CMS classifier (available in an updated version of the R package CMScaller) therefore implemented an approach to regress out the liver tissue background. The majority of classified metastases were either CMS2 or CMS4. Nonetheless, subtype switching and inter-metastatic CMS heterogeneity were frequent and increased with sampling intensity. Poor-prognostic value of CMS1/3 metastases was consistent in the context of intra-patient tumor heterogeneity.

Download Full-text

Mapping the Spatial-Temporal Dynamics of Vegetation Response Lag to Drought in a Semi-Arid Region

Remote Sensing ◽

10.3390/rs11161873 ◽

2019 ◽

Vol 11 (16) ◽

pp. 1873 ◽

Cited By ~ 6

Author(s):

Li Hua ◽

Huidong Wang ◽

Haigang Sui ◽

Brian Wardlow ◽

Michael J. Hayes ◽

...

Keyword(s):

Vegetation Index ◽

Temporal Dynamics ◽

Risk Mitigation ◽

Scientific Information ◽

Gaussian Function ◽

Drought Risk ◽

Severe Drought ◽

Similar Response ◽

Vegetation Types ◽

Data Set

Drought, as an extreme climate event, affects the ecological environment for vegetation and agricultural production. Studies of the vegetative response to drought are paramount to providing scientific information for drought risk mitigation. In this paper, the spatial-temporal pattern of drought and the response lag of vegetation in Nebraska were analyzed from 2000 to 2015. Based on the long-term Daymet data set, the standard precipitation index (SPI) was computed to identify precipitation anomalies, and the Gaussian function was applied to obtain temperature anomalies. Vegetation anomaly was identified by dynamic time warping technique using a remote sensing Normalized Difference Vegetation Index (NDVI) time series. Finally, multilayer correlation analysis was applied to obtain the response lag of different vegetation types. The results show that Nebraska suffered severe drought events in 2002 and 2012. The response lag of vegetation to drought typically ranged from 30 to 45 days varying for different vegetation types and human activities (water use and management). Grasslands had the shortest response lag (~35 days), while forests had the longest lag period (~48 days). For specific crop types, the response lag of winter wheat varied among different regions of Nebraska (35–45 days), while soybeans, corn and alfalfa had similar response lag times of approximately 40 days.

Download Full-text

The 2018 Colombian Military Academy dataset

Revista Científica General José María Córdova ◽

10.21830/19006586.345 ◽

2018 ◽

Vol 16 (23) ◽

pp. 147-162

Author(s):

Andres Eduardo Fernandez-Osorio ◽

Edna Jackeline Latorre Rojas ◽

Nayiver Mayorga Zarta

Keyword(s):

Scientific Information ◽

Military Academy ◽

Sociological Study ◽

Data Set ◽

Women In The Military ◽

Social Patterns ◽

Civil Military Relations ◽

Perceptions And Attitudes ◽

Military Relations ◽

The Military

This article presents a data set of the population of military students, resulting from a sociological study completed at the Colombian Military Academy (Escuela Militar de Cadetes General Jose Maria Cordova - ESMIC). By analyzing perceptions and attitudes of ESMIC’s students in six areas, namely, socio-demographic characteristics; professional behavior; social patterns; military values; civil-military relations; and integration of women in the military, this data set aims to provide scientific information to assist in the design, implementation, and effectiveness of the National Army of Colombia’s policies.

Download Full-text

German Reproducibility Network - a new platform for Open Science in Germany

10.5194/egusphere-egu21-14724 ◽

2021 ◽

Author(s):

Bernadette Fritzsch ◽

Daniel Nüst

Keyword(s):

Local Level ◽

Scientific Work ◽

Research Process ◽

Open Science ◽

Further Education ◽

Early Career ◽

Reproducible Research ◽

Small Scale ◽

Research Software

Open Science has established itself as a movement across all scientific disciplines in recent years. It supports good practices in science and research that lead to more robust, comprehensible, and reusable results. The aim is to improve the transparency and quality of scientific results so that more trust is achieved, both in the sciences themselves and in society. Transparency requires that uncertainties and assumptions are made explicit and disclosed openly.&#160; Currently, the Open Science movement is largely driven by grassroots initiatives and small scale projects. We discuss some examples that have taken on different facets of the topic:<ul><li>The software developed and used in the research process is playing an increasingly important role. The Research Software Engineers (RSE) communities have therefore organized themselves in national and international initiatives to increase the quality of research software.</li> <li>Evaluating reproducibility of scientific articles as part of peer review requires proper creditation and incentives for both authors and specialised reviewers to spend extra efforts to facilitate workflow execution. The Reproducible AGILE initiative has established a reproducibility review at a major community conference in GIScience.</li> <li>Technological advances for more reproducible scholarly communication beyond PDFs, such as containerisation, exist, but are often inaccessible to domain experts who are not programmers. Targeting geoscience and geography, the project Opening Reproducible Research (o2r) develops infrastructure to support publication of research compendia, which capture data, software (incl. execution environment), text, and interactive figures and maps.</li> </ul>At the core of scientific work lie replicability and reproducibility. Even if different scientific communities use these terms differently, the recognition that these aspects need more attention is commonly shared and individual communities can learn a lot from each other. Networking is therefore of great importance. The newly founded initiative German Reproducibility Network (GRN) wants to be a platform for such networking and targets all of the above initiatives. GRN is embedded in a growing network of similar initiatives, e.g. in the UK, Switzerland and Australia. Its goals include&#160;<ul><li>Support of local open science groups</li> <li>Connecting local or topic-centered initiatives for the exchange of experiences</li> <li>Attracting facilities for the goals of Open Science&#160;</li> <li>Cultivate contacts to funding organizations, publishers and other actors in the scientific landscape</li> </ul>In particular, the GRN aims to promote the dissemination of best practices through various formats of further education, in order to sensitize particularly early career researchers to the topic. By providing a platform for networking, local and domain-specific groups should be able to learn from one another, strengthen one another, and shape policies at a local level.We present the GRN in order to address the existing local initiatives and to win them for membership in the GRN or sibling networks in other countries.

Download Full-text

User Satisfaction on Social Media Profile of E-sports Organization

Marketing and Management of Innovations ◽

10.21272/mmi.2020.4-05 ◽

2020 ◽

pp. 61-75

Author(s):

Krzystof Lukowicz ◽

Artur Strzelecki

Keyword(s):

Social Media ◽

Computer Games ◽

User Satisfaction ◽

Partial Least Square ◽

Easy Access ◽

Equation Modeling ◽

Sport Organizations ◽

Data Set ◽

The Social ◽

Sport Organization

E-sport is one of the most rapidly growing branches of modern entertainment. Many factors influence this rapid progress such as easy access to the broadcast of matches, free e-sport games, or enjoying the favorite match are just a few of them. Moreover, the regularly growing number of tournaments organized (both online and hosted in the largest sports halls in the world) makes more and more older people interested in this phenomenon. Apart from the pure entertainment aspect, electronic sports offer great business opportunities. Proper use of social media allows generating high financial results for investors. The paper is dedicated to the user’s satisfaction from using social media profiles of e-sport organizations, teams, and players. The research covers the basic information about e-sport, social media, and e-marketing forms on social media for e-sport organizations. This work aims to assess the factors influencing the feeling of satisfaction with the use of the social media profile. The purpose of this study is to investigate the influence of Perceived Profile Usefulness, Perceived Entertainment, Identification with Organization and Players, and satisfaction on users’ Intention to Follow and Recommend social media profile of e-sport organization. The study tested and used the model in the context of social media profiles. The partial least square method of structural equation modeling is employed to test the proposed research model. The study utilizes an online survey to obtain data from 209 Polish e-sport enthusiasts (both players and spectators). The data set was analyzed using SmartPLS 3 software. The obtained results showed that the best predictor of users’ Satisfaction is Integration with Organization and Players, followed by Perceived Entertainment. Satisfaction predicts users’ Intention to Follow and Recommend the social media profile of the e-sport organization. The findings improve understanding regarding the marketing actions in e-sport’s social media profiles, and this work is therefore of particular interest to e-sport organizations, e-sport teams, and e-sport players. Keywords: E-sport, social media profile, satisfaction, computer games, social media marketing.

Download Full-text

Addressing disorder in scholarly communication: Strategies from NISO 2021

Information Services & Use ◽

10.3233/isu-210113 ◽

2021 ◽

pp. 1-15

Author(s):

Jodi Schneider ◽

Michele Avissar-Whiting ◽

Caitlin Bakker ◽

Hannah Heckner ◽

Sylvain Massip ◽

...

Keyword(s):

Scientific Information ◽

Scholarly Communication ◽

Open Science ◽

Fake News ◽

Science Standards ◽

Validation And Verification ◽

Related Research ◽

Health Related Research ◽

Health Related ◽

Conference Session

Open science and preprints have invited a larger audience of readers, especially during the pandemic. Consequently, communicating the limitations and uncertainties of research to a broader public has become important over the entire information lifecycle. This paper brings together reports from the NISO Plus 2021 conference session “Misinformation and truth: from fake news to retractions to preprints”. We discuss the validation and verification of scientific information at the preprint stage in order to support sound and open science standards, at the publication stage in order to limit the spread of retracted research, and after publication, to fight fake news about health-related research by mining open access content.

Download Full-text