scholarly journals archivist: An R Package for Managing, Recording and Restoring Data Analysis Results

2017 ◽  
Vol 82 (11) ◽  
Author(s):  
Przemysaw Biecek ◽  
Marcin Kosinski
Keyword(s):  
Author(s):  
Pedro M. Esperança ◽  
Dari F. Da ◽  
Ben Lambert ◽  
Roch K. Dabiré ◽  
Thomas S. Churcher

AbstractNear infrared spectroscopy is increasingly being used as an economical method to monitor mosquito vector populations in support of disease control. Despite this rise in popularity, strong geographical variation in spectra has proven an issue for generalising predictions from one location to another. Here, we use a functional data analysis approach—which models spectra as smooth curves rather than as a discrete set of points—to develop a method that is robust to geographic heterogeneity. Specifically, we use a penalised generalised linear modelling framework which includes efficient functional representation of spectra, spectral smoothing and regularisation. To ensure better generalisation of model predictions from one training set to another, we use cross-validation procedures favouring smoother representation of spectra. To illustrate the performance of our approach, we collected spectra for field-caught specimens of Anopheles gambiae complex mosquitoes – the most epidemiologically important vector species on the planet – in two sites in Burkina Faso. Using these spectra, we show how models trained on data from one site can successfully classify morphologically identical sibling species in another site, over 250km away. Whilst we apply our framework to species prediction, our unified statistical framework can, alternatively, handle regression analysis (for example, to determine mosquito age) and other types of multinomial classification (for example, to determine infection status). To make our methods readily available for field entomologists, we have created an open-source R package mlevcm. All data used is publicly also available.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Benjamin Ulfenborg

Abstract Background Studies on multiple modalities of omics data such as transcriptomics, genomics and proteomics are growing in popularity, since they allow us to investigate complex mechanisms across molecular layers. It is widely recognized that integrative omics analysis holds the promise to unlock novel and actionable biological insights into health and disease. Integration of multi-omics data remains challenging, however, and requires combination of several software tools and extensive technical expertise to account for the properties of heterogeneous data. Results This paper presents the miodin R package, which provides a streamlined workflow-based syntax for multi-omics data analysis. The package allows users to perform analysis of omics data either across experiments on the same samples (vertical integration), or across studies on the same variables (horizontal integration). Workflows have been designed to promote transparent data analysis and reduce the technical expertise required to perform low-level data import and processing. Conclusions The miodin package is implemented in R and is freely available for use and extension under the GPL-3 license. Package source, reference documentation and user manual are available at https://gitlab.com/algoromics/miodin.


2010 ◽  
Vol 22 (1) ◽  
pp. 278
Author(s):  
A. Gad ◽  
M. Hoelker ◽  
F. Rings ◽  
N. Ghanem ◽  
D. Salilew-Wondim ◽  
...  

Estrus synchronization and superovulation are the most widely used procedures in embryo transfer technology. However, changes in the oviduct and uterine environment due to these procedures and the subsequent influence on embryos have not yet been investigated. This study was con- ducted to investigate the effect of oviduct environment of only synchronized or superovulated cyclic heifers on the gene expression profile of blastocysts. Bovine Affymetrix array analysis was performed using 2 groups of blastocysts. The first group was bovine blastocysts produced after superovulation of Simmental heifers (n = 9) using 8 consecutive FSH injections over 4 days in decreasing doses (in total, 300-400 mg of FSH equivalent according to body weight) and flushed at Day 7 by nonsurgical endoscopic method. The second group was bovine blastocysts derived from synchronized Simmental heifers (n = 4) after transfer of 2-cell stage embryos from superovulated donor Simmental heifers (n = 9) by nonsurgical transvaginal endoscopy tubal transfer method. Total RNA was extracted from 3 pools of embryos from each experimental group (6 embryos per pool). A total of 6 biotin-labeled cRNA samples were hybridized on 6 bovine Affymetrix arrays. Data analysis was performed using LIMMA written on R package, which maintained the Bioconductor. Array data analysis revealed a total of 454 transcripts to be differen- tially expressed (P < 0.05, fold change >2) between the 2 groups. Of these, 429 and 25 were up- and down-regulated, respectively, in blastocysts derived from superovulated heifers compared with those derived from synchronized animals. Genes involved in response to stress (HSPA14 and HSPE1), cellular and metabolic processes (CPSF3, ATPIF1, POMP, and MDH2), translation (RPS17, EEF1B2, and EIF4E), and cell commu- nication (FN1, KRT18, and DSG2) were found to be enriched in blastocysts derived from superovulated animals. On the other hand, protein metabolic processes related genes (CLGN) were found to be enriched in blastocysts derived from the synchronized group. The KEGG analysis of the differentially expressed genes showed that the ribosome and oxidative phosphorylation pathways are the dominant pathways and genes involved in these pathways are greatly abundant in the blastocysts derived from superovulated animals. Quantitative real-time PCR has confirmed the transcript abundance of 7 out of 8 genes selected for validation. In conclusion, blastocysts cultured in synchronized animals post 2-cell stage showed significant differences in transcriptome profile compared with their counterparts that remained in superovulated heifers until Day 7. Further functional analysis of some selected candidate genes could give new insights into mechanisms regulating the ability of embryos to survive after transfer.


2018 ◽  
Author(s):  
Maziyar Baran Pouyan ◽  
Dennis Kostka

AbstractMotivationGenome-wide transcriptome sequencing applied to single cells (scRNA-seq) is rapidly becoming an assay of choice across many fields of biological and biomedical research. Scientific objectives often revolve around discovery or characterization of types or sub-types of cells, and therefore obtaining accurate cell–cell similarities from scRNA-seq data is critical step in many studies. While rapid advances are being made in the development of tools for scRNA-seq data analysis, few approaches exist that explicitly address this task. Furthermore, abundance and type of noise present in scRNA-seq datasets suggest that application of generic methods, or of methods developed for bulk RNA-seq data, is likely suboptimal.ResultsHere we present RAFSIL, a random forest based approach to learn cell–cell similarities from scRNA-seq data. RAFSIL implements a two-step procedure, where feature construction geared towards scRNA-seq data is followed by similarity learning. It is designed to be adaptable and expandable, and RAFSIL similarities can be used for typical exploratory data analysis tasks like dimension reduction, visualization, and clustering. We show that our approach compares favorably with current methods across a diverse collection of datasets, and that it can be used to detect and highlight unwanted technical variation in scRNA-seq datasets in situations where other methods fail. Overall, RAFSIL implements a flexible approach yielding a useful tool that improves the analysis of scRNA-seq data.Availability and ImplementationThe RAFSIL R package is available online at www.kostkalab.net/software.html


2021 ◽  
Author(s):  
Jakob P. Pettersen ◽  
Eivind Almaas

AbstractBackgroundDifferential co-expression network analysis has become an important tool to gain understanding of biological phenotypes and diseases. The CSD algorithm is a method to generate differential co-expression networks by comparing gene co-expressions from two different conditions. Each of the gene pairs is assigned conserved (C), specific (S) and differentiated (D) scores based on the co-expression of the gene pair between the two conditions. The result of the procedure is a network where the nodes are genes and the links are the gene pairs with the highest C-, S-, and D-scores. However, the existing CSD-implementations suffer from poor computational performance, difficult user procedures and lack of documentation.ResultsWe created the R-package csdR aimed at reaching good performance together with ease of use, sufficient documentation, and with the ability to play well with other tools for data analysis. csdR was benchmarked on a realistic dataset with 20, 645 genes. After verifying that the chosen number of iterations gave sufficient robustness, we tested the performance against the two existing CSD implementations. csdR was superior in performance to one of the implementations, whereas the other did not run. Our implementation can utilize multiple processing cores. However, we were unable to achieve more than ∼ 2.7 parallel speedup with saturation reached at about 10 cores.ConclusionsThe results suggest that csdR is a useful tool for differential co-expression analysis and is able to generate robust results within a workday on datasets of realistic sizes when run on a workstation or compute server.


The R Journal ◽  
2018 ◽  
Vol 10 (1) ◽  
pp. 73 ◽  
Author(s):  
Juhyun Kim ◽  
Yiwen Zhang ◽  
Joshua Day ◽  
Hua Zhou

F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 7
Author(s):  
Sebastien Theil ◽  
Etienne Rifa

Bioinformatic tools for marker gene sequencing data analysis are continuously and rapidly evolving, thus integrating most recent techniques and tools is challenging. We present an R package for data analysis of 16S and ITS amplicons based sequencing. This workflow is based on several R functions and performs automatic treatments from fastq sequence files to diversity and differential analysis with statistical validation. The main purpose of this package is to automate bioinformatic analysis, ensure reproducibility between projects, and to be flexible enough to quickly integrate new bioinformatic tools or statistical methods. rANOMALY is an easy to install and customizable R package, that uses amplicon sequence variants (ASV) level for microbial community characterization. It integrates all assets of the latest bioinformatics methods, such as better sequence tracking, decontamination from control samples, use of multiple reference databases for taxonomic annotation, all main ecological analysis for which we propose advanced statistical tests, and a cross-validated differential analysis by four different methods. Our package produces ready to publish figures, and all of its outputs are made to be integrated in Rmarkdown code to produce automated reports.


2019 ◽  
pp. 51-64

The article presents basic algorithms categorical data analysis using R package. Algorithms for the analysis of independent and non­independent nominal and ordinal data are presented.


Sign in / Sign up

Export Citation Format

Share Document