scholarly journals ascend: R package for analysis of single-cell RNA-seq data

GigaScience ◽  
2019 ◽  
Vol 8 (8) ◽  
Author(s):  
Anne Senabouth ◽  
Samuel W Lukowski ◽  
Jose Alquicira Hernandez ◽  
Stacey B Andersen ◽  
Xin Mei ◽  
...  

Abstract Background Recent developments in single-cell RNA sequencing (scRNA-seq) platforms have vastly increased the number of cells typically assayed in an experiment. Analysis of scRNA-seq data is multidisciplinary in nature, requiring careful consideration of the application of statistical methods with respect to the underlying biology. Few analysis packages exist that are at once robust, are computationally fast, and allow flexible integration with other bioinformatics tools and methods. Findings ascend is an R package comprising tools designed to simplify and streamline the preliminary analysis of scRNA-seq data, while addressing the statistical challenges of scRNA-seq analysis and enabling flexible integration with genomics packages and native R functions, including fast parallel computation and efficient memory management. The package incorporates both novel and established methods to provide a framework to perform cell and gene filtering, quality control, normalization, dimension reduction, clustering, differential expression, and a wide range of visualization functions. Conclusions ascend is designed to work with scRNA-seq data generated by any high-throughput platform and includes functions to convert data objects between software packages. The ascend workflow is simple and interactive, as well as suitable for implementation by a broad range of users, including those with little programming experience.

2017 ◽  
Author(s):  
Anne Senabouth ◽  
Samuel W Lukowski ◽  
Jose Alquicira Hernandez ◽  
Stacey Andersen ◽  
Xin Mei ◽  
...  

AbstractSummaryascend is an R package comprised of fast, streamlined analysis functions optimized to address the statistical challenges of single cell RNA-seq. The package incorporates novel and established methods to provide a flexible framework to perform filtering, quality control, normalization, dimension reduction, clustering, differential expression and a wide-range of plotting. ascend is designed to work with scRNA-seq data generated by any high-throughput platform, and includes functions to convert data objects between software packages.AvailabilityThe R package and associated vignettes are freely available at https://github.com/IMB-Computational-Genomics-Lab/[email protected] informationAn example dataset is available at ArrayExpress, accession number E-MTAB-6108


2017 ◽  
Author(s):  
Jonathan Alles ◽  
Nikos Karaiskos ◽  
Samantha D. Praktiknjo ◽  
Stefanie Grosswendt ◽  
Philipp Wahle ◽  
...  

ABSTRACTBackgroundRecent developments in droplet-based microfluidics allow the transcriptional profiling of thousands of individual cells, in a quantitative, highly parallel and cost-effective way. A critical, often limiting step is the preparation of cells in an unperturbed state, not compromised by stress or ageing. Another challenge are rare cells that need to be collected over several days, or samples prepared at different times or locations.ResultsHere, we used chemical fixation to overcome these problems. Methanol fixation allowed us to stabilize and preserve dissociated cells for weeks. By using mixtures of fixed human and mouse cells, we showed that individual transcriptomes could be confidently assigned to one of the two species. Single-cell gene expression from live and fixed samples correlated well with bulk mRNA-seq data. We then applied methanol fixation to transcriptionally profile primary single cells from dissociated complex tissues. Low RNA content cells from Drosophila embryos, as well as mouse hindbrain and cerebellum cells sorted by FACS, were successfully analysed after fixation, storage and single-cell droplet RNA-seq. We were able to identify diverse cell populations, including neuronal subtypes. As an additional resource, we provide ‘dropbead’, an R package for exploratory data analysis, visualization and filtering of Drop-seq data.ConclusionsWe expect that the availability of a simple cell fixation method will open up many new opportunities in diverse biological contexts to analyse transcriptional dynamics at single cell resolution.


2015 ◽  
Author(s):  
Keegan D. Korthauer ◽  
Li-Fang Chu ◽  
Michael A. Newton ◽  
Yuan Li ◽  
James Thomson ◽  
...  

AbstractThe ability to quantify cellular heterogeneity is a major advantage of single-cell technologies. Although understanding such heterogeneity is of primary interest in a number of studies, for convenience, statistical methods often treat cellular heterogeneity as a nuisance factor. We present a novel method to characterize differences in expression in the presence of distinct expression states within and among biological conditions. Using simulated and case study data, we demonstrate that the modeling framework is able to detect differential expression patterns of interest under a wide range of settings. Compared to existing approaches, scDD has higher power to detect subtle differences in gene expression distributions that are more complex than a mean shift, and is able to characterize those differences. The freely available R package scDD implements the approach.


2020 ◽  
Vol 12 (8) ◽  
pp. 1257 ◽  
Author(s):  
Mercedes E. Paoletti ◽  
Juan M. Haut ◽  
Xuanwen Tao ◽  
Javier Plaza Miguel ◽  
Antonio Plaza

The storage and processing of remotely sensed hyperspectral images (HSIs) is facing important challenges due to the computational requirements involved in the analysis of these images, characterized by continuous and narrow spectral channels. Although HSIs offer many opportunities for accurately modeling and mapping the surface of the Earth in a wide range of applications, they comprise massive data cubes. These huge amounts of data impose important requirements from the storage and processing points of view. The support vector machine (SVM) has been one of the most powerful machine learning classifiers, able to process HSI data without applying previous feature extraction steps, exhibiting a robust behaviour with high dimensional data and obtaining high classification accuracies. Nevertheless, the training and prediction stages of this supervised classifier are very time-consuming, especially for large and complex problems that require an intensive use of memory and computational resources. This paper develops a new, highly efficient implementation of SVMs that exploits the high computational power of graphics processing units (GPUs) to reduce the execution time by massively parallelizing the operations of the algorithm while performing efficient memory management during data-reading and writing instructions. Our experiments, conducted over different HSI benchmarks, demonstrate the efficiency of our GPU implementation.


2020 ◽  
Vol 36 (10) ◽  
pp. 3276-3278 ◽  
Author(s):  
Alemu Takele Assefa ◽  
Jo Vandesompele ◽  
Olivier Thas

Abstract Summary SPsimSeq is a semi-parametric simulation method to generate bulk and single-cell RNA-sequencing data. It is designed to simulate gene expression data with maximal retention of the characteristics of real data. It is reasonably flexible to accommodate a wide range of experimental scenarios, including different sample sizes, biological signals (differential expression) and confounding batch effects. Availability and implementation The R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
jiawei Zou ◽  
miaochen Wang ◽  
zhen Zhang ◽  
zheqi Liu ◽  
xiaobin Zhang ◽  
...  

Differential expression (DE) gene detection in single-cell RNA-seq (scRNA-seq) data is a key step to understand the biological question investigated. We find that DE methods together with gene filtering have profound impact on DE gene identification, and different datasets will benefit from personalized DE gene detection strategies. Existing tools don't take gene filtering into consideration, and couldn't evaluate DE performance on real datasets without prior knowledge of true results. Based on two new metrics, we propose scCODE (single cell Consensus Optimization of Differentially Expressed gene detection), an R package to automatically optimize DE gene detection for each experimental scRNA-seq dataset.


The recycling and reuse of materials and objects were extensive in the past, but have rarely been embedded into models of the economy; even more rarely has any attempt been made to assess the scale of these practices. Recent developments, including the use of large datasets, computational modelling, and high-resolution analytical chemistry, are increasingly offering the means to reconstruct recycling and reuse, and even to approach the thorny matter of quantification. Growing scholarly interest in the topic has also led to an increasing recognition of these practices from those employing more traditional methodological approaches, which are sometimes coupled with innovative archaeological theory. Thanks to these efforts, it has been possible for the first time in this volume to draw together archaeological case studies on the recycling and reuse of a wide range of materials, from papyri and textiles, to amphorae, metals and glass, building materials and statuary. Recycling and reuse occur at a range of site types, and often in contexts which cross-cut material categories, or move from one object category to another. The volume focuses principally on the Roman Imperial and late antique world, over a broad geographical span ranging from Britain to North Africa and the East Mediterranean. Last, but not least, the volume is unique in focusing upon these activities as a part of the status quo, and not just as a response to crisis.


Polymers ◽  
2020 ◽  
Vol 12 (10) ◽  
pp. 2237 ◽  
Author(s):  
P. R. Sarika ◽  
Paul Nancarrow ◽  
Abdulrahman Khansaheb ◽  
Taleb Ibrahim

Phenol–formaldehyde (PF) resin continues to dominate the resin industry more than 100 years after its first synthesis. Its versatile properties such as thermal stability, chemical resistance, fire resistance, and dimensional stability make it a suitable material for a wide range of applications. PF resins have been used in the wood industry as adhesives, in paints and coatings, and in the aerospace, construction, and building industries as composites and foams. Currently, petroleum is the key source of raw materials used in manufacturing PF resin. However, increasing environmental pollution and fossil fuel depletion have driven industries to seek sustainable alternatives to petroleum based raw materials. Over the past decade, researchers have replaced phenol and formaldehyde with sustainable materials such as lignin, tannin, cardanol, hydroxymethylfurfural, and glyoxal to produce bio-based PF resin. Several synthesis modifications are currently under investigation towards improving the properties of bio-based phenolic resin. This review discusses recent developments in the synthesis of PF resins, particularly those created from sustainable raw material substitutes, and modifications applied to the synthetic route in order to improve the mechanical properties.


Author(s):  
Irzam Sarfraz ◽  
Muhammad Asif ◽  
Joshua D Campbell

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 22 (3) ◽  
pp. 1399
Author(s):  
Salim Ghannoum ◽  
Waldir Leoncio Netto ◽  
Damiano Fantini ◽  
Benjamin Ragan-Kelley ◽  
Amirabbas Parizadeh ◽  
...  

The growing attention toward the benefits of single-cell RNA sequencing (scRNA-seq) is leading to a myriad of computational packages for the analysis of different aspects of scRNA-seq data. For researchers without advanced programing skills, it is very challenging to combine several packages in order to perform the desired analysis in a simple and reproducible way. Here we present DIscBIO, an open-source, multi-algorithmic pipeline for easy, efficient and reproducible analysis of cellular sub-populations at the transcriptomic level. The pipeline integrates multiple scRNA-seq packages and allows biomarker discovery with decision trees and gene enrichment analysis in a network context using single-cell sequencing read counts through clustering and differential analysis. DIscBIO is freely available as an R package. It can be run either in command-line mode or through a user-friendly computational pipeline using Jupyter notebooks. We showcase all pipeline features using two scRNA-seq datasets. The first dataset consists of circulating tumor cells from patients with breast cancer. The second one is a cell cycle regulation dataset in myxoid liposarcoma. All analyses are available as notebooks that integrate in a sequential narrative R code with explanatory text and output data and images. R users can use the notebooks to understand the different steps of the pipeline and will guide them to explore their scRNA-seq data. We also provide a cloud version using Binder that allows the execution of the pipeline without the need of downloading R, Jupyter or any of the packages used by the pipeline. The cloud version can serve as a tutorial for training purposes, especially for those that are not R users or have limited programing skills. However, in order to do meaningful scRNA-seq analyses, all users will need to understand the implemented methods and their possible options and limitations.


Sign in / Sign up

Export Citation Format

Share Document