Transcriptome dataset of human corneal endothelium based on ribosomal RNA-depleted RNA-Seq data

AbstractThe corneal endothelium maintains corneal transparency; consequently, damage to this endothelium by a number of pathological conditions results in severe vision loss. Publicly available expression databases of human tissues are useful for investigating the pathogenesis of diseases and for developing new therapeutic modalities; however, databases for ocular tissues, and especially the corneal endothelium, are poor. Here, we have generated a transcriptome dataset from the ribosomal RNA-depleted total RNA from the corneal endothelium of eyes from seven Caucasians without ocular diseases. The results of principal component analysis and correlation coefficients (ranged from 0.87 to 0.96) suggested high homogeneity of our RNA-Seq dataset among the samples, as well as sufficient amount and quality. The expression profile of tissue-specific marker genes indicated only limited, if any, contamination by other layers of the cornea, while the Smirnov-Grubbs test confirmed the absence of outlier samples. The dataset presented here should be useful for investigating the function/dysfunction of the cornea, as well as for extended transcriptome analyses integrated with expression data for non-coding RNAs.

Download Full-text

JIND: Joint Integration and Discrimination for Automated Single-Cell Annotation

10.1101/2020.10.06.327601 ◽

2020 ◽

Author(s):

Mohit Goyal ◽

Guillermo Serrano ◽

Ilan Shomorony ◽

Mikel Hernaez ◽

Idoia Ochoa

Keyword(s):

Single Cell ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Rna Seq ◽

Batch Effects ◽

Cell Type ◽

Latent Space ◽

Cell Type Specific ◽

Low Dimensional

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.

Download Full-text

ASAP: A web-based platform for the analysis and interactive visualization of single-cell RNA-seq data

10.1101/096222 ◽

2016 ◽

Cited By ~ 5

Author(s):

Vincent Gardeux ◽

Fabrice David ◽

Adrian Shajkofci ◽

Petra C Schwalie ◽

Bart Deplancke

Keyword(s):

Single Cell ◽

Single Cell Analysis ◽

Transcriptome Profiling ◽

Cell Types ◽

Complete Analysis ◽

Marker Genes ◽

Specific Marker ◽

Rna Seq ◽

Web Based ◽

Wide Range

AbstractMotivationSingle-cell RNA-sequencing (scRNA-seq) allows whole transcriptome profiling of thousands of individual cells, enabling the molecular exploration of tissues at the cellular level. Such analytical capacity is of great interest to many research groups in the world, yet, these groups often lack the expertise to handle complex scRNA-seq data sets.ResultsWe developed a fully integrated, web-based platform aimed at the complete analysis of scRNA-seq data post genome alignment: from the parsing, filtering, and normalization of the input count data files, to the visual representation of the data, identification of cell clusters, differentially expressed genes (including cluster-specific marker genes), and functional gene set enrichment. This Automated Single-cell Analysis Pipeline (ASAP) combines a wide range of commonly used algorithms with sophisticated visualization tools. Compared with existing scRNA-seq analysis platforms, researchers (including those lacking computational expertise) are able to interact with the data in a straightforward fashion and in real time. Furthermore, given the overlap between scRNA-seq and bulk RNA-seq analysis workflows, ASAP should conceptually be broadly applicable to any RNA-seq dataset. As a validation, we demonstrate how we can use ASAP to simply reproduce the results from a single-cell study of 91 mouse cells involving five distinct cell types.AvailabilityThe tool is freely available at http://[email protected]

Download Full-text

Celda: A Bayesian model to perform bi-clustering of genes into modules and cells into subpopulations using single-cell RNA-seq data

10.1101/2020.11.16.373274 ◽

2020 ◽

Author(s):

Zhe Wang ◽

Shiyi Yang ◽

Yusuke Koga ◽

Sean E. Corbett ◽

Evan Johnson ◽

...

Keyword(s):

T Cells ◽

Single Cell ◽

Cell Population ◽

Latent Dirichlet Allocation ◽

Expression Patterns ◽

Building Blocks ◽

Marker Genes ◽

Specific Marker ◽

Rna Seq ◽

Biological Functions

1AbstractComplex biological systems can be understood by dividing them into hierarchies. Each level of such a hierarchy is composed of different subunits which cooperate to perform distinct biological functions. Single-cell RNA-seq (scRNA-seq) has emerged as a powerful technique to quantify gene expression in individual cells and is being used to elucidate the molecular and cellular building blocks of complex tissues. We developed a novel Bayesian hierarchical model called Cellular Latent Dirichlet Allocation (Celda) to perform bi-clustering of co-expressed genes into modules and cells into subpopulations. This model can also quantify the relationship between different levels in a biological hierarchy by determining the contribution of each gene in each module, each module in each cell population, and each cell population in each sample. We used Celda to identify transcriptional modules and cell subpopulations in publicly-available peripheral blood mononuclear cell (PBMC) dataset. In addition to the major classes of cell types, Celda also identified a population of proliferating T-cells and a single plasma cell that was missed by other clustering methods in this dataset. Transcriptional modules captured consistency in expression patterns among genes linked to same biological functions. Furthermore, transcriptional modules provided direct insights on cell type specific marker genes, and helped understanding of subtypes of B- and T-cells. Overall, Celda presents a novel principled approach towards characterizing transcriptional programs and cellular and heterogeneity in single-cell data.

Download Full-text

The transcriptome of rat hippocampal subfields

10.1101/2021.06.23.449669 ◽

2021 ◽

Author(s):

Joao P.D. Machado ◽

Maria C.P. Athie ◽

Alexandre H.B. Matos ◽

Iscia Lopes-Cendes ◽

Andre Schwambach Vieira

Keyword(s):

Molecular Mechanisms ◽

Long Term Potentiation ◽

Cdna Libraries ◽

Marker Genes ◽

Specific Marker ◽

Rna Seq ◽

Region Gene ◽

Hippocampal Subfields ◽

Neuronal Populations ◽

Molecular Machinery

The hippocampus comprises several neuronal populations such as CA1, CA2, CA3, and the dentate gyrus (DG), which present different neuronal origins, morphologies, and molecular mechanisms. Laser capture microdissection (LCM) allows selectively collecting samples from target regions and eliminating unwanted cells to obtain more specific results. LCM of hippocampus neuronal populations coupĺed with RNA-seq analysis has the potential to allow the exploration of the molecular machinery unique to each of these subfields. Previous RNA-seq investigation has already provided a molecular blueprint of the hippocampus, however, there is no RNA-seq data specific for each of the rat hippocampal regions. Serial tissue sections covering the hippocampus were produced from frozen brains of adult male Wistar rats, and the hippocampal subfields CA1, CA2, CA3, and DG were identified and isolated by LCM. Total RNA was extracted from samples, and cDNA libraries were prepared and run on a HiSeq 2500 platform. Reads were aligned using STAR, and the DESeq2 statistics package was used to estimate gene expression. We found evident segregation of the transcriptomic profile from different regions of the hippocampus and the expression of known, as well as novel, specific marker genes for each region. Gene ontology enrichment analysis of CA1 subfield indicates an enrichment of actin regulation and postsynaptic membrane AMPA receptors genes indispensable for long-term potentiation. CA2 and CA3 transcripts were found associated with the increased metabolic processes. DG expression was enriched for ribosome and spliceosome, both required for protein synthesis and maintenance of cell life. The present findings contribute to a deeper understanding of the differences in the molecular machinery expressed by the rat hippocampal neuronal populations, further exploring underlying mechanisms responsible for each subflied specific functions.

Download Full-text

Detection of condition-specific marker genes from RNA-seq data with MGFR

PeerJ ◽

10.7717/peerj.6970 ◽

2019 ◽

Vol 7 ◽

pp. e6970 ◽

Cited By ~ 1

Author(s):

Khadija El Amrani ◽

Gregorio Alanis-Lobato ◽

Nancy Mah ◽

Andreas Kurtz ◽

Miguel A. Andrade-Navarro

Keyword(s):

Gene Expression ◽

Marker Gene ◽

Cell Types ◽

The Other ◽

Marker Genes ◽

Specific Marker ◽

Sample Type ◽

Rna Seq ◽

Cell Fate Decisions ◽

Daunting Task

The identification of condition-specific genes is key to advancing our understanding of cell fate decisions and disease development. Differential gene expression analysis (DGEA) has been the standard tool for this task. However, the amount of samples that modern transcriptomic technologies allow us to study, makes DGEA a daunting task. On the other hand, experiments with low numbers of replicates lack the statistical power to detect differentially expressed genes. We have previously developed MGFM, a tool for marker gene detection from microarrays, that is particularly useful in the latter case. Here, we have adapted the algorithm behind MGFM to detect markers in RNA-seq data. MGFR groups samples with similar gene expression levels and flags potential markers of a sample type if their highest expression values represent all replicates of this type. We have benchmarked MGFR against other methods and found that its proposed markers accurately characterize the functional identity of different tissues and cell types in standard and single cell RNA-seq datasets. Then, we performed a more detailed analysis for three of these datasets, which profile the transcriptomes of different human tissues, immune and human blastocyst cell types, respectively. MGFR’s predicted markers were compared to gold-standard lists for these datasets and outperformed the other marker detectors. Finally, we suggest novel candidate marker genes for the examined tissues and cell types. MGFR is implemented as a freely available Bioconductor package (https://doi.org/doi:10.18129/B9.bioc.MGFR), which facilitates its use and integration with bioinformatics pipelines.

Download Full-text

Chemometric Analysis for the Classification of some Groups of Drugs with Divergent Pharmacological Activity on the Basis of some Chromatographic and Molecular Modeling Parameters

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207321666180129102149 ◽

2018 ◽

Vol 21 (2) ◽

pp. 125-137

Author(s):

Jolanta Stasiak ◽

Marcin Koba ◽

Marcin Gackowski ◽

Tomasz Baczek

Keyword(s):

Correlation Analysis ◽

Pharmacological Activity ◽

Correlation Coefficients ◽

Principal Component ◽

Cardiovascular Drugs ◽

New Drugs ◽

Analgesic Drugs ◽

Starting Point ◽

Chromatographic Parameters

Aim and Objective: In this study, chemometric methods as correlation analysis, cluster analysis (CA), principal component analysis (PCA), and factor analysis (FA) have been used to reduce the number of chromatographic parameters (logk/logkw) and various (e.g., 0D, 1D, 2D, 3D) structural descriptors for three different groups of drugs, such as 12 analgesic drugs, 11 cardiovascular drugs and 36 “other” compounds and especially to choose the most important data of them. Material and Methods: All chemometric analyses have been carried out, graphically presented and also discussed for each group of drugs. At first, compounds’ structural and chromatographic parameters were correlated. The best results of correlation analysis were as follows: correlation coefficients like R = 0.93, R = 0.88, R = 0.91 for cardiac medications, analgesic drugs, and 36 “other” compounds, respectively. Next, part of molecular and HPLC experimental data from each group of drugs were submitted to FA/PCA and CA techniques. Results: Almost all results obtained by FA or PCA, and total data variance, from all analyzed parameters (experimental and calculated) were explained by first two/three factors: 84.28%, 76.38 %, 69.71% for cardiovascular drugs, for analgesic drugs and for 36 “other” compounds, respectively. Compounds clustering by CA method had similar characteristic as those obtained by FA/PCA. In our paper, statistical classification of mentioned drugs performed has been widely characterized and discussed in case of their molecular structure and pharmacological activity. Conclusion: Proposed QSAR strategy of reduced number of parameters could be useful starting point for further statistical analysis as well as support for designing new drugs and predicting their possible activity.

Download Full-text

scSorter: assigning cells to known cell types according to marker genes

Genome Biology ◽

10.1186/s13059-021-02281-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Hongyu Guo ◽

Jun Li

Keyword(s):

Real Data ◽

Cell Types ◽

Exact Expression ◽

Marker Genes ◽

Specific Marker ◽

Sequencing Data ◽

Reference Dataset ◽

Over Expression ◽

Higher Power ◽

Cell Type Specific

AbstractOn single-cell RNA-sequencing data, we consider the problem of assigning cells to known cell types, assuming that the identities of cell-type-specific marker genes are given but their exact expression levels are unavailable, that is, without using a reference dataset. Based on an observation that the expected over-expression of marker genes is often absent in a nonnegligible proportion of cells, we develop a method called scSorter. scSorter allows marker genes to express at a low level and borrows information from the expression of non-marker genes. On both simulated and real data, scSorter shows much higher power compared to existing methods.

Download Full-text

Seed Set Patterns in East African Highland Cooking Bananas Are Dependent on Weather before, during and after Pollination

Horticulturae ◽

10.3390/horticulturae7070165 ◽

2021 ◽

Vol 7 (7) ◽

pp. 165

Author(s):

Allan Waniale ◽

Rony Swennen ◽

Settumba B. Mukasa ◽

Arthur K. Tugume ◽

Jerome Kubiriba ◽

...

Keyword(s):

Critical Period ◽

Floral Development ◽

Seed Set ◽

Correlation Coefficients ◽

Principal Component ◽

Maximum Temperature ◽

Fruit Pulp ◽

East African ◽

Average Temperature ◽

Wild Banana

Seed set in banana is influenced by weather, yet the key weather attributes and the critical period of influence are unknown. We therefore investigated the influence of weather during floral development for a better perspective of seed set increase. Three East African highland cooking bananas (EAHBs) were pollinated with pollen fertile wild banana ‘Calcutta 4′. At full maturity, bunches were harvested, ripened, and seeds extracted from fruit pulp. Pearson’s correlation analysis was then conducted between seed set per 100 fruits per bunch and weather attributes at 15-day intervals from 105 days before pollination (DBP) to 120 days after pollination (DAP). Seed set was positively correlated with average temperature (P < 0.05–P < 0.001, r = 0.196–0.487) and negatively correlated with relative humidity (RH) (P < 0.05–P < 0.001, r = −0.158–−0.438) between 75 DBP and the time of pollination. After pollination, average temperature was negatively correlated with seed set in ‘Mshale’ and ‘Nshonowa’ from 45 to 120 DAP (P < 0.05–P < 0.001, r = −0.213–−0.340). Correlation coefficients were highest at 15 DBP for ‘Mshale’ and ‘Nshonowa’, whereas for ‘Enzirabahima’, the highest were at the time of pollination. Maximum temperature as revealed by principal component analysis at the time of pollination should be the main focus for seed set increase.

Download Full-text