scholarly journals Grape-RNA: A Database for the Collection, Evaluation, Treatment, and Data Sharing of Grape RNA-Seq Datasets

Genes ◽  
2020 ◽  
Vol 11 (3) ◽  
pp. 315 ◽  
Author(s):  
Yi Wang ◽  
Rui Zhang ◽  
Zhenchang Liang ◽  
Shaohua Li

Since its inception, RNA sequencing (RNA-seq) has become the most effective way to study gene expression. After more than a decade of development, numerous RNA-seq datasets have been created, and the full utilization of these datasets has emerged as a major issue. In this study, we built a comprehensive database named Grape-RNA, which is focused on the collection, evaluation, treatment, and data sharing of grape RNA-seq datasets. This database contains 1529 RNA-seq samples, 112 microRNA samples from the public platform, and 485 RNA-seq in-house datasets sequenced by our lab. We classified these data into 25 conditions and provide the sample information, cleaned raw data, expression level, assembled unigenes, useful tools, and other relevant information to the users. Thus, this study provides data and tools that should be beneficial for researchers by allowing them to easily use the RNA-seq. The provided information can greatly contribute to grape breeding and genomic and biological research. This study may improve the usage of RNA-seq.

PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e11875
Author(s):  
Tomoko Matsuda

Large volumes of high-throughput sequencing data have been submitted to the Sequencing Read Archive (SRA). The lack of experimental metadata associated with the data makes reuse and understanding data quality very difficult. In the case of RNA sequencing (RNA-Seq), which reveals the presence and quantity of RNA in a biological sample at any moment, it is necessary to consider that gene expression responds over a short time interval (several seconds to a few minutes) in many organisms. Therefore, to isolate RNA that accurately reflects the transcriptome at the point of harvest, raw biological samples should be processed by freezing in liquid nitrogen, immersing in RNA stabilization reagent or lysing and homogenizing in RNA lysis buffer containing guanidine thiocyanate as soon as possible. As the number of samples handled simultaneously increases, the time until the RNA is protected can increase. Here, to evaluate the effect of different lag times in RNA protection on RNA-Seq data, we harvested CHO-S cells after 3, 5, 6, and 7 days of cultivation, added RNA lysis buffer in a time course of 15, 30, 45, and 60 min after harvest, and conducted RNA-Seq. These RNA samples showed high RNA integrity number (RIN) values indicating non-degraded RNA, and sequence data from libraries prepared with these RNA samples was of high quality according to FastQC. We observed that, at the same cultivation day, global trends of gene expression were similar across the time course of addition of RNA lysis buffer; however, the expression of some genes was significantly different between the time-course samples of the same cultivation day; most of these differentially expressed genes were related to apoptosis. We conclude that the time lag between sample harvest and RNA protection influences gene expression of specific genes. It is, therefore, necessary to know not only RIN values of RNA and the quality of the sequence data but also how the experiment was performed when acquiring RNA-Seq data from the database.


2019 ◽  
Vol 97 (Supplement_3) ◽  
pp. 135-135
Author(s):  
Shengfa F Liao ◽  
Shamimul Hasan ◽  
Jean M Feugang

Abstract Animal life essentially is a set of gene expression processes. Thorough understanding of these processes driven by dietary nutrients and other environmental factors can be regarded as a bottom line of modern advanced animal nutrition research for improving animal growth, development, health, production, and reproduction performance. Nutrigenomics, a genome-wide approach using the knowledge and techniques obtained from the disciplines of genomics (including transcriptomics) and molecular biology, is to study the effects of dietary nutrients on cellular gene expression, cellular metabolic responses and, ultimately, the phenotypic changes of a living organism. Transcriptomics can be applied to investigate animal tissue transcriptome at a defined physiological or nutritional state, which provides a holistic view of the intracellular expression of RNA, especially mRNA. As a novel, promising transcriptomics approach, RNA sequencing (RNA-Seq) technology can monitor all-gene expressions simultaneously in response to dietary intervention. The principle and history of RNA-Seq technology will be briefly reviewed, and the three principal steps of this methodology, including the laboratory analysis of tissue samples, the bioinformatics analysis of the generated sequence data, and the subsequent biological interpretation of the data, will be described. The application of RNA-Seq technology in different areas of animal nutrition research, which include maternal nutrition, feeding strategy and gut microbiota, will be summarized. Lastly, the application of RNA-Seq technology in swine science and nutrition research will also be discussed. In short, to further improve animal feeding or production efficiency, RNA-Seq technology holds a great potential to be employed to explore the new insights into better understanding of nutrient-gene interactions in agricultural animals, and it is expected that the application of this cutting-edge technology in animal nutrition research will continue to grow in the foreseeable future. This research was supported in part by a USDA-NIFA Multistate Project (No. 1007691).


Author(s):  
Alemu Takele Assefa ◽  
Jo Vandesompele ◽  
Olivier Thas

Abstract Background: In gene expression studies, RNA sample pooling is sometimes considered because of budget constraints or lack of sufficient input material. Using microarray technology, RNA sample pooling strategies have been reported to optimize both the cost of data generation as well as the statistical power for differential gene expression (DGE) analysis. For RNA sequencing, with its different quantitative output in terms of counts and tunable dynamic range, the adequacy and empirical validation of RNA sample pooling strategies have not yet been evaluated. In this study, we comprehensively assessed the utility of pooling strategies in RNA-seq experiments using empirical and simulated RNA-seq datasets. Results: The data generating model in pooled experiments is defined mathematically to evaluate the the mean and variability of gene expression estimates. The model is further used to examine the trade-off between the statistical power of testing for DGE and the data generating costs. Empirical assessment of pooling strategies is done through analysis of RNA-seq datasets under various pooling and non-pooling experimental settings. Simulation study is also used to rank experimental scenarios with respect to the rate of false and true discoveries in DGE analysis. The results demonstrate that pooling strategies in RNA-seq studies can be both cost-effective and powerful when the number of pools, pool size and sequencing depth are optimally defined. Conclusion: For high within-group gene expression variability, small RNA sample pools are effective to reduce the variability and compensate for the loss of the number of replicates. Unlike the typical cost-saving strategies, such as reducing sequencing depth or number of RNA samples (replicates), an adequate pooling strategy is effective in maintaining the power of testing DGE for genes with low to medium abundance levels, along with a substantial reduction of the total cost of the experiment. In general, pooling RNA samples or pooling RNA samples in conjunction with moderate reduction of the sequencing depth can be good options to optimize the cost and maintain the power.


2017 ◽  
Author(s):  
John M Bryan ◽  
Temesgen D Fufa ◽  
Kapil Bharti ◽  
Brian P Brooks ◽  
Robert B Hufnagel ◽  
...  

AbstractThe human eye is built from several specialized tissues which direct, capture, and pre-process information to provide vision. The gene expression of the different eye tissues has been extensively profiled with RNA-seq across numerous studies. Large consortium projects have also used RNA-seq to study gene expression patterning across many different human tissues, minus the eye. There has not been an integrated study of expression patterns from multiple eye tissues compared to other human body tissues. We have collated all publicly available healthy human eye RNA-seq datasets as well as dozens of other tissues. We use this fully integrated dataset to probe the biological processes and pan expression relationships between the cornea, retina, RPE-choroid complex, and the rest of the human tissues with differential expression, clustering, and GO term enrichment tools. We also leverage our large collection of retina and RPE-choroid tissues to build the first human weighted gene correlation networks and use them to highlight known biological pathways and eye gene disease enrichment. We also have integrated publicly available single cell RNA-seq data from mouse retina into our framework for validation and discovery. Finally, we make all these data, analyses, and visualizations available via a powerful interactive web application (https://eyeintegration.nei.nih.gov/).


2021 ◽  
Vol 15 (Supplement_1) ◽  
pp. S062-S062
Author(s):  
A Lewis ◽  
B Pan-Castillo ◽  
G Berti ◽  
C Felice ◽  
H Gordon ◽  
...  

Abstract Background Histone-deacetylase (HDAC) enzymes are a broad class of ubiquitously expressed enzymes that modulate histone acetylation, chromatin accessibility and gene expression. In models of Inflammatory bowel disease (IBD), HDAC inhibitors, such as Valproic acid (VPA) are proven anti-inflammatory agents and evidence suggests that they also inhibit fibrosis in non-intestinal organs. However, the role of HDAC enzymes in stricturing Crohn’s disease (CD) has not been characterised; this is key to understanding the molecular mechanism and developing novel therapies. Methods To evaluate HDAC expression in the intestine of SCD patients, we performed unbiased single-cell RNA sequencing (sc-RNA-seq) of over 10,000 cells isolated from full-thickness surgical resection specimens of non-SCD (NSCD; n=2) and SCD intestine (n=3). Approximately, 1000 fibroblasts were identified for further analysis, including a distinct cluster of myofibroblasts. Changes in gene expression were compared between myofibroblasts and other resident intestinal fibroblasts using the sc-RNA-seq analysis pipeline in Partek. Changes in HDAC expression and markers of HDAC activity (H3K27ac) were confirmed by immunohistochemistry in FFPE tissue from patient matched NSCD and SCD intestine (n=14 pairs). The function of HDACs in intestinal fibroblasts in the CCD-18co cell line and primary CD myofibroblast cultures (n=16 cultures) was assessed using VPA, a class I HDAC inhibitor. Cells were analysed using a variety of molecular techniques including ATAC-seq, gene expression arrays, qPCR, western blot and immunofluorescent protein analysis. Results Class I HDAC (HDAC1, p= 2.11E-11; HDAC2, p= 4.28E-11; HDAC3, p= 1.60E-07; and HDAC8, p= 2.67E-03) expression was increased in myofibroblasts compared to other intestinal fibroblasts subtypes. IHC also showed an increase in the percentage of stromal HDAC2 positive cells, coupled with a decrease in the percentage of H3K27ac positive cells, in the mucosa overlying SCD intestine relative to matched NSCD areas. In the CCD-18co cell line and primary myofibroblast cultures, VPA reduced chromatin accessibility at Collagen-I gene promoters and suppressed their transcription. VPA also inhibited TGFB-induced up-regulation of Collagen-I, in part by inhibiting TGFB1|1/SMAD4 signalling. TGFB1|1 was identified as a mesenchymal specific target of VPA and siRNA knockdown of TGFB1|1 was sufficient suppress TGFB-induced up-regulation of Collagen-I. Conclusion In SCD patients, class I HDAC expression is increased in myofibroblasts. Class I HDACs inhibitors impair TGFB-signalling and inhibit Collagen-I expression. Selective targeting of TGFB1|1 offers the opportunity to increase treatment specificity by selectively targeting meschenymal cells.


2018 ◽  
Author(s):  
Koen Van Den Berge ◽  
Katharina Hembach ◽  
Charlotte Soneson ◽  
Simone Tiberi ◽  
Lieven Clement ◽  
...  

Gene expression is the fundamental level at which the result of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq datasets as well as the performance of the myriad of methods developed. In this review, we give an overall view of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on quantification of gene expression and statistical approaches for differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.


2021 ◽  
Author(s):  
Pablo E. García-Nieto ◽  
Ban Wang ◽  
Hunter B. Fraser

ABSTRACTBackgroundRNA sequencing has been widely used as an essential tool to probe gene expression. While standard practices have been established to analyze RNA-seq data, it is still challenging to detect and remove artifactual signals. Several factors such as sex, age, and sequencing technology have been found to bias these estimates. Probabilistic estimation of expression residuals (PEER) has been used to account for some systematic effects, but it has remained challenging to interpret these PEER factors.ResultsHere we show that transcriptome diversity – a simple metric based on Shannon entropy – explains a large portion of variability in gene expression, and is a major factor detected by PEER. We then show that transcriptome diversity has significant associations with multiple technical and biological variables across diverse organisms and datasets. This prevalent confounding factor provides a simple explanation for a major source of systematic biases in gene expression estimates.ConclusionsOur results show that transcriptome diversity is a metric that captures a systematic bias in RNA-seq and is the strongest known factor encoded in PEER covariates.


2019 ◽  
Author(s):  
Alemu Takele Assefa ◽  
Jo Vandesompele ◽  
Olivier Thas

SummarySPsimSeq is a semi-parametric simulation method for bulk and single cell RNA sequencing data. It simulates data from a good estimate of the actual distribution of a given real RNA-seq dataset. In contrast to existing approaches that assume a particular data distribution, our method constructs an empirical distribution of gene expression data from a given source RNA-seq experiment to faithfully capture the data characteristics of real data. Importantly, our method can be used to simulate a wide range of scenarios, such as single or multiple biological groups, systematic variations (e.g. confounding batch effects), and different sample sizes. It can also be used to simulate different gene expression units resulting from different library preparation protocols, such as read counts or UMI counts.Availability and implementationThe R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq.Supplementary informationSupplementary data are available at bioRχiv online.


2020 ◽  
Author(s):  
Eun Jung Koh ◽  
So Yeon Yu ◽  
Seung Jun Kim ◽  
Eun-Il Lee ◽  
Seung Yong Hwang

Abstract BackgroundWhole blood is one of the most widely utilized human samples in biological research and is useful for analysing the mechanisms of diverse bio-molecular phenomena. However, owing to its fluidic properties, whole blood is relatively unstable in the frozen state compared to other biopsy samples. Because RNA is structurally unstable, sample damage can severely affect RNA quality, thereby reducing its usability. This study aimed to assess the quality of RNA prepared from blood stored at different temperatures and times prior to freezing, as well as the effect of freezer storage time. ResultsThe quality of the RNA derived from different blood samples was assessed by determining the RNA integrity number and RNA sequencing to identify genes (|fold-change (FC)| > 1.5, p-value < 0.05, false discovery rate (FDR) < 0.05) that were differentially expressed between the differently prepared RNA samples. We found that improper sample handling critically influenced both RNA quality and gene expression patterns. In particular, storing blood at room temperature over 12 h before freezing led to RNA degradation. Differential gene expression analysis revealed that expression of the CXCR1 gene was substantially reduced when using impaired RNA. ConclusionsThis study emphasizes the importance of proper sample management for obtaining reliable downstream application outcomes and suggests the CXCR1 gene as a candidate screening marker for RNA damage caused by improper sample handling.


2020 ◽  
Author(s):  
Weimiao Wu ◽  
Qile Dai ◽  
Yunqing Liu ◽  
Xiting Yan ◽  
Zuoheng Wang

AbstractSingle-cell RNA sequencing provides an opportunity to study gene expression at single-cell resolution. However, prevalent dropout events result in high data sparsity and noise that may obscure downstream analyses. We propose a novel method, G2S3, that imputes dropouts by borrowing information from adjacent genes in a sparse gene graph learned from gene expression profiles across cells. We applied G2S3 and other existing methods to seven single-cell datasets to compare their performance. Our results demonstrated that G2S3 is superior in recovering true expression levels, identifying cell subtypes, improving differential expression analyses, and recovering gene regulatory relationships, especially for mildly expressed genes.


Sign in / Sign up

Export Citation Format

Share Document