scholarly journals Systematic analysis of 1,298 RNA-Seq samples and construction of a comprehensive soybean (Glycine max) expression atlas

2019 ◽  
Author(s):  
Fabricio Brum Machado ◽  
Kanhu C. Moharana ◽  
Fabricio Almeida-Silva ◽  
Rajesh K. Gazara ◽  
Francisnei Pedrosa-Silva ◽  
...  

AbstractSoybean (Glycine max [L.] Merr.) is a major crop in animal feed and human nutrition, mainly for its rich protein and oil contents. The remarkable rise in soybean transcriptome studies over the past five years generated an enormous amount of RNA-seq data, encompassing various tissues, developmental conditions, and genotypes. In this study, we have collected data from 1,298 publicly available soybean transcriptome samples, processed the raw sequencing reads, and mapped them to the soybean reference genome in a systematic fashion. We found that 94% of the annotated genes (52,737/56,044) had detectable expression in at least one sample. Unsupervised clustering revealed three major groups, comprising samples from aerial, underground, and seed/seed-related parts. We found 452 genes with uniform and constant expression levels, supporting their roles as housekeeping genes. On the other hand, 1,349 genes showed heavily biased expression patterns towards particular tissues. A transcript-level analysis revealed that 95% (70,963/74,490) of the known transcripts overlap with those reported here, whereas 3,256 assembled transcripts represent potentially novel splicing isoforms. The dataset compiled here constitute a new resource for the community, which can be downloaded or accessed through a user-friendly web interface at http://venanciogroup.uenf.br/resources/. This comprehensive transcriptome atlas will likely accelerate research on soybean genetics and genomics.

2020 ◽  
Vol 103 (5) ◽  
pp. 1894-1909 ◽  
Author(s):  
Fabricio B. Machado ◽  
Kanhu C. Moharana ◽  
Fabricio Almeida‐Silva ◽  
Rajesh K. Gazara ◽  
Francisnei Pedrosa‐Silva ◽  
...  

2018 ◽  
Author(s):  
Verboom Karen ◽  
Everaert Celine ◽  
Bolduc Nathalie ◽  
Livak J. Kenneth ◽  
Yigit Nurten ◽  
...  

AbstractSingle cell RNA sequencing methods have been increasingly used to understand cellular heterogeneity. Nevertheless, most of these methods suffer from one or more limitations, such as focusing only on polyadenylated RNA, sequencing of only the 3’ end of the transcript, an exuberant fraction of reads mapping to ribosomal RNA, and the unstranded nature of the sequencing data. Here, we developed a novel single cell strand-specific total RNA library preparation method addressing all the aforementioned shortcomings. Our method was validated on a microfluidics system using three different cancer cell lines undergoing a chemical or genetic perturbation. We demonstrate that our total RNA-seq method detects an equal or higher number of genes compared to classic polyA[+] RNA-seq, including novel and non-polyadenylated genes. The obtained RNA expression patterns also recapitulate the expected biological signal. Inherent to total RNA-seq, our method is also able to detect circular RNAs. Taken together, SMARTer single cell total RNA sequencing is very well suited for any single cell sequencing experiment in which transcript level information is needed beyond polyadenylated genes.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e4719 ◽  
Author(s):  
Yu-Chun Chang ◽  
Yan Ding ◽  
Lingsheng Dong ◽  
Lang-Jing Zhu ◽  
Roderick V. Jensen ◽  
...  

Background Using DNA microarrays, we previously identified 451 genes expressed in 19 different human tissues. Although ubiquitously expressed, the variable expression patterns of these “housekeeping genes” (HKGs) could separate one normal human tissue type from another. Current focus on identifying “specific disease markers” is problematic as single gene expression in a given sample represents the specific cellular states of the sample at the time of collection. In this study, we examine the diagnostic and prognostic potential of the variable expressions of HKGs in lung cancers. Methods Microarray and RNA-seq data for normal lungs, lung adenocarcinomas (AD), squamous cell carcinomas of the lung (SQCLC), and small cell carcinomas of the lung (SCLC) were collected from online databases. Using 374 of 451 HKGs, differentially expressed genes between pairs of sample types were determined via two-sided, homoscedastic t-test. Principal component analysis and hierarchical clustering classified normal lung and lung cancers subtypes according to relative gene expression variations. We used uni- and multi-variate cox-regressions to identify significant predictors of overall survival in AD patients. Classifying genes were selected using a set of training samples and then validated using an independent test set. Gene Ontology was examined by PANTHER. Results This study showed that the differential expression patterns of 242, 245, and 99 HKGs were able to distinguish normal lung from AD, SCLC, and SQCLC, respectively. From these, 70 HKGs were common across the three lung cancer subtypes. These HKGs have low expression variation compared to current lung cancer markers (e.g., EGFR, KRAS) and were involved in the most common biological processes (e.g., metabolism, stress response). In addition, the expression pattern of 106 HKGs alone was a significant classifier of AD versus SQCLC. We further highlighted that a panel of 13 HKGs was an independent predictor of overall survival and cumulative risk in AD patients. Discussion Here we report HKG expression patterns may be an effective tool for evaluation of lung cancer states. For example, the differential expression pattern of 70 HKGs alone can separate normal lung tissue from various lung cancers while a panel of 106 HKGs was a capable class predictor of subtypes of non-small cell carcinomas. We also reported that HKGs have significantly lower variance compared to traditional cancer markers across samples, highlighting the robustness of a panel of genes over any one specific biomarker. Using RNA-seq data, we showed that the expression pattern of 13 HKGs is a significant, independent predictor of overall survival for AD patients. This reinforces the predictive power of a HKG panel across different gene expression measurement platforms. Thus, we propose the expression patterns of HKGs alone may be sufficient for the diagnosis and prognosis of individuals with lung cancer.


2020 ◽  
Author(s):  
Li Li ◽  
Fu Shi ◽  
Yanbin Guan ◽  
Guoli Wang ◽  
Yufan Zhang ◽  
...  

Abstract Background: The SQUAMOSA-PROMOTER BINDING PROTEIN-LIKE (SPL) genes encode a family of plant-specific transcription factors that contain a conservative SBP domain. SPL proteins play important roles in plant growth and development, such as plant architecture, flowering regulation, and grain yield. However, the systematic analysis of TaSPL gene family in wheat is lacking.Results: In this study, 56 TaSPL genes were identified from wheat genome and divided into eight groups (G1-G8), according to the phylogenetic analysis of TaSPL proteins among numbers of plant species. Bioinformatics method were applied to analyse the gene structure, motif, chromosome localization, segmental duplication and synteny of total TaSPL genes and the results showed that their characteristics were different among group in the exon-intron constitution, conserved and specific motif. The expansion and evolution of the TaSPL genes occurred within the wheat genome. Total 28 of 56 TaSPL genes were predicted to be putative targets for miR156, which revealed the importance of miR156-mediated regulation in wheat. Moreover, transcript level analysis of TaSPL genes in wheat tissues by qRT-PCR discovered the diversified spatiotemporal expression patterns, based on the comparison with reference RNA-seq data. Some TaSPL genes were subject to various stress treatments including drought and hormones, etc. suggesting that these part genes probably involved in responding to hormone signals during different wheat development stages. Conclusions: Our findings show that TaSPL genes may regulate the development of spike and grain, resistance to abiotic stresses, and involve in responding to hormone signals. These results could provide a fundamentally information to further study of the functions of TaSPL genes in wheat growth and development.


2021 ◽  
Author(s):  
Diana Lobo ◽  
Raquel Godinho ◽  
John Archer

Abstract The evolution of RNA-Seq technologies yielded datasets that are of immense scientific value. Commonly, such data is generated within differential expression studies, where datasets derived from individual samples are grouped into conditions, and gene expression patterns quantified. The number of archived datasets is increasing and revisiting many at an inter-study level provides an in-depth view into transcriptome evolution. The biggest hurdle is in dealing with variation of read counts at an individual transcript level between common conditions. We present a tool, TVScript, that quantifies intra-condition variation, and subsequently, removes reference-based transcripts that are associated with high levels of this. TVScript is demonstrated at inter and intra-study levels, using data from brain samples of dogs, wolves and foxes (aggressive and tame), where a marked improvement in the distribution of the gene-wise dispersion estimates, the metric utilized by the majority of differential expression tools, lowered the number of outliers detected. We provide support for seven candidate genes with potential for being involved with selection for tameness, and that appear to play a crucial role in canine domestication. We also identify several genes previously identified as being differentially expressed, but that possessed high intra-condition variation, weakening their relevance. TVScript is available at: https://sourceforge.net/projects/tvscript/.


2018 ◽  
Author(s):  
Zhigang Lu ◽  
Matthew Berriman

AbstractBackgroundSince the genome of the parasitic flatworm Schistosoma mansoni was sequenced in 2009, various RNA-seq studies have been conducted to investigate differential gene expression between certain life stages. Based on these studies, the overview of gene expression in all life stages can improve our understanding of S. mansoni genome biology.Methodspublicly available RNA-seq data covering all life stages and gonads were mapped to the latest S. mansoni genome. Read counts were normalised across all samples and differential expression analysis was preformed using the generalized linear model (GLM) approach.Resultswe revealed for the first time the dissimilarities among all life stages. Genes that are abundantly-expressed in all life stages, as well as those preferentially-expressed in certain stage(s), were determined. The latter reveals genes responsible for stage-dominant functions of the parasite, which can be a guidance for the investigation and annotation of gene functions. In addition, distinct differential expression patterns were observed between adjacent life stages, which not only correlate well with original individual studies, but also provide additional information on changes in gene expression during parasite transitions. Furthermore, thirteen novel housekeeping genes across all life stages were identified, which is valuable for quantitative studies (e.g., qPCR).Conclusionsthe metaanalysis provides valuable information on the expression and potential functions of S. mansoni genes across all life stages, and can facilitate basic as well as applied research for the community.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Seon Hwa Kim ◽  
Vladimir Vujanovic

AbstractMycoparasites are an assemblage of biotrophic and necrotrophic fungi that occur on plant pathogenic fungal hosts. Biotrophic mycoparasites are often overlooked in transcriptomic-based biocontrol studies. Sphaerodes mycoparasitica (S.m.) is a specific biotrophic mycoparasite of plant pathogenic Fusarium graminearum (F.g.), a devastating Fusarium head blight (FHB) disease in small-grain cereals. To understand the biotrophic mycoparasitism comprehensively, we performed Illumina RNA-Seq transcriptomic study on the fungus–fungus interaction in vitro. The aim is to identify the transcript-level mechanism related to the biotrophic S.m. mycoparasitism, particularly its ability to effectively control the F.g. 3-ADON chemotype. A shift in the transcriptomic profile of the mycoparasite was triggered in response to its interaction with F.g. during recognition (1.5 days) and colonization (3.5 days) steps. RNA-Seq analysis revealed ~ 30% of annotated transcripts with "function unknown". Further, 14 differentially expressed genes functionally linked to the biotrophic mycoparasitism were validated by quantitative real-time PCR (qPCR). The gene expression patterns of the filamentous haemagglutinin/adhesin/attachment factor as well as cell wall-degrading glucanases and chitinases were upregulated by host interaction. Besides, mycoparasitism-associated antioxidant resistance genes encoding ATP-binding cassette (ABC) transporter(s) and glutathione synthetase(s) were upregulated. However, the thioredoxin reductase was downregulated which infers that this antioxidant gene can be used as a resistance marker to assess S.m. antifungal and antimycotoxigenic activities. The interactive transcriptome of S. mycoparasitica provides new insights into specific mycoparasitism and will contribute to future research in controlling FHB. Graphic Abstract


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Jing Xu ◽  
Xiangdong Liu ◽  
Qiming Dai

Abstract Background Hypertrophic cardiomyopathy (HCM) represents one of the most common inherited heart diseases. To identify key molecules involved in the development of HCM, gene expression patterns of the heart tissue samples in HCM patients from multiple microarray and RNA-seq platforms were investigated. Methods The significant genes were obtained through the intersection of two gene sets, corresponding to the identified differentially expressed genes (DEGs) within the microarray data and within the RNA-Seq data. Those genes were further ranked using minimum-Redundancy Maximum-Relevance feature selection algorithm. Moreover, the genes were assessed by three different machine learning methods for classification, including support vector machines, random forest and k-Nearest Neighbor. Results Outstanding results were achieved by taking exclusively the top eight genes of the ranking into consideration. Since the eight genes were identified as candidate HCM hallmark genes, the interactions between them and known HCM disease genes were explored through the protein–protein interaction (PPI) network. Most candidate HCM hallmark genes were found to have direct or indirect interactions with known HCM diseases genes in the PPI network, particularly the hub genes JAK2 and GADD45A. Conclusions This study highlights the transcriptomic data integration, in combination with machine learning methods, in providing insight into the key hallmark genes in the genetic etiology of HCM.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 311
Author(s):  
Zhenqiu Liu

Single-cell RNA-seq (scRNA-seq) is a powerful tool to measure the expression patterns of individual cells and discover heterogeneity and functional diversity among cell populations. Due to variability, it is challenging to analyze such data efficiently. Many clustering methods have been developed using at least one free parameter. Different choices for free parameters may lead to substantially different visualizations and clusters. Tuning free parameters is also time consuming. Thus there is need for a simple, robust, and efficient clustering method. In this paper, we propose a new regularized Gaussian graphical clustering (RGGC) method for scRNA-seq data. RGGC is based on high-order (partial) correlations and subspace learning, and is robust over a wide-range of a regularized parameter λ. Therefore, we can simply set λ=2 or λ=log(p) for AIC (Akaike information criterion) or BIC (Bayesian information criterion) without cross-validation. Cell subpopulations are discovered by the Louvain community detection algorithm that determines the number of clusters automatically. There is no free parameter to be tuned with RGGC. When evaluated with simulated and benchmark scRNA-seq data sets against widely used methods, RGGC is computationally efficient and one of the top performers. It can detect inter-sample cell heterogeneity, when applied to glioblastoma scRNA-seq data.


Sign in / Sign up

Export Citation Format

Share Document