scholarly journals Improved gene co-expression network quality through expression dataset down-sampling and network aggregation

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Franziska Liesecke ◽  
Johan-Owen De Craene ◽  
Sébastien Besseau ◽  
Vincent Courdavault ◽  
Marc Clastre ◽  
...  

Abstract Large-scale gene co-expression networks are an effective methodology to analyze sets of co-expressed genes and discover new gene functions or associations. Distances between genes are estimated according to their expression profiles and are visualized in networks that may be further partitioned to reveal communities of co-expressed genes. Creating expression profiles is now eased by the large amounts of publicly available expression data (microarrays and RNA-seq). Although many distance calculation methods have been intensively compared and reviewed in the past, it is unclear how to proceed when many samples reflecting a wide range of different conditions are available. Should as many samples as possible be integrated into network construction or be partitioned into smaller sets of more related samples? Previous studies have indicated a saturation in network performances to capture known associations once a certain number of samples is included in distance calculations. Here, we examined the influence of sample size on co-expression network construction using microarray and RNA-seq expression data from three plant species. We tested different down-sampling methods and compared network performances in recovering known gene associations to networks obtained from full datasets. We further examined how aggregating networks may help increase this performance by testing six aggregation methods.

Molecules ◽  
2021 ◽  
Vol 26 (13) ◽  
pp. 3924
Author(s):  
Maria Leonor Santos ◽  
Mariaelena D’Ambrosio ◽  
Ana P. Rodrigo ◽  
A. Jorge Parola ◽  
Pedro M. Costa

The past decade has seen growing interest in marine natural pigments for biotechnological applications. One of the most abundant classes of biological pigments is the tetrapyrroles, which are prized targets due their photodynamic properties; porphyrins are the best known examples of this group. Many animal porphyrinoids and other tetrapyrroles are produced through heme metabolic pathways, the best known of which are the bile pigments biliverdin and bilirubin. Eulalia is a marine Polychaeta characterized by its bright green coloration resulting from a remarkably wide range of greenish and yellowish tetrapyrroles, some of which have promising photodynamic properties. The present study combined metabolomics based on HPLC-DAD with RNA-seq transcriptomics to investigate the molecular pathways of porphyrinoid metabolism by comparing the worm’s proboscis and epidermis, which display distinct pigmentation patterns. The results showed that pigments are endogenous and seemingly heme-derived. The worm possesses homologs in both organs for genes encoding enzymes involved in heme metabolism such as ALAD, FECH, UROS, and PPOX. However, the findings also indicate that variants of the canonical enzymes of the heme biosynthesis pathway can be species- and organ-specific. These differences between molecular networks contribute to explain not only the differential pigmentation patterns between organs, but also the worm’s variety of novel endogenous tetrapyrrolic compounds.


2020 ◽  
Author(s):  
Park MooJong ◽  
Song Youngseok ◽  
Lee Heesup ◽  
Park Juhyeok

<p>Recently, climate change due to global warming has been frequented by large-scale weather disasters that have not been experienced in the past. Among various weather disasters, drought is one of the representative weather disasters in Korea recently along with heavy rains. In the case of drought, it occurs in a wide range in the short term and long term, and it is difficult to identify specific occurrence times, places, and causes, and damage and influence are enormous.</p><p>In the past, the Republic of Korea has been prepared with non-structural measures such as securing irrigation water for drought restoration, developing emergency management, and developing a drought information system based on drought index. The reduction measures for drought degradation were mainly used by Palmer Draught Severity Index (PDSI), Standardized Precision Index (SPI), Crop Moisture Index (CMI), Crop Specific Drug Index (CMI), and Profication (DICS Index), and Survey.</p><p>In this study, we intend to establish standards for reducing drought damage by investigating and analyzing drought damage characteristics in Korea. In the past, drought damage in Korea occurred in agriculture, living and industry, and the ministry manages and stores the data on drought damage. The drought damage in South Korea from 1965 to 2018 occurred a total of 204 times, mostly in South Gyeongsang and South Jeolla provinces, rather than in special cities and metropolitan cities. The purpose of this study is to analyze the characteristics of drought damage in Korea and establish the measures to reduce mega drought.</p><p>Acknowledges : This research was supported by a grant(2019-MOIS31-010) from Fundamental Technology Development Program for Extreme Disaster Response funded by Korean Ministry of Interior and Safety(MOIS).</p>


2015 ◽  
Author(s):  
Florian Wagner

Genome-wide expression profiling is a cost-efficient and widely used method to characterize heterogeneous populations of cells, tissues, biopsies, or other biological specimen. The exploratory analysis of such datasets typically relies on generic unsupervised methods, e.g. principal component analysis or hierarchical clustering. However, generic methods fail to exploit the significant amount of knowledge that exists about the molecular functions of genes. Here, I introduce GO-PCA, an unsupervised method that incorporates prior knowledge about gene functions in the form of gene ontology (GO) annotations. GO-PCA aims to discover and represent biological heterogeneity along all major axes of variation in a given dataset, while suppressing heterogeneity due to technical biases. To this end, GO-PCA combines principal component analysis (PCA) with nonparametric GO enrichment analysis, and uses the results to generate expression signatures based on small sets of functionally related genes. I first applied GO-PCA to expression data from diverse lineages of the human hematopoietic system, and obtained a small set of signatures that captured known cell characteristics for most lineages. I then applied the method to expression profiles of glioblastoma (GBM) tumor biopsies, and obtained signatures that were strongly associated with multiple previously described GBM subtypes. Surprisingly, GO-PCA discovered a cell cycle-related signature that exhibited significant differences between the Proneural and the prognostically favorable GBM CpG Island Methylator (G-CIMP) subtypes, suggesting that the G-CIMP subtype is characterized in part by lower mitotic activity. Previous expression-based classifications have failed to separate these subtypes, demonstrating that GO-PCA can detect heterogeneity that is missed by other methods. My results show that GO-PCA is a powerful and versatile expression-based method that facilitates exploration of large-scale expression data, without requiring additional types of experimental data. The low-dimensional representation generated by GO-PCA lends itself to interpretation, hypothesis generation, and further analysis.


2018 ◽  
Author(s):  
Koen Van Den Berge ◽  
Katharina Hembach ◽  
Charlotte Soneson ◽  
Simone Tiberi ◽  
Lieven Clement ◽  
...  

Gene expression is the fundamental level at which the result of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq datasets as well as the performance of the myriad of methods developed. In this review, we give an overall view of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on quantification of gene expression and statistical approaches for differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.


2016 ◽  
Author(s):  
Alan Medlar ◽  
Laura Laakso ◽  
Andreia Miraldo ◽  
Ari Löytynoja

AbstractHigh-throughput RNA-seq data has become ubiquitous in the study of non-model organisms, but its use in comparative analysis remains a challenge. Without a reference genome for mapping, sequence data has to be de novo assembled, producing large numbers of short, highly redundant contigs. Preparing these assemblies for comparative analyses requires the removal of redundant isoforms, assignment of orthologs and converting fragmented transcripts into gene alignments. In this article we present Glutton, a novel tool to process transcriptome assemblies for downstream evolutionary analyses. Glutton takes as input a set of fragmented, possibly erroneous transcriptome assemblies. Utilising phylogeny-aware alignment and reference data from a closely related species, it reconstructs one transcript per gene, finds orthologous sequences and produces accurate multiple alignments of coding sequences. We present a comprehensive analysis of Glutton’s performance across a wide range of divergence times between study and reference species. We demonstrate the impact choice of assembler has on both the number of alignments and the correctness of ortholog assignment and show substantial improvements over heuristic methods, without sacrificing correctness. Finally, using inference of Darwinian selection as an example of downstream analysis, we show that Glutton-processed RNA-seq data give results comparable to those obtained from full length gene sequences even with distantly related reference species. Glutton is available from http://wasabiapp.org/software/glutton/ and is licensed under the GPLv3.


2016 ◽  
Vol 22 (6) ◽  
pp. 579-592 ◽  
Author(s):  
Xiaomin Dong ◽  
Yanan You ◽  
Jia Qian Wu

The composition and function of the central nervous system (CNS) is extremely complex. In addition to hundreds of subtypes of neurons, other cell types, including glia (astrocytes, oligodendrocytes, and microglia) and vascular cells (endothelial cells and pericytes) also play important roles in CNS function. Such heterogeneity makes the study of gene transcription in CNS challenging. Transcriptomic studies, namely the analyses of the expression levels and structures of all genes, are essential for interpreting the functional elements and understanding the molecular constituents of the CNS. Microarray has been a predominant method for large-scale gene expression profiling in the past. However, RNA-sequencing (RNA-Seq) technology developed in recent years has many advantages over microarrays, and has enabled building more quantitative, accurate, and comprehensive transcriptomes of the CNS and other systems. The discovery of novel genes, diverse alternative splicing events, and noncoding RNAs has remarkably expanded the complexity of gene expression profiles and will help us to understand intricate neural circuits. Here, we discuss the procedures and advantages of RNA-Seq technology in mammalian CNS transcriptome construction, and review the approaches of sample collection as well as recent progress in building RNA-Seq-based transcriptomes from tissue samples and specific cell types.


2020 ◽  
Vol 2 (2) ◽  
pp. 45-47
Author(s):  
Shubha Devi Sapkota ◽  
Monika Sharma ◽  
Gehendra Bhusal

COVID 19 is a newly recognized infectious disease that has rapidly spread with no verified treatment available. It is caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-COV-2). In Convalescent plasma therapy, the yellowish liquid or the plasma from the recovered blood is used to treat the patient suffering from the same illness. For more than 100 years it has been used to treat severe infections with varying degrees of success. For this present infection, multiple clinical trials on plasma therapy are still under vigorous investigations. Despite the very low chance of risks like allergies, lung damage, and transmission of blood-related infection, the therapy has shown a positive result in the recovery of the patients. Many experts are observing its use as a “stopgap measure” until effective vaccines and antiviral drugs are available in a wide range. However, the main challenges faced are finding suitable donors, its expensiveness in the whole procedure, and inability to perform on a large scale. In this commentary, summarization of the convalescent plasma therapy is done as a hopeful alternative therapy of severe or critical COVID 19. It has also emphasized the promising results shown since the past while the use of this therapy in various infectious diseases.


2018 ◽  
Author(s):  
LM Simon ◽  
S Karg ◽  
AJ Westermann ◽  
M Engel ◽  
AHA Elbehery ◽  
...  

AbstractBackgroundWith the advent of the age of big data in bioinformatics, large volumes of data and high performance computing power enable researchers to perform re-analyses of publicly available datasets at an unprecedented scale. Ever more studies imply the microbiome in both normal human physiology and a wide range of diseases. RNA sequencing technology (RNA-seq) is commonly used to infer global eukaryotic gene expression patterns under defined conditions, including human disease-related contexts, but its generic nature also enables the detection of microbial and viral transcripts.FindingsWe developed a bioinformatic pipeline to screen existing human RNA-seq datasets for the presence of microbial and viral reads by re-inspecting the non-human-mapping read fraction. We validated this approach by recapitulating outcomes from 6 independent controlled infection experiments of cell line models and comparison with an alternative metatranscriptomic mapping strategy. We then applied the pipeline to close to 150 terabytes of publicly available raw RNA-seq data from >17,000 samples from >400 studies relevant to human disease using state-of-the-art high performance computing systems. The resulting data of this large-scale re-analysis are made available in the presented MetaMap resource.ConclusionsOur results demonstrate that common human RNA-seq data, including those archived in public repositories, might contain valuable information to correlate microbial and viral detection patterns with diverse diseases. The presented MetaMap database thus provides a rich resource for hypothesis generation towards the role of the microbiome in human disease.


2017 ◽  
Author(s):  
Philippa Borrill ◽  
Sophie A. Harrington ◽  
Cristobal Uauy

ARTICLE SUMMARYTranscription factors are vital in plants to regulate gene expression in response to environmental stimuli and to control developmental processes. In this study, we annotated and classified transcription factors in polyploid bread wheat into gene families and explored the NAC family in detail. We combined phylogenetic analysis and transcriptome analysis, using publicly available RNA-seq data, to characterize the NAC gene family and provide hypotheses for putative functions of many NAC transcription factors. This study lays the groundwork for future studies on transcription factors in wheat which may be of great agronomic relevance.ABSTRACTMany important genes in agriculture correspond to transcription factors which regulate a wide range of pathways from flowering to responses to disease and abiotic stresses. In this study, we identified 5,776 transcription factors in hexaploid wheat (Triticum aestivum) and classified them into gene families. We further investigated the NAC family exploring the phylogeny, C-terminal domain conservation and expression profiles across 308 RNA-seq samples. Phylogenetic trees of NAC domains indicated that wheat NACs divided into eight groups similar to rice (Oryza sativa) and barley (Hordeum vulgare). C-terminal domain motifs were frequently conserved between wheat, rice and barley within phylogenetic groups, however this conservation was not maintained across phylogenetic groups. We explored gene expression patterns across a wide range of developmental stages, tissues, and abiotic stresses. We found that more phylogenetically related NACs shared more similar expression patterns compared to more distant NACs. However, within each phylogenetic group there were clades with diverse expression profiles. We carried out a co-expression analysis on all wheat genes and identified 37 modules of co-expressed genes of which 23 contained NACs. Using GO term enrichment we obtained putative functions for NACs within co-expressed modules including responses to heat and abiotic stress and responses to water: these NACs may represent targets for breeding or biotechnological applications. This study provides a framework and data for hypothesis generation for future studies on NAC transcription factors in wheat.


F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 750
Author(s):  
Olukayode A. Sosina ◽  
Matthew N. Tran ◽  
Kristen R. Maynard ◽  
Ran Tao ◽  
Margaret A. Taub ◽  
...  

Background: Statistical deconvolution strategies have emerged over the past decade to estimate the proportion of various cell populations in homogenate tissue sources like brain using gene expression data. However, no study has been undertaken to assess the extent to which expression-based and DNAm-based cell type composition estimates agree. Results: Using estimated neuronal fractions from DNAm data, from the same brain region (i.e., matched) as our bulk RNA-Seq dataset, as proxies for the true unobserved cell-type fractions (i.e., as the gold standard), we assessed the accuracy (RMSE) and concordance (R2) of four reference-based deconvolution algorithms: Houseman, CIBERSORT, non-negative least squares (NNLS)/MIND, and MuSiC. We did this for two cell-type populations - neurons and non-neurons/glia - using matched single nuclei RNA-Seq and mismatched single cell RNA-Seq reference datasets. With the mismatched single cell RNA-Seq reference dataset, Houseman, MuSiC, and NNLS produced concordant (high correlation; Houseman R2 = 0.51, 95% CI [0.39, 0.65]; MuSiC R2 = 0.56, 95% CI [0.43, 0.69]; NNLS R2 = 0.54, 95% CI [0.32, 0.68]) but biased (high RMSE, >0.35) neuronal fraction estimates. CIBERSORT produced more discordant (moderate correlation; R2 = 0.25, 95% CI [0.15, 0.38]) neuronal fraction estimates, but with less bias (low RSME, 0.09). Using the matched single nuclei RNA-Seq reference dataset did not eliminate bias (MuSiC RMSE = 0.17). Conclusions: Our results together suggest that many existing RNA deconvolution algorithms estimate the RNA composition of homogenate tissue, e.g. the amount of RNA attributable to each cell type, and not the cellular composition, which relates to the underlying fraction of cells.


Sign in / Sign up

Export Citation Format

Share Document