scholarly journals BioSankey: Visualizing microbial communities and gene expression data over time

2017 ◽  
Author(s):  
Alexander Platzer ◽  
Julia Polzin ◽  
Ping Penny Han ◽  
Klaus Rembart ◽  
Thomas Nussbaumer

AbstractMetagenomics, RNA-seq, WGS (Whole Genome Sequencing) and other types of next-generation sequencing techniques provide quantitative measurements for single strains and genes over time. To obtain a global overview of the experiment and to explore the full potential of a given dataset, intuitive and interactive visualization tools are needed. Therefore, we established BioSankey, which allows to visualize microbial species in microbiome studies and gene expression over time as a Sankey diagram. These diagrams are embedded into a project-specific HTML page, that contains all information as provided during the installation process. BioSankey can be easily applied to analyse bacterial communities in time-series datasets. Furthermore, it can be used to analyse the fluctuations of differentially expressed genes (DEG). The output of BioSankey is a project-specific HTML page, which depends only on JavaScript to enable searches of interesting species or genes of interest without requiring a web server or connection to a database to exchange results among collaboration partners. BioSankey is a tool to visualize different data elements from single and dual RNA-seq datasets as well as from metagenomes studies.

2018 ◽  
Vol 15 (4) ◽  
Author(s):  
Alexander Platzer ◽  
Julia Polzin ◽  
Klaus Rembart ◽  
Ping Penny Han ◽  
Denise Rauer ◽  
...  

Abstract Metagenomics provides quantitative measurements for microbial species over time. To obtain a global overview of an experiment and to explore the full potential of a given dataset, intuitive and interactive visualization tools are needed. Therefore, we established BioSankey to visualize microbial species in microbiome studies over time as a Sankey diagram. These diagrams are embedded into a project-specific webpage which depends only on JavaScript and Google API to allow searches of interesting species without requiring a web server or connection to a database. BioSankey is a valuable tool to visualize different data elements from single or dual RNA-seq datasets and additionally enables a straightforward exchange of results among collaboration partners.


2019 ◽  
Author(s):  
Tim O. Nieuwenhuis ◽  
Stephanie Yang ◽  
Rohan X. Verma ◽  
Vamsee Pillalamarri ◽  
Dan E. Arking ◽  
...  

AbstractOne of the challenges of next generation sequencing (NGS) is read contamination. We used the Genotype-Tissue Expression (GTEx) project, a large, diverse, and robustly generated dataset, to understand the factors that contribute to contamination. We obtained GTEx datasets and technical metadata and validating RNA-Seq from other studies. Of 48 analyzed tissues in GTEx, 26 had variant co-expression clusters of four known highly expressed and pancreas-enriched genes (PRSS1, PNLIP, CLPS, and/or CELA3A). Fourteen additional highly expressed genes from other tissues also indicated contamination. Sample contamination by non-native genes was associated with a sample being sequenced on the same day as a tissue that natively expressed those genes. This was highly significant for pancreas and esophagus genes (linear model, p=9.5e-237 and p=5e-260 respectively). Nine SNPs in four genes shown to contaminate non-native tissues demonstrated allelic differences between DNA-based genotypes and contaminated sample RNA-based genotypes, validating the contamination. Low-level contamination affected 4,497 (39.6%) samples (defined as 10 PRSS1 TPM). It also led ≥ to eQTL assignments in inappropriate tissues among these 18 genes. We note this type of contamination occurs widely, impacting bulk and single cell data set analysis. In conclusion, highly expressed, tissue-enriched genes basally contaminate GTEx and other datasets impacting analyses. Awareness of this process is necessary to avoid assigning inaccurate importance to low-level gene expression in inappropriate tissues and cells.


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 2851 ◽  
Author(s):  
Panu Artimo ◽  
Séverine Duvaud ◽  
Mikhail Pachkov ◽  
Vassilios Ioannidis ◽  
Erik van Nimwegen ◽  
...  

ISMARA (ismara.unibas.ch) automatically infers the key regulators and regulatory interactions from high-throughput gene expression or chromatin state data. However, given the large sizes of current next generation sequencing (NGS) datasets, data uploading times are a major bottleneck. Additionally, for proprietary data, users may be uncomfortable with uploading entire raw datasets to an external server. Both these problems could be alleviated by providing a means by which users could pre-process their raw data locally, transferring only a small summary file to the ISMARA server. We developed a stand-alone client application that pre-processes large input files (RNA-seq or ChIP-seq data) on the user's computer for performing ISMARA analysis in a completely automated manner, including uploading of small processed summary files to the ISMARA server. This reduces file sizes by up to a factor of 1000, and upload times from many hours to mere seconds. The client application is available from ismara.unibas.ch/ISMARA/client.


2018 ◽  
Author(s):  
Khaled Moustafa ◽  
Joanna M. Cross

The assessment of gene expression levels is an important step toward elucidating gene functions temporally and spatially. Decades ago, typical studies were focusing on a few genes individually, whereas now researchers are able to examine whole genomes at once. The upgrade of throughput levels aided the introduction of systems biology approaches whereby cell functional networks can be scrutinized in their entireties to unravel potential functional interacting components. The birth of systems biology goes hand-in-hand with huge technological advancements and enables a fairly rapid detection of all transcripts in studied biological samples. Even so, earlier technologies that were restricted to probing single genes or a subset of genes still have their place in research laboratories. The objective here is to highlight key approaches used in gene expression analysis in plant responses to environmental stresses, or, more generally, any other condition of interest. Northern blots, RNase protection assays, and qPCR are described for their targeted detection of one or a few transcripts at a once. Differential display and serial analysis of gene expression represent non-targeted methods to evaluate expression changes of a significant number of gene transcripts. Finally, microarrays and RNA-seq (next-generation sequencing) contribute to the ultimate goal of identifying and quantifying all transcripts in a cell under conditions or stages of study. Recent examples of applications as well as principles, advantages, and drawbacks of each method are contrasted. We also suggest replacing the term "Next-Generation Sequencing (NGS)" with another less confusing synonym such as "RNA-seq", "high throughput sequencing", or "massively parallel sequencing" to avoid confusion with any future sequencing technologies.


2018 ◽  
Author(s):  
Khaled Moustafa

The assessment of gene expression levels is an important step toward elucidating gene functions temporally and spatially. Decades ago, typical studies were focusing on a few genes individually, whereas now researchers are able to examine whole genomes at once. The upgrade of throughput levels aided the introduction of systems biology approaches whereby cell functional networks can be scrutinized in their entireties to unravel potential functional interacting components. The birth of systems biology goes hand-in-hand with huge technological advancements and enables a fairly rapid detection of all transcripts in studied biological samples. Even so, earlier technologies that were restricted to probing single genes or a subset of genes still have their place in research laboratories. The objective here is to highlight key approaches used in gene expression analysis in plant responses to environmental stresses, or, more generally, any other condition of interest. Northern blots, RNase protection assays, and qPCR are described for their targeted detection of one or a few transcripts at a once. Differential display and serial analysis of gene expression represent non-targeted methods to evaluate expression changes of a significant number of gene transcripts. Finally, microarrays and RNA-seq (next-generation sequencing) contribute to the ultimate goal of identifying and quantifying all transcripts in a cell under conditions or stages of study. Recent examples of applications as well as principles, advantages, and drawbacks of each method are contrasted. We also suggest replacing the term "Next-Generation Sequencing (NGS)" with another less confusing synonym such as "RNA-seq", "high throughput sequencing", or "massively parallel sequencing" to avoid confusion with any future sequencing technologies.


2012 ◽  
Vol 2012 ◽  
pp. 1-8 ◽  
Author(s):  
Robert Ekblom ◽  
Jon Slate ◽  
Gavin J. Horsburgh ◽  
Tim Birkhead ◽  
Terry Burke

Next-generation sequencing of transcriptomes (RNA-Seq) is being used increasingly in studies of nonmodel organisms. Here, we evaluate the effectiveness of normalising cDNA libraries prior to sequencing in a small-scale study of the zebra finch. We find that assemblies produced from normalised libraries had a larger number of contigs but used fewer reads compared to unnormalised libraries. Considerably more genes were also detected using the contigs produced from normalised cDNA, and microsatellite discovery was up to 73% more efficient in these. There was a positive correlation between the detected expression level of genes in normalised and unnormalised cDNA, and there was no difference in the number of genes identified as being differentially expressed between blood and spleen for the normalised and unnormalised libraries. We conclude that normalised cDNA libraries are preferable for many applications of RNA-Seq and that these can also be used in quantitative gene expression studies.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Thomas M. Adams ◽  
Tjelvar S. G. Olsson ◽  
Ricardo H. Ramírez-González ◽  
Ruth Bryant ◽  
Rosie Bryson ◽  
...  

Abstract Background Transcriptomics is being increasingly applied to generate new insight into the interactions between plants and their pathogens. For the wheat yellow (stripe) rust pathogen (Puccinia striiformis f. sp. tritici, Pst) RNA-based sequencing (RNA-Seq) has proved particularly valuable, overcoming the barriers associated with its obligate biotrophic nature. This includes the application of RNA-Seq approaches to study Pst and wheat gene expression dynamics over time and the Pst population composition through the use of a novel RNA-Seq based surveillance approach called “field pathogenomics”. As a dual RNA-Seq approach, the field pathogenomics technique also provides gene expression data from the host, giving new insight into host responses. However, this has created a wealth of data for interrogation. Results Here, we used the field pathogenomics approach to generate 538 new RNA-Seq datasets from Pst-infected field wheat samples, doubling the amount of transcriptomics data available for this important pathosystem. We then analysed these datasets alongside 66 RNA-Seq datasets from four Pst infection time-courses and 420 Pst-infected plant field and laboratory samples that were publicly available. A database of gene expression values for Pst and wheat was generated for each of these 1024 RNA-Seq datasets and incorporated into the development of the rust expression browser (http://www.rust-expression.com). This enables for the first time simultaneous ‘point-and-click’ access to gene expression profiles for Pst and its wheat host and represents the largest database of processed RNA-Seq datasets available for any of the three Puccinia wheat rust pathogens. We also demonstrated the utility of the browser through investigation of expression of putative Pst virulence genes over time and examined the host plants response to Pst infection. Conclusions The rust expression browser offers immense value to the wider community, facilitating data sharing and transparency and the underlying database can be continually expanded as more datasets become publicly available.


2019 ◽  
Vol 21 (Supplement_6) ◽  
pp. vi101-vi101
Author(s):  
Piroon Jejaroenpun ◽  
Thidathip Wongsurawat ◽  
Annick DeLoose ◽  
David Ussery ◽  
Intawat Nookaew ◽  
...  

Abstract The RNA sequencing (RNA-Seq) technique is now routinely used to quantitatively explore genome-wide expression by various research fields including cancer research. The most common RNA-seq methodology produce billions of short-read sequencing in the range of 100–600 base pairs, from which it is occasionally difficult to reconstruct isoform-level transcriptome and fusion genes. The limitations of the short-reads can be overcome by using third-generation sequencing technologies, such as Oxford Nanopore Technologies (ONT). This study aims to perform full-length cDNA sequencing using ONT platform and investigate the abilities of ONT in (1) identifying differential gene expression, (2) detecting differential transcript isoform usage, and (3) detecting fusion genes. To do these methods, CNS-1 cells were implanted into the frontal lobes of three Lewis rats. The CNS-1 model is a histocompatible astrocytoma cell line with an invasive pattern mimicking glioblastoma (GBM). After two weeks of transplantation, the transplanted tumors and the normal brain on the other side were collected as matched normal-tumor pairs. Total RNA extracted from the samples were subjected to the full-length cDNA sequencing on a portable MinION sequencer. In tumors samples, 615 genes involved in cell cycle were upregulated, whereas 1067 genes involved in neurological functions were downregulated. Finally, we could identify differential transcript isoform expression and fusion genes from the matched normal-tumor pairs. Overall, full-length sequencing of the cDNA molecules permitted a detailed characterization of the differential gene expression, the isoform complexity, and fusion genes. In the near future, we will use these methods on human samples.


2017 ◽  
Vol 7 (1) ◽  
Author(s):  
Carlos P. Roca ◽  
Susana I. L. Gomes ◽  
Mónica J. B. Amorim ◽  
Janeck J. Scott-Fordsmand

Abstract RNA-Seq and gene expression microarrays provide comprehensive profiles of gene activity, but lack of reproducibility has hindered their application. A key challenge in the data analysis is the normalization of gene expression levels, which is currently performed following the implicit assumption that most genes are not differentially expressed. Here, we present a mathematical approach to normalization that makes no assumption of this sort. We have found that variation in gene expression is much larger than currently believed, and that it can be measured with available assays. Our results also explain, at least partially, the reproducibility problems encountered in transcriptomics studies. We expect that this improvement in detection will help efforts to realize the full potential of gene expression profiling, especially in analyses of cellular processes involving complex modulations of gene expression.


2012 ◽  
Vol 28 (8) ◽  
pp. 1184-1185 ◽  
Author(s):  
Markus Krupp ◽  
Jens U. Marquardt ◽  
Ugur Sahin ◽  
Peter R. Galle ◽  
John Castle ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document