scholarly journals HTSSIP: an R package for analysis of high throughput sequencing data from nucleic acid stable isotope probing (SIP) experiments

2017 ◽  
Author(s):  
Nicholas D. Youngblut ◽  
Samuel E. Barnett ◽  
Daniel H. Buckley

AbstractCombining high throughput sequencing with stable isotope probing (HTS-SIP) is a powerful method for mapping in situ metabolic processes to thousands of microbial taxa. However, accurately mapping metabolic processes to taxa is complex and challenging. Multiple HTS-SIP data analysis methods have been developed, including high-resolution stable isotope probing (HR-SIP), multi-window high-resolution stable isotope probing (MW-HR-SIP), quantitative stable isotope probing (q-SIP), and ΔBD. Currently, the computational tools to perform these analyses are either not publicly available or lack documentation, testing, and developer support. To address this shortfall, we have developed the HTSSIP R package, a toolset for conducting HTS-SIP analyses in a straightforward and easily reproducible manner. The HTSSIP package, along with full documentation and examples, is available from CRAN at https://cran.r-project.org/web/packages/HTSSIP/index.html and Github at https://github.com/nick-youngblut/HTSSIP.

2019 ◽  
Author(s):  
Anthony Federico ◽  
Stefano Monti

ABSTRACTSummaryGeneset enrichment is a popular method for annotating high-throughput sequencing data. Existing tools fall short in providing the flexibility to tackle the varied challenges researchers face in such analyses, particularly when analyzing many signatures across multiple experiments. We present a comprehensive R package for geneset enrichment workflows that offers multiple enrichment, visualization, and sharing methods in addition to novel features such as hierarchical geneset analysis and built-in markdown reporting. hypeR is a one-stop solution to performing geneset enrichment for a wide audience and range of use cases.Availability and implementationThe most recent version of the package is available at https://github.com/montilab/hypeR.Supplementary informationComprehensive documentation and tutorials, are available at https://montilab.github.io/hypeR-docs.


Author(s):  
Anthony Federico ◽  
Stefano Monti

Abstract Summary Geneset enrichment is a popular method for annotating high-throughput sequencing data. Existing tools fall short in providing the flexibility to tackle the varied challenges researchers face in such analyses, particularly when analyzing many signatures across multiple experiments. We present a comprehensive R package for geneset enrichment workflows that offers multiple enrichment, visualization, and sharing methods in addition to novel features such as hierarchical geneset analysis and built-in markdown reporting. hypeR is a one-stop solution to performing geneset enrichment for a wide audience and range of use cases. Availability and implementation The most recent version of the package is available at https://github.com/montilab/hypeR. Contact [email protected] or [email protected]


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e2209 ◽  
Author(s):  
Georgios Georgiou ◽  
Simon J. van Heeringen

Summary.In this article we describe fluff, a software package that allows for simple exploration, clustering and visualization of high-throughput sequencing data mapped to a reference genome. The package contains three command-line tools to generate publication-quality figures in an uncomplicated manner using sensible defaults. Genome-wide data can be aggregated, clustered and visualized in a heatmap, according to different clustering methods. This includes a predefined setting to identify dynamic clusters between different conditions or developmental stages. Alternatively, clustered data can be visualized in a bandplot. Finally, fluff includes a tool to generate genomic profiles. As command-line tools, the fluff programs can easily be integrated into standard analysis pipelines. The installation is straightforward and documentation is available athttp://fluff.readthedocs.org.Availability.fluff is implemented in Python and runs on Linux. The source code is freely available for download athttps://github.com/simonvh/fluff.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1466 ◽  
Author(s):  
Erik Fasterius ◽  
Cristina Al-Khalili Szigyarto

High throughput sequencing technologies are flourishing in the biological sciences, enabling unprecedented insights into e.g. genetic variation, but require extensive bioinformatic expertise for the analysis. There is thus a need for simple yet effective software that can analyse both existing and novel data, providing interpretable biological results with little bioinformatic prowess. We present seqCAT, a Bioconductor toolkit for analysing genetic variation in high throughput sequencing data. It is a highly accessible, easy-to-use and well-documented R-package that enables a wide range of researchers to analyse their own and publicly available data, providing biologically relevant conclusions and publication-ready figures. SeqCAT can provide information regarding genetic similarities between an arbitrary number of samples, validate specific variants as well as define functionally similar variant groups for further downstream analyses. Its ease of use, installation, complete data-to-conclusions functionality and the inherent flexibility of the R programming language make seqCAT a powerful tool for variant analyses compared to already existing solutions. A publicly available dataset of liver cancer-derived organoids is analysed herein using the seqCAT package, demonstrating that the organoids are genetically stable. A previously known liver cancer-related mutation is additionally shown to be present in a sample though it was not listed in the original publication. Differences between DNA- and RNA-based variant calls in this dataset are also analysed revealing a high median concordance of 97.5%.


2017 ◽  
Author(s):  
Thomas J. Hardcastle ◽  
Irene Papatheodorou

ABSTRACTSummary:Identifying gene co-expression is a significant step in understanding functional relationships between genes. Existing methods primarily depend on analyses of correlation between pairs of genes; however, this neglects structural elements between experimental conditions. We present a novel approach to identifying clusters of co-expressed genes that incorporates these structures.Availability:The methods are released on Bioconductor as the clusterSeq package (https://bioconductor.org/packages/release/bioc/html/clusterSeq.html).Contact: [email protected]


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Enrique Blanco ◽  
Mar González-Ramírez ◽  
Luciano Di Croce

AbstractLarge-scale sequencing techniques to chart genomes are entirely consolidated. Stable computational methods to perform primary tasks such as quality control, read mapping, peak calling, and counting are likewise available. However, there is a lack of uniform standards for graphical data mining, which is also of central importance. To fill this gap, we developed SeqCode, an open suite of applications that analyzes sequencing data in an elegant but efficient manner. Our software is a portable resource written in ANSI C that can be expected to work for almost all genomes in any computational configuration. Furthermore, we offer a user-friendly front-end web server that integrates SeqCode functions with other graphical analysis tools. Our analysis and visualization toolkit represents a significant improvement in terms of performance and usability as compare to other existing programs. Thus, SeqCode has the potential to become a key multipurpose instrument for high-throughput professional analysis; further, it provides an extremely useful open educational platform for the world-wide scientific community. SeqCode website is hosted at http://ldicrocelab.crg.eu, and the source code is freely distributed at https://github.com/eblancoga/seqcode.


2016 ◽  
Author(s):  
Georgios Georgiou ◽  
Simon J. van Heeringen

AbstractSummaryIn this application note we describe fluff, a software package that allows for simple exploration, clustering and visualization of high-throughput sequencing data mapped to a reference genome. The package contains three command-line tools to generate publication-quality figures in an uncomplicated manner using sensible defaults. Genome-wide data can be aggregated, clustered and visualized in a heatmap, according to different clustering methods. This includes a predefined setting to identify dynamic clusters between different conditions or developmental stages. Alternatively, clustered data can be visualized in a bandplot. Finally, fluff includes a tool to generate genomic profiles. As command-line tools, the fluff programs can easily be integrated into standard analysis pipelines. The installation is straightforward and documentation is available at http://fluff.readthedocs.org.Availabilityfluff is implemented in Python and runs on Linux. The source code is freely available for download at http://github.com/simonvh/[email protected]


2017 ◽  
Author(s):  
Xuhua Xia

ABSTRACTTwo major stumbling blocks exist in high-throughput sequencing (HTS) data analysis. The first is the sheer file size typically in gigabytes when uncompressed, causing problems in storage, transmission and analysis. However, these files do not need to be so large and can be reduced without loss of information. Each HTS file, either in compressed .SRA or plain text .fastq format, contains numerous identical reads stored as separate entries. For example, among 44603541 forward reads in the SRR4011234.sra file (from aBacillus subtilistranscriptomic study) deposited at NCBI’s SRA database, one read has 497027 identical copies. Instead of storing them as separate entries, one can and should store them as a single entry with the SeqID_NumCopy format (which I dub as FASTA+ format). The second is the proper allocation reads that map equally well to paralogous genes. I illustrate in detail a new method for such allocation. I have developed ARSDA software that implement these new approaches. A number of HTS files for model species are in the process of being processed and deposited athttp://coevol.rdc.uottawa.cato demonstrate that this approach not only saves a huge amount of storage space and transmission bandwidth, but also dramatically reduces time in downstream data analysis. Instead of matching the 497027 identical reads separately against theBacillus subtilisgenome, one only needs to match it once. ARSDA includes functions to take advantage of HTS data in the new sequence format for downstream data analysis such as gene expression characterization. ARSDA can be run on Windows, Linux and Macintosh computers and is freely available athttp://dambe.bio.uottawa.ca/ARSDA/ARSDA.aspx.


F1000Research ◽  
2019 ◽  
Vol 7 ◽  
pp. 1466 ◽  
Author(s):  
Erik Fasterius ◽  
Cristina Al-Khalili Szigyarto

High throughput sequencing technologies are flourishing in the biological sciences, enabling unprecedented insights into e.g. genetic variation, but require extensive bioinformatic expertise for the analysis. There is thus a need for simple yet effective software that can analyse both existing and novel data, providing interpretable biological results with little bioinformatic prowess. We present seqCAT, a Bioconductor toolkit for analysing genetic variation in high throughput sequencing data. It is a highly accessible, easy-to-use and well-documented R-package that enables a wide range of researchers to analyse their own and publicly available data, providing biologically relevant conclusions and publication-ready figures. SeqCAT can provide information regarding genetic similarities between an arbitrary number of samples, validate specific variants as well as define functionally similar variant groups for further downstream analyses. Its ease of use, installation, complete data-to-conclusions functionality and the inherent flexibility of the R programming language make seqCAT a powerful tool for variant analyses compared to already existing solutions. A publicly available dataset of liver cancer-derived organoids is analysed herein using the seqCAT package, corroborating the original authors' conclusions that the organoids are genetically stable. A previously known liver cancer-related mutation is additionally shown to be present in a sample though it was not listed in the original publication. Differences between DNA- and RNA-based variant calls in this dataset are also analysed revealing a high median concordance of 97.5%. SeqCAT is an open source software under a MIT licence available at https://bioconductor.org/packages/release/bioc/html/seqCAT.html.


Sign in / Sign up

Export Citation Format

Share Document