HTSeq - A Python framework to work with high-throughput sequencing data

Mapping Intimacies ◽

10.1101/002824 ◽

2014 ◽

Cited By ~ 242

Author(s):

Simon Anders ◽

Paul Theodor Pyl ◽

Wolfgang Huber

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Rapid Development ◽

Differential Expression Analysis ◽

Rna Seq ◽

Sequencing Data ◽

Standard Work ◽

Data Formats ◽

High Throughput Sequencing Data ◽

Python Package

Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard work flows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data such as genomic coordinates, sequences, sequencing reads, alignments, gene model information, variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability: HTSeq is released as open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index, https://pypi.python.org/pypi/HTSeq

Download Full-text

Application of High-Throughput Sequencing Data Mining in Comparison of Gene Expression Profile in Renal Cell Carcinoma and Normal Renal Cell by RNA-Seq

Lecture Notes in Electrical Engineering - Innovative Computing ◽

10.1007/978-981-15-5959-4_45 ◽

2020 ◽

pp. 359-365

Author(s):

Yunhai Yu ◽

Hongmei Xu ◽

Shaoning Guo ◽

Na Wang

Keyword(s):

Gene Expression ◽

Data Mining ◽

Renal Cell Carcinoma ◽

Cell Carcinoma ◽

High Throughput ◽

Renal Cell ◽

High Throughput Sequencing ◽

Rna Seq ◽

Sequencing Data ◽

High Throughput Sequencing Data

Download Full-text

Integrative analyses of transcriptome data reveal the mechanisms of post-transcriptional regulation

Briefings in Functional Genomics ◽

10.1093/bfgp/elab004 ◽

2021 ◽

Author(s):

Jinkai Wang

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Rna Binding ◽

Rna Binding Proteins ◽

Rapid Development ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Public Resources ◽

Integrative Analyses ◽

Post Transcriptional Regulation

Abstract Post-transcriptional processing of RNAs plays important roles in a variety of physiological and pathological processes. These processes can be precisely controlled by a series of RNA binding proteins and cotranscriptionally regulated by transcription factors as well as histone modifications. With the rapid development of high-throughput sequencing techniques, multiomics data have been broadly used to study the mechanisms underlying the important biological processes. However, how to use these high-throughput sequencing data to elucidate the fundamental regulatory roles of post-transcriptional processes is still of great challenge. This review summarizes the regulatory mechanisms of post-transcriptional processes and the general principles and approaches to dissect these mechanisms by integrating multiomics data as well as public resources.

Download Full-text

Building Genomic Analysis Pipelines in a Hackathon Setting with Bioinformatician Teams: DNA-seq, Epigenomics, Metagenomics and RNA-seq

10.1101/018085 ◽

2015 ◽

Cited By ~ 3

Author(s):

Ben Busby ◽

Allissa Dillman ◽

Claire L. Simpson ◽

Ian Fingerman ◽

Sijung Yun ◽

...

Keyword(s):

Web Service ◽

High Throughput ◽

High Throughput Sequencing ◽

Genomic Analysis ◽

National Institutes Of Health ◽

Rna Seq ◽

Sequencing Data ◽

Collaborative Software ◽

High Throughput Sequencing Data ◽

Collaborative Software Development

We assembled teams of genomics professionals to assess whether we could rapidly develop pipelines to answer biological questions commonly asked by biologists and others new to bioinformatics by facilitating analysis of high-throughput sequencing data. In January 2015, teams were assembled on the National Institutes of Health (NIH) campus to address questions in the DNA-seq, epigenomics, metagenomics and RNA-seq subfields of genomics. The only two rules for this hackathon were that either the data used were housed at the National Center for Biotechnology Information (NCBI) or would be submitted there by a participant in the next six months, and that all software going into the pipeline was open-source or open-use. Questions proposed by organizers, as well as suggested tools and approaches, were distributed to participants a few days before the event and were refined during the event. Pipelines were published on GitHub, a web service providing publicly available, free-usage tiers for collaborative software development (https://github.com/features/). The code was published at https://github.com/DCGenomics/ with separate repositories for each team, starting with hackathon_v001.

Download Full-text

The simple fool's guide to population genomics via RNA ‐Seq: an introduction to high‐throughput sequencing data analysis

Molecular Ecology Resources ◽

10.1111/1755-0998.12003 ◽

2012 ◽

Vol 12 (6) ◽

pp. 1058-1067 ◽

Cited By ~ 167

Author(s):

Pierre Wit ◽

Melissa H. Pespeni ◽

Jason T. Ladner ◽

Daniel J. Barshis ◽

François Seneca ◽

...

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Population Genomics ◽

Rna Seq ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Sequencing Data Analysis

Download Full-text

Faculty Opinions recommendation of Coalescent Inference Using Serially Sampled, High-Throughput Sequencing Data from Intrahost HIV Infection.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726132071.793531014 ◽

2017 ◽

Author(s):

Sarah Rowland-Jones ◽

Sophie Andrews

Keyword(s):

Hiv Infection ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data

Download Full-text

BlindCall: ultra-fast base-calling of high-throughput sequencing data by blind deconvolution

Bioinformatics ◽

10.1093/bioinformatics/btu010 ◽

2014 ◽

Vol 30 (9) ◽

pp. 1214-1219 ◽

Cited By ~ 6

Author(s):

C. Ye ◽

C. Hsiao ◽

H. Corrada Bravo

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Blind Deconvolution ◽

Sequencing Data ◽

Base Calling ◽

High Throughput Sequencing Data

Download Full-text

Great differences in performance and outcome of high-throughput sequencing data analysis platforms for fungal metabarcoding

MycoKeys ◽

10.3897/mycokeys.39.28109 ◽

2018 ◽

Vol 39 ◽

pp. 29-40 ◽

Cited By ~ 21

Author(s):

Sten Anslan ◽

R. Henrik Nilsson ◽

Christian Wurzbacher ◽

Petr Baldrian ◽

Leho Tedersoo ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Computation Time ◽

Potential Effect ◽

Data Sets ◽

Sequencing Data ◽

Operational Taxonomic Units ◽

High Throughput Sequencing Data ◽

Recent Developments

Along with recent developments in high-throughput sequencing (HTS) technologies and thus fast accumulation of HTS data, there has been a growing need and interest for developing tools for HTS data processing and communication. In particular, a number of bioinformatics tools have been designed for analysing metabarcoding data, each with specific features, assumptions and outputs. To evaluate the potential effect of the application of different bioinformatics workflow on the results, we compared the performance of different analysis platforms on two contrasting high-throughput sequencing data sets. Our analysis revealed that the computation time, quality of error filtering and hence output of specific bioinformatics process largely depends on the platform used. Our results show that none of the bioinformatics workflows appears to perfectly filter out the accumulated errors and generate Operational Taxonomic Units, although PipeCraft, LotuS and PIPITS perform better than QIIME2 and Galaxy for the tested fungal amplicon dataset. We conclude that the output of each platform requires manual validation of the OTUs by examining the taxonomy assignment values.

Download Full-text

Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis

Genomics ◽

10.1016/j.ygeno.2017.01.005 ◽

2017 ◽

Vol 109 (2) ◽

pp. 83-90 ◽

Cited By ~ 44

Author(s):

Yan Guo ◽

Yulin Dai ◽

Hui Yu ◽

Shilin Zhao ◽

David C. Samuels ◽

...

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Sequencing Data Analysis

Download Full-text

SEED 2: a user-friendly platform for amplicon high-throughput sequencing data analyses

Bioinformatics ◽

10.1093/bioinformatics/bty071 ◽

2018 ◽

Vol 34 (13) ◽

pp. 2292-2294 ◽

Cited By ~ 59

Author(s):

Tomáš Větrovský ◽

Petr Baldrian ◽

Daniel Morais

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

Data Analyses ◽

High Throughput Sequencing Data ◽

User Friendly

Download Full-text

Computational Analysis of High Throughput Sequencing Data

Methods in Molecular Biology - Bioinformatics for Omics Data ◽

10.1007/978-1-61779-027-0_9 ◽

2011 ◽

pp. 199-217 ◽

Cited By ~ 4

Author(s):

Steve Hoffmann

Keyword(s):

High Throughput ◽

Computational Analysis ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data

Download Full-text