neoantigenR: An annotation based pipeline for tumor neoantigen identification from sequencing data

AbstractStudies indicate that more than 90% of human genes are alternatively spliced, suggesting the complexity of the transcriptome assembly and analysis. The splicing process is often disrupted, resulting in both functional and non-functional end-products (Sveen et al. 2016) in many cancers. Harnessing the immune system to fight against malignant cancers carrying aberrantly mutated or spliced products is becoming a promising approach to cancer therapy. Advances in immune checkpoint blockade have elicited adaptive immune responses with promising clinical responses to treatments against human malignancies (Tumor Neoantigens in Personalized Cancer Immunotherapy 2017). Emerging data suggest that recognition of patient-specific mutation-associated cancer antigens (i.e. from alternative splicing isoforms) may allow scientists to dissect the immune response in the activity of clinical immunotherapies (Schumacher and Schreiber 2015). The advent of high-throughput sequencing technology has provided a comprehensive view of both splicing aberrations and somatic mutations across a range of human malignancies, allowing for a deeper understanding of the interplay of various disease mechanisms.Meanwhile, studies show that the number of transcript isoforms reported to date may be limited by the short-read sequencing due to the inherit limitation of transcriptome reconstruction algorithms, whereas long-read sequencing is able to significantly improve the detection of alternative splicing variants since there is no need to assemble full-length transcripts from short reads. The analysis of these high-throughput long-read sequencing data may permit a systematic view of tumor specific peptide epitopes (also known as neoantigens) that could serve as targets for immunotherapy (Tumor Neoantigens in Personalized Cancer Immunotherapy 2017).Currently, there is no software pipeline available that can efficiently produce mutation-associated cancer antigens from raw high-throughput sequencing data on patient tumor DNA (The Problem with Neoantigen Prediction 2017). In addressing this issue, we introduce a R package that allows the discoveries of peptide epitope candidates, which are the tumor-specific peptide fragments containing potential functional neoantigens. These peptide epitopes consist of structure variants including insertion, deletions, alternative sequences, and peptides from nonsynonymous mutations. Analysis of these precursor candidates with widely used tools such as netMHC allows for the accurate in-silico prediction of neoantigens. The pipeline named neoantigeR is currently hosted in https://github.com/ICBI/neoantigeR.

Download Full-text

rMAPS2: an update of the RNA map analysis and plotting server for alternative splicing regulation

Nucleic Acids Research ◽

10.1093/nar/gkaa237 ◽

2020 ◽

Vol 48 (W1) ◽

pp. W300-W306 ◽

Cited By ~ 2

Author(s):

Jae Y Hwang ◽

Sungbo Jung ◽

Tae L Kook ◽

Eric C Rouchka ◽

Jinwoong Bok ◽

...

Keyword(s):

Alternative Splicing ◽

High Throughput ◽

Splice Site ◽

High Throughput Sequencing ◽

Rna Binding ◽

Rna Binding Protein ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Retained Intron ◽

Map Analysis

Abstract The rMAPS2 (RNA Map Analysis and Plotting Server 2) web server, freely available at http://rmaps.cecsresearch.org/, has provided the high-throughput sequencing data research community with curated tools for the identification of RNA binding protein sites. rMAPS2 analyzes differential alternative splicing or CLIP peak data obtained from high-throughput sequencing data analysis tools like MISO, rMATS, Piranha, PIPE-CLIP and PARalyzer, and then, graphically displays enriched RNA-binding protein target sites. The initial release of rMAPS focused only on the most common alternative splicing event, skipped exon or exon skipping. However, there was a high demand for the analysis of other major types of alternative splicing events, especially for retained intron events since this is the most common type of alternative splicing in plants, such as Arabidopsis thaliana. Here, we expanded the implementation of rMAPS2 to facilitate analyses for all five major types of alternative splicing events: skipped exon, mutually exclusive exons, alternative 5′ splice site, alternative 3′ splice site and retained intron. In addition, by employing multi-threading, rMAPS2 has vastly improved the user experience with significant reductions in running time, ∼3.5 min for the analysis of all five major alternative splicing types at once.

Download Full-text

Statistical and Computational Methods for High-Throughput Sequencing Data Analysis of Alternative Splicing

Statistics in Biosciences ◽

10.1007/s12561-012-9064-7 ◽

2012 ◽

Vol 5 (1) ◽

pp. 138-155 ◽

Cited By ~ 9

Author(s):

Liang Chen

Keyword(s):

Data Analysis ◽

Alternative Splicing ◽

Computational Methods ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Sequencing Data Analysis

Download Full-text

Faculty Opinions recommendation of Coalescent Inference Using Serially Sampled, High-Throughput Sequencing Data from Intrahost HIV Infection.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726132071.793531014 ◽

2017 ◽

Author(s):

Sarah Rowland-Jones ◽

Sophie Andrews

Keyword(s):

Hiv Infection ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data

Download Full-text

BlindCall: ultra-fast base-calling of high-throughput sequencing data by blind deconvolution

Bioinformatics ◽

10.1093/bioinformatics/btu010 ◽

2014 ◽

Vol 30 (9) ◽

pp. 1214-1219 ◽

Cited By ~ 6

Author(s):

C. Ye ◽

C. Hsiao ◽

H. Corrada Bravo

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Blind Deconvolution ◽

Sequencing Data ◽

Base Calling ◽

High Throughput Sequencing Data

Download Full-text

Great differences in performance and outcome of high-throughput sequencing data analysis platforms for fungal metabarcoding

MycoKeys ◽

10.3897/mycokeys.39.28109 ◽

2018 ◽

Vol 39 ◽

pp. 29-40 ◽

Cited By ~ 21

Author(s):

Sten Anslan ◽

R. Henrik Nilsson ◽

Christian Wurzbacher ◽

Petr Baldrian ◽

Leho Tedersoo ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Computation Time ◽

Potential Effect ◽

Data Sets ◽

Sequencing Data ◽

Operational Taxonomic Units ◽

High Throughput Sequencing Data ◽

Recent Developments

Along with recent developments in high-throughput sequencing (HTS) technologies and thus fast accumulation of HTS data, there has been a growing need and interest for developing tools for HTS data processing and communication. In particular, a number of bioinformatics tools have been designed for analysing metabarcoding data, each with specific features, assumptions and outputs. To evaluate the potential effect of the application of different bioinformatics workflow on the results, we compared the performance of different analysis platforms on two contrasting high-throughput sequencing data sets. Our analysis revealed that the computation time, quality of error filtering and hence output of specific bioinformatics process largely depends on the platform used. Our results show that none of the bioinformatics workflows appears to perfectly filter out the accumulated errors and generate Operational Taxonomic Units, although PipeCraft, LotuS and PIPITS perform better than QIIME2 and Galaxy for the tested fungal amplicon dataset. We conclude that the output of each platform requires manual validation of the OTUs by examining the taxonomy assignment values.

Download Full-text

Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis

Genomics ◽

10.1016/j.ygeno.2017.01.005 ◽

2017 ◽

Vol 109 (2) ◽

pp. 83-90 ◽

Cited By ~ 44

Author(s):

Yan Guo ◽

Yulin Dai ◽

Hui Yu ◽

Shilin Zhao ◽

David C. Samuels ◽

...

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Sequencing Data Analysis

Download Full-text

Accurate fetal variant calling in the presence of maternal cell contamination

10.1101/552414 ◽

2019 ◽

Cited By ~ 1

Author(s):

Elena Nabieva ◽

Satyarth Mishra Sharma ◽

Yermek Kapushev ◽

Sofya K. Garushyants ◽

Anna V. Fedotova ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Chorionic Villus ◽

Genetic Diagnosis ◽

Variant Calling ◽

Data Availability ◽

Training Data ◽

Sequencing Data ◽

Maternal Cell ◽

Fetal Dna

AbstractHigh-throughput sequencing of fetal DNA is a promising and increasingly common method for the discovery of all (or all coding) genetic variants in the fetus, either as part of prenatal screening or diagnosis, or for genetic diagnosis of spontaneous abortions. In many cases, the fetal DNA (from chorionic villi, amniotic fluid, or abortive tissue) can be contaminated with maternal cells, resulting in the mixture of fetal and maternal DNA. This maternal cell contamination (MCC) undermines the assumption, made by traditional variant callers, that each allele in a heterozygous site is covered, on average, by 50% of the reads, and therefore can lead to erroneous genotype calls. We present a panel of methods for reducing the genotyping error in the presence of MCC. All methods start with the output of GATK HaplotypeCaller on the sequencing data for the (contaminated) fetal sample and both of its parents, and additionally rely on information about the MCC fraction (which itself is readily estimated from the high-throughput sequencing data). The first of these methods uses a Bayesian probabilistic model to correct the fetal genotype calls produced by MCC-unaware HaplotypeCaller. The other two methods “learn” the genotype-correction model from examples. We use simulated contaminated fetal data to train and test the models. Using the test sets, we show that all three methods lead to substantially improved accuracy when compared with the original MCC-unaware HaplotypeCaller calls. We then apply the best-performing method to three chorionic villus samples from spontaneously terminated pregnancies.Code and training data availabilityhttps://github.com/bazykinlab/ML-maternal-cell-contamination

Download Full-text

HTSeq - A Python framework to work with high-throughput sequencing data

10.1101/002824 ◽

2014 ◽

Cited By ~ 242

Author(s):

Simon Anders ◽

Paul Theodor Pyl ◽

Wolfgang Huber

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Rapid Development ◽

Differential Expression Analysis ◽

Rna Seq ◽

Sequencing Data ◽

Standard Work ◽

Data Formats ◽

High Throughput Sequencing Data ◽

Python Package

Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard work flows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data such as genomic coordinates, sequences, sequencing reads, alignments, gene model information, variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability: HTSeq is released as open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index, https://pypi.python.org/pypi/HTSeq

Download Full-text