HLA Typing from RNA Sequencing and Applications to Cancer

Author(s):  
Rose Orenbuch ◽  
Ioan Filip ◽  
Raul Rabadan
Keyword(s):  
2018 ◽  
Author(s):  
Rose Orenbuch ◽  
Ioan Filip ◽  
Devon Comito ◽  
Jeffrey Shaman ◽  
Itsik Pe'er ◽  
...  

Human leukocyte antigen (HLA) locus makes up the major compatibility complex (MHC) and plays a critical role in host response to disease, including cancers and autoimmune disorders. In the clinical setting, HLA typing is necessary for determining tissue compatibility. Recent improvements in the quality and accessibility of next-generation sequencing have made HLA typing from standard short-read data practical. However, this task remains challenging given the high level of polymorphism and homology between the HLA genes. HLA typing from RNA sequencing is further complicated by post-transcriptional splicing and bias due to amplification. Here, we present arcasHLA: a fast and accurate in silico tool that infers HLA genotypes from RNA sequencing data. Our tool outperforms established tools on the gold-standard benchmark dataset for HLA typing in terms of both accuracy and speed, with an accuracy rate of 100% at two field precision for MHC class I genes, and over 99.7% for MHC class II. Importantly, arcasHLA takes as its input pre-aligned BAM files, and outputs three-field resolution for all HLA genes in less than 2 minutes. Finally, we discuss evaluate the performance of our tool on a new biological dataset of 447 single-end total RNA samples from nasopharyngeal swabs, and establish the applicability of arcasHLA in metatranscriptome studies. arcasHLA is available at https://github.com/RabadanLab/arcasHLA.


2019 ◽  
Vol 36 (1) ◽  
pp. 33-40 ◽  
Author(s):  
Rose Orenbuch ◽  
Ioan Filip ◽  
Devon Comito ◽  
Jeffrey Shaman ◽  
Itsik Pe’er ◽  
...  

Abstract Motivation The human leukocyte antigen (HLA) locus plays a critical role in tissue compatibility and regulates the host response to many diseases, including cancers and autoimmune di3orders. Recent improvements in the quality and accessibility of next-generation sequencing have made HLA typing from standard short-read data practical. However, this task remains challenging given the high level of polymorphism and homology between HLA genes. HLA typing from RNA sequencing is further complicated by post-transcriptional modifications and bias due to amplification. Results Here, we present arcasHLA: a fast and accurate in silico tool that infers HLA genotypes from RNA-sequencing data. Our tool outperforms established tools on the gold-standard benchmark dataset for HLA typing in terms of both accuracy and speed, with an accuracy rate of 100% at two-field resolution for Class I genes, and over 99.7% for Class II. Furthermore, we evaluate the performance of our tool on a new biological dataset of 447 single-end total RNA samples from nasopharyngeal swabs, and establish the applicability of arcasHLA in metatranscriptome studies. Availability and implementation arcasHLA is available at https://github.com/RabadanLab/arcasHLA. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 12 ◽  
Author(s):  
Tiira Johansson ◽  
Satu Koskela ◽  
Dawit A. Yohannes ◽  
Jukka Partanen ◽  
Päivi Saavalainen

Identification of human leukocyte antigen (HLA) alleles from next-generation sequencing (NGS) data is challenging because of the high polymorphism and mosaic nature of HLA genes. Owing to the complex nature of HLA genes and consequent challenges in allele assignment, Oxford Nanopore Technologies’ (ONT) single-molecule sequencing technology has been of great interest due to its fitness for sequencing long reads. In addition to the read length, ONT’s advantages are its portability and possibility for a rapid real-time sequencing, which enables a simultaneous data analysis. Here, we describe a targeted RNA-based method for HLA typing using ONT sequencing and SeqNext-HLA SeqPilot software (JSI Medical Systems GmbH). Twelve classical HLA genes were enriched from cDNA of 50 individuals, barcoded, pooled, and sequenced in 10 MinION R9.4 SpotON flow cell runs producing over 30,000 reads per sample. Using barcoded 2D reads, SeqPilot assigned HLA alleles to two-field typing resolution or higher with the average read depth of 1750x. Sequence analysis resulted in 99–100% accuracy at low-resolution level (one-field) and in 74–100% accuracy at high-resolution level (two-field) with the expected alleles. There are still some limitations with ONT RNA sequencing, such as noisy reads, homopolymer errors, and the lack of robust algorithms, which interfere with confident allele assignment. These issues need to be inspected carefully in the future to improve the allele call rates. Nevertheless, here we show that sequencing of multiplexed cDNA amplicon libraries on ONT MinION can produce accurate high-resolution typing results of 12 classical HLA loci. For HLA research, ONT RNA sequencing is a promising method due to its capability to sequence full-length HLA transcripts. In addition to HLA genotyping, the technique could also be applied for simultaneous expression analysis.


2021 ◽  
Author(s):  
Ram Ayyala ◽  
Junghyun Jung ◽  
Sergey Knyazev ◽  
SERGHEI MANGUL

Although precise identification of the human leukocyte antigen (HLA) allele is crucial for various clinical and research applications, HLA typing remains challenging due to high polymorphism of the HLA loci. However, with Next-Generation Sequencing (NGS) data becoming widely accessible, many computational tools have been developed to predict HLA types from RNA sequencing (RNA-seq) data. However, there is a lack of comprehensive and systematic benchmarking of RNA-seq HLA callers using large-scale and realist gold standards. In order to address this limitation, we rigorously compared the performance of 12 HLA callers over 50,000 HLA tasks including searching 30 pairwise combinations of HLA callers and reference in over 1,500 samples. In each case, we produced evaluation metrics of accuracy that is the percentage of correctly predicted alleles (two and four-digit resolution) based on six gold standard datasets spanning 650 RNA-seq samples. To determine the influence of the relationship of the read length over the HLA region on prediction quality using each tool, we explored the read length effect by considering read length in the range 37-126 bp, which was available in our gold standard datasets. Moreover, using the Genotype-Tissue Expression (GTEx) v8 data, we carried out evaluation metrics by calculating the concordance of the same HLA type across different tissues from the same individual to evaluate how well the HLA callers can maintain consistent results across various tissues of the same individual. This study offers crucial information for researchers regarding appropriate choices of methods for an HLA analysis.


2007 ◽  
Vol 38 (11) ◽  
pp. 15
Author(s):  
BRUCE JANCIN
Keyword(s):  

1997 ◽  
Vol 56 (1-3) ◽  
pp. 313
Author(s):  
A Jurado
Keyword(s):  

Diabetes ◽  
2020 ◽  
Vol 69 (Supplement 1) ◽  
pp. 41-OR
Author(s):  
FARNAZ SHAMSI ◽  
MARY PIPER ◽  
LI-LUN HO ◽  
TIAN LIAN HUANG ◽  
YU-HUA TSENG

Sign in / Sign up

Export Citation Format

Share Document