Higher Order Restrictions of the Immunoglobulin Repertoire in CLL: The Illustrative Case of Stereotyped Subsets #2 and #169

Blood ◽  
2019 ◽  
Vol 134 (Supplement_1) ◽  
pp. 5453-5453
Author(s):  
Katerina Gemenetzi ◽  
Andreas Agathangelidis ◽  
Fotis Psomopoulos ◽  
Karla Plevova ◽  
Lesley-Ann Sutton ◽  
...  

Stereotyped subset #2 (IGHV3-21/IGLV3-21) is the largest subset in CLL (~3% of all patients). Membership in subset #2 is clinically relevant since these patients experience an aggressive disease irrespective of the somatic hypermutation (SHM) status of the clonotypic immunoglobulin heavy variable (IGHV) gene. Low-throughput evidence suggests that stereotyped subset #169, a minor CLL subset (~0.2% of all CLL), resembles subset #2 at the immunogenetic level. More specifically: (i) the clonotypic heavy chain (HC) of subset #169 is encoded by the IGHV3-48 gene which is closely related to the IGHV3-21 gene; (ii) both subsets carry VH CDR3s comprising 9-amino acids (aa) with a conserved aspartic acid (D) at VH CDR3 position 3; (iii) both subsets bear light chains (LC) encoded by the IGLV3-21 gene with a restricted VL CDR3; and, (iv) both subsets have borderline SHM status. Here we comprehensively assessed the ontogenetic relationship between CLL subsets #2 and #169 by analyzing their immunogenetic signatures. Utilizing next-generation sequencing (NGS) we studied the HC and LC gene rearrangements of 6 subset #169 patients and 20 subset #2 cases. In brief, IGHV-IGHD-IGHJ and IGLV-IGLJ gene rearrangements were RT-PCR amplified using subgroup-specific leader primers as well as IGHJ and IGLC primers, respectively. Libraries were sequenced on the MiSeq Illumina instrument. IG sequence annotation was performed with IMGT/HighV-QUEST and metadata analysis conducted using an in-house, validated bioinformatics pipeline. Rearrangements with identical CDR3 aa sequences were herein defined as clonotypes, whereas clonotypes with different aa substitutions within the V-domain were defined as subclones. For the HC analysis of subset #169, we obtained 894,849 productive sequences (mean: 127,836, range: 87,509-208,019). On average, each analyzed sample carried 54 clonotypes (range: 44-68); the dominant clonotype had a mean frequency of 99.1% (range: 98.8-99.2%) and displayed considerable intraclonal heterogeneity with a mean of 2,641 subclones/sample (range: 1,566-6,533). For the LCs of subset #169, we obtained 2,096,728 productive sequences (mean: 299,533, range: 186,637-389,258). LCs carried a higher number of distinct clonotypes/sample compared to their partner HCs (mean: 148, range: 110-205); the dominant clonotype had a mean frequency of 98.1% (range: 97.2-98.6%). Intraclonal heterogeneity was also observed in the LCs, with a mean of 6,325 subclones/sample (range: 4,651-11,444), hence more pronounced than in their partner HCs. Viewing each of the cumulative VH and VL CDR3 sequence datasets as a single entity branching through diversification enabled the identification of common sequences. In particular, 2 VH clonotypes were present in 3/6 cases, while a single VL clonotype was present in all 6 cases, albeit at varying frequencies; interestingly, this VL CDR3 sequence was also detected in all subset #2 cases, underscoring the molecular similarities between the two subsets. Focusing on SHM, the following observations were made: (i) the frequent 3-nucleotide (AGT) deletion evidenced in the VH CDR2 of subset #2 (leading to the deletion of one of 5 consecutive serine residues) was also detected in all subset #169 cases at subclonal level (average: 6% per sample, range: 0.1-10.8%); of note, the 5-serine stretch is also present in the germline VH CDR2 of the IGHV3-48 gene; (ii) the R-to-G substitution at the VL-CL linker, a ubiquitous SHM in subset #2 and previously reported as critical for IG self-association leading to cell autonomous signaling in this subset, was present in all subset #169 samples as a clonal event with a mean frequency of 98.3%; and, finally, (iii) the S-to-G substitution at position 6 of the VL CDR3, present in all subset #2 cases (mean : 44.2% ,range: 6.3-87%), was also found in all #169 samples, representing a clonal event in 1 case (97.2% of all clonotypes) and a subclonal event in the remaining 5 cases (mean: 0.6%, range: 0.4-1.1%). In conclusion, the present high-throughput sequencing data cements the immunogenetic relatedness of CLL stereotyped subsets #2 and #169, further highlighting the role of antigen selection throughout their natural history. These findings also argue for a similar pathophysiology for these subsets that could also be reflected in a similar clonal behavior, with implications for risk stratification. Disclosures Sutton: Abbvie: Honoraria; Gilead: Honoraria; Janssen: Honoraria. Stamatopoulos:Abbvie: Honoraria, Research Funding; Janssen: Honoraria, Research Funding. Chatzidimitriou:Janssen: Honoraria.

Blood ◽  
2019 ◽  
Vol 134 (Supplement_1) ◽  
pp. 4277-4277 ◽  
Author(s):  
Katerina Gemenetzi ◽  
Andreas Agathangelidis ◽  
Fotis Psomopoulos ◽  
Kostas Pasentsis ◽  
Evdoxia Koravou ◽  
...  

Classification of patients with chronic lymphocytic leukemia (CLL) based on the immunoglobulin heavy variable (IGHV) gene somatic hypermutation (SHM) status has established predictive and prognostic relevance. The SHM status is assessed based on the number of mutations within the sequence of the rearranged IGHV gene excluding the VH CDR3. This is mostly due to the difficulty in discriminating actual SHM from random nucleotides added between the recombined IGHV, IGHD and IGHJ genes. Hence, this approach may underestimate the true impact of SHM, in fact overlooking the most critical region for antigen-antibody interactions i.e. the VH CDR3. Relevant to mention in this respect, studies from our group in CLL with mutated IGHV genes (M-CLL), particularly subset #4, have revealed considerable intra-VH CDR3 diversity attributed to ongoing SHM. Prompted by these findings, here we investigated whether SHM may also be present in cases bearing 'truly unmutated' IGHV genes (i.e. 100% germline identity across VH FR1-VH FR3), focusing on two well characterized stereotyped subsets i.e. subset #1 (IGHV clan I/IGHD6-19/IGHJ4) and subset #6 (IGHV1-69/IGHD3-16/IGHJ3). These subsets carry germline-encoded amino acid (aa) motifs within the VH CDR3, namely QWL and YDYVWGSY, originating from the IGHD6-19 and IGHD3-16 gene, respectively. However, in both subsets, cases exist with variations in these motifs that could potentially represent SHM. The present study included 12 subset #1 and 5 subset #6 patients with clonotypic IGHV genes lacking any SHM (100% germline identity). IGHV-IGHD-IGHJ gene rearrangements were RT-PCR amplified by subgroup-specific leader primers and a high-fidelity polymerase in order to ensure high data quality. RT-PCR products were subjected to paired-end NGS on the MiSeq platform. Sequence annotation was performed with IMGT/HighV-QUEST and metadata analysis was undertaken using an in-house purpose-built bioinformatics pipeline. Rearrangements with the same IGHV gene and identical VH CDR3 aa sequences were defined as clonotypes. Overall, we obtained 1,570,668 productive reads with V-region identity 99-100%; of these, 1,232,958 (mean: 102,746, range: 20,796-242,519) concerned subset #1 while 337,710 (mean: 67,542, range: 50,403-79,683) concerned subset #6. On average, 64.4% (range: 1.7-77.5%) of subset #1 reads and 49.2% (range: 0.7-70%) of subset #6 reads corresponded to rearrangements with IGHV genes lacking any SHM (100% germline identity). Clonotype computation revealed 1,831 and 1,048 unique clonotypes for subset #1 and #6, respectively. Subset #1 displayed a mean of 157 distinct clonotypes per sample (range: 74-267), with the dominant clonotype having a mean frequency of 96.9% (range: 96-98.2%). Of note, 44 clonotypes were shared between different patients (albeit at varying frequencies), including the dominant clonotype of 11/12 cases, which was present in 2-6 additional subset #1 patients. Subset #6 cases carried a higher number of distinct clonotypes per sample (mean: 219, range: 189-243) while the dominant clonotype had a mean frequency of 95.6% (range: 94.5-96.5%). Shared clonotypes (n=30) were identified also in subset #6 and the dominant clonotype of each subset #6 case was present in 3-5 additional subset #6 patients. Focusing on the VH CDR3, in particular the IGHD-encoded part, the following observations were made: (1) in both subsets, extensive intra-VH CDR3 variation was detected at certain positions within the IGHD gene; (2) in most cases, the observed aa substitutions were conservative i.e. concerned aa sharing similar physicochemical properties. Particularly noteworthy in this respect were the observations in subset #6 that: (i) the valine residue (V) in the D-derived YDYVWGSY motif was very frequently mutated to another aliphatic residue (A, I, L); (ii) in cases were the predominant clonotype carried I (also in the Sanger-derived sequence), several minor clonotypes carried the germline-encoded V, compelling evidence that the observed substitution concerned true SHM. In conclusion, we provide immunogenetic evidence for intra-VH CDR3 variations, very likely attributed to SHM, in CLL patients carrying 'truly unmutated' IGHV genes. While the prognostic/predictive relevance of this observation is beyond the scope of the present work, our findings highlight the possible need to reappraise definitions ('semantics') regarding SHM status in CLL. Disclosures Stamatopoulos: Janssen: Honoraria, Research Funding; Abbvie: Honoraria, Research Funding. Chatzidimitriou:Janssen: Honoraria.


2021 ◽  
Author(s):  
H. Serhat Tetikol ◽  
Kubra Narci ◽  
Deniz Turgut ◽  
Gungor Budak ◽  
Ozem Kalay ◽  
...  

ABSTRACTGraph-based genome reference representations have seen significant development, motivated by the inadequacy of the current human genome reference for capturing the diverse genetic information from different human populations and its inability to maintain the same level of accuracy for non-European ancestries. While there have been many efforts to develop computationally efficient graph-based bioinformatics toolkits, how to curate genomic variants and subsequently construct genome graphs remains an understudied problem that inevitably determines the effectiveness of the end-to-end bioinformatics pipeline. In this study, we discuss major obstacles encountered during graph construction and propose methods for sample selection based on population diversity, graph augmentation with structural variants and resolution of graph reference ambiguity caused by information overload. Moreover, we present the case for iteratively augmenting tailored genome graphs for targeted populations and test the proposed approach on the whole-genome samples of African ancestry. Our results show that, as more representative alternatives to linear or generic graph references, population-specific graphs can achieve significantly lower read mapping errors, increased variant calling sensitivity and provide the improvements of joint variant calling without the need of computationally intensive post-processing steps.


Blood ◽  
2021 ◽  
Vol 138 (Supplement 1) ◽  
pp. 1314-1314
Author(s):  
Michael Svaton ◽  
Aneta Skotnicova ◽  
Leona Reznickova ◽  
Andrea Rennerova ◽  
Tatana Valova ◽  
...  

Abstract Together with multicolor flow cytometry, quantitation of clonal immunoglobulin (IG) and T-cell receptor (TR) gene rearrangements represents the current standard for the detection of minimal / measurable residual disease (MRD) in treatment protocols for pediatric acute lymphoblastic leukemia (ALL) patients. Despite the adoption of next generation sequencing (NGS) in the routine identification of clonal IG/TR gene rearrangements as markers for MRD detection, real-time quantitative (q)PCR is still the standard for MRD quantitation in follow-up samples. So far, no large-scale direct comparison of qPCR- and NGS-based MRD quantitation has been performed. We compared qPCR- and NGS-MRD evaluation in a cohort of children with B-cell precursor (BCP) ALL treated on the AIEOP-BFM ALL 2009 protocol and assessed the feasibility and relevance of this method for the stratification at day 33 (EOI). In total, 459 patients were diagnosed with BCP-ALL from 2010 to 2018, and 437 of them were included in our study based on the availability of residual DNA material isolated from day 33 bone marrow aspirates and having at least one IG/TR MRD marker detectable by standard qPCR with protocol-required sensitivity of 10 -4. Sequencing libraries were prepared according to the EuroClonality-NGS group SOP (Brüggemann et al, Leukemia 2019) with the total DNA input normalized to the equivalent of 150,000 nucleated cells to reach MRD sensitivity of 10 -5 and sequenced on Illumina NovaSeq and MiSeq instruments. In total of 780 IG/TR markers evaluated by both NGS and qPCR. Sequencing data were analyzed using the ARResT/Interrogate (Bystry et at, Bioinformatics 2017) pipeline and a custom bioinformatic analysis process and the NGS-MRD results were normalized to the EuroClonality-NGS central in-tube quality/quantification control (cIT-QC; Knecht et al, Leukemia 2019). From the total 780 IG/TR MRD markers evaluated by both methods, 629 (80.6%) were concordant with 242 markers being MRD positive and 387 negative. From 82 markers that were only positive by qPCR and not by NGS, 76 were positive below the quantitative range (positive non-quantifiable). Specificity analysis was performed for each marker by searching for the junction sequence across the dataset of all patients' NGS results. Based on these results, 22 out of 82 markers positive only by qPCR were classified as potentially unspecific (false positive) and similarly 32 unspecific markers were identified among the 69 positive only by NGS. This was also supported by unspecific amplification of the polyclonal control in 27 out of these 32 corresponding qPCR systems, in some cases leading to qPCR negative classification determined by the EuroMRD guidelines. Overall stratification of patients based only on day 33 MRD by qPCR or NGS was concordant in 76% of patients by both methods, while in 19% of patients, NGS-MRD quantitation led to the assignment to a lower-risk group, mainly due to the elimination of false-positive results. Furthermore, analysis of all positive markers across all patients' NGS libraries showed, that one out of 10 markers (mainly in the IGK, TRG and TRD loci) used for qPCR-MRD stratification did not provide satisfactory specificity, although they fully met EuroMRD criteria during the optimization of qPCR patient-specific assays. Our results show that NGS-MRD is highly concordant with traditional qPCR-based strategy and has comparable sensitivity and clinical value in the setting of a BFM-based clinical protocol, while being less laborious and providing significantly more specific results and additional information on the IG/TR repertoire (Kotrova et al, Blood 2016). Our study also emphasizes the importance of selecting MRD markers of adequate specificity at diagnosis. Currently, this selection can be assisted by these broad sequencing data on IG/TR repertoire of large number of patients. Based on these results, we propose that frontline NGS-MRD evaluation developed by the EuroClonality-NGS working group can be used as an alternative to traditional qPCR-based MRD quantitation in future MRD-based treatment protocols. Supported by grants NU20-03-00284 and NU20-07-00322 from the Czech Health Research Council and 534120 from Charles University. All methods were established through collaboration within the EuroClonality-NGS and EuroMRD groups. Disclosures van der Velden: Agilent: Research Funding; Navigate: Other: Service Level Agreement; Janssen: Other: Service Level Agreement; EuroFlow: Other: Service Level Agreement, Patents & Royalties: for network, not personally; BD Biosciences: Other: Service Level Agreement. Brüggemann: Incyte: Other: Advisory Board; Janssen: Speakers Bureau; Amgen: Other: Advisory Board, Travel support, Research Funding, Speakers Bureau. Langerak: Erasmus MS, University Medical Center: Current Employment; F. Hoffmann-La Roche Ltd/Genentech, Inc.: Research Funding; Gilead: Research Funding; Janssen: Speakers Bureau.


Blood ◽  
2018 ◽  
Vol 132 (Supplement 1) ◽  
pp. 1839-1839 ◽  
Author(s):  
Katerina Gemenetzi ◽  
Andreas Agathangelidis ◽  
Lesley-Ann Sutton ◽  
Elisavet Vlachonikola ◽  
Chrysi Galigalidou ◽  
...  

Abstract Subset #2 is the largest subset carrying stereotyped B cell receptor immunoglobulin (BcR IG) in chronic lymphocytic leukemia (CLL). This particular BcR IG is composed of heavy (HC) and light (LC) chains encoded by the IGHV3-21 and the lambda IGLV3-21 gene, respectively. The clonotypic IGHV3-21 genes display a variable load of somatic hypermutation (SHM), being mostly classified as mutated (M-CLL) but also including unmutated (U-CLL) cases. Subset #2 cases, independently of the SHM status, have a particularly dismal clinical outcome similar to that of patients with TP53 aberrations, although lacking such aberrations. Subset #2 BcR IG display a series of distinctive features, including conservation at certain VH and VL CDR3 positions and recurrent SHMs; as well as a capacity for self-association leading to cell autonomous signaling that is critically dependent on a substitution of Arginine (R) for Glycine (G) introduced by SHM at the lambda VL-CL linker region. These features implicate antigen selection in CLL subset #2 ontogeny. However, the available molecular evidence derives from low throughput immunogenetic analysis, precluding comprehensive assessment of antigenic impact on (sub)clonal composition. Here, we sought to overcome this limitation by performing next-generation sequencing (NGS) of HC and LC gene rearrangements of 20 subset #2 patients. RT-PCR products amplified by the IGHV3-21/IGHJ6 and IGLV3-21/IGLC primer pairs, respectively, were subjected to NGS on the MiSeq Illumina Platform. NGS data was analyzed by a validated bioinformatics pipeline. Rearrangements with identical CDR3 amino acid (aa) sequences were defined as clonotypes, whereas clonotypes with different aa substitutions within the V-domain were defined as subclones. Starting with HCs, we obtained 3,340,508 (mean: 291,751, range: 101,231-186,055) productive reads. On average, each analyzed sample carried 92 distinct clonotypes (range: 71-152), with the dominant clonotype having a mean frequency of 96% (range: 67-99%): in all cases the dominant clonotype was identical to that determined by Sanger sequencing. The dominant clonotype displayed considerable intraclonal heterogeneity with a mean of 5,082 subclones/sample (range: 2,946-11,041). Turning to LCs, we obtained 5,094,045 (mean: 231,547, range: 38,036-507,949) productive reads. LCs carried a higher number of distinct clonotypes/sample compared to their partner HCs (mean 222, range: 156-306). The dominant clonotype had a mean frequency of 96% (range: 74-98%); similar to HCs, it was identical to that determined by Sanger sequencing. Intraclonal heterogeneity was observed in the LCs as well, with a mean of 7,382 subclones/sample (range: 1,946-11,866), hence more pronounced vs their partner HCs. Viewing the entire subset #2 VH or VL CDR3 dataset (i.e. the CDR3 aa sequences from all clonotypes of all cases) as a single entity branching through diversification enabled the identification of 2 distinct VH CDR3 sequences present at varying frequencies in 16 and 13 cases, respectively; and, 3 distinct VL CDR3 sequences present at varying frequencies in all 20 cases: these results allude to important constraints on the composition of the antigen binding site. Focusing on SHM, the following notable observations could be made. (i) The G-to-R substitution at the VL-CL linker was a clonal event in all cases with R being degenerately encoded by different nucleotide sequences; altogether, these findings underscore the seminal role of this recurrent SHM, likely due to mediating self-association. (ii) A recurrent 3-nucleotide deletion was detected in the VH CDR2 of all cases, strongly supporting functional pressure. This change, previously identified by Sanger sequencing as a recurrent SHM in subset #2 (albeit at a frequency of only 25%), was clonal in 4 cases and subclonal in the remainder, where it was present in an average of 105 subclones/sample (range: 1-369). (iii) Certain positions in both the VH and VL domain bore the same aa substitution, mostly at subclonal level: the prime example concerned the G for Serine (S) substitution within the VL CDR3, detected in all samples at a mean frequency of 44.2% (range: 6.3-87%). In conclusion, we provide compelling immunogenetic evidence for functional pressure in the ontogeny of CLL subset #2. On this evidence, subset #2 emerges as perhaps the most striking example of antigen-driven leukemogenesis reported thus far. Disclosures Gemenetzi: Gilead: Research Funding. Agathangelidis:Gilead: Research Funding. Stamatopoulos:Abbvie: Honoraria, Research Funding; Gilead: Honoraria, Research Funding; Janssen: Honoraria, Research Funding. Hadzidimitriou:Abbvie: Research Funding; Gilead: Research Funding; Janssen: Honoraria, Research Funding.


Author(s):  
Susana Posada-Céspedes ◽  
David Seifert ◽  
Ivan Topolsky ◽  
Karin J. Metzner ◽  
Niko Beerenwinkel

AbstractHigh-throughput sequencing technologies are used increasingly, not only in viral genomics research but also in clinical surveillance and diagnostics. These technologies facilitate the assessment of the genetic diversity in intra-host virus populations, which affects transmission, virulence, and pathogenesis of viral infections. However, there are two major challenges in analysing viral diversity. First, amplification and sequencing errors confound the identification of true biological variants, and second, the large data volumes represent computational limitations. To support viral high-throughput sequencing studies, we developed V-pipe, a bioinformatics pipeline combining various state-of-the-art statistical models and computational tools for automated end-to-end analyses of raw sequencing reads. V-pipe supports quality control, read mapping and alignment, low-frequency mutation calling, and inference of viral haplotypes. For generating high-quality read alignments, we developed a novel method, called ngshmmalign, based on profile hidden Markov models and tailored to small and highly diverse viral genomes. V-pipe also includes benchmarking functionality providing a standardized environment for comparative evaluations of different pipeline configurations. We demonstrate this capability by assessing the impact of three different read aligners (Bowtie 2, BWA MEM, ngshmmalign) and two different variant callers (LoFreq, ShoRAH) on the performance of calling single-nucleotide variants in intra-host virus populations. V-pipe supports various pipeline configurations and is implemented in a modular fashion to facilitate adaptations to the continuously changing technology landscape. V-pipe is freely available at https://github.com/cbg-ethz/V-pipe.


Blood ◽  
2014 ◽  
Vol 124 (21) ◽  
pp. 3413-3413
Author(s):  
Christopher S Carlson ◽  
Alfred L. Garfall ◽  
Wenzhao Meng ◽  
Robert Daber ◽  
Bochao Zhang ◽  
...  

Abstract Background: High-throughput sequencing (HTS) of antibody gene rearrangements is an emerging tool for minimal residual disease (MRD) monitoring in B cell malignancies in which the malignant clone harbors a monoclonal Ig heavy chain (IgH) and/or light chain (κ or λ) rearrangement. This approach has shown promise in B-ALL and CLL, but experience with this technique applied to samples from multiple myeloma patients is limited. Approach: We conducted HTS of PCR-amplified IgH (VDJ and DJ) rearrangements from bone marrow aspirates of 21 patients with various plasma cell dyscrasias (MM, MGUS, LPL) and peripheral blood of a patient with plasma cell leukemia. In 17/21 samples, an aliquot was enriched for CD138+ cells by immunomagnetic separation and analyzed separately. Dominant clones from enriched and un-enriched aliquots were compared to verify the malignant clonotype sequence(s). Disease burden in un-enriched samples was also evaluated by microscopy of the bone marrow aspirate smear and ranged from 0 (hemodilute) to 37% plasma cells. Results: In 19 out of 21 samples, a clearly dominant IgH gene rearrangement (>2.5% of total sequences, range 2.9-99.9%) was identified with clear separation from background frequency (at least 2.7-fold higher frequency than next most common clone). In 17/17 cases with paired CD138-enriched samples, the dominant sequences in the enriched and un-enriched samples were identical, indicating successful identification of the malignant clonal Ig rearrangements in the un-enriched sample. More than one IgH rearrangement suitable for longitudinally tracking the malignant clone was identified in 8 of 21 cases. The two cases without an expected, productive IgH rearrangement were IgG-κ and IgG-λ. This suggested that somatic hypermutation (SHM) in the primer binding sites might interfere with some clonal amplifications, so we investigated the degree of SHM in the VH segment of the 19 cases with at least one detected dominant Ig rearrangement. A total of 18 productive VDJ rearrangements were identified, and had SHM frequencies ranging from 2% to 19% in the sequenced portions of the rearranged VH gene. 8 myeloma clones harbored an identifiable DJ rearrangement, none of which showed evidence of SHM. Finally, 3 myeloma clones harbored nonproductive VDJ rearrangements, two with no SHM, and one with 2 SHM in 84 bp of the sequenced VH gene. Conclusion: HTS of Ig heavy and light chain rearrangements can successfully identify the malignant plasma cell clone in clinical specimens, including those with low disease burden and significant SHM. Application of this technique to MRD evaluation in multiple myeloma warrants further development. Disclosures Carlson: Adaptive Biotechnologies: Consultancy, Equity Ownership. Vogl:Celgene Corporation: Consultancy; Amgen: Consultancy; Millennium/Takeda: Research Funding; GSK: Research Funding; Acetylon: Research Funding. Stadtmauer:Janssen: Consultancy.


2014 ◽  
Vol 2014 ◽  
pp. 1-12 ◽  
Author(s):  
Preston Leung ◽  
Rowena Bull ◽  
Andrew Lloyd ◽  
Fabio Luciani

Rapidly mutating viruses, such as hepatitis C virus (HCV) and HIV, have adopted evolutionary strategies that allow escape from the host immune response via genomic mutations. Recent advances in high-throughput sequencing are reshaping the field of immuno-virology of viral infections, as these allow fast and cheap generation of genomic data. However, due to the large volumes of data generated, a thorough understanding of the biological and immunological significance of such information is often difficult. This paper proposes a pipeline that allows visualization and statistical analysis of viral mutations that are associated with immune escape. Taking next generation sequencing data from longitudinal analysis of HCV viral genomes during a single HCV infection, along with antigen specific T-cell responses detected from the same subject, we demonstrate the applicability of these tools in the context of primary HCV infection. We provide a statistical and visual explanation of the relationship between cooccurring mutations on the viral genome and the parallel adaptive immune response against HCV.


2021 ◽  
Vol 99 (2) ◽  
Author(s):  
Yuhua Fu ◽  
Pengyu Fan ◽  
Lu Wang ◽  
Ziqiang Shu ◽  
Shilin Zhu ◽  
...  

Abstract Despite the broad variety of available microRNA (miRNA) research tools and methods, their application to the identification, annotation, and target prediction of miRNAs in nonmodel organisms is still limited. In this study, we collected nearly all public sRNA-seq data to improve the annotation for known miRNAs and identify novel miRNAs that have not been annotated in pigs (Sus scrofa). We newly annotated 210 mature sequences in known miRNAs and found that 43 of the known miRNA precursors were problematic due to redundant/missing annotations or incorrect sequences. We also predicted 811 novel miRNAs with high confidence, which was twice the current number of known miRNAs for pigs in miRBase. In addition, we proposed a correlation-based strategy to predict target genes for miRNAs by using a large amount of sRNA-seq and RNA-seq data. We found that the correlation-based strategy provided additional evidence of expression compared with traditional target prediction methods. The correlation-based strategy also identified the regulatory pairs that were controlled by nonbinding sites with a particular pattern, which provided abundant complementarity for studying the mechanism of miRNAs that regulate gene expression. In summary, our study improved the annotation of known miRNAs, identified a large number of novel miRNAs, and predicted target genes for all pig miRNAs by using massive public data. This large data-based strategy is also applicable for other nonmodel organisms with incomplete annotation information.


Sign in / Sign up

Export Citation Format

Share Document