scholarly journals Accurate and predictive antibody repertoire profiling by molecular amplification fingerprinting

2016 ◽  
Vol 2 (3) ◽  
pp. e1501371 ◽  
Author(s):  
Tarik A. Khan ◽  
Simon Friedensohn ◽  
Arthur R. Gorter de Vries ◽  
Jakub Straszewski ◽  
Hans-Joachim Ruscheweyh ◽  
...  

High-throughput antibody repertoire sequencing (Ig-seq) provides quantitative molecular information on humoral immunity. However, Ig-seq is compromised by biases and errors introduced during library preparation and sequencing. By using synthetic antibody spike-in genes, we determined that primer bias from multiplex polymerase chain reaction (PCR) library preparation resulted in antibody frequencies with only 42 to 62% accuracy. Additionally, Ig-seq errors resulted in antibody diversity measurements being overestimated by up to 5000-fold. To rectify this, we developed molecular amplification fingerprinting (MAF), which uses unique molecular identifier (UID) tagging before and during multiplex PCR amplification, which enabled tagging of transcripts while accounting for PCR efficiency. Combined with a bioinformatic pipeline, MAF bias correction led to measurements of antibody frequencies with up to 99% accuracy. We also used MAF to correct PCR and sequencing errors, resulting in enhanced accuracy of full-length antibody diversity measurements, achieving 98 to 100% error correction. Using murine MAF-corrected data, we established a quantitative metric of recent clonal expansion—the intraclonal diversity index—which measures the number of unique transcripts associated with an antibody clone. We used this intraclonal diversity index along with antibody frequencies and somatic hypermutation to build a logistic regression model for prediction of the immunological status of clones. The model was able to predict clonal status with high confidence but only when using MAF error and bias corrected Ig-seq data. Improved accuracy by MAF provides the potential to greatly advance Ig-seq and its utility in immunology and biotechnology.

2018 ◽  
Author(s):  
Simon Friedensohn ◽  
John M. Lindner ◽  
Vanessa Cornacchione ◽  
Mariavittoria Iazeolla ◽  
Enkelejda Miho ◽  
...  

ABSTRACTHigh-throughput sequencing of immunoglobulin repertoires (Ig-seq) is a powerful method for quantitatively interrogating B cell receptor sequence diversity. When applied to human repertoires, Ig-seq provides insight into fundamental immunological questions, and can be implemented in diagnostic and drug discovery projects. However, a major challenge in Ig-seq is ensuring accuracy, as library preparation protocols and sequencing platforms can introduce substantial errors and bias that compromise immunological interpretation. Here, we have established an approach for performing highly accurate human Ig-seq by combining synthetic standards with a comprehensive error and bias correction pipeline. First, we designed a set of 85 synthetic antibody heavy chain standards (in vitro transcribed RNA) to assess correction workflow fidelity. Next, we adapted a library preparation protocol that incorporates unique molecular identifiers (UIDs) for error and bias correction which, when applied to the synthetic standards, resulted in highly accurate data. Finally, we performed Ig-seq on purified human circulating B cell subsets (naïve and memory), combined with a cellular replicate sampling strategy. This strategy enabled robust and reliable estimation of key repertoire features such as clonotype diversity, germline segment and isotype subclass usage, and somatic hypermutation (SHM). We anticipate that our standards and error and bias correction pipeline will become a valuable tool for researchers to validate and improve accuracy in human Ig-seq studies, thus leading to potentially new insights and applications in human antibody repertoire profiling.


2017 ◽  
Author(s):  
Ben S. Wendel ◽  
Chenfeng He ◽  
Mingjuan Qu ◽  
Di Wu ◽  
Stefany M. Hernandez ◽  
...  

ABSTRACTAccurately measuring antibody repertoire sequence composition in a small amount of blood is challenging yet important to the understanding of the repertoire response to infections and vaccinations. Here, we describe an accurate and high-coverage repertoire sequencing method, MIDCIRS, which uses as few as 1,000 naïve B cells. Using it, we studied age-related antibody repertoire development and diversification before and during acute malaria in infants (< 12 months old) and toddlers (12 – 47 months old) with 4-8 ml of blood draws. Unexpectedly, we discovered high levels of somatic hypermutation (SHM) in infants as young as three months old. Antibody clonal lineage analysis revealed that both infants and toddlers increase SHM levels upon infection and memory B cells isolated from pre-malaria samples in malaria-experienced individuals continue to induce SHMs upon malaria rechallenge. These results highlight the vast potential of antibody repertoire diversification in infants and toddlers that has not been realized previously.


2021 ◽  
Vol 12 ◽  
Author(s):  
Qilong Wang ◽  
Huikun Zeng ◽  
Yan Zhu ◽  
Minhui Wang ◽  
Yanfang Zhang ◽  
...  

Antibody repertoire sequencing (Rep-seq) has been widely used to reveal repertoire dynamics and to interrogate antibodies of interest at single nucleotide-level resolution. However, polymerase chain reaction (PCR) amplification introduces extensive artifacts including chimeras and nucleotide errors, leading to false discovery of antibodies and incorrect assessment of somatic hypermutations (SHMs) which subsequently mislead downstream investigations. Here, a novel approach named DUMPArts, which improves the accuracy of antibody repertoires by labeling each sample with dual barcodes and each molecule with dual unique molecular identifiers (UMIs) via minimal PCR amplification to remove artifacts, is developed. Tested by ultra-deep Rep-seq data, DUMPArts removed inter-sample chimeras, which cause artifactual shared clones and constitute approximately 15% of reads in the library, as well as intra-sample chimeras with erroneous SHMs and constituting approximately 20% of the reads, and corrected base errors and amplification biases by consensus building. The removal of these artifacts will provide an accurate assessment of antibody repertoires and benefit related studies, especially mAb discovery and antibody-guided vaccine design.


2019 ◽  
Author(s):  
Yicheng Guo ◽  
Kevin Chen ◽  
Peter D. Kwong ◽  
Lawrence Shapiro ◽  
Zizhang Sheng

AbstractThe diversity of B cell receptors provides a basis for recognizing numerous pathogens. Antibody repertoire sequencing has revealed relationships between B cell receptor sequences, their diversity, and their function in infection, vaccination, and disease. However, many repertoire datasets have been deposited without annotation or quality control, limiting their utility. To accelerate investigations of B cell immunoglobulin sequence repertoires and to facilitate development of algorithms for their analysis, we constructed a comprehensive public database of curated human B cell immunoglobulin sequence repertoires, cAb-Rep (https://cab-rep.c2b2.columbia.edu), which currently includes 306 immunoglobulin repertoires from 121 human donors, who were healthy, vaccinated, or had autoimmune disease. The database contains a total of 267.9 million V(D)J heavy chain and 72.9 million VJ light chain transcripts. These transcripts are full-length or near full-length, have been annotated with gene origin, antibody isotype, somatic hypermutations, and other biological characteristics, and are stored in FASTA format to facilitate their direct use by most current repertoire-analysis programs. We describe a website to search cAb-Rep for similar antibodies along with methods for analysis of the prevalence of antibodies with specific genetic signatures, for estimation of reproducibility of somatic hypermutation patterns of interest, and for delineating frequencies of somatically introduced N-glycosylation. cAb-Rep should be useful for investigating attributes of B cell sequence repertoires, for understanding characteristics of affinity maturation, and for identifying potential barriers to the elicitation of effective neutralizing antibodies in infection or by vaccination.


2020 ◽  
Author(s):  
Alexander M Sevy

AbstractMotivationRecent advances in DNA sequencing technology have allowed deep profiling of B- and T-cell receptor sequences on an unprecedented scale. However, sequencing errors pose a significant challenge in expanding the scope of these experiments. Errors can arise both by PCR during library preparation and by miscalled bases on the sequencing instrument itself. These errors compromise the validity of biological conclusions drawn from the data.ResultsTo address these concerns I have developed ErrorX, a software for automated error correction of B- and T-cell receptor NGS datasets. ErrorX uses deep learning to automatically identify bases that have a high probability of being erroneous. In benchmark studies, ErrorX reduced the overall error rate of public datasets by up to 36% with a false positive rate of 0.05% or less. Since ErrorX is a pure bioinformatics approach, it can be directly applied to any existing antibody or T-cell receptor sequencing datasets to infer sites of probable error without any changes in library preparation.AvailabilityErrorX is free for non-commercial use, with both a command-line interface and GUI available for Mac, Linux, and Windows operating systems, and full documentation available. Pre-compiled binaries are available at https://endeavorbio.com/downloads/.


Author(s):  
Rui Zang ◽  
Ying Zhao ◽  
Kangdi Guo ◽  
Kunqi Hong ◽  
Huijun Xi ◽  
...  

AbstractBitter gourd wilt caused by Fusarium oxysporum f. sp. momordicae (FOM) is a devastating crop disease in China. A total of 173 isolates characteristic of typical Fusarium oxysporum with abundant microconidia and macroconidia on white or ruby colonies were obtained from diseased plant tissues. BLASTn analysis of the rDNA-ITS of the isolates showed 99% identity with F. oxysporum species. Among the tested isolates, three were infectious toward tower gourd and five were pathogenic to bottle gourd. However, all of the isolates were pathogenic to bitter gourd. For genetic differences analysis, 40 ISSR primers were screened and 11 primers were used for ISSR-PCR amplification. In total, 127 loci were detected, of which 76 were polymorphic at a rate of 59.84%. POPGENE analysis showed that Nei’s gene diversity index (H) and Shannon’s information index (I) were 0.09 and 0.15, respectively, which indicated that the genetic diversity of the 173 isolates was low. The coefficient of gene differentiation (Gst = 0.33 > 0.15) indicated that genetic differentiation was mainly among populations. The strength of gene flow (Nm = 1.01 > 1.0) was weak, indicating that the population differentiation caused by gene drift was blocked to some degree. The dendrogram based on ISSR markers showed that the nine geographical populations were clustered into two groups at the threshold of genetic similarity coefficient of 0.96. The Shandong and Henan populations were clustered into Group I, while the Guangdong, Hainan, Guangxi, Fujian, Jiangxi, and Hubei populations constituted Group II. Results of the genetic variation analysis showed that the Hunan and Guangxi populations had the highest degree of genetic differentiation, while the Hubei population had the lowest genetic differentiation. Our findings enrich the knowledge of the genetic variation characteristics of FOM populations with the goal of developing effective disease-management programs and resistance breeding programs.


Author(s):  
Damien Jacot ◽  
Trestan Pillonel ◽  
Gilbert Greub ◽  
Claire Bertelli

Although many laboratories worldwide have developed their sequencing capacities in response to the need for SARS-CoV-2 genome-based surveillance of variants, only few reported some quality criteria to ensure sequence quality before lineage assignment and submission to public databases. Hence, we aimed here to provide simple quality control criteria for SARS-CoV-2 sequencing to prevent erroneous interpretation of low quality or contaminated data. We retrospectively investigated 647 SARS-CoV-2 genomes obtained over ten tiled amplicons sequencing runs. We extracted 26 potentially relevant metrics covering the entire workflow from sample selection to bioinformatics analysis. Based on data distribution, critical values were established for eleven selected metrics to prompt further quality investigations for problematic samples, in particular those with a low viral RNA quantity. Low frequency variants (<70% of supporting reads) can result from PCR amplification errors, sample cross contaminations or presence of distinct SARS-CoV2 genomes in the sample sequenced. The number and the prevalence of low frequency variants can be used as a robust quality criterion to identify possible sequencing errors or contaminations. Overall, we propose eleven metrics with fixed cutoff values as a simple tool to evaluate the quality of SARS-CoV-2 genomes, among which cycle thresholds, mean depth, proportion of genome covered at least 10x and the number of low frequency variants combined with mutation prevalence data.


2008 ◽  
Vol 364 (1517) ◽  
pp. 667-673 ◽  
Author(s):  
Uttiya Basu ◽  
Andrew Franklin ◽  
Frederick W Alt

The assembled immunoglobulin genes in the B cells of mice and humans are altered by distinct processes known as class switch recombination (CSR) and somatic hypermutation, leading to diversification of the antibody repertoire. These two DNA modification processes are initiated by the B cell-specific protein factor activation-induced cytidine deaminase (AID). AID is post-translationally modified by phosphorylation at multiple sites, although functional significance during CSR has been implicated only for phosphorylation at serine-38 (S38). Although multiple laboratories have demonstrated that AID function is regulated via phosphorylation at S38, the precise biological role of S38 phosphorylation has been a topic of debate. Here, we discuss our interpretation of the significance of AID regulation via phosphorylation and also discuss how this form of AID regulation may have evolved in higher organisms.


2018 ◽  
Author(s):  
Nicholas Stoler ◽  
Barbara Arbeithuber ◽  
Gundula Povysil ◽  
Monika Heinzl ◽  
Renato Salazar ◽  
...  

AbstractDuplex sequencing is the most accurate approach for identification of sequence variants present at very low frequencies. Its power comes from pooling together multiple descendants of both strands of original DNA molecules, which allows distinguishing true nucleotide substitutions from PCR amplification and sequencing artifacts. This strategy comes at a cost—sequencing the same molecule multiple times increases dynamic range but significantly diminishes coverage, making whole genome duplex sequencing prohibitively expensive. Furthermore, every duplex experiment produces a substantial proportion of singleton reads that cannot be used in the analysis and are, technically, thrown away. In this paper we demonstrate that a significant fraction of these reads contains PCR or sequencing errors within duplex tags. Correction of such errors allows “reuniting” these reads with their respective families increasing the output of the method and making it more cost effective. Additionally, we combine error correction strategy with a number of algorithmic improvements in a new version of the duplex analysis software, Du Novo 2.0, readily available through Galaxy, Bioconda, and as the source code.


Sign in / Sign up

Export Citation Format

Share Document