scholarly journals Discovery of 10,828 new putative human immunoglobulin heavy chain IGHV variants

2021 ◽  
Author(s):  
Fabio R Martins ◽  
Lucas Alves de Melo Pontes ◽  
Tiago Antônio de Oliveira Mendes ◽  
Liza F. Felicori

AbstractThe correct identification of immunoglobulin alleles in genome sequences is a challenge. Nevertheless, it can assist in the study of several human diseases associated with the antibody repertoire and in the development of new therapies using antibody engineering techniques. The advent of next-generation sequencing of human genomes and antibody repertoires enabled the development of several tools for the mapping and identification of new immunoglobulin (Ig) alleles. Some of these tools use 1,000 Genomes (G1K) data for new Ig alleles discovery. However, genome data from G1K present low coverage and variant call problems. Here, a computational screen of immunoglobulin alleles was carried out in the Genome Aggregation Database (gnomAD), the largest high-quality catalogue of variation from 125,748 exomes and 15,708 human genomes.A total of 10,909 putative IGHV alleles were identified, in which 10,828 of them are new and 2,024 appear at least in 6 different alleles from genomes/exomes. The IGHV2-70 was the IGHV gene segment with the largest number of variants described. The majority of the variants were found in the framework 3 and most of them are missense. Interestingly, a large number of variants were found to be population exclusive. A database integrated with a web platform was created (YGL-DB) to store and make accessible the likely new variants found.This available data can help the scientific community to validate new IGHV variants as well as it can shed light on the importance of variants in disease development and immunization protocols.

2021 ◽  
Author(s):  
Michael Schneider ◽  
Asis Shrestha ◽  
Agim Ballvora ◽  
Jens Leon

Abstract BackgroundThe identification of environmentally specific alleles and the observation of evolutional processes is a goal of conservation genomics. By generational changes of allele frequencies in populations, questions regarding effective population size, gene flow, drift, and selection can be addressed. The observation of such effects often is a trade-off of costs and resolution, when a decent sample of genotypes should be genotyped for many loci. Pool genotyping approaches can derive a high resolution and precision in allele frequency estimation, when high coverage sequencing is utilized. Still, pool high coverage pool sequencing of big genomes comes along with high costs.ResultsHere we present a reliable method to estimate a barley population’s allele frequency at low coverage sequencing. Three hundred genotypes were sampled from a barley backcross population to estimate the entire population’s allele frequency. The allele frequency estimation accuracy and yield were compared for three next generation sequencing methods. To reveal accurate allele frequency estimates on a low coverage sequencing level, a haplotyping approach was performed. Low coverage allele frequency of positional connected single polymorphisms were aggregated to a single haplotype allele frequency, resulting in two to 271 times higher depth and increased precision. We compared different haplotyping tactics, showing that gene and chip marker-based haplotypes perform on par or better than simple contig haplotype windows. The comparison of multiple pool samples and the referencing against an individual sequencing approach revealed whole genome pool resequencing having the highest correlation to individual genotyping (up to 0.97), while transcriptomics and genotyping by sequencing indicated higher error rates and lower correlations.ConclusionUsing the proposed method allows to identify the allele frequency of populations with high accuracy at low cost. This is particularly interesting for conservation genomics in species with big genomes, like barley or wheat. Whole genome low coverage resequencing at 10x coverage can deliver a highly accurate estimation of the allele frequency, when a loci-based haplotyping approach is applied. Using annotated haplotypes allows to capitalize from biological background and statistical robustness.


Biotechnology ◽  
2019 ◽  
pp. 804-837
Author(s):  
Hithesh Kumar ◽  
Vivek Chandramohan ◽  
Smrithy M. Simon ◽  
Rahul Yadav ◽  
Shashi Kumar

In this chapter, the complete overview and application of Big Data analysis in the field of health care industries, Clinical Informatics, Personalized Medicine and Bioinformatics is provided. The major tools and databases used for the Big Data analysis are discussed in this chapter. The development of sequencing machines has led to the fast and effective ways of generating DNA, RNA, Whole Genome data, Transcriptomics data, etc. available in our hands in just a matter of hours. The complete Next Generation Sequencing (NGS) huge data analysis work flow for the medicinal plants are discussed in the chapter. This chapter serves as an introduction to the big data analysis in Next Generation Sequencing and concludes with a summary of the topics of the remaining chapters of this book.


Blood ◽  
2016 ◽  
Vol 128 (22) ◽  
pp. 1665-1665
Author(s):  
Elisabeth Mack ◽  
Danny Langer ◽  
André Marquardt ◽  
Alfred Ultsch ◽  
Michael G Kiehl ◽  
...  

Abstract Background Acute Myeloid Leukemia (AML) is the most common acute leukemia in adults with a poor overall prognosis. Although the disease has been extensively characterized on the molecular level, this knowledge is translating only slowly into the clinic, particularly with regard to novel therapeutic concepts. Presumably, this striking imbalance substantially is due to the long time required to complete genetic analyses so that results are not available when treatment has to be initiated. Specifically, cytogenetic examinations to determine the karyotype of the malignant blasts, which has been the most important parameter for risk stratification for more than thirty years, take up to two weeks. Next generation sequencing (NGS) technology essentially catalyzed efforts to dissect the genomic landscape of AML, leading to the identification of a large variety of AML driver genes and distinct molecular risk groups. However, these emerging molecular classes of AML do not cover all patients, implying that karyotyping is not dispensable for AML diagnostics at this point. Here we present an integrated approach to AML diagnostics that incorporates these complementary genetic examinations - focused mutational screening of AML-related genes and karyotyping - in one NGS assay. Methods We combined targeted resequencing of DNA and RNA using commercially available panels (TruSigth Myeloid, Illumina and FusionPlex Heme, ArcherDx) to detect AML-associated short sequence variants and gene fusions with low coverage whole genome sequencing for copy number variation analysis. Sequencing was performed on an Illumina MiSeq instrument with a read length of 2x150 bp and a coverage of 3.75 M reads for the TruSight Myeloid panel, 2.25 M reads for the FusionPlex panel and 1.5 M reads for the whole genome library. Variants and fusions were called using the manufacturers' analysis software and a previously published algorithm to identify ITDs (ITD-seek, Au et al., 2016). CNV analysis was performed by comparing read distribution in an AML whole genome library to in silico randomly sampled reads from the reference genome using an in house-developed algorithm. Results Initial testing of our approach on leukemia cell lines and peripheral blood leukocytes from healthy donors revealed sensitivities of 2% and 1-25% for the detection of DNA variants and fusions, respectively. Applying stringent filter criteria, we recovered 75% of verified COSMIC variants and 100% of known fusions in undiluted AML samples without false positives. Chromosomal gains and losses were detected with high confidence with a sensitivity of 10%. We were able to reliably distinguish between normal and complex karyotypes, although NGS-karyotyping based on known fusions and CNV-analysis missed some details of highly aberrant karyotypes such as derivative chromosomes and chromosomal translocations that did not involve genes included in the FusionPlex panel. Our preliminary experience on our method in a diagnostic setting confirms high correlation with reference laboratory results and no relevant differences with regard to treatment decisions. Moreover, we find that NGS considerably accelerates genetic diagnostics of AML as the entire workflow from sample to report including three parallel library preparations, sequencing and data analysis can be completed within 5 days. Operational costs amount approximately 1,700 USD (1,500 EUR) per sample with the low throughput equipment used in this work, which is in the range of expenses for currently established AML diagnostics. Conclusions NGS allows for comprehensive translocation and mutation screening, however, some technical and bioinformatics optimization is required to achieve consistently high sensitivity and specificity for all target genes. CNV analysis of low coverage whole genome sequencing data adds valuable information on numerical chromosomal aberrations, thus allowing construction of a virtual karyotype to substitute for difficult and time-consuming cytogenetics. In summary, we present a reliable, fast and cost-effective strategy to combine molecular and cytogenetics for AML diagnostics in a single NGS run in order to pave the way for a more differentiated clinical management of AML patients in the near future. Disclosures Kiehl: Roche: Consultancy, Other: Travel grants, Speakers Bureau.


2015 ◽  
Vol 32 (4) ◽  
pp. 635-637 ◽  
Author(s):  
Juan J. Diaz-Montana ◽  
Owen J.L. Rackham ◽  
Norberto Diaz-Diaz ◽  
Enrico Petretto

2018 ◽  
Author(s):  
Velimir Gayevskiy ◽  
Tony Roscioli ◽  
Marcel E Dinger ◽  
Mark J Cowley

AbstractCapability for genome sequencing and variant calling has increased dramatically, enabling large scale genomic interrogation of human disease. However, discovery is hindered by the current limitations in genomic interpretation, which remains a complicated and disjointed process. We introduce Seave, a web platform that enables variants to be easily filtered and annotated with in silico pathogenicity prediction scores and annotations from popular disease databases. Seave stores genomic variation of all types and sizes, and allows filtering for specific inheritance patterns, quality values, allele frequencies and gene lists. Seave is open source and deployable locally, or on a cloud computing provider, and works readily with gene panel, exome and whole genome data, scaling from single labs to multi-institution scale.


2020 ◽  
Vol 101 (12) ◽  
pp. 1280-1288
Author(s):  
Roozbeh Tahmasebi ◽  
Adriana Luchs ◽  
Kaelan Tardy ◽  
Philip Michael Hefford ◽  
Rory J. Tinker ◽  
...  

Human enteric adenovirus species F (HAdV-F) is one of the most common pathogens responsible for acute gastroenteritis worldwide. Brazil is a country with continental dimensions where continuous multiregional surveillance is vital to establish a more complete picture of the epidemiology of HAdV-F. The aim of the current study was to investigate the molecular epidemiology of HAdV-F using full-genome data in rural and low-income urban areas in northern Brazil. This will allow a genetic comparison between Brazilian and global HAdV-F strains. The frequency of HAdV-F infections in patients with gastroenteritis and molecular typing of positive samples within this period was also analysed. A total of 251 stool samples collected between 2010 and 2016 from patients with acute gastroenteritis were screened for HAdV-F using next-generation sequencing techniques. HAdV-F infection was detected in 57.8 % (145/251) of samples. A total of 137 positive samples belonged to HAdV-F41 and 7 to HAdV-F40. HAdV-F40/41 dual infection was found in one sample. Detection rates did not vary significantly according to the year. Single HAdV-F infections were detected in 21.9 % (55/251) of samples and mixed infections in 37.4 % (94/251), with RVA/HAdV-F being the most frequent association (21.5 %; 54/251). Genetic analysis indicated that the HAdV-F strains circulating in Brazil were closely related to worldwide strains, and the existence of some temporal order was not observed. This is the first large-scale HAdV-F study in Brazil in which whole-genome data and DNA sequence analyses were used to characterize HAdV-F strains. Expanding the viral genome database could improve overall genotyping success and assist the National Center for Biotechnology Information (NCBI)/GenBank in standardizing the HAdV genome records by providing a large set of annotated HAdV-F genomes.


2016 ◽  
Author(s):  
Li Fang ◽  
Jiang Hu ◽  
Depeng Wang ◽  
Kai Wang

AbstractBackgroundStructural variants (SVs) in human genomes are implicated in a variety of human diseases. Long-read sequencing delivers much longer read lengths than short-read sequencing and may greatly improve SV detection. However, due to the relatively high cost of long-read sequencing, it is unclear what coverage is needed and how to optimally use the aligners and SV callers.ResultsIn this study, we developed NextSV, a meta-caller to perform SV calling from low coverage long-read sequencing data. NextSV integrates three aligners and three SV callers and generates two integrated call sets (sensitive/stringent) for different analysis purposes. We evaluated SV calling performance of NextSV under different PacBio coverages on two personal genomes, NA12878 and HX1. Our results showed that, compared with running any single SV caller, NextSV stringent call set had higher precision and balanced accuracy (F1 score) while NextSV sensitive call set had a higher recall. At 10X coverage, the recall of NextSV sensitive call set was 93.5% to 94.1% for deletions and 87.9% to 93.2% for insertions, indicating that ~10X coverage might be an optimal coverage to use in practice, considering the balance between the sequencing costs and the recall rates. We further evaluated the Mendelian errors on an Ashkenazi Jewish trio dataset.ConclusionsOur results provide useful guidelines for SV detection from low coverage whole-genome PacBio data and we expect that NextSV will facilitate the analysis of SVs on long-read sequencing data.


Sign in / Sign up

Export Citation Format

Share Document