scholarly journals Bibliome Variant Database: Automated Identification and Annotation of Genetic Variants in Primary Literature

2020 ◽  
Author(s):  
Samuel W. Baker ◽  
Arupa Ganguly

ABSTRACTThe Bibliome Variant Database (BVdb) is a freely available reference database containing over 1 million human genetic variants mapped to the human genome that have been mined from primary literature. The BVdb is designed to facilitate variant interpretation in clinical and research contexts by reducing or eliminating the time required to search for literature describing a given variant. Users can search the database using gene symbols, HGVS variant nomenclature, genomic positions, or rsIDs. Each variant page lists references in the database that describe the variant, as well as the exact gene symbol and variant text description identified in each reference.AVAILABILITY AND IMPLEMENTATIONThe BVdb is freely available at http://bibliome.ai

2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Ralf C. Mueller ◽  
Nicolai Mallig ◽  
Jacqueline Smith ◽  
Lél Eöery ◽  
Richard I. Kuo ◽  
...  

Abstract Background Genomic and genetic studies often require a target list of genes before conducting any hypothesis testing or experimental verification. With the ever-growing number of sequenced genomes and a variety of different annotation strategies, comes the potential for ambiguous gene symbols, making it cumbersome to capture the “correct” set of genes. In this article, we present and describe the Avian Immunome DB (Avimm) for easy gene property extraction as exemplified by avian immune genes. The avian immune system is characterised by a cascade of complex biological processes underlaid by more than 1000 different genes. It is a vital trait to study particularly in birds considering that they are a significant driver in spreading zoonotic diseases. With the completion of phase II of the B10K (“Bird 10,000 Genomes”) consortium’s whole-genome sequencing effort, we have included 363 annotated bird genomes in addition to other publicly available bird genome data which serve as a valuable foundation for Avimm. Construction and content A relational database with avian immune gene evidence from Gene Ontology, Ensembl, UniProt and the B10K consortium has been designed and set up. The foundation stone or the “seed” for the initial set of avian immune genes is based on the well-studied model organism chicken (Gallus gallus). Gene annotations, different transcript isoforms, nucleotide sequences and protein information, including amino acid sequences, are included. Ambiguous gene names (symbols) are resolved within the database and linked to their canonical gene symbol. Avimm is supplemented by a command-line interface and a web front-end to query the database. Utility and discussion The internal mapping of unique gene symbol identifiers to canonical gene symbols allows for an ambiguous gene property search. The database is organised within core and feature tables, which makes it straightforward to extend for future purposes. The database design is ready to be applied to other taxa or biological processes. Currently, the database contains 1170 distinct avian immune genes with canonical gene symbols and 612 synonyms across 363 bird species. While the command-line interface readily integrates into bioinformatics pipelines, the intuitive web front-end with download functionality offers sophisticated search functionalities and tracks the origin for each record. Avimm is publicly accessible at https://avimm.ab.mpg.de.


2016 ◽  
Vol 96 (4) ◽  
pp. 570-580 ◽  
Author(s):  
Catherine L. Curtis ◽  
Allon Goldberg ◽  
Jeffrey A. Kleim ◽  
Steven L. Wolf

The Human Genome Project and the International HapMap Project have yielded new understanding of the influence of the human genome on health and disease, advancing health care in significant ways. In personalized medicine, genetic factors are used to identify disease risk and tailor preventive and therapeutic regimens. Insight into the genetic bases of cellular processes is revealing the causes of disease and effects of exercise. Many diseases known to have a major lifestyle contribution are highly influenced by common genetic variants. Genetic variants are associated with increased risk for common diseases such as cardiovascular disease and osteoarthritis. Exercise response also is influenced by genetic factors. Knowledge of genetic factors can help clinicians better understand interindividual differences in disease presentation, pain experience, and exercise response. Family health history is an important genetic tool and encourages clinicians to consider the wider client-family unit. Clinicians in this new era need to be prepared to guide patients and their families on a variety of genomics-related concerns, including genetic testing and other ethical, legal, or social issues. Thus, it is essential that clinicians reconsider the role of genetics in the preservation of wellness and risk for disease to identify ways to best optimize fitness, health, or recovery. Clinicians with knowledge of the influence of genetic variants on health and disease will be uniquely positioned to institute individualized lifestyle interventions, thereby fulfilling roles in prevention and wellness. This article describes how discoveries in genomics are rapidly evolving the understanding of health and disease by highlighting 2 conditions: cardiovascular disease and osteoarthritis. Genetic factors related to exercise effects also are considered.


2020 ◽  
Author(s):  
Diego Garrido-Martín ◽  
Beatrice Borsari ◽  
Miquel Calvo ◽  
Ferran Reverter ◽  
Roderic Guigó

AbstractWe have developed an efficient and reproducible pipeline for the discovery of genetic variants affecting splicing (sQTLs), based on an approach that captures the intrinsically multivariate nature of this phenomenon. We employed it to analyze the multi-tissue transcriptome GTEx dataset, generating a comprehensive catalogue of sQTLs in the human genome. A core set of these sQTLs is shared across multiple tissues. Downstream analyses of this catalogue contribute to the understanding of the mechanisms underlying splicing regulation. We found that sQTLs often target the global splicing pattern of genes, rather than individual splicing events. Many of them also affect gene expression, but not always of the same gene, potentially uncovering regulatory loci that act on different genes through different mechanisms. sQTLs tend to be preferentially located in introns that are post-transcriptionally spliced, which would act as hotspots for splicing regulation. While many variants affect splicing patterns by directly altering the sequence of splice sites, many more modify the binding of RNA-binding proteins (RBPs) to target sequences within the transcripts. Genetic variants affecting splicing can have a phenotypic impact comparable or even stronger than variants affecting expression, with those that alter RBP binding playing a prominent role in disease.


Science ◽  
2019 ◽  
Vol 365 (6460) ◽  
pp. 1396-1400 ◽  
Author(s):  
Alexander I. Young ◽  
Stefania Benonisdottir ◽  
Molly Przeworski ◽  
Augustine Kong

Efforts to link variation in the human genome to phenotypes have progressed at a tremendous pace in recent decades. Most human traits have been shown to be affected by a large number of genetic variants across the genome. To interpret these associations and to use them reliably—in particular for phenotypic prediction—a better understanding of the many sources of genotype-phenotype associations is necessary. We summarize the progress that has been made in this direction in humans, notably in decomposing direct and indirect genetic effects as well as population structure confounding. We discuss the natural next steps in data collection and methodology development, with a focus on what can be gained by analyzing genotype and phenotype data from close relatives.


2014 ◽  
Vol 42 (1) ◽  
pp. 19-27 ◽  
Author(s):  
Pascal Borry ◽  
Mahsa Shabani ◽  
Heidi Carmen Howard

In the last few decades, great progress has been made in both genetic and genomic research. The development of the Human Genome Project has increased our knowledge of the genetic basis of diseases and has given a tremendous momentum to the development of new technologies that make widespread genetic testing possible and has increased the availability of previously inaccessible genetic information. Two examples of this exponential evolution are the increasing implementation of next-generation sequencing technologies in the clinical context and the expanding commercial offer of genetic tests directly-to-consumers.Firstly, the rapid development of next generation sequencing technologies (i.e., high-throughput and massively parallel DNA sequencing technologies) has substantially reduced both the cost and the time required to sequence an entire human genome. These technologies are increasingly being used in the clinical setting with the goal of diagnosing conditions of presumed genetic origin that cannot be explained by targeted sequencing approaches.


2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Xi Long ◽  
Hong Xue

Abstract Background Genetic variants, underlining phenotypic diversity, are known to distribute unevenly in the human genome. A comprehensive understanding of the distributions of different genetic variants is important for insights into genetic functions and disorders. Methods Herein, a sliding-window scan of regional densities of eight kinds of germline genetic variants, including single-nucleotide-polymorphisms (SNPs) and four size-classes of copy-number-variations (CNVs) in the human genome has been performed. Results The study has identified 44,379 hotspots with high genetic-variant densities, and 1135 hotspot clusters comprising more than one type of hotspots, accounting for 3.1% and 0.2% of the genome respectively. The hotspots and clusters are found to co-localize with different functional genomic features, as exemplified by the associations of hotspots of middle-size CNVs with histone-modification sites, work with balancing and positive selections to meet the need for diversity in immune proteins, and facilitate the development of sensory-perception and neuroactive ligand-receptor interaction pathways in the function-sparse late-replicating genomic sequences. Genetic variants of different lengths co-localize with retrotransposons of different ages on a “long-with-young” and “short-with-all” basis. Hotspots and clusters are highly associated with tumor suppressor genes and oncogenes (p < 10−10), and enriched with somatic tumor CNVs and the trait- and disease-associated SNPs identified by genome-wise association studies, exceeding tenfold enrichment in clusters comprising SNPs and extra-long CNVs. Conclusions In conclusion, the genetic-variant hotspots and clusters represent two-edged swords that spearhead both positive and negative genomic changes. Their strong associations with complex traits and diseases also open up a potential “Common Disease-Hotspot Variant” approach to the missing heritability problem.


2018 ◽  
Author(s):  
Alex H Wagner ◽  
Brian Walsh ◽  
Georgia Mayfield ◽  
David Tamborero ◽  
Dmitriy Sonkin ◽  
...  

ABSTRACTPrecision oncology relies on the accurate discovery and interpretation of genomic variants to enable individualized diagnosis, prognosis, and therapy selection. We found that knowledgebases containing clinical interpretations of somatic cancer variants are highly disparate in interpretation content, structure, and supporting primary literature, impeding consensus when evaluating variants and their relevance in a clinical setting. With the cooperation of experts of the Global Alliance for Genomics and Health (GA4GH) and six prominent cancer variant knowledgebases, we developed a framework for aggregating and harmonizing variant interpretations to produce a meta-knowledgebase of 12,856 aggregate interpretations covering 3,437 unique variants in 415 genes, 357 diseases, and 791 drugs. We demonstrated large gains in overlap between resources across variants, diseases, and drugs as a result of this harmonization. We subsequently demonstrated improved matching between a patient cohort and harmonized interpretations of potential clinical significance, observing an increase from an average of 33% per individual knowledgebase to 56% in aggregate. Our analyses illuminate the need for open, interoperable sharing of variant interpretation data. We also provide an open and freely available web interface (search.cancervariants.org) for exploring the harmonized interpretations from these six knowledgebases.


2020 ◽  
Author(s):  
Sehyun Oh ◽  
Jasmine Abdelnabi ◽  
Ragheed Al-Dulaimi ◽  
Ayush Aggarwal ◽  
Marcel Ramos ◽  
...  

AbstractGene symbols are recognizable identifiers for gene names but are unstable and error-prone due to aliasing, manual entry, and unintentional conversion by spreadsheets to date format. Official gene symbol resources such as HUGO Gene Nomenclature Committee (HGNC) for human genes and the Mouse Genome Informatics project (MGI) for mouse genes provide authoritative sources of valid, aliased, and outdated symbols, but lack a programmatic interface and correction of symbols converted by spreadsheets. We present HGNChelper, an R package that identifies known aliases and outdated gene symbols based on the HGNC human and MGI mouse gene symbol databases, in addition to common mislabeling introduced by spreadsheets, and provides corrections where possible. HGNChelper identified invalid gene symbols in the most recent Molecular Signatures Database (mSigDB 7.0) and in platform annotation files of the Gene Expression Omnibus, with prevalence ranging from ∼3% in recent platforms to 30-40% in the earliest platforms from 2002-03. HGNChelper is installable from CRAN, with open development and issue tracking on GitHub and an associated pkgdown site https://waldronlab.io/HGNChelper/.


Sign in / Sign up

Export Citation Format

Share Document