scholarly journals Creation and evaluation of full-text literature-derived, feature-weighted disease models of genetically determined developmental disorders

Author(s):  
Thabo Michael Yates ◽  
Antoine Lain ◽  
Jamie Campbell ◽  
T. Ian Simpson ◽  
David R FitzPatrick

There are >2500 different genetically-determined developmental disorders (DD), which, as a group, show very high levels of both locus and allelic heterogeneity. This has led to the wide-spread use of evidence-based filtering of genome-wide sequence data as a diagnostic tool in DD. Determining whether the association of a filtered variant at a specific locus is a plausible explanation of the phenotype in the proband is crucial and commonly requires extensive manual literature review by both clinical scientists and clinicians. Access to a database of weighted clinical features extracted from rigorously curated literature would increase the efficiency of this process and facilitate the development of robust phenotypic similarity metrics. However, given the large and rapidly increasing volume of published information, conventional biocuration approaches are becoming impractical. Here, we present a scalable, automated method for extraction of categorical phenotypic descriptors from full-text literature. Papers identified through literature review were downloaded and parsed using the Cadmus custom retrieval package. Human Phenotype Ontology terms were extracted using MetaMap, with 76-83% precision and 72-81% recall. Mean terms per paper increased from 9 in title + abstract, to 69 using full text. We demonstrate that these literature-derived disease models plausibly reflect true disease expressivity more accurately than gold standard manually-curated models, through comparison with prospectively gathered data from the Deciphering Developmental Disorders study. AUC for ROC curves increased by 5-10% through use of literature-derived models. This work shows that scalable automated literature curation increases performance and adds weight to the need for this strategy to be integrated into informatic variant analysis pipelines.

Author(s):  
Caroline F. Wright ◽  
◽  
Ruth Y. Eberhardt ◽  
Panayiotis Constantinou ◽  
Matthew E. Hurles ◽  
...  

Abstract Purpose Automated variant filtering is an essential part of diagnostic genome-wide sequencing but may generate false negative results. We sought to investigate whether some previously identified pathogenic variants may be being routinely excluded by standard variant filtering pipelines. Methods We evaluated variants that were previously classified as pathogenic or likely pathogenic in ClinVar in known developmental disorder genes using exome sequence data from the Deciphering Developmental Disorders (DDD) study. Results Of these ClinVar pathogenic variants, 3.6% were identified among 13,462 DDD probands, and 1134/1352 (83.9%) had already been independently communicated to clinicians using DDD variant filtering pipelines as plausibly pathogenic. The remaining 218 variants failed consequence, inheritance, or other automated variant filters. Following clinical review of these additional variants, we were able to identify 112 variants in 107 (0.8%) DDD probands as potential diagnoses. Conclusion Lower minor allele frequency (<0.0005%) and higher gold star review status in ClinVar (>1 star) are good predictors of a previously identified variant being plausibly diagnostic for developmental disorders. However, around half of previously identified pathogenic variants excluded by automated variant filtering did not appear to be disease-causing, underlining the continued need for clinical evaluation of candidate variants as part of the diagnostic process.


2020 ◽  
Author(s):  
Laura Lafon-Hughes

BACKGROUND COVID-19 pandemic prompts the study of coronavirus biology and search of putative therapeutic strategies. OBJECTIVE To compare SARS-CoV-2 genome-wide structure and proteins with other coronaviruses, focusing on putative coronavirus-specific or SARS-CoV-2 specific therapeutic designs. METHODS The genome-wide structure of SARS-CoV-2 was compared to that of SARS and other coronaviruses in order to gain insights, doing a literature review through Google searches. RESULTS There are promising therapeutic alternatives. Host cell targets could be modulated to hamper viral replication, but targeting viral proteins directly would be a better therapeutic design, since fewer adverse side effects would be expected. CONCLUSIONS Therapeutic strategies (Figure 1) could include the modulation of host targets (PARPs, kinases) , competition with G-quadruplexes or nucleoside analogs to hamper RDRP. The nicest anti-CoV options include inhibitors of the conserved essential viral proteases and drugs that interfere ribosome slippage at the -1 PRF site.


Nature ◽  
2021 ◽  
Vol 590 (7845) ◽  
pp. 290-299 ◽  
Author(s):  
Daniel Taliun ◽  
◽  
Daniel N. Harris ◽  
Michael D. Kessler ◽  
Jedidiah Carlson ◽  
...  

AbstractThe Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.


Author(s):  
Ida Stadig ◽  
Therese Svanberg

Abstract Objectives This article aims to provide a brief review of information retrieval and hospital-based health technology assessment (HB-HTA) and describe library experiences and working methods at a regional HB-HTA center from the center's inception to the present day. Methods For this brief literature review, searches in PubMed and LISTA were conducted to identify studies reporting on HB-HTA and information retrieval. The description of the library's involvement in the HTA center and its working methods is based on the authors’ experience and internal and/or unpublished documents. Results Region Västra Götaland is the second largest healthcare region in Sweden and has had a regional HB-HTA center since 2007 (HTA-centrum). Assessments are performed by clinicians supported by HTA methodologists. The medical library at Sahlgrenska University Hospital works closely with HTA-centrum, with one HTA librarian responsible for coordinating the work. Conclusion In the literature on HB-HTA, we found limited descriptions of the role librarians and information specialists play in different units. The librarians at HTA-centrum play an important role, not only in literature searching but also in abstract and full-text screening.


GigaScience ◽  
2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Taras K Oleksyk ◽  
Walter W Wolfsberger ◽  
Alexandra M Weber ◽  
Khrystyna Shchubelka ◽  
Olga T Oleksyk ◽  
...  

Abstract Background The main goal of this collaborative effort is to provide genome-wide data for the previously underrepresented population in Eastern Europe, and to provide cross-validation of the data from genome sequences and genotypes of the same individuals acquired by different technologies. We collected 97 genome-grade DNA samples from consented individuals representing major regions of Ukraine that were consented for public data release. BGISEQ-500 sequence data and genotypes by an Illumina GWAS chip were cross-validated on multiple samples and additionally referenced to 1 sample that has been resequenced by Illumina NovaSeq6000 S4 at high coverage. Results The genome data have been searched for genomic variation represented in this population, and a number of variants have been reported: large structural variants, indels, copy number variations, single-nucletide polymorphisms, and microsatellites. To our knowledge, this study provides the largest to-date survey of genetic variation in Ukraine, creating a public reference resource aiming to provide data for medical research in a large understudied population. Conclusions Our results indicate that the genetic diversity of the Ukrainian population is uniquely shaped by evolutionary and demographic forces and cannot be ignored in future genetic and biomedical studies. These data will contribute a wealth of new information bringing forth a wealth of novel, endemic and medically related alleles.


2021 ◽  
Author(s):  
Robin N Beaumont ◽  
Isabelle K Mayne ◽  
Rachel M Freathy ◽  
Caroline F Wright

Abstract Birth weight is an important factor in newborn survival; both low and high birth weights are associated with adverse later-life health outcomes. Genome-wide association studies (GWAS) have identified 190 loci associated with maternal or fetal effects on birth weight. Knowledge of the underlying causal genes is crucial to understand how these loci influence birth weight and the links between infant and adult morbidity. Numerous monogenic developmental syndromes are associated with birth weights at the extreme ends of the distribution. Genes implicated in those syndromes may provide valuable information to prioritize candidate genes at the GWAS loci. We examined the proximity of genes implicated in developmental disorders (DDs) to birth weight GWAS loci using simulations to test whether they fall disproportionately close to the GWAS loci. We found birth weight GWAS single nucleotide polymorphisms (SNPs) fall closer to such genes than expected both when the DD gene is the nearest gene to the birth weight SNP and also when examining all genes within 258 kb of the SNP. This enrichment was driven by genes causing monogenic DDs with dominant modes of inheritance. We found examples of SNPs in the intron of one gene marking plausible effects via different nearby genes, highlighting the closest gene to the SNP not necessarily being the functionally relevant gene. This is the first application of this approach to birth weight, which has helped identify GWAS loci likely to have direct fetal effects on birth weight, which could not previously be classified as fetal or maternal owing to insufficient statistical power.


2020 ◽  
Vol 41 (Supplement_2) ◽  
Author(s):  
A Said ◽  
Y.J Van De Vegte ◽  
N Verweij ◽  
P Van Der Harst

Abstract Background Caffeine is the most widely consumed psychostimulant and is associated with lower risk of coronary artery disease (CAD) and type 2 diabetes (T2D). However, whether these associations are causal remains unknown. Objectives This study aimed to identify genetic variants associated with caffeine intake, and to investigate possible causal links between genetically determined caffeine intake and CAD or T2D. Additionally, we aimed to replicate previous observational findings between caffeine intake and CAD or T2D. Methods Genome wide associated studies (GWAS) were performed on caffeine intake from coffee, tea or both in 407,072 UK Biobank participants. Identified variants were used in a two-sample Mendelian randomization (MR) approach to investigate evidence for causal links between caffeine intake and CAD in CARDIoGRAMplusC4D (60,801 cases; 123,504 controls) or T2D in DIAGRAM (26,676 cases; 132,532 controls). Observational associations were tested within UK Biobank using Cox regression analyses. Results Moderate observational caffeine intakes from coffee or tea were associated with lower risks of CAD or T2D compared to no or high intake, with the lowest risks at intakes of 120–180 mg/day from coffee for CAD (HR=0.77 [95% CI: 0.73–0.82; P&lt;1e-16]), and 300–360 mg/day for T2D (HR=0.76 [95% CI: 0.67–0.86]; P=1.57e-5). GWAS identified 51 novel genetic loci associated with caffeine intake, enriched for central nervous system genes. In contrast to observational analyses, MR analyses in CARDIoGRAMplusC4D and DIAGRAM yielded no evidence for causal links between caffeine intake and the development of CAD or T2D. Conclusions MR analyses indicate caffeine intake might not protect against CAD or T2D, despite protective associations in observational analyses. Manhattan_plot_CaffeineIntake Funding Acknowledgement Type of funding source: None


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Pierpaolo Maisano Delser ◽  
Eppie R. Jones ◽  
Anahit Hovhannisyan ◽  
Lara Cassidy ◽  
Ron Pinhasi ◽  
...  

AbstractOver the last few years, genome-wide data for a large number of ancient human samples have been collected. Whilst datasets of captured SNPs have been collated, high coverage shotgun genomes (which are relatively few but allow certain types of analyses not possible with ascertained captured SNPs) have to be reprocessed by individual groups from raw reads. This task is computationally intensive. Here, we release a dataset including 35 whole-genome sequenced samples, previously published and distributed worldwide, together with the genetic pipeline used to process them. The dataset contains 72,041,355 sites called across 19 ancient and 16 modern individuals and includes sequence data from four previously published ancient samples which we sequenced to higher coverage (10–18x). Such a resource will allow researchers to analyse their new samples with the same genetic pipeline and directly compare them to the reference dataset without re-processing published samples. Moreover, this dataset can be easily expanded to increase the sample distribution both across time and space.


Parasitology ◽  
2009 ◽  
Vol 136 (5) ◽  
pp. 469-485 ◽  
Author(s):  
A. S. TAFT ◽  
J. J. VERMEIRE ◽  
J. BERNIER ◽  
S. R. BIRKELAND ◽  
M. J. CIPRIANO ◽  
...  

SUMMARYInfection of the snail,Biomphalaria glabrata, by the free-swimming miracidial stage of the human blood fluke,Schistosoma mansoni, and its subsequent development to the parasitic sporocyst stage is critical to establishment of viable infections and continued human transmission. We performed a genome-wide expression analysis of theS. mansonimiracidia and developing sporocyst using Long Serial Analysis of Gene Expression (LongSAGE). Five cDNA libraries were constructed from miracidia andin vitrocultured 6- and 20-day-old sporocysts maintained in sporocyst medium (SM) or in SM conditioned by previous cultivation with cells of theB. glabrataembryonic (Bge) cell line. We generated 21 440 SAGE tags and mapped 13 381 to theS. mansonigene predictions (v4.0e) either by estimating theoretical 3′ UTR lengths or using existing 3′ EST sequence data. Overall, 432 transcripts were found to be differentially expressed amongst all 5 libraries. In total, 172 tags were differentially expressed between miracidia and 6-day conditioned sporocysts and 152 were differentially expressed between miracidia and 6-day unconditioned sporocysts. In addition, 53 and 45 tags, respectively, were differentially expressed in 6-day and 20-day cultured sporocysts, due to the effects of exposure to Bge cell-conditioned medium.


Sign in / Sign up

Export Citation Format

Share Document