Harnessing the 100,000 Genomes Project whole genome sequencing data - an unbiased systematic tool to filter by biologically validated regions of functionality

SUMMARYWhole genome sequencing (WGS) is championed by the UK National Health Service (NHS) to identify genetic variants that cause particular diseases. The full potential of WGS has yet to be realised as early data analytic steps prioritise protein-coding genes, and effectively ignore the less well annotated non-coding genome which is rich in transcribed and critical regulatory regions. To address, we developed a filter, which we call GROFFFY, and validated in WGS data from hereditary haemorrhagic telangiectasia patients within the 100,000 Genomes Project. Before filter application, the mean number of DNA variants compared to human reference sequence GRCh38 was 4,867,167 (range 4,786,039-5,070,340), and one-third lay within intergenic areas. GROFFFY removed a mean of 2,812,015 variants per DNA. In combination with allele frequency and other filters, GROFFFY enabled a 99.56% reduction in variant number. The proportion of intergenic variants was maintained, and no pathogenic variants in disease genes were lost. We conclude that the filter applied to NHS diagnostic samples in the 100,000 Genomes pipeline offers an efficient method to prioritise intergenic, intronic and coding gDNA variants. Reducing the overwhelming number of variants while retaining functional genome variation of importance to patients, enhances the near-term value of WGS in clinical diagnostics.

Download Full-text

A Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance

PLoS ONE ◽

10.1371/journal.pone.0157718 ◽

2016 ◽

Vol 11 (6) ◽

pp. e0157718 ◽

Cited By ~ 84

Author(s):

Martin Christen Frølund Thomsen ◽

Johanne Ahrenfeldt ◽

Jose Luis Bellod Cisneros ◽

Vanessa Jurtz ◽

Mette Voldby Larsen ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Clinical Diagnostics ◽

Integrated System ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Analysis Platform

Download Full-text

VPMBench: a test bench for variant prioritization methods

BMC Bioinformatics ◽

10.1186/s12859-021-04458-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Andreas Ruscheinski ◽

Anna Lena Reimler ◽

Roland Ewald ◽

Adelinde M. Uhrmacher

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Test Bench ◽

Clinical Diagnostics ◽

Tool Support ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Variant Prioritization ◽

Whole Exome

Abstract Background Clinical diagnostics of whole-exome and whole-genome sequencing data requires geneticists to consider thousands of genetic variants for each patient. Various variant prioritization methods have been developed over the last years to aid clinicians in identifying variants that are likely disease-causing. Each time a new method is developed, its effectiveness must be evaluated and compared to other approaches based on the most recently available evaluation data. Doing so in an unbiased, systematic, and replicable manner requires significant effort. Results The open-source test bench “VPMBench” automates the evaluation of variant prioritization methods. VPMBench introduces a standardized interface for prioritization methods and provides a plugin system that makes it easy to evaluate new methods. It supports different input data formats and custom output data preparation. VPMBench exploits declaratively specified information about the methods, e.g., the variants supported by the methods. Plugins may also be provided in a technology-agnostic manner via containerization. Conclusions VPMBench significantly simplifies the evaluation of both custom and published variant prioritization methods. As we expect variant prioritization methods to become ever more critical with the advent of whole-genome sequencing in clinical diagnostics, such tool support is crucial to facilitate methodological research.

Download Full-text

Detection of structural mosaicism from targeted and whole-genome sequencing data

10.1101/062620 ◽

2016 ◽

Author(s):

Daniel A. King ◽

Alejandro Sifrim ◽

Tomas W. Fitzgerald ◽

Raheleh Rahbari ◽

Emma Hobson ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Exome Sequencing ◽

Genome Sequencing ◽

Developmental Disorders ◽

Large Fraction ◽

Clinical Diagnostics ◽

Next Generation Sequencing Data ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data

ABSTRACTStructural mosaic abnormalities are large post-zygotic mutations present in a subset of cells and have been implicated in developmental disorders and cancer. Such mutations have been conventionally assessed in clinical diagnostics using cytogenetic or microarray testing. Modern disease studies rely heavily on exome sequencing, yet an adequate method for the detection of structural mosaicism using targeted sequencing data is lacking. Here, we present a method, called MrMosaic, to detect structural mosaic abnormalities using deviations in allele fraction and read coverage from next generation sequencing data. Whole-exome sequencing (WES) and whole-genome sequencing (WGS) simulations were used to calculate detection performance across a range of mosaic event sizes, types, clonalities, and sequencing depths. The tool was applied to 4,911 patients with undiagnosed developmental disorders, and 11 events in 9 patients were detected. In 8 of 11 cases, mosaicism was observed in saliva but not blood, suggesting that assaying blood alone would miss a large fraction, possibly more than 50%, of mosaic diagnostic chromosomal rearrangements.

Download Full-text

S95 Identifying new hereditary haemorrhagic telangiectasia genes by applying a machine learning approach to screen whole genome sequencing data

10.1136/thorax-2019-btsabstracts2019.101 ◽

2019 ◽

Author(s):

S Xiao ◽

D Brown ◽

IG Mollet ◽

FS Govani ◽

D Patel ◽

...

Keyword(s):

Machine Learning ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome Sequencing Data ◽

Learning Approach ◽

Hereditary Haemorrhagic Telangiectasia ◽

Whole Genome ◽

Sequencing Data ◽

Machine Learning Approach

Download Full-text

From whole genome sequencing data toward a simple genotyping tool: application to the animal pathogen Mycobacterium bovis

10.26226/morressier.56d5ba2ad462b80296c965c0 ◽

2016 ◽

Author(s):

Lorraine Michelet

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Mycobacterium Bovis ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data

Download Full-text

Plasmids or no plasmids? A comparison between the agilent TapeStation and whole-genome sequencing data in a large-scale bacterial sequencing project

10.26226/morressier.56d5ba27d462b80296c95fe7 ◽

2016 ◽

Author(s):

Sarah Alexander

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Sequencing Project

Download Full-text

Risk prediction and marker selection in nonsynonymous single nucleotide polymorphisms using whole genome sequencing data

Animal Cells and Systems ◽

10.1080/19768354.2020.1860125 ◽

2020 ◽

Vol 24 (6) ◽

pp. 321-328

Author(s):

Young-Sup Lee ◽

KyeongHye Won ◽

Donghyun Shin ◽

Jae-Don Oh

Keyword(s):

Single Nucleotide Polymorphisms ◽

Whole Genome Sequencing ◽

Risk Prediction ◽

Genome Sequencing ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Nucleotide Polymorphisms ◽

Sequencing Data ◽

Single Nucleotide ◽

Marker Selection

Download Full-text

Development and evaluation of an outbreak surveillance system integrating whole genome sequencing data for non-typhoidal Salmonella in London and South East of England, 2016-17

Epidemiology and Infection ◽

10.1017/s0950268821001400 ◽

2021 ◽

pp. 1-26

Author(s):

Karthik Paranthaman ◽

Piers Mook ◽

Daniele Curtis ◽

Edward-Wynne Evans ◽

Emma Crawley-Boevey ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Surveillance System ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data

Download Full-text

Whole genome sequencing reveals high differentiation, low levels of genetic diversity and short runs of homozygosity among Swedish wels catfish

Heredity ◽

10.1038/s41437-021-00438-5 ◽

2021 ◽

Author(s):

Axel Jensen ◽

Mette Lillie ◽

Kristofer Bergström ◽

Per Larsson ◽

Jacob Höglund

Keyword(s):

Genetic Diversity ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome Sequencing Data ◽

Peripheral Populations ◽

Whole Genome ◽

Runs Of Homozygosity ◽

Sequencing Data ◽

Isolated Populations ◽

Native Populations

AbstractThe use of genetic markers in the context of conservation is largely being outcompeted by whole-genome data. Comparative studies between the two are sparse, and the knowledge about potential effects of this methodology shift is limited. Here, we used whole-genome sequencing data to assess the genetic status of peripheral populations of the wels catfish (Silurus glanis), and discuss the results in light of a recent microsatellite study of the same populations. The Swedish populations of the wels catfish have suffered from severe declines during the last centuries and persists in only a few isolated water systems. Fragmented populations generally are at greater risk of extinction, for example due to loss of genetic diversity, and may thus require conservation actions. We sequenced individuals from the three remaining native populations (Båven, Emån, and Möckeln) and one reintroduced population of admixed origin (Helge å), and found that genetic diversity was highest in Emån but low overall, with strong differentiation among the populations. No signature of recent inbreeding was found, but a considerable number of short runs of homozygosity were present in all populations, likely linked to historically small population sizes and bottleneck events. Genetic substructure within any of the native populations was at best weak. Individuals from the admixed population Helge å shared most genetic ancestry with the Båven population (72%). Our results are largely in agreement with the microsatellite study, and stresses the need to protect these isolated populations at the northern edge of the distribution of the species.

Download Full-text