Advanced Whole Genome Sequencing Using an Entirely PCR-free Massively Parallel Sequencing Workflow

AbstractBackgroundSystematic errors can be introduced from DNA amplification during massively parallel sequencing (MPS) library preparation and sequencing array formation. Polymerase chain reaction (PCR)-free genomic library preparation methods were previously shown to improve whole genome sequencing (WGS) quality on the Illumina platform, especially in calling insertions and deletions (InDels). We hypothesized that substantial InDel errors continue to be introduced by the remaining PCR step of DNA cluster generation. In addition to library preparation and sequencing, data analysis methods are also important for the accuracy of the output data.In recent years, several machine learning variant calling pipelines have emerged, which can correct the systematic errors from MPS and improve the data performance of variant calling.ResultsHere, PCR-free libraries were sequenced on the PCR-free DNBSEQ™ arrays from MGI Tech Co., Ltd. (referred to as MGI) to accomplish the first true PCR-free WGS which the whole process is truly not only PCR-free during library preparation but also PCR-free during sequencing. We demonstrated that PCR-based WGS libraries have significantly (about 5 times) more InDel errors than PCR-free libraries.Furthermore, PCR-free WGS libraries sequenced on the PCR-free DNBSEQ™ platform have up to 55% less InDel errors compared to the NovaSeq platform, confirming that DNA clusters contain PCR-generated errors.In addition, low coverage bias and less than 1% read duplication rate was reproducibly obtained in DNBSEQ™ PCR-free using either ultrasonic or enzymatic DNA fragmentation MGI kits combined with MGISEQ-2000. Meanwhile, variant calling performance (single-nucleotide polymorphisms (SNPs) F-score>99.94%, InDels F-score>99.6%) exceeded widely accepted standards using machine learning (ML) methods (DeepVariant or DNAscope).ConclusionsEnabled by the new PCR-free library preparation kits, ultra high-thoughput PCR-free sequencers and ML-based variant calling, true PCR-free DNBSEQ™ WGS provides a powerful solution for improving WGS accuracy while reducing cost and analysis time, thus facilitating future precision medicine, cohort studies, and large population genome projects.

Download Full-text

Whole-genome sequencing and comprehensive variant analysis of a Japanese individual using massively parallel sequencing

Nature Genetics ◽

10.1038/ng.691 ◽

2010 ◽

Vol 42 (11) ◽

pp. 931-936 ◽

Cited By ~ 90

Author(s):

Akihiro Fujimoto ◽

Hidewaki Nakagawa ◽

Naoya Hosono ◽

Kaoru Nakano ◽

Tetsuo Abe ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Massively Parallel Sequencing ◽

Massively Parallel ◽

Whole Genome ◽

Parallel Sequencing ◽

Variant Analysis ◽

Japanese Individual

Download Full-text

Non-invasive prenatal testing using massively parallel sequencing of maternal plasma DNA: from molecular karyotyping to fetal whole-genome sequencing

Reproductive BioMedicine Online ◽

10.1016/j.rbmo.2013.08.008 ◽

2013 ◽

Vol 27 (6) ◽

pp. 593-598 ◽

Cited By ~ 32

Author(s):

Y.M. Dennis Lo

Keyword(s):

Genome Sequencing ◽

Massively Parallel Sequencing ◽

Prenatal Testing ◽

Maternal Plasma ◽

Massively Parallel ◽

Molecular Karyotyping ◽

Whole Genome ◽

Plasma Dna ◽

Parallel Sequencing ◽

Non Invasive

Download Full-text

Whole Genome Sequencing Refines Knowledge on the Population Structure of Mycobacterium bovis from a Multi-Host Tuberculosis System

Microorganisms ◽

10.3390/microorganisms9081585 ◽

2021 ◽

Vol 9 (8) ◽

pp. 1585

Author(s):

Ana C. Reis ◽

Liliana C. M. Salvador ◽

Suelee Robbe-Austerman ◽

Rogério Tenreiro ◽

Ana Botelho ◽

...

Keyword(s):

Population Structure ◽

Whole Genome Sequencing ◽

Wild Boar ◽

Genome Sequencing ◽

Mycobacterium Bovis ◽

Red Deer ◽

Variable Number Tandem Repeat ◽

Variant Calling ◽

Whole Genome ◽

Network Analyses

Classical molecular analyses of Mycobacterium bovis based on spoligotyping and Variable Number Tandem Repeat (MIRU-VNTR) brought the first insights into the epidemiology of animal tuberculosis (TB) in Portugal, showing high genotypic diversity of circulating strains that mostly cluster within the European 2 clonal complex. Previous surveillance provided valuable information on the prevalence and spatial occurrence of TB and highlighted prevalent genotypes in areas where livestock and wild ungulates are sympatric. However, links at the wildlife–livestock interfaces were established mainly via classical genotype associations. Here, we apply whole genome sequencing (WGS) to cattle, red deer and wild boar isolates to reconstruct the M. bovis population structure in a multi-host, multi-region disease system and to explore links at a fine genomic scale between M. bovis from wildlife hosts and cattle. Whole genome sequences of 44 representative M. bovis isolates, obtained between 2003 and 2015 from three TB hotspots, were compared through single nucleotide polymorphism (SNP) variant calling analyses. Consistent with previous results combining classical genotyping with Bayesian population admixture modelling, SNP-based phylogenies support the branching of this M. bovis population into five genetic clades, three with apparent geographic specificities, as well as the establishment of an SNP catalogue specific to each clade, which may be explored in the future as phylogenetic markers. The core genome alignment of SNPs was integrated within a spatiotemporal metadata framework to further structure this M. bovis population by host species and TB hotspots, providing a baseline for network analyses in different epidemiological and disease control contexts. WGS of M. bovis isolates from Portugal is reported for the first time in this pilot study, refining the spatiotemporal context of TB at the wildlife–livestock interface and providing further support to the key role of red deer and wild boar on disease maintenance. The SNP diversity observed within this dataset supports the natural circulation of M. bovis for a long time period, as well as multiple introduction events of the pathogen in this Iberian multi-host system.

Download Full-text

Clinical-grade whole-genome sequencing and 3′ transcriptome analysis of colorectal cancer patients

Genome Medicine ◽

10.1186/s13073-021-00852-8 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Agata Stodolna ◽

Miao He ◽

Mahesh Vasipalli ◽

Zoya Kingsbury ◽

Jennifer Becq ◽

...

Keyword(s):

Colorectal Cancer ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Transcriptome Analysis ◽

Variant Calling ◽

Standard Of Care ◽

Genomic Variation ◽

Whole Genome ◽

Clinical Grade ◽

Pathway Gene

Abstract Background Clinical-grade whole-genome sequencing (cWGS) has the potential to become the standard of care within the clinic because of its breadth of coverage and lack of bias towards certain regions of the genome. Colorectal cancer presents a difficult treatment paradigm, with over 40% of patients presenting at diagnosis with metastatic disease. We hypothesised that cWGS coupled with 3′ transcriptome analysis would give new insights into colorectal cancer. Methods Patients underwent PCR-free whole-genome sequencing and alignment and variant calling using a standardised pipeline to output SNVs, indels, SVs and CNAs. Additional insights into the mutational signatures and tumour biology were gained by the use of 3′ RNA-seq. Results Fifty-four patients were studied in total. Driver analysis identified the Wnt pathway gene APC as the only consistently mutated driver in colorectal cancer. Alterations in the PI3K/mTOR pathways were seen as previously observed in CRC. Multiple private CNAs, SVs and gene fusions were unique to individual tumours. Approximately 30% of patients had a tumour mutational burden of > 10 mutations/Mb of DNA, suggesting suitability for immunotherapy. Conclusions Clinical whole-genome sequencing offers a potential avenue for the identification of private genomic variation that may confer sensitivity to targeted agents and offer patients new options for targeted therapies.

Download Full-text

A common protocol for the simultaneous processing of multiple clinically relevant bacterial species for whole genome sequencing

Scientific Reports ◽

10.1038/s41598-020-80031-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Kathy E. Raven ◽

Sophia T. Girgis ◽

Asha Akram ◽

Beth Blane ◽

Danielle Leek ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Dna Extraction ◽

Genome Sequencing ◽

Clinical Microbiology ◽

Bacterial Species ◽

Outbreak Detection ◽

Whole Genome ◽

Library Preparation ◽

Simultaneous Processing ◽

Antimicrobial Resistance Gene

AbstractWhole-genome sequencing is likely to become increasingly used by local clinical microbiology laboratories, where sequencing volume is low compared with national reference laboratories. Here, we describe a universal protocol for simultaneous DNA extraction and sequencing of numerous different bacterial species, allowing mixed species sequence runs to meet variable laboratory demand. We assembled test panels representing 20 clinically relevant bacterial species. The DNA extraction process used the QIAamp mini DNA kit, to which different combinations of reagents were added. Thereafter, a common protocol was used for library preparation and sequencing. The addition of lysostaphin, lysozyme or buffer ATL (a tissue lysis buffer) alone did not produce sufficient DNA for library preparation across the species tested. By contrast, lysozyme plus lysostaphin produced sufficient DNA across all 20 species. DNA from 15 of 20 species could be extracted from a 24-h culture plate, while the remainder required 48–72 h. The process demonstrated 100% reproducibility. Sequencing of the resulting DNA was used to recapitulate previous findings for species, outbreak detection, antimicrobial resistance gene detection and capsular type. This single protocol for simultaneous processing and sequencing of multiple bacterial species supports low volume and rapid turnaround time by local clinical microbiology laboratories.

Download Full-text

Estimating sequencing error rates using families

BioData Mining ◽

10.1186/s13040-021-00259-6 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Kelley Paskov ◽

Jae-Yoon Jung ◽

Brianna Chrisman ◽

Nate T. Stockham ◽

Peter Washington ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Exome Sequencing ◽

Genome Sequencing ◽

Variant Calling ◽

Error Rates ◽

Sequencing Error ◽

Whole Genome ◽

Sequencing Data ◽

Sequencing Platform ◽

Whole Exome

Abstract Background As next-generation sequencing technologies make their way into the clinic, knowledge of their error rates is essential if they are to be used to guide patient care. However, sequencing platforms and variant-calling pipelines are continuously evolving, making it difficult to accurately quantify error rates for the particular combination of assay and software parameters used on each sample. Family data provide a unique opportunity for estimating sequencing error rates since it allows us to observe a fraction of sequencing errors as Mendelian errors in the family, which we can then use to produce genome-wide error estimates for each sample. Results We introduce a method that uses Mendelian errors in sequencing data to make highly granular per-sample estimates of precision and recall for any set of variant calls, regardless of sequencing platform or calling methodology. We validate the accuracy of our estimates using monozygotic twins, and we use a set of monozygotic quadruplets to show that our predictions closely match the consensus method. We demonstrate our method’s versatility by estimating sequencing error rates for whole genome sequencing, whole exome sequencing, and microarray datasets, and we highlight its sensitivity by quantifying performance increases between different versions of the GATK variant-calling pipeline. We then use our method to demonstrate that: 1) Sequencing error rates between samples in the same dataset can vary by over an order of magnitude. 2) Variant calling performance decreases substantially in low-complexity regions of the genome. 3) Variant calling performance in whole exome sequencing data decreases with distance from the nearest target region. 4) Variant calls from lymphoblastoid cell lines can be as accurate as those from whole blood. 5) Whole-genome sequencing can attain microarray-level precision and recall at disease-associated SNV sites. Conclusion Genotype datasets from families are powerful resources that can be used to make fine-grained estimates of sequencing error for any sequencing platform and variant-calling methodology.

Download Full-text

Rapid, high throughput library preparation of SARS-CoV2 using Illumina’s DNA Prep Library Preparation Kit v2

10.17504/protocols.io.b3vgqn3w ◽

2022 ◽

Author(s):

Jason Nguyen ◽

Rebecca Hickman ◽

Tracy Lee ◽

Natalie Prystajecky ◽

John Tyson

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

High Throughput ◽

Illumina Miseq ◽

Whole Genome ◽

Library Preparation ◽

Dna Libraries ◽

Reaction Volumes ◽

Single Size ◽

Stop Buffer

This procedure provides instructions on how to prepare DNA libraries for whole genome sequencing on an Illumina MiSeq or NextSeq using Illumina’s DNA Prep Library Preparation Kit scaled to half reaction volumes with modifications to the post-PCR procedures; tagmentation stop buffer and associated washes are removed and libraries are pooled post PCR then a single size selection is performed. This protocol is used to sequence SARS-CoV-2 using the cDNA/PCR protocol: https://dx.doi.org/10.17504/protocols.io.b3viqn4e

Download Full-text