Genetic Sex Validation for Sample Tracking in Clinical Testing

Background: Next generation DNA sequencing (NGS) has been rapidly adopted by clinical testing laboratories for detection of germline and somatic genetic variants. The complexity of sample processing in a clinical DNA sequencing laboratory creates multiple opportunities for sample identification errors, demanding stringent quality control procedures. Methods: We utilized DNA genotyping via a 96-SNP PCR panel applied at sample acquisition in comparison to the final sequence, for tracking of sample identity throughout the sequencing pipeline. The 96-SNP PCR panel's inclusion of sex SNPs also provides a mechanism for a genotype-based comparison to recorded sex at sample collection for identification. This approach was implemented in the clinical genomic testing pathways, in the multi-center Electronic Medical Records and Genomics (eMERGE) Phase III program. Results: We identified 110 inconsistencies from 25,015 (0.44%) clinical samples, when comparing the 96-SNP PCR panel data to the test requisition-provided sex. The 96-SNP PCR panel genetic sex predictions were confirmed using additional SNP sites in the sequencing data or high-density hybridization-based genotyping arrays. Results identified clerical errors, samples from transgender participants and stem cell or bone marrow transplant patients and undetermined sample mix-ups. Conclusion: The 96-SNP PCR panel provides a cost-effective, robust tool for tracking samples within DNA sequencing laboratories, while the ability to predict sex from genotyping data provides an additional quality control measure for all procedures, beginning with sample collections. While not sufficient to detect all sample mix-ups, the inclusion of genetic versus reported sex matching can give estimates of the rate of errors in sample collection systems.

Download Full-text

Novel Nested-Seq Approach for SARS-CoV-2 Real-Time Epidemiology and In-Depth Mutational Profiling in Wastewater

International Journal of Molecular Sciences ◽

10.3390/ijms22168498 ◽

2021 ◽

Vol 22 (16) ◽

pp. 8498

Author(s):

Margaritis Avgeris ◽

Panagiotis G. Adamopoulos ◽

Aikaterini Galani ◽

Marieta Xagorari ◽

Dimitrios Gourgiotis ◽

...

Keyword(s):

Real Time ◽

Nested Pcr ◽

Mutational Analysis ◽

National Level ◽

Rna Stability ◽

3D Structure ◽

Cost Effective ◽

Clinical Samples ◽

Sequencing Data ◽

Alpha Variant

Considering the lack of effective treatments against COVID-19, wastewater-based epidemiology (WBE) is emerging as a cost-effective approach for real-time population-wide SARS-CoV-2 monitoring. Here, we report novel molecular assays for sensitive detection and mutational/variant analysis of SARS-CoV-2 in wastewater. Highly stable regions of SARS-CoV-2 RNA were identified by RNA stability analysis and targeted for the development of novel nested PCR assays. Targeted DNA sequencing (DNA-seq) was applied for the analysis and quantification of SARS-CoV-2 mutations/variants, following hexamers-based reverse transcription and nested PCR-based amplification of targeted regions. Three-dimensional (3D) structure models were generated to examine the predicted structural modification caused by genomic variants. WBE of SARS-CoV-2 revealed to be assay dependent, and significantly improved sensitivity achieved by assay combination (94%) vs. single-assay screening (30%–60%). Targeted DNA-seq allowed the quantification of SARS-CoV-2 mutations/variants in wastewater, which agreed with COVID-19 patients’ sequencing data. A mutational analysis indicated the prevalence of D614G (S) and P323L (RdRP) variants, as well as of the Β.1.1.7/alpha variant of concern, in agreement with the frequency of Β.1.1.7/alpha variant in clinical samples of the same period of the third pandemic wave at the national level. Our assays provide an innovative cost-effective platform for real-time monitoring and early-identification of SARS-CoV-2 variants at community/population levels.

Download Full-text

Highly multiplexed, fast and accurate nanopore sequencing for verification of synthetic DNA constructs and sequence libraries

Synthetic Biology ◽

10.1093/synbio/ysz025 ◽

2019 ◽

Vol 4 (1) ◽

Cited By ~ 4

Author(s):

Andrew Currin ◽

Neil Swainston ◽

Mark S Dunstan ◽

Adrian J Jervis ◽

Paul Mulherin ◽

...

Keyword(s):

Synthetic Biology ◽

Dna Sequencing ◽

Cost Effective ◽

Polymorphism Analysis ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Synthetic Dna ◽

Design Build ◽

Hardware Costs

Abstract Synthetic biology utilizes the Design–Build–Test–Learn pipeline for the engineering of biological systems. Typically, this requires the construction of specifically designed, large and complex DNA assemblies. The availability of cheap DNA synthesis and automation enables high-throughput assembly approaches, which generates a heavy demand for DNA sequencing to verify correctly assembled constructs. Next-generation sequencing is ideally positioned to perform this task, however with expensive hardware costs and bespoke data analysis requirements few laboratories utilize this technology in-house. Here a workflow for highly multiplexed sequencing is presented, capable of fast and accurate sequence verification of DNA assemblies using nanopore technology. A novel sample barcoding system using polymerase chain reaction is introduced, and sequencing data are analyzed through a bespoke analysis algorithm. Crucially, this algorithm overcomes the problem of high-error rate nanopore data (which typically prevents identification of single nucleotide variants) through statistical analysis of strand bias, permitting accurate sequence analysis with single-base resolution. As an example, 576 constructs (6 × 96 well plates) were processed in a single workflow in 72 h (from Escherichia coli colonies to analyzed data). Given our procedure’s low hardware costs and highly multiplexed capability, this provides cost-effective access to powerful DNA sequencing for any laboratory, with applications beyond synthetic biology including directed evolution, single nucleotide polymorphism analysis and gene synthesis.

Download Full-text

368. Performance Characteristics of Sequencing Assays for Identification of the SARS-CoV-2 Viral Genome

Open Forum Infectious Diseases ◽

10.1093/ofid/ofab466.569 ◽

2021 ◽

Vol 8 (Supplement_1) ◽

pp. S286-S287

Author(s):

Danny Antaki ◽

Mara Couto-Rodriguez ◽

Tong Liu ◽

Kristin Butcher ◽

Esteban Toro ◽

...

Keyword(s):

In Silico ◽

Viral Genome ◽

Board Member ◽

Panel Member ◽

Cost Effective ◽

Dropout Rate ◽

Clinical Samples ◽

Sequencing Data ◽

Hybrid Capture ◽

Wide Range

Abstract Background As the SARS-CoV-2 (SCV-2) virus evolves, diagnostics and vaccines against novel strains rely on viral genome sequencing. Researchers have gravitated towards the cost-effective and highly sensitive amplicon-based (e.g. ARTIC) and hybrid capture sequencing (e.g. SARS-CoV-2 NGS Assay) to selectively target the SCV-2 genome. We provide an in silico model to compare these 2 technologies and present data on the high scalability of the Research Use Only (RUO) workflow of the SARS-CoV-2 NGS Assay. Methods In silico work included alignments of 383,656 high-quality genome sequences belonging to variant of concern (VOC) or variant of interest (VOI) isolates (GISAID). We profiled mismatches and sequencing dropouts using the ARTIC V3 primers, SARS-CoV-2 NGS Assay probes (Twist Bioscience) and 11 synthesized viral sequences containing mutations and compared the performance of these assays using clinical samples. Further, the miniaturized hybrid capture workflow was optimized and evaluated to support high-throughput (384-plex). The sequencing data was processed by COVID-DX software. Results We detected 101,432 viruses (27%) with > = 1 mismatch in the last 6 base pairs of the 3’ end of ARTIC primers; of these, 413 had > = 2 mismatches in one primer. In contrast, only 38 viruses (0.01%) had enough mutations ( > = 10) in a hybrid capture probe to have a similar effect on coverage. We observed that mutations in ARTIC primers led to complete dropout of the amplicon for 4/11 isolates and diminished coverage in additional 4. Twist probes showed uniform coverage throughout with little to no dropouts. Both assays detected a wide range of variants (~99.9% coverage at 5X depth) in clinical samples (CT value < 30) collected in NY (Spring 2020-Spring 2021). The distribution of the number of reads and on target rates were more uniform among specimens within amplicon-based sequencing. However, uneven genome coverage and primer dropouts, some in the spike protein, were observed on VOC/VOI and other isolates highlighting limitations of an amplicon-based approach. Conclusion The RUO workflow of the SARS-CoV-2 NGS Assay is a comprehensive and scalable sequencing tool for variant profiling, yields more consistent coverage and smaller dropout rate compared to ARTIC (0.05% vs. 7.7%). Disclosures Danny Antaki, PhD, Twist Bioscience (Employee, Shareholder) Mara Couto-Rodriguez, MS, Biotia (Employee) Kristin Butcher, MS, Twist Bioscience (Employee, Shareholder) Esteban Toro, PhD, Twist Bioscience (Employee) Bryan Höglund, BS, Twist Bioscience (Employee, Shareholder) Xavier O. Jirau Serrano, B.S., Biotia (Employee) Joseph Barrows, MS, Biotia (Employee) Christopher Mason, PhD, Biotia (Board Member, Advisor or Review Panel member, Shareholder) Niamh B. O’Hara, PhD, Biotia (Board Member, Employee, Shareholder) Dorottya Nagy-Szakal, MD PhD, Biotia Inc (Employee, Shareholder)

Download Full-text

AlmostSignificant: Simplifying quality control of high-throughput sequencing data

10.1101/053702 ◽

2016 ◽

Author(s):

Joseph Ward ◽

Christian Cole ◽

Melanie Febrer ◽

Geoffrey Barton

Keyword(s):

Quality Control ◽

Dna Sequencing ◽

Illumina Sequencing ◽

High Throughput Sequencing ◽

Sequencing Data ◽

Multiple Sources ◽

Meta Data ◽

Sequencing Technologies ◽

High Throughput Sequencing Data

AbstractMotivationThe current generation of DNA sequencing technologies produce a large amount of data quickly. All of these data need to pass some form of quality control processing and checking before they can be used for any analysis. The large number of samples that are run through Illumina sequencing machines makes the process of quality control an onerous and time-consuming task that requires multiple pieces of information from several sources.ResultsAlmostSignificant is an open-source platform for aggregating multiple sources of quality metrics as well as meta-data associated with DNA sequencing runs from Illumina sequencing machines. AlmostSignificant is a graphical platform to streamline the quality control of DNA sequencing data, to collect and store these data for future reference and to collect extra meta-data associated with the sequencing runs to check for errors and monitor the volume of data produced by the associated machines. AlmostSignificant has been used to track the quality of over 80 sequencing runs covering over 2500 samples produced over the last three years.AvailabilityThe code and documentation for AlmostSignificant is freely available at https://github.com/bartongroup/[email protected], [email protected]

Download Full-text

Using QC-Blind for Quality Control and Contamination Screening of Bacteria DNA Sequencing Data Without Reference Genome

Frontiers in Microbiology ◽

10.3389/fmicb.2019.01560 ◽

2019 ◽

Vol 10 ◽

Cited By ~ 2

Author(s):

Wang Xi ◽

Yan Gao ◽

Zhangyu Cheng ◽

Chaoyun Chen ◽

Maozhen Han ◽

...

Keyword(s):

Quality Control ◽

Dna Sequencing ◽

Reference Genome ◽

Sequencing Data

Download Full-text

Deconvolution of nucleic-acid length distributions: a gel electrophoresis analysis tool and applications

Nucleic Acids Research ◽

10.1093/nar/gkz534 ◽

2019 ◽

Vol 47 (16) ◽

pp. e92-e92

Author(s):

Riccardo Ziraldo ◽

Massa J Shoura ◽

Andrew Z Fire ◽

Stephen D Levene

Keyword(s):

Quality Control ◽

Software Tool ◽

Cost Effective ◽

Analysis Tool ◽

Dna Fragments ◽

Line Profiles ◽

Limited Information ◽

Sequencing Data ◽

Dna And Rna ◽

Average Size

Abstract Next-generation DNA-sequencing (NGS) technologies, which are designed to streamline the acquisition of massive amounts of sequencing data, are nonetheless dependent on various preparative steps to generate DNA fragments of required concentration, purity and average size (molecular weight). Current automated electrophoresis systems for DNA- and RNA-sample quality control, such as Agilent’s Bioanalyzer® and TapeStation® products, are costly to acquire and use; they also provide limited information for samples having broad size distributions. Here, we describe a software tool that helps determine the size distribution of DNA fragments in an NGS library, or other DNA sample, based on gel-electrophoretic line profiles. The software, developed as an ImageJ plug-in, allows for straightforward processing of gel images, including lane selection and fitting of univariate functions to intensity distributions. The user selects the option of fitting either discrete profiles in cases where discrete gel bands are visible or continuous profiles, having multiple bands buried under a single broad peak. The method requires only modest imaging capabilities and is a cost-effective, rigorous alternative characterization method to augment existing techniques for library quality control.

Download Full-text

Deconvolution of Nucleic-acid Length Distributions: A Gel Electrophoresis Analysis Tool and Applications

10.1101/636936 ◽

2019 ◽

Author(s):

Riccardo Ziraldo ◽

Massa J. Shoura ◽

Andrew Z. Fire ◽

Stephen D. Levene

Keyword(s):

Quality Control ◽

Software Tool ◽

Cost Effective ◽

Analysis Tool ◽

Dna Fragments ◽

Line Profiles ◽

Limited Information ◽

Sequencing Data ◽

Dna And Rna ◽

Average Size

ABSTRACTNext-generation DNA-sequencing (NGS) technologies, which are designed to streamline the acquisition of massive amounts of sequencing data, are nonetheless dependent on various preparative steps to generate DNA fragments of required concentration, purity, and average size (molecular weight). Current automated electrophoresis systems for DNA- and RNA-sample quality control, such as Agilent’s Bioanalyzer ®and TapeStation ® products, are costly to acquire and use; they also provide limited information for samples having broad size distributions. Here we describe a software tool that helps determine the size distribution of DNA fragments in an NGS library, or other DNA sample, based on gel-electrophoretic line profiles. The software, developed as an ImageJ plug-in, allows for straightforward processing of gel images, including lane selection and fitting of univariate functions to intensity distributions. The user selects the option of fitting either discrete profiles in cases where discrete gel bands are visible, or continuous profiles, having multiple bands buried under a single broad peak. The method requires only modest imaging capabilities and is a cost-effective, rigorous alternative characterization method to augment existing techniques for library quality control.

Download Full-text

Analysis of pap tests in Tepecik Education and Research Hospital as a quality control measure: An observational study

10.26226/morressier.596cd2b7d462b8029238795c ◽

2017 ◽

Author(s):

Özge Kaya

Keyword(s):

Quality Control ◽

Observational Study ◽

Control Measure ◽

Pap Tests ◽

Research Hospital ◽

Quality Control Measure

Download Full-text

Computer-supported Detection of M-Components and Evaluation of Immunoglobulins after Capillary Electrophoresis

Clinical Chemistry ◽

10.1093/clinchem/47.1.110 ◽

2001 ◽

Vol 47 (1) ◽

pp. 110-117 ◽

Cited By ~ 16

Author(s):

Magnus Jonsson ◽

Joyce Carlson ◽

Jan-Olof Jeppsson ◽

Per Simonsson

Keyword(s):

Capillary Electrophoresis ◽

Decision Support ◽

Protein Separation ◽

Serum Proteins ◽

Cost Effective ◽

Clinical Samples ◽

Serum Samples ◽

Computerized Decision Support ◽

Cost Effective Method ◽

Mathematical Algorithms

Abstract Background: Electrophoresis of serum samples allows detection of monoclonal gammopathies indicative of multiple myeloma, Waldenström macroglobulinemia, monoclonal gammopathy of undetermined significance, and amyloidosis. Present methods of high-resolution agarose gel electrophoresis (HRAGE) and immunofixation electrophoresis (IFE) are manual and labor-intensive. Capillary zone electrophoresis (CZE) allows rapid automated protein separation and produces digital absorbance data, appropriate as input for a computerized decision support system. Methods: Using the Beckman Paragon CZE 2000 instrument, we analyzed 711 routine clinical samples, including 95 monoclonal components (MCs) and 9 cases of Bence Jones myeloma, in both the CZE and HRAGE systems. Mathematical algorithms developed for the detection of monoclonal immunoglobulins (MCs) in the γ- and β-regions of the electropherogram were tested on the entire material. Additional algorithms evaluating oligoclonality and polyclonal concentrations of immunoglobulins were also tested. Results: CZE electropherograms corresponded well with HRAGE. Only one IgG MC of 1 g/L, visible on HRAGE, was not visible after CZE. Algorithms detected 94 of 95 MCs (98.9%) and 100% of those visible after CZE. Of 607 samples lacking an MC on HRAGE, only 3 were identified by the algorithms (specificity, 99%). Algorithms evaluating total gammaglobulinemia and oligoclonality also identified several cases of Bence Jones myeloma. Conclusions: The use of capillary electrophoresis provides a modern, rapid, and cost-effective method of analyzing serum proteins. The additional option of computerized decision support, which provides rapid and standardized interpretations, should increase the clinical availability and usefulness of protein analyses in the future.

Download Full-text

Distinguishing linear and branched evolution given single-cell DNA sequencing data of tumors

Algorithms for Molecular Biology ◽

10.1186/s13015-021-00194-5 ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Leah L. Weber ◽

Mohammed El-Kebir

Keyword(s):

Dna Sequencing ◽

Single Cell ◽

Evolutionary Process ◽

Treatment Decision ◽

Real Data ◽

Current Data ◽

Fast Method ◽

Sequencing Data ◽

Evolutionary Trajectory ◽

Cancer Types

Abstract Background Cancer arises from an evolutionary process where somatic mutations give rise to clonal expansions. Reconstructing this evolutionary process is useful for treatment decision-making as well as understanding evolutionary patterns across patients and cancer types. In particular, classifying a tumor’s evolutionary process as either linear or branched and understanding what cancer types and which patients have each of these trajectories could provide useful insights for both clinicians and researchers. While comprehensive cancer phylogeny inference from single-cell DNA sequencing data is challenging due to limitations with current sequencing technology and the complexity of the resulting problem, current data might provide sufficient signal to accurately classify a tumor’s evolutionary history as either linear or branched. Results We introduce the Linear Perfect Phylogeny Flipping (LPPF) problem as a means of testing two alternative hypotheses for the pattern of evolution, which we prove to be NP-hard. We develop Phyolin, which uses constraint programming to solve the LPPF problem. Through both in silico experiments and real data application, we demonstrate the performance of our method, outperforming a competing machine learning approach. Conclusion Phyolin is an accurate, easy to use and fast method for classifying an evolutionary trajectory as linear or branched given a tumor’s single-cell DNA sequencing data.

Download Full-text